I’ve spend almost 2 day struggling how to compile tesseract project on Windows, encountered too many errors, missing ddl, path issue, etc.. To make it short, here are the easy and complete step on how to compile Tesseract Github Project on Windows 10, 8, 7 or XP.
First Thing, you need to have a C++ Compiler for tesseract and a Linux Terminal to your Windows in order to run Linux code. You need to download the following program:
1. Visual Studio Community 2015, You can download it for FREE from this link (https://www.visualstudio.com)
After you download the programs, We have to activate first the C++ libraries from Visual Studio and install other dependencies program from Cygwin.
1. Install Visual Studio Community 2015, after installation open the Visual Studio and go to File > New > Project, Then go to “Visual C++”, then install.
2. Now Install Cygwin (Install it on drive C), then check the following item to install: Devel, Archieve and Graphics (You can filter out what you only need, but I suggest to install everything from this category).
Now that we already have the compiler and linux terminal. We can now compile tesseract.
1. On Cygwin, the default folder location of terminal can be found at (C:\cygwin64\home\YOUR_USERNAME).
2. (Optional) If you want linux code to work in your Command Prompt, you have to add the bin directory of Cygwin to the Environment Variables on your computer.. Just go to properties of My Computer, then Advanced system Settings, then click “Environment Variables”. A window will pop up, below find and edit “Path”.. Now click “New” then browse the bin directory of Cygwin (C:\cygwin64\bin\). then OK.
3. Open Cygwin Terminal and Type the following Codes:
git clone https://github.com/tesseract-ocr/tesseract.git
git clone https://github.com/egorpugin/leptonica.git
cmake --build .
cmake .. -DLeptonica_DIR=~/leptonica/build
cmake --build .
4. Now that you have successfully compile Tesseract, Last thing we have to do is to get the trained data containing the languages. Open Cygwin Terminal and do the following code
git clone https://github.com/tesseract-ocr/tessdata.git tessdata
5. When done, go to your cygwin folder (C:\cygwin64\home\YOUR_USERNAME\leptonica\build\bin) and copy the DLL (cyglept173.dll)
6. Paste it on (C:\cygwin64\home\YOUR_USERNAME\tesseract\build\bin)
7. Now that we have the files to run tesseract on Windows, Let’s try running it.
8. Paste any JPG, GIF, PNG files on (C:\cygwin64\home\YOUR_USERNAME\tesseract\build\bin) you wish to convert to text. You can grab my example image below. Don’t forget to paste the image on (tesseract\build\bin)
9. Now open Command Prompt, then code the following:
tesseract "YOUR_IMAGE.jpg" stdout -l eng -psm 6
You can alternatively output the result into text files by this:
tesseract "YOUR_IMAGE.jpg" "RESULT.txt" -l eng -psm 6
So, everything works now! Thank you for taking time doing my instruction, If you have problem please leave a comment below.
You can also download my compiled version at this link:
- Tesseract 184.108.40.206dev (http://gensanblog.com/downloads/tesseract-220.127.116.11dev.zip)
Please don’t mind the code below, its for search engine so other people can see this post if they encounter one of the error below from missing dependencies library.
Could NOT find ZLIB (missing: ZLIB_LIBRARY ZLIB_INCLUDE_DIR)
Could NOT find PNG (missing: PNG_LIBRARY PNG_PNG_INCLUDE_DIR)
Could NOT find JPEG (missing: JPEG_LIBRARY JPEG_INCLUDE_DIR)
Could NOT find TIFF (missing: TIFF_LIBRARY TIFF_INCLUDE_DIR)
Could NOT find PkgConfig (missing: PKG_CONFIG_EXECUTABLE)