I’ve spend almost 2 day struggling how to compile tesseract project on Windows, encountered too many errors, missing ddl, path issue, etc.. To make it short, here are the easy and complete step on how to compile Tesseract Github Project on Windows 10, 8, 7 or XP.
Program Requirement:
First Thing, you need to have a Linux Terminal to your Windows in order to run Linux code. You need to download the following program:
1. Cygwin Terminal at Cygwin.com (setup-x86_64.exe for 64bit and setup-x86.exe for 32bit)
I’ve tried using “Git for Windows”, “Cmake” and “MingW”, but I have hard time looking for missing DLL.
Installation:
After you download the programs, We have to install the dependencies program from Cygwin.
1. Now Install Cygwin (Install it on drive C), then check the following item to install: Devel, Archieve and Graphics (You can filter out what you only need, but I suggest to install everything from this category).
Compiling:
Now that we already have the compiler and linux terminal. We can now compile tesseract.
1. On Cygwin, the default folder location of terminal can be found at (C:\cygwin64\home\YOUR_USERNAME).
2. (Optional) If you want linux code to work in your Command Prompt, you have to add the bin directory of Cygwin to the Environment Variables on your computer.. Just go to properties of My Computer, then Advanced system Settings, then click “Environment Variables”. A window will pop up, below find and edit “Path”.. Now click “New” then browse the bin directory of Cygwin (C:\cygwin64\bin\). then OK.
3. Open Cygwin Terminal and Type the following Codes:
git clone https://github.com/tesseract-ocr/tesseract.git
git clone https://github.com/DanBloomberg/leptonica.git
cd leptonica
mkdir build
cd build
cmake ..
cmake --build .
cd ../
cd ../
cd tesseract
mkdir build
cmake .. -DLeptonica_DIR=~/leptonica/build
cmake --build .
exit
4. Now that you have successfully compile Tesseract, Last thing we have to do is to get the trained data containing the languages. Open Cygwin Terminal and do the following code
cd tesseract
cd build
cd bin
git clone https://github.com/tesseract-ocr/tessdata.git tessdata
5. When done, go to your cygwin folder (C:\cygwin64\home\YOUR_USERNAME\leptonica\build\bin) and copy the DLL (cyglept173.dll)
6. Paste it on (C:\cygwin64\home\YOUR_USERNAME\tesseract\build\bin)
7. Now that we have the files to run tesseract on Windows, Let’s try running it.
8. Paste any JPG, GIF, PNG files on (C:\cygwin64\home\YOUR_USERNAME\tesseract\build\bin) you wish to convert to text. You can grab my example image below. Don’t forget to paste the image on (tesseract\build\bin)
9. Now open Command Prompt, then code the following:
cd c:
cd cygwin64
cd home
cd YOUR_USERNAME
cd tesseract
cd build
cd bin
tesseract "YOUR_IMAGE.jpg" stdout -l eng -psm 6
You can alternatively output the result into text files by this:
tesseract "YOUR_IMAGE.jpg" "RESULT.txt" -l eng -psm 6
10. And here’s the final output:
11. Lastly, You need to the following Cygwin dll’s and add it together with your tesseract exe directory in order to run tesseract without cygwin. You can find these binaries at C:/cygwin/bin Folder:
*Will edit this post when I find ways how to include these binaries during compiling in Cygwin..
So, everything works now! Thank you for taking time doing my instruction, If you have problem please leave a comment below.
Compiled Tesseract:
You can also download my compiled version at this link:
- Tesseract 3.5.0.0dev (http://gensanblog.com/downloads/tesseract-3.5.0.0dev.zip)
- Tesseract 3.5.0.0dev with DLL’s (http://gensanblog.com/downloads/tesseract-3.5.0.0dev-dlls.zip)
Please don’t mind the code below, its for search engine so other people can see this post if they encounter one of the error below from missing dependencies library.
Could NOT find ZLIB (missing: ZLIB_LIBRARY ZLIB_INCLUDE_DIR)
Could NOT find PNG (missing: PNG_LIBRARY PNG_PNG_INCLUDE_DIR)
Could NOT find JPEG (missing: JPEG_LIBRARY JPEG_INCLUDE_DIR)
Could NOT find TIFF (missing: TIFF_LIBRARY TIFF_INCLUDE_DIR)
Could NOT find PkgConfig (missing: PKG_CONFIG_EXECUTABLE)