安装Tesseract-OCR 

准备工作:

编译环境: gcc gcc-c++ make(这个环境一般机器都具备,可以忽略) 

?

1
yum install gcc gcc -c++ make

依赖的包: autoconf automake libtool libjpeg-devel libpng-devel libtiff-devel zlib-devel leptonica(1.67以上)

1. autoconf automake libtool libjpeg-devel libpng-devel libtiff-devel zlib-devel 可以通过yum安装:

?

1
2
yum install autoconf automake libtool
yum install libjpeg-devel libpng-devel libtiff-devel zlib-devel

2. leptonica 需要源码编译安装

参考资料:

http://paramountideas.com/tesseract-ocr-30-and-leptonica-installation-centos-55-and-opensuse-113

http://www.leptonica.org/source/README.html

下载 leptonica 包: http://www.leptonica.org/source/leptonica-1.68.tar.gz

解压后切换到 leptonica-1.68 根目录 

?

1
2
3
. /configure
make
make install

tesseract安装:

依赖安装完毕后开始安装tesseract

下载 tesseract-3.01 安装包: http://tesseract-ocr.googlecode.com/files/tesseract-3.01.tar.gz

解压后切换到 tesseract-3.01 根目录

(如果在make时遇到类似 strngs.h:1: error: stray '\357' in program 的错误,请将 tesseract-3.01/ccutil/strngs.h 文件转为 ANSI 编码保存,再重新编译) 

?

1
2
3
4
5
. /autogen .sh
. /configure
make
make install
ldconfig

tesseract英文语言包安装: 

下载 tesseract-3.01 英文语言包: http://tesseract-ocr.googlecode.com/files/tesseract-ocr-3.01.eng.tar.gz

解压后将 tesseract-ocr/tessdata 下的所有文件全部拷贝到 /usr/local/share/tessdata 下

安装完毕.

测试一下:

切换到解压后的 tesseract-3.01 根目录(这个目录下有一个自带的 phototest.tif 可以做测试用)

命令行:

?

1
tesseract phototest.tif phototest -l eng


输出:

?

1
2
Tesseract Open Source OCR Engine v3.01 with Leptonica
Page 0

这时应该在当前目录生成一个 phototest.txt 文本文件,内容就是 phototest.tif 显示的文字.


参考文档:http://my.oschina.net/iceman/blog/40771


配置文档:

#安装leptonica

yum -y install gcc gcc-c++ make

yum -y install autoconf automake libtool

yum -y install libjpeg-devel libpng-devel libtiff-devel zlib-devel


wget http://www.leptonica.org/source/leptonica-1.72.tar.gz

tar zxvf leptonica-1.72.tar.gz

cd leptonica-1.72

./configure

make

make install


#安装tesseract-ocr

wget https://tesseract-ocr.googlecode.com/files/tesseract-ocr-3.02.02.tar.gz

tar zxvf tesseract-ocr-3.02.02.tar.gz

cd tesseract-ocr/

./autogen.sh

./configure

make

make install

ldconfig

cd /root/

wget https://tesseract-ocr.googlecode.com/files/tesseract-ocr-3.02.eng.tar.gz

tar zxvf tesseract-ocr-3.02.eng.tar.gz

mv /root/tesseract-ocr/tessdata /usr/local/share/tessdata


#测试

cd tesseract-ocr/

tesseract phototest.tif phptotest -l eng

ll phpto*