简介:
OCR(Optical Character Recognition):光学字符识别,是指对图片文件中的文字进行分析识别,获取的过程。
Tesseract:开源的OCR识别引擎,初期Tesseract引擎由HP实验室研发,后来贡献给了开源软件业,后经由Google进行改进,消除bug,优化,重新发布。
链接:
Q&A:
Q1、关于语言库错误:
Error opening data file /usr/local/share/tessdata/eng.traineddata Please make sure the TESSDATA_PREFIX environment variable is set to the parent directory of your "tessdata" directory. Failed loading language eng Tesseract couldn't load any languages! Could not initialize tesseract.
A1:找到语言包之后拖到项目就行,关键是拖得时候注意这一点:“Make sure you select the "Create folder references" option, when adding the tessdata folder to your project”,详细解决方案看这里