How To: Compile and Use Tesseract (3.01) on iOS (SDK 5)

邹齐智
2023-12-01

http://tinsuke.wordpress.com/2011/11/01/how-to-compile-and-use-tesseract-3-01-on-ios-sdk-5/

Update

  • I don’t have access to a Mac computer now (actually it has been 3 months) and I couldn’t update the guide to Xcode 4.5 (iOS 6), but the fine gentlemanbengl3rt has done so and the updated script is available at: http://goo.gl/wQea5 (I haven’t been able to test it, so some feedback on whether it works or not would be appreciated!)

I never thought that my last post would have so much audience. Among other things, it earned me 3 direct job interview offers (1 of  ‘em from Google itself, maintainer of tesseract), an invite to write articles to a TI digital e-magazine and a few digital friends, but that’s something to discuss at other posts. Thank you!

Getting back to what really matters: last post was focused on cross compiling (potentially) any library for iOS (armv6/armv7/i386) and to use as an example I chose Tesseract, which was the library I was using on a work project. But the repercussion was so great and both Tesseract and iOS got newer versions that I’ve decided to write this post specifically about getting Tesseract compiled and using it on your iOS project.

As stated earlier, Tesseract has been officially launched at version 3.01 (that now uses an autogen.sh setup script and an improved configure script ) and iOS has received a major upgrade, version 5.0. As you may guess, these changes broke my script!


So let’s restart this party! (or: Compiling Tesseract 3.01 for iOS SDK 5.0)

The basics about the script were explained at last post and I’ll be just covering the changes and how to use it.

As noted by Rafael, the default C/C++/Objective-C compilers for iOS 5 (bundled with Xcode 4.2) have changed, actually, now you just need Clang, so the CPP, CXX, CXXPP, and CC definitions (inside setenv_all()) have changed to:

1
2
export CXX= "$DEVROOT/usr/bin/llvm-g++"
export CC= "$DEVROOT/usr/bin/llvm-gcc"

Additionally, as Tesseract now has on autogen.sh script to run before configuring, we run it before each configure call:

1
bash autogen.sh

And because Tesseract’s configure script now accepts a path to Leptonica to be specified, no hacks with it are needed, just calling it with another parameter is just enough:

1
. /configure -- enable -shared=no LIBLEPT_HEADERSDIR=$GLOBAL_OUTDIR /include/

To build your desired library, create a directory, I’ll refer to it as “./build/”. Inside it, create the following structure:

  • ./build/

Open Terminal, enter our “./build/” directory, cross your fingers (one very important step pointed out byVenusbai) and run:

1
bash build_dependencies.sh

Well, if you’re lucky enough and deserve the holy right to use Tesseract on mobile Apps, check the dependencies folder content and there you’ll have all the needed header and library files to play with OCR on your iPhone (I don’t have one, personally prefer Android, but you got it….).

Great!!! Now what?! (or: Using Tesseract on your iOS project)

  1. Create one new iOS project at Xcode (or just open your existing one)
  2. Add the generated ./build/dependencies/ folder to your project. It contains the needed .h Header and lib*.a Library files
  3. Add the tessdata folder, containing, well, erhm, hum, the tessdata files you need at your project. If you don’t know what the “tessdata” folder is: it contains preprocessed data for a certain language so Tesseract can recognize that language, download language data from: http://code.google.com/p/tesseract-ocr/downloads/list. Check the sub-instructions below to add it the right way (not the default Xcode way…)
    1. Right-click your project/group at Xcode
    2. Choose “Add files to your project”
    3. Select the “tessdata” folder
    4. At the same window, check the “Create folder references for any added folders”. This is the most important step, as it instructs Xcode to add your “tessdata” folder as a regular folder (a resource, as well), not as a Xcode project group.
  4. Create your TessBaseAPI object with the code below to start playing with it!
  5. Make sure that every source file that includes/imports or sees (includes/imports one file that may include/import) Tesseract Header files has the .mm extension instead of the regular .m. This allows the compiler to interpret Tesseract Headers as C/C++ headers.
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
// Set up the tessdata path. This is included in the application bundle
// but is copied to the Documents directory on the first run.
NSArray *documentPaths = NSSearchPathForDirectoriesInDomains(NSDocumentDirectory, NSUserDomainMask, YES);
NSString *documentPath = ([documentPaths count] > 0) ? [documentPaths objectAtIndex:0] : nil;
 
NSString *dataPath = [documentPath stringByAppendingPathComponent:@ "tessdata" ];
NSFileManager *fileManager = [NSFileManager defaultManager];
// If the expected store doesn't exist, copy the default store.
if (![fileManager fileExistsAtPath:dataPath]) {
     // get the path to the app bundle (with the tessdata dir)
     NSString *bundlePath = [[NSBundle mainBundle] bundlePath];
     NSString *tessdataPath = [bundlePath stringByAppendingPathComponent:@ "tessdata" ];
     if (tessdataPath) {
         [fileManager copyItemAtPath:tessdataPath toPath:dataPath error:NULL];
     }
}
 
setenv( "TESSDATA_PREFIX" , [[documentPath stringByAppendingString:@ "/" ] UTF8String], 1);
 
// init the tesseract engine.
tesseract = new tesseract::TessBaseAPI();
tesseract->Init([dataPath cStringUsingEncoding:NSUTF8StringEncoding], "eng" );

Well, that’s it! Hope you can reproduce this and I also provide to download one Xcode 4.2 iOS SDK 5 project with Tesseract configured and already recognizing one sample image, check it out if having any troubles following this howto.

Files for Download

Final Considerations

I really hope you guys have enjoyed it and if you have any opinion, compliment, suggestion or just wanna state something, feel free to comment, I’ll try to approve it ASAP.

acknowledgements

 类似资料:

相关阅读

相关文章

相关问答