Optical Characters recognition in image files

Ph Vouters
Valued Contributor

Optical Characters recognition in image files

Dear reader,


I did not find any suitable ITRC Linux forum, hence me posting this here.


If you are interested with text recognition in image files such as produced by scanners, here are some results I acheived on Linux and that I documented at


I am currently working on porting one of the software (Tesseract OCR) onto Windows. So the subject is being worked upon. This subject interests many national libraries to turn their richnesses currently paper printed into text documents accessible from anywhere over the world via Internet.


It happens the Tesseract OCR software which would currently be the most advanced software has initially been developed by HP Labs, then offered by HP to the Opensource world. Google is as of today sponsoring this Opensource code development. This Google sponsorship has to be connected with Google's project to digitalize all paper printed documents.


Best regards to everyone,

Philippe Vouters