- Community Home
- >
- Servers and Operating Systems
- >
- Operating Systems
- >
- Operating System - Linux
- >
- Optical Characters recognition in image files
Categories
Company
Local Language
Forums
Discussions
Forums
- Data Protection and Retention
- Entry Storage Systems
- Legacy
- Midrange and Enterprise Storage
- Storage Networking
- HPE Nimble Storage
Discussions
Discussions
Discussions
Forums
Discussions
Discussion Boards
Discussion Boards
Discussion Boards
Discussion Boards
- BladeSystem Infrastructure and Application Solutions
- Appliance Servers
- Alpha Servers
- BackOffice Products
- Internet Products
- HPE 9000 and HPE e3000 Servers
- Networking
- Netservers
- Secure OS Software for Linux
- Server Management (Insight Manager 7)
- Windows Server 2003
- Operating System - Tru64 Unix
- ProLiant Deployment and Provisioning
- Linux-Based Community / Regional
- Microsoft System Center Integration
Discussion Boards
Discussion Boards
Discussion Boards
Discussion Boards
Discussion Boards
Discussion Boards
Discussion Boards
Discussion Boards
Discussion Boards
Discussion Boards
Discussion Boards
Discussion Boards
Discussion Boards
Discussion Boards
Discussion Boards
Discussion Boards
Discussion Boards
Discussion Boards
Discussion Boards
Community
Resources
Forums
Blogs
- Subscribe to RSS Feed
- Mark Topic as New
- Mark Topic as Read
- Float this Topic for Current User
- Bookmark
- Subscribe
- Printer Friendly Page
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
09-15-2011 01:16 AM
09-15-2011 01:16 AM
Optical Characters recognition in image files
Dear reader,
I did not find any suitable ITRC Linux forum, hence me posting this here.
If you are interested with text recognition in image files such as produced by scanners, here are some results I acheived on Linux and that I documented at http://vouters.dyndns.org/tima/Linux-Java-Tesseract-ocr-Porting_Tesseract_from_android_Graphics_to_Sun_Graphics.html
I am currently working on porting one of the software (Tesseract OCR) onto Windows. So the subject is being worked upon. This subject interests many national libraries to turn their richnesses currently paper printed into text documents accessible from anywhere over the world via Internet.
It happens the Tesseract OCR software which would currently be the most advanced software has initially been developed by HP Labs, then offered by HP to the Opensource world. Google is as of today sponsoring this Opensource code development. This Google sponsorship has to be connected with Google's project to digitalize all paper printed documents.
Best regards to everyone,
Philippe Vouters