-->
Home » , , , » OCRopus is a free document analysis and optical character recognition system released under the Apache License

OCRopus is a free document analysis and optical character recognition system released under the Apache License

ocropus OCRopus(tm) is a state-of-the-art document analysis and OCR system, featuring pluggable layout analysis, pluggable character recognition, statistical natural language modeling, and multi-lingual capabilities.

OCRopus is a free document analysis and optical character recognition system released under the Apache License, Version 2.0 with a very modular design through the use of plugins. These plugins allow OCRopus to swap out components easily.

OCRopus is currently developed under the lead of Thomas Breuel from the German Research Centre for Artificial Intelligence in Kaiserslautern, Germany and is sponsored by Google.

OCRopus is developed for Linux; however, users have reported success with OCRopus on Mac OS X and an application called TakOCR[1] has been developed that installs OCRopus on Mac OS X and provides a simple droplet interface.

The OCRopus engine is based on two research projects: a high-performance handwriting recognizer developed in the mid-90's and deployed by the US Census bureau, and novel high-performance layout analysis methods.

OCRopus is development is sponsored by Google and is initially intended for high-throughput, high-volume document conversion efforts. We expect that it will also be an excellent OCR system for many other applications.

Releases

The current release is ocropus-0.4.3; it is still an alpha release, so don't expect stability or high performance yet. We will not be providing new tar balls until the beta release. To obtain ocropus-0.4.3 and install it, please use something like the following commands:

mkdir ~/build
cd ~/build
hg clone https://iulib.googlecode.com/hg/ iulib
cd iulib
hg update -r ocropus-0.4.3
scons
sudo scons install
cd ~/build
hg clone https://ocropus.googlecode.com/hg/ ocropus
cd ocropus
hg update -r ocropus-0.4.3
scons
sudo scons install


That should work on Ubuntu 9.04 if you have all the necessary packages installed; if not, have a look at the DevInstall page or the Google Group Pages.


Resources


Related Projects
  • iulib Library (you need to install this)

  • hOCR Tools -- tools for manipulating OCR output

  • DECAPOD -- camera-based document capture and tagged PDF generation

  • PyOpenFST -- Python bindings for OpenFST (for language modeling)

Documentation

The following is the most important documentation:

If you want to contribute to the primary documentation, please check out hg clone https://wiki.ocropus.googlecode.com/hg and submit patches against the documentation. Additional links you may find useful are here:

If you liked this article, subscribe to the feed by clicking the image below to keep informed about new contents of the blog:

Related Post






Linux Links


0 commenti:

Post a Comment

Random Posts

Recent Posts

Recent Posts Widget

Popular Posts

Labels

Archive

page counter follow us in feedly
 
Copyright © 2014 Linuxlandit & The Conqueror Penguin
-->