OCR – Optical Character Recognition

What exactly is meant by OCR?

Optical Character Recognition, or OCR, is a technology that enables you to convert different types of documents, such as scanned paper documents, PDF files or images captured by a digital camera into editable and searchable data.

All a scanner can do is create an image or a snapshot of the document that is nothing more than a collection of black and white or colour dots, known as a raster image. In order to extract and repurpose data from scanned documents, camera images or image-only PDFs, you need an OCR software that would single out letters on the image, put them into words and then – words into sentences, thus enabling you to access and edit the content of the original document.

What Technology lies behind OCR?

The most advanced optical character recognition systems, such as ABBYY FineReader OCR, are focused on replicating natural or “animal like” recognition. In the heart of these systems lie three fundamental principles: Integrity, Purposefulness and Adaptability. The principle of integrity says that the observed object must always be considered as a “whole” consisting of many interrelated parts. The principle of purposefulness supposes that any interpretation of data must always serve some purpose. And the principle of adaptability means that the program must be capable of self-learning.

FineReader OCR recognizes text in a few steps:

  1. Analyzes the structure of document image
  2. Divides the page into elements such as blocks of texts, tables, images, etc
  3. Lines are divided into words and then – into characters
  4. The program then compares the characters with a set of pattern images.
  5. Basing on these hypotheses the program analyzes different variants of breaking of lines into words and words into characters.
  6. The program finally takes the decision, presenting you the recognized text.In addition, ABBYY FineReader provides dictionary support for 48 languages. This enables secondary analysis of the text elements on word level. With dictionary support, the program ensures even more accurate analysis and recognition of documents and simplifies further verification of recognition results.

One does not have to be an OCR specialist to see the advantages of an OCR application built on the IPA principles. These principles endow the program with maximum flexibility and intelligence, bringing it as close as possible to human recognition.

After years of research, ABBYY was able to implement the IPA principles described above in its OCR technologies.

Recognition of Digital Camera Images

Images captured by a digital camera differ from scanned documents or image-only PDFs. They often have defects such as distortion at the edges and dimmed light, making it difficult for most OCR applications, to correctly recognize the text. The latest version of ABBYY Fine Reader supports adaptive recognition technology specifically designed for processing camera images. It offers a range of features to improve the quality of such images, providing you with the ability to fully use the capabilities of your digital devices.

More information on Recognition of Digital Camera OCR please contact us >>.

How to use OCR Software?

Using ABBYY FineReader OCR is easy: the process generally consists of three stages: Open (Scan) the document, Recognize it and then Save in a convenient format (DOC, RTF, XLS, PDF, HTML, TXT etc.) or export data directly to one of Office applications such as Microsoft Word, Excel or Adobe Acrobat.

In addition, the latest version of ABBYY FineReader supports Automated Tasks mode which is essential when you deal with routine tasks regularly. With this feature, recognition tasks run automatically without having to manually execute all of the above mentioned steps.

Benefits

  • Save a lot of time and effort when creating, processing and repurposing various documents.
  • Scan paper documents for further editing and sharing with colleagues and partners.
  • Extract quotes from books and magazines and use them for creating course studies and papers without the need of retyping.
  • Capture text outdoors from banners, posters and timetables and then use the captured information for the purposes.
  • Use OCR software for creating searchable PDF archives.
  • The entire process of data conversion from original paper document, image or PDF takes less than a minute, and the final recognized document looks just like the original.