文档自动转换解决方案| ABBYY FineReader Server (Recognition Server)

ABBYY FineReader Server (Recognition Server) 为文档的自动撷取和PDF转换提供强大的OCR功能的服务器。为分批处理大量文件而设计,它能让公司和扫描服务商通过将纸质文件如TIFF、JEPG和PDF图片文件转换成可全文搜索并长期数码存档的电子文档的方式来建立合算的工作流程。

 选择ABBYY FineReader Server的理由

自动转换为PDF和PDF/A格式

能使大型文件包数码化并将扫描文件自动转换为PDF或PDF/A格式以便电子化存储及归档

企业级文档转换服务

为员工和客户提供一个能随时随地在线的灵活的OCR和文件转换服务器

创建可全文本搜索的数据库

将扫描件或者传真件转换为可搜索文本并存储在Microsoft® SharePoint®数据库,以便通过 SharePoint搜索引擎进行搜索

OCR 的授权

产品亮点

用于数字化存档、电子搜索和企业内部访问的自动化文档转换解决方案

Recognition Server自动从扫描件、文件、传真件、邮件和Microsoft SharePoint数据库中获取图片,执行服务器端的光学字符识别(OCR)并允许添加元数据。将这些结果以需要的格式直接传送到网络文件夹,SharePoint数据库或者其它存储、管理系统。这些格式包括MRC压缩的可搜索PDF或PDF / A文件,XML数据,可编辑文档(如Microsoft Word和Excel®文件)和纯文本文件。

识别服务器的分布式和高度可扩展的架构意味着它可以在网络中的多个服务器上运行,并在短时间内转换大量文档。 其快速部署,易于管理和自动化工作程序使识别服务器成为带来快速回报的投资。

强大的基于服务器的OCR软件,用于自动文档撷取和PDF转换。

产品亮点


自动转换为PDF & PDF/A格式

在ABBYY Recognition Server中,精确的OCR和PDF转换过程是基于服务器并且完全自动化的。ABBYY Recognition Server抓取指定的“热文件夹”,共享文件和网络数据库,将发现的图片文件转换成可搜索的文件并将结果发送回同一个网络数据库或者客户指定位置。最大限度的降低使用人工操作大量文档转换工作从而降低业务流程的成本。
ABBYY Recognition Server配有的高级PDF和PDF/A创建功能满足长期数字文件归档的标准。提高MRC压缩技术,可以创建非常适合在线发布的高视觉质量的小尺寸PDF。PDF加密可以用来防止未经授权的查看、打印或修改所创建的PDF文件。ABBYY Recognition Server可以检测扫描仪生成的PDF,并向这些文件添加文字层使其能进行全文搜索。如果PDF文件已经包含文本层,ABBYY RecognitionServer将会评估它的质量,并在需要时将其替换为更高质量的文本图层。同时,原始PDF文件的所有的书签、注释、元数据和附件将保留不变。
通过ABBYY Recognition Server提供的便利的索引和元数据提取工具,定义自定义文件名,目标文件夹或元数据字段很容易。 可以撷取封面上的条形码或包含在文档内的数据,用于文档分类和路由,并与文档一起存储在数字存档中。

• 创建全文本可搜索的SharePoint数据库

最先进的ABBYY OCR技术即使对低质量文件也能提供最佳的结果,并确保高识别精度。所有的扫描或传真文件都可以转换为可搜索的PDF或PDF/A,以便通过Microsoft SharePoint索引并被发现
ABBYY Recognition Server在多个级别上提供与Microsoft SharePoint的集成。 它可以设置为SharePoint服务器的前端,以便在Microsoft SharePoint中上传之前将所有传入的图像文档一致转换为可搜索的PDF。 已经存储在SharePoint库中的图像文档可以在库中自动转换,而不会有任何用户干扰。 除此之外,用户以图片格式上传的新增加文件也能够被OCR工具发现、处理并以可搜索格式保存回数据库。
文档转换作为后台程序执行并且对于SharePoint最终用户完全不可见。他们使用SharePoint的经验不必改变的同时ABBYYY Recognition Server能确保输入的文档不断的处理并进行全文本搜索。

• 企业级文档转换服务器

使用ABBYY Recognition Server,OCR并不限于台式电脑或操作员的工作时间。无论在何时何地,服务器上的服务能够提供给所有用户或者被指定的用户组。
不同于需要多个工作站的IT人员维护的桌面应用程序,集中的安装、配置和管理让ABBYY Recognition Server成为一个具有成本效益的企业级解决方案。
用户可以立即使用文档转换服务,而无需了解OCR是什么。他们只需要选择一个想要将其文档转换成的格式(可搜索的PDF、PDF/A,Microsoft Word或Excel)并接收请求的文件。
由于文档转换过程完全自动的并对用户隐匿的,因而ABBYY Recognition Server同样适用于单用户或多用户环境。安装很容易扩展来处理来自新增的客户端的文档,而不会降低生产力。一个灵活的系统优先级系统允许自动将重要的文档移动到队列之前。

自动将文档数字化

如何工作?

1.灵活导入选择

从网络/FTP文件夹导入
ABBYY识别服务器能够从以下网络资源自动导入图片:

  • 网络文件夹
  • FTP文件夹(例如,从远程位置上传的图片)
  • 电子邮件文件夹(例如,用户用电子邮件为客户传的送图片)

文件输入格式

  • TIFF/多页TIFF
    压缩方式:打开,CCITT Group 3, CCITT Group 3 FAX(2D), CCITT Group4, PackBits, JPEG, ZIP, LZW
  • JPEG, JPEG 2000
  • PDF
  • DjVu
  • BMP
  • PNG
  • PCX, DCX

2. 扫描工作站

扫描工作站提供了批量扫描功能,并做好进一步处理图片的准备:

  • 通过TWAIN, WIA and ISIS扫描.
  • 快速预览图片
  • 图像预处理(旋转、抗扭曲、去处杂点等)
  • • 文档条形码、空白页、页码分离。

对于批量扫描的图像,ABBYY Recognition Server提供几个内置文件分离功能:在每个文档的第一页上粘贴或打印的空白页,条形码纸或条形码。基于识别的文本的其他自定义规则可以使用脚本创建。

OCR is done on a Processing Station automatically. It is possible to connect several computers to the Server Manager as Processing Stations, and the Server Manager will balance the workload among these stations evenly. This will result in much faster processing of documents.

The OCR and barcode recognition technologies implemented in Recognition Server deliver unprecedented accuracy, support various types of text and support the most popular 1D and 2D barcodes. The OCR process has extensive language support. The supported languages include 198 languages including Latin, Cyrillic, Greek, Arabic, Chinese, Japanese, Korean, Vietnamese, Hebrew, Yiddish and Thai. European languages written in Gothic fonts are also supported.

To preserve the original document layout, ABBYY Recognition Server uses Adaptive Document Recognition Technology (ADRT). ADRT significantly improves document layout retention when saving documents to DOC and RTF formats. The logical structure of an entire document is reproduced, including headers, footers, footnotes, page numbers, table of contents linked to document sections and notes to pictures and diagrams.

Support Many Recognition Languages

  • 43 main languages with dictionary support: Arabic (Saudi Arabia), Armenian (Eastern), Armenian (Grabar), Armenian (Western), Azeri (Latin), Bashkir, Bulgarian, Catalan, Croatian, Czech, Danish, Dutch, Dutch (Belgian), English, Estonian, Finnish, French, German, German (new spelling), Greek, Hebrew, Hungarian, Indonesian, Italian, Latvian, Lithuanian, Norwegian, Norwegian (Bokmal), Norwegian (Nynorsk), Polish, Portuguese, Portuguese (Brazilian), Romanian, Russian, Slovak, Slovenian, Spanish, Swedish, Tatar, Thai, Turkish, Ukrainian, Vietnamese;
  • 133 additional languages without dictionary support: Abkhaz, Adyghe, Afrikaans, Agul, Albanian, Altai, Avar, Aymara, Azerbaijani (Cyrillic), Basque, Belarusian, Bemba, Blackfoot, Breton, Bugotu, Buryat, Cebuano, Chamorro, Chechen, Chukchee, Chuvash, Corsican, Crimean Tatar, Crow, Dargwa, Dungan, Eskimo (Cyrillic), Eskimo (Latin), Even, Evenki, Faroese, Fijian, Frisian, Friulian, Gagauz, Galician, Ganda, German (Luxembourg), Guarani, Hani, Hausa, Hawaiian, Icelandic, Indonesian, Ingush, Irish, Jingpo, Kabardian, Kalmyk, Karachay-balkar, Karakalpak, Kasub, Kawa, Kazakh, Khakass, Khanty, Kikuyu, Kirghiz, Kongo, Koryak, Kpelle, Kumyk, Kurdish, Lak, Latin, Lezgi, Luba, Macedonian, Malagasy, Malay (Malaysian), Malinke, Maltese, Mansi, Maori, Mari, Maya, Miao, Minangkabau, Mohawk, Moldavian, Mongol, Mordvin, Nahuatl, Nenets, Nivkh, Nogay, Nyanja, Ojibway, Ossetian, Papiamento, Provencal, Quechua, Rhaeto-Romanic, Romany, Rundi, Russian (Old Spelling), Rwanda, Sami (Lappish) , Samoan, Scottish Gaelic, Selkup, Serbian (Cyrillic), Serbian (Latin), Shona, Sioux (Dakota), Somali, Sorbian, Sotho, Sunda, Swahili, Swazi, Tabasaran, Tagalog, Tahitian, Tajik, Tok Pisin, Tongan, Tswana, Tun, Turkmen, Tuvinian, Udmurt, Uigur (Cyrillic), Uigur (Latin), Uzbek (Cyrillic), Uzbek (Latin), Welsh, Wolof, Xhosa, Yakut, Yiddish, Zapotec, and Zulu;
  • 5 East Asian languages: Chinese (Traditional, Simplified), Japanese, Korean and Hangul (Korean);
  • 6 languages for recognition of old European documents and Gothic fonts in books printed in 18-20th centuries
    • English,
    • French,
    • German,
    • Italian,
    • Spanish,
    • Latvian;
  • 4 artificial languages: Esperanto, Ido, Interlingua, and Occidental;
  • 6 programming languages: Basic, C/C++, COBOL, Fortran, Java, and Pascal;
  • Simple chemical formulas
  • Digits
  • 1D Barcodes
    • Check Code 39, Check Interleaved 25, Code 128, Code 39, EAN 13, EAN 8, Interleaved 25, CODABAR (without checksum), UCC Code 128, Code 2 of 5 (Industrial, IATA, Matrix), Code 93, UPC-A, UPC-E, Patch Code and Postnet;
  • 2D Barcodes
    • PDF 417, Aztec, Data Matrix, QR Code
  • Multiple Text Types
    • Normal, Fax (mode for low-resolution texts), Typewriter, Dot Matrix Printer, OCR-A, OCR-B, MICR (E13B), Gothic
Sometimes there is a need to process important documents which have to be recognized with exceptional accuracy. At the same time, the quality of the scans may not be perfect, suffering from low resolution and unwanted noise. In this case it is very important to have a reliable quality assurance mechanism.

Automatic quality control allows the administrator to set a threshold for recognition accuracy: documents with poor-quality text will not be converted, but rather stored in a separate folder for special treatment.

Verification Station

A client station for proofreading recognition results. Verification can be enabled for all pages or it can be based on the accuracy threshold. Verification permissions management is supported.

Indexing Station

A client station for document indexing and classification.

1. Multi-Export Destinations

ABBYY Recognition Server enables multiple destinations for data and images as well as generation of searchable PDFs.

2. Flexible File Output Formats

  • PDF, PDF/A-1a, PDF/A-1b, PDF/-2a, PDF/A-2b, PDF/A-2u
  • RTF
  • DOC, DOCX
  • XLS, XLSX
  • TXT, CSV
  • HTML
  • TIFF
  • JPEG, JPEG 2000
  • JBIG2
  • PNG
  • EPUB
  • XML, Alto XML
  • FineReader internal format (FineReader Engine-compatible)

3. Available Connectors to Enterprise Systems

  • Export to Microsoft SharePoint
  • IFilter for TIFF files
  • Connector to Google Search Appliance

4. Available Customization and Integration Options

  • Custom processing parameters defined via XML files (XML Tickets)
  • WEB API
  • COM API
  • Scripting in VBScript and JScript
ABBYY Recognition Server is a server-based software for automating document processing, OCR and PDF conversion in enterprise and service-based environments. Its architecture makes it easy to deploy document processing solutions that scale to any size, with significant time and cost savings.

ABBYY Recognition Server automatically converts large volumes of paper documents or document images into fully searchable electronic text suitable for business processes including archiving, e-discovery, and enterprise search. It enables automated, unattended document processing that can be managed and accessed from within an organization or remotely. Recognition Server can also connect with a variety of back-end systems and third-party applications, integrating via Scripts, XML tickets, a Web-service API or a COM-based API. ABBYY intelligent OCR and PDF conversion technology delivers highly accurate document conversion with recognition of up to 190 languages.

Architecture

ABBYY Recognition Server consists of several components, which can be installed on one or many computers in a LAN. The main components are:

  • Server Manager — a central service component, which controls the document processing queue and distributes the tasks among the stations.
  • Processing Station — a service that performs recognition and document conversion.
  • Scanning Station — a client station for batch scanning and image pre-processing.
  • Indexing Station — a client station for document indexing and classification.
  • Connector to Google Search Appliance™ (GSA) — a component that allows Google Search Appliance to use ABBYY Recognition Server for extracting content from document images .
  • Connector to Microsoft® Search Systems (IFilter) — a component that allows Microsoft Office SharePoint Server and Windows Search to use ABBYY Recognition Server for extracting content from document images.
  • Remote Administration Console — a client console used for configuring and monitoring Recognition Server.

abbyy-recognition-server-overview4

Document Processing

6470e_rs_doc_processing

ABBYY Recognition Server processes each image file according to a workflow — a set of processing parameters predefined by the administrator. ABBYY Recognition Server can run several workflows with different parameters simultaneously. Each workflow corresponds to a unique input source (a folder, a SharePoint library or a mailbox).

Processing Steps

A workflow in ABBYY Recognition Server typically includes up to six configurable stages. Each workflow runs independently of others according to its own schedule and priority.

6470e_rs_processingstep

Six Stages of Document Processing

1. Scanning/Import of images. Images can be either scanned by an operator on the Scanning Station and then sent to ABBYY Recognition Server, or automatically imported by ABBYY Recognition Server from an input folder (network folder, FTP folder, SharePoint® library, or mailbox). ABBYY Recognition Server arranges image files in a queue to process them automatically according to priorities.

2. Recognition. The OCR process runs automatically on the Processing Station. If several Processing Stations are installed in the system, the files will be distributed among these Processing Stations evenly for optimal performance. Deploying additional Processing Stations brings a linear increase in OCR speed.

3. Verification (optional). In some cases, for example when digitizing books, verification of the recognition results might be necessary. Verification Stations allow operators to check all documents or only documents below a certain accuracy threshold.

4. Document separation (optional). When the batch scanning or import is performed, a document separation may be required. The documents can be separated using blank separator sheets, barcodes or by fixed number of pages per document. Separation can also be done according to a scripted rule.

5. Classification and indexing (optional). Indexing of documents can be done either automatically by a script, or by an operator on the Indexing Station, which allows the operator to manually select the document type and assign document attributes. The operator can also verify the data that has been populated by the script.

6. Export. In the final stage, ABBYY Recognition Server delivers the output documents to their destination (which can be a network folder, a SharePoint document library, or an e-mail address). Additionally, scripts can be applied for intelligent routing and delivery of documents to ECM systems based on document types and attributes.

Recognition Server is administered via a convenient interface based on the Microsoft Management Console. It allows the administrator to configure the system and monitor its activity: to set processing parameters, to manage licenses, stations, user permissions, processing queues and to view logs.

With the priority management and scheduling features, the administrator can control the order in which the documents are processed and use the stations’ hardware resources efficiently by scheduling OCR for night hours or weekends.

Benefits

Increase your business competitiveness
This high-performance & highly scalable technology helps you to fasten decision making processes, provide instant and efficient services to your clients and attract new customers and businesses.

Enjoy the new level of Arabic recognition
ABBYY Recognition Server provides fast document capture with 99%* accuracy for 190 languages including Arabic, which is an unprecedented success in OCR technologies.

Reduce cost of your business processes
Digitization of your workflow and archive allows you to reduce costs on paper, hard copy storage, manual entry and processing, consequently you save money and man-hours.

Easily set and forget
ABBYY Recognition server has intuitive User Interface, providing quick simple setup and implementation, including ready-to-use Demo-projects, no training needed, and fast technical support.

Get fast ROI
Flexible system of licensing of ABBYY technologies, its high level of scalability and 24/7 automated performance allows you to get fast ROI.

ABBYY technology seamlessly integrates with your existing environment
ABBYY Recognition Server has an inherent mechanism to integrate with Microsoft SharePoint and thanks to open API you can also smoothly integrate it into other existing workflow systems with no additional expenses or efforts.