Search | Contact us
Arabic omni font-written OCR
Virtual Tutor
Text Mining

Clever Page©

Clever Page©, which is selected as the best project proposal among tens of submitted project proposals to the Egyptian Information Technology Industry Development Agency ITIDA's Information Technology Academia Collaboration (ITAC) fund, has been granted the Product Development Programme (PDP) fund of 1 million EGP over 18 months starting from Apr. 2010 to Nov. 2011.

This project from RDI aims to satisfy a crying need in the field of Arabic Human Language Technologies. The community of Arabic-enabled application-integrators and vendors has been for long badly seeking for a reliable, highly-performing Arabic omni font-written OCR software technology.

Like other OCRs, such a software takes scanned Arabic paper documents as its input, and automatically produces the digital text files corresponding to them as if a typist has edited those paper documents on a digital computer.

Document management systems (DMS), libraries digitization, information retrieval (IR), uni/multi modal text entry, reading assists for the blind, .. etc., are just examples of applications that may benefit from a reliable OCR. Moreover, a robust Arabic OCR software may also serve other languages using Arabic script like Persian, Urdu, ...., etc.

It is remarkable that OCR systems in general are being developed since decades, tens of research Arabic OCR pilots have been produced by the academia, and a handful Arabic OCR products are even available in the market. However, a reliable Arabic OCR software that works on real-life (multi-font, multi-size, maybe noisy, …) documents at a practically acceptable average word-error-rate (WER) within 3% is yet away from being available in the market!

Here comes RDI with a new proven technology based on theoretical foundations similar to those deployed in digital speech recognition systems. In RDI's system CleverPage©; hidden Markov models (HMM) is used as the recognition tool, autonomously normalized horizontal differentials have been invented as the recognition feature, and a new algorithm for lines & words decomposition algorithm is re-designed.

The early versions of CleverPage© are showing excellent results when tried on numerous documents containing multiple fonts and sizes. In fact, these results are the best reported ones in the published literature to date regarding Arabic omni font-written OCR.

The long history of RDI esp. Prof. Mohsen A. A. Rashwan, and Dr. Mohamed Attia in this field including numerous MSc. and PhD theses, published papers in international conferences and journals, implemented pilots, and an international patent, are all a strong basis establishing for the prospected success of this new core technology from RDI.

For more detailed info...

Arabic OCR System Analogous to HMM-Based ASR Systems_Dec.2007
(PDF, 429KB)

A Large Scale HMM-Based Omni Front-Written OCR System for Cursive Scripts: PhD thesis by Mohamed S. M. El-Mahallaway, Cairo University, April 2008
(PDF, 2, 652 KB)

Our system; results & conclusions
(PPS, 5.135 KB)

Competitors analysis of commercial Arabic OCR's.
(Arabic PDF, 728 KB)

Presentation on RDI 's lines & words decomposition algorithm.
(PPS, 1,888KB)

Research history of RDI on Arabic OCR.
(PDF, 116 KB)

RDI© - Research and Development International.
Since 1993 - All rights reserved.
Downloads | Jobs