which is selected as the best project proposal
among tens of submitted project proposals to the
Egyptian Information Technology Industry
ITIDA's Information Technology Academia
Collaboration (ITAC) fund, has been granted the
Product Development Programme (PDP) fund of 1
million EGP over 18 months starting from Apr.
2010 to Nov. 2011.
This project from
RDI aims to satisfy a crying need in the field
of Arabic Human Language Technologies.
The community of Arabic-enabled application-integrators and vendors has been for long badly seeking
for a reliable, highly-performing Arabic omni font-written OCR software technology.
Like other OCRs, such a software takes scanned Arabic paper documents as its input, and automatically
produces the digital text files
corresponding to them as if a typist has edited those paper documents on a digital computer.
Document management systems (DMS), libraries digitization, information retrieval (IR), uni/multi modal text entry,
reading assists for the blind, .. etc., are just examples of applications that may benefit from a reliable OCR.
Moreover, a robust Arabic OCR software may also serve other languages using Arabic script like Persian,
Urdu, ...., etc.
It is remarkable that OCR systems in general are being developed since decades, tens of research Arabic OCR pilots have been
produced by the academia, and a handful Arabic OCR products are even available in the market. However, a reliable Arabic OCR
software that works on real-life (multi-font, multi-size, maybe noisy, …) documents at a practically acceptable average
word-error-rate (WER) within 3% is
yet away from being available in the market!
Here comes RDI with a new proven technology based on theoretical foundations similar to those deployed in digital speech
recognition systems. In RDI's system CleverPage©; hidden Markov models (HMM) is used as the recognition tool, autonomously
normalized horizontal differentials have been invented as the recognition feature, and a new algorithm for lines & words
decomposition algorithm is re-designed.
The early versions of CleverPage© are showing excellent results when tried on numerous documents containing
multiple fonts and sizes. In fact, these results are the best reported ones in the published literature to date regarding Arabic
omni font-written OCR.
The long history
of RDI esp. Prof. Mohsen A. A. Rashwan, and Dr. Mohamed Attia in this field including numerous MSc.
and PhD theses, published papers in international conferences and journals,
implemented pilots, and an international patent, are all a strong basis establishing for the prospected success of this
new core technology from RDI.
For more detailed info...
Arabic OCR System Analogous
to HMM-Based ASR Systems_Dec.2007
A Large Scale HMM-Based Omni Front-Written OCR System for
Cursive Scripts: PhD thesis by Mohamed S. M. El-Mahallaway,
Cairo University, April 2008
(PDF, 2, 652 KB)
Our system; results & conclusions
(PPS, 5.135 KB)
Competitors analysis of commercial Arabic OCR's.
(Arabic PDF, 728 KB)
Presentation on RDI 's lines & words decomposition algorithm.
Research history of RDI on Arabic OCR.
(PDF, 116 KB)