|
Arabic omni font-written OCR
(Unleashed)
|
Clever Page©
The community of Arabic-enabled application-integrators and vendors has been for long badly seeking
for a reliable, highly-performing Arabic omni font-written OCR software technology.
Like other OCRs, such a software takes scanned Arabic paper documents as its input, and automatically
produces the digital files
corresponding to them as if a typist has edited those paper documents on a digital computer.
Document management systems (DMS), libraries digitization, information retrieval (IR), uni/multi modal text entry,
reading assists for the blind, .. etc., are just examples of applications that may benefit from a reliable OCR.
Moreover, a robust Arabic OCR software may also serve other languages using Arabic script like Persian,
Urdu, ...., etc.
It is remarkable that OCR systems in general are being developed since decades, tens of research Arabic OCR pilots have been
produced by the academia, and a handful Arabic OCR products are even available in the market. However, a reliable Arabic OCR
software that works on real-life (multi-font, multi-size, maybe noisy, …) documents at a practically acceptable average
word-error-rate (WER) within 3% is
yet away from being available in the market!
Here comes RDI with a new proven technology based on theoretical foundations similar to those deployed in digital speech
recognition systems. In RDI's system CleverPage©; hidden Markov models (HMM) is used as the recognition tool, autonomously
normalized horizontal differentials have been invented as the recognition feature, and a new algorithm for lines & words
decomposition algorithm is re-designed.
The early versions of CleverPage© are showing excellent results when tried on numerous documents containing
multiple fonts and sizes. In fact, these results are the best reported ones in the published literature to date regarding Arabic
omni font-written OCR.
The long history
of RDI esp. Prof. Mohsen A. A. Rashwan, and Dr. Mohamed Attia in this field including numerous MSc.
and PhD theses, published papers in international conferences and journals,
implemented pilots, and an international patent, are all a strong basis establishing for the prospected success of this
new core technology from RDI.
For more detailed info...
(PDF, 429KB)
(PPS, 5.135 KB)
(Arabic PDF, 728 KB)
(PPS, 1,888KB)
(PDF, 16 KB)
|
|