Sotoor is the all-in-one Arabic typewritten optical character recognition (OCR) software package that converts scanned images of Arabic documents into a fully editable and searchable text file. In addition to the accurate and reliable recognition engine, Sotoor is able to maintain the layout of the original document.



Support major input image formats (JPEG, PNG, BMP, HDR, PSD, TGA, PIC, PGM, PPM) and PDF.

Extracted text can be downloaded as one of the following file formats: PDF , Editable formats: DOC(X), HTML, RTF, TXT, and ODT , E-book formats: EPUB, FB2.

Provide single page as well as batch recognition strategies.

Recognize texts on scans and photos of different types of Arabic Documents (Newspapers, Computer printed Documents). Future releases will support new types (Captions, Historical typewritten and handwritten manuscripts).

Different resolutions are supported by the recognition engine, 300 dpi resolution images are recommended.

Improve the quality of input images via perspective correction and noise removal stages to guarantee higher recognition rates.

The sophisticated editor displays the recognized text side by side with the original input image to smooth the way for reviewing outputs.

Provide automated and customized conversion procedures with workflows.


  • As Arabic language is our mother tongue, Sotoor recognition model can deeply deal with complex patterns on the syntactic and semantic levels.
  • Sotoor can be customized to support customer’s preferred input and output formats.
  • Sotoor the highest recognition rates compared to other Arabic OCR solutions.
  • Sotoor respects customer privacy and does not share his own input/output files with third parties. We can setup the solution environment at your side, train your employees and provide a full customer support service.
  • Sotoor is the only available OCR solution that can deal with noisy documents.


Can Sotoor capture hand-written text?
Sotoor can only capture the different types of the Arabic typewritten documents (Newspapers, Computer Printed Documents). Historical handwritten manuscripts will be supported in future releases.
What resolution of the scanned documents does Sotoor need?
Sotoor is able to recognize scanned text with very poor quality and low resolutions, but 300 dpi will guarantee higher recognition rates.
Can I convert multiple pages at once?
Yes, this what the batch recognition strategy do.
Will Sotoor handle rotated or skewed input images before the recognition?
Yes, Sotoor has a preprocessing engine that fixes rotated and skewed images before the recognition phase starts.
How much time does Sotoor take to recognize one page?
It depends on the number of words per page and available Hardware. On a (Linux, 2.50GHz × 4 and 8 GB RAM) machine, Sotoor can apply the OCR process on a 250-word computer printed document in less than 2 minutes.
What are the optimal settings for the recognition purposes?
The following settings are optimal for the recognition process:
  • Resolution of 300 dpi (font size 12 up to 24).
  • Monochrome clean computer printed documents (white background).
Can I recognize a specific paragraph/section of the input image?
Yes, Sotoor provides a customized OCR procedure that enables users to select specific lines of the input images to be recognized.