Tashkeel is the diacritization system provided by RDI that enables users to convert Arabic raw data into vocalized text through adding diacritics to bare letters. It combines statistical and rule-based approaches in order to make advantage of their integration while avoiding their disadvantages. In addition to being a standalone system, its accurate outputs can be used to improve the performance of many other systems such as Text to Speech, Automatic Arabic Speech Recognition, and Search Engines.

TASHKEEL FEATURES


Support morphological and syntactic diacritization with the possibility of using each separately.

Has an average diacritization speed of 35 words per second when using CPU while its speed reaches 150 words per second with the GPU.

Ability to enforce or ignore input diacritics.

Support user preferences list of diacritized words.

Support major text formats for inputs and outputs (TXT, DOC(X), ODT, RTF).

Ability to diacritize English words written in Arabic letters as well as Arabic words that has not been seen before with high accuracy.

WHY TASHKEEL FOR ARABIC DIACRITIZATION?


  • Tashkeel contains a set of models that specially designed to serve Arabic language, making the recognizer outperforms its general-model counterparts.
  • Tashkeel improves your personal and business productivity as it is 40 times faster than manual diacritization with the possibility of faster processing rates using higher hardware specifications.
  • Tashkeel can be used to improve the performance of many other systems such as Text to Speech, Automatic Arabic Speech Recognition, and Search Engines.

TASHKEEL FAQS


Can Tashkeel pull text from a .DOCX file?
Yes, Tashkeel supports many text formats such as TXT, DOC(X), ODT, and RTF.
Can Tashkeel process an input text that already has diacritics?
Yes, Tashkeel can handle the diacritized input in two ways as desired by the user. It can process the entire text or just add diacritics to bare letters.
Can Tashkeel generate only the morphological diacritics while ignoring the syntactic diacritics?
Yes, Tashkeel has the ability to use each type of diacritics separately.
What will happen if a text contains numbers or English letters is entered?
Tashkeel has a preprocessing module that can recognize and set aside non-Arabic characters while preserving their contexts and then displaying them as they are in the output.