Tashkeel is the diacritization system provided by RDI that enables users to convert Arabic raw data into vocalized text through adding diacritics to bare letters. It combines statistical and rule-based approaches in order to make advantage of their integration while avoiding their disadvantages. In addition to being a standalone system, its accurate outputs can be used to improve the performance of many other systems such as Text to Speech, Automatic Arabic Speech Recognition, and Search Engines.
Support morphological and syntactic diacritization with the possibility of using each separately.
Has an average diacritization speed of 35 words per second when using CPU while its speed reaches 150 words per second with the GPU.
Ability to enforce or ignore input diacritics.
Support user preferences list of diacritized words.
Support major text formats for inputs and outputs (TXT, DOC(X), ODT, RTF).
Ability to diacritize English words written in Arabic letters as well as Arabic words that has not been seen before with high accuracy.