The major goal of the OrienTel data collection is to enable the project's participants to design and develop multilingual interactive communication services for the Mediterranean and the Middle East, ranging from Morocco in the West to the Gulf States in the East, including Turkey and Cyprus. The set of possible applications using the OrienTel databases will be speech-based and will typically be implemented on mobile and multi-modal platforms such as cellular GSM or fixed phones. In order to develop such applications, spoken language resources
must first be created that adequately cover various aspects of speech variation.
A prerequisite for successfully creating spoken language resources is a comprehensive specification of the speech data to be collected. Work package WP2 provides the specification of the speech to be collected. Based on the desired functionalities of speech driven interfaces, this document specifies the design issues necessary for creating databases that adequately cover the targeted applications. Main issues are the specification of the recording platform and recording scenarios, corpus contents, the annotation, transcription and structure of the data, speaker characteristics, and format and structure of the lexicon.