|
The objective of the LC-STAR project is to improve human-to-human and man-machine communication in multilingual environments. The project aims to create lexica and corpora needed for transferring speech-to-speech translation (SST) components, i.e. flexible vocabulary speech recognition, high quality text-to-speech synthesis (TTS) and speech centered translation into selected languages. SST components are targeted to be integrated into speech driven interfaces embedded in mobile appliances and network servers. LC-STAR will concentrate on the one hand on the creation of language resources, i.e. pronunciation lexica with phonetic, prosodic and morpho-syntactic content and on the creation of bilingual aligned text corpora. Furthermore all language resources will be validated by external validation centers. On the other hand speech-to-speech translation technologies will be investigated with respect to their demand on language resources. The transfer will be shown by a demonstrator translating within 3 languages.
|
|
Summary of 2003 activities Main activities
Track I : Overview of the demonstrator
|
Future work - Production of large lexicons suited for speech recognition and synthesis - Validation of large lexica Track II: - Specifications on content of translation lexica suited for statistical machine translation - Specifications of validation criteria for translation lexica - Creation of lexicons for ASR and TTs components in a tourist domain - Creation of translation lexica suited for statistical machine translation - Continuation of translation experiments - Demonstrator testing |
|
- 13 lexicons for speech recognition and synthesis will be created - Text corpora and databases for three languages to demonstrate transfer are created - 9 lexicons suited for statistical machine translation will be created - Experimental results for speech centered translation approaches concerning their requirements on language resources - Language transfer will be shown with a demonstrator translating between Catalan, Spanish and US-English |
Dissemination and Awareness In order to promote the project to the international community our website is updated regularly (http://www.lc-star.com) . It provides information on the project in general (objectives, milestones and expected results) as well as a description of the consortium and further details. Documents like specifications, technical reports, research papers are publicly available via the website. Furthermore all relevant major events, press releases and presentations of the demonstrator as well as links to other projects and instituts can be found at the site. A new version of the leaflet was created which was also distributed at the Eurospeech Conference in Geneva, September, at SEPLN, Madrid 2003 and at RANLP, Borovets, Bulgaria. The following papers were presented: Ueffing N. Ney H.(2003): Using POS Information for Statistical Machine Translation into Morphological Rich Languages. In: Proc. of EACL, Budapest, Hungary, p. 347-354. Ueffing N., Macherey K., Ney, H.(2003): Confidence Measures for Statistical Machine Translation. In: Proc. MTSummitIX, New Orleans, LO. September 2003. Leusch G., Ueffing N., Ney, H. (2003): A Novel String-to-String Distance Measure with Applications to Machine Translation Evaluation. To appear in: Proc. MTSummitIX, New Orleans, LO, September 2003. Hartikainen, E., Maltese, G., Moreno A., Shammass Sh., Ziegenhain U. (2003): Large Lexica for Speech-to-Speech Translation: From Specification to Creation. In: Proc.of Eurospeech, Geneva, p.1529-1532. D. Conejero et al. (2003): Lexica and Corpora for Speech-to-Speech Translation: A Trilingual Approach. In: Proc. of Eurospeech, Geneva, 2003, p.1593-1596. Bisani M., Bonafonte A., Castell N., Hartikainen E., Maltese G., Moreno A., Shammass Sh., Ziegenhain U. (2003): Lexica and Corpora for Speech-To-Speech Translation (LC-STAR). In: Proc. of SEPLN, Madrid, September, 2003. Available documents can be downloaded from the webpage. Exploitation Prospects All lexica created within the project will be distributed via ELRA (http://www.elra.info/) no later that 18 months after the official end of the project. All data thus will be made available to research institutes and companies worldwide for further exploitation in research and commercial applications. The specifications will be publicly made available on the webpage. |