Ebrahim, S., El-Beltagy, S., Hegazy, D., Mostafa, M. (2017). Toward Building a Comprehensive Phrase-based English-Arabic Statistical Machine Translation System. The Egyptian Journal of Language Engineering, 4(2), 10-26. doi: 10.21608/ejle.2017.59427
Sara Ebrahim; Samha R. El-Beltagy; Doaa Hegazy; Mostafa G. Mostafa. "Toward Building a Comprehensive Phrase-based English-Arabic Statistical Machine Translation System". The Egyptian Journal of Language Engineering, 4, 2, 2017, 10-26. doi: 10.21608/ejle.2017.59427
Ebrahim, S., El-Beltagy, S., Hegazy, D., Mostafa, M. (2017). 'Toward Building a Comprehensive Phrase-based English-Arabic Statistical Machine Translation System', The Egyptian Journal of Language Engineering, 4(2), pp. 10-26. doi: 10.21608/ejle.2017.59427
Ebrahim, S., El-Beltagy, S., Hegazy, D., Mostafa, M. Toward Building a Comprehensive Phrase-based English-Arabic Statistical Machine Translation System. The Egyptian Journal of Language Engineering, 2017; 4(2): 10-26. doi: 10.21608/ejle.2017.59427
Toward Building a Comprehensive Phrase-based English-Arabic Statistical Machine Translation System
1Scientific Computing Department, Faculty of Computer and Information Sciences (FCIS), Ain Shams University, Cairo, Egypt
2Nile University (NU), Center for Informatics Science
3Scientific Computing Department, Faculty of Computer and Information Sciences (FCIS), Ain Shams University, Cairo, Egypt.
4Computer Science at the Faculty of Computer and Information Sciences (FCIS), Ain Shams University
Abstract
This paper explores a phrase-based statistical machine translation (PBSMT) pipeline for English-Arabic (En-Ar) language pair. The work surveys the most recent experiments conducted to enhance Arabic machine translation in the En-Ar direction. It also focuses on free datasets and linguistically motivated ideas that enhance phrase-based En-Ar statistical machine translation (SMT) as it is as aims to use those only in order to build a large scale En-Ar SMT system. In addition, the paper highlights Arabic linguistic challenges in Machine Translation (MT) in general. This paper can be considered a guide for building an En-Ar PBSMT system. Furthermore, the presented pipeline can be generalized to any language pairs.