Toward Building a Comprehensive Phrase-based English-Arabic Statistical Machine Translation System

Document Type : Original Article

Authors

1 Scientific Computing Department, Faculty of Computer and Information Sciences (FCIS), Ain Shams University, Cairo, Egypt

2 Nile University (NU), Center for Informatics Science

3 Scientific Computing Department, Faculty of Computer and Information Sciences (FCIS), Ain Shams University, Cairo, Egypt.

4 Computer Science at the Faculty of Computer and Information Sciences (FCIS), Ain Shams University

Abstract

This paper explores a phrase-based statistical machine translation (PBSMT) pipeline for English-Arabic (En-Ar)
language pair. The work surveys the most recent experiments conducted to enhance Arabic machine translation in the En-Ar direction. It also focuses on free datasets and linguistically motivated ideas that enhance phrase-based En-Ar statistical machine translation (SMT) as it is as aims to use those only in order to build a large scale En-Ar SMT system. In addition, the paper highlights Arabic linguistic challenges in Machine Translation (MT) in general. This paper can be considered a guide for building an En-Ar PBSMT system. Furthermore, the presented pipeline can be generalized to any language pairs.

Keywords