Abdulwasea, W., Abdou, S., Barhamtoshy, H. (2014). Classifiers Fusion for Arabic Named Entity Recognition. The Egyptian Journal of Language Engineering, 1(2), 19-34. doi: 10.21608/ejle.2014.59923
Wasim M. Abdulwasea; Sherif M. Abdou; Hassanin Barhamtoshy. "Classifiers Fusion for Arabic Named Entity Recognition". The Egyptian Journal of Language Engineering, 1, 2, 2014, 19-34. doi: 10.21608/ejle.2014.59923
Abdulwasea, W., Abdou, S., Barhamtoshy, H. (2014). 'Classifiers Fusion for Arabic Named Entity Recognition', The Egyptian Journal of Language Engineering, 1(2), pp. 19-34. doi: 10.21608/ejle.2014.59923
Abdulwasea, W., Abdou, S., Barhamtoshy, H. Classifiers Fusion for Arabic Named Entity Recognition. The Egyptian Journal of Language Engineering, 2014; 1(2): 19-34. doi: 10.21608/ejle.2014.59923
Classifiers Fusion for Arabic Named Entity Recognition
1Computers Department, Faculty of Engineering, Cairo University
2King Abdel Aziz City for Sciences
Abstract
This paper presents a new approach to Arabic Name Entity Recognition (ANER). The introduced approach uses different sets of features that are both language independent and language specific in a discriminative and generative machine learning frameworks namely, conditional random fields (CRF), support vector machines (SVM), Naive Bayes(NB), Decision Tree (DT), SVM for sequence tagging using Hidden Markov Models (SVMhmm), K-nearest neighbors(K-NN), Logistic classifier and the other SVM Weka model called (SMO). Also all these classifiers have been fused together and the fusion configuration provided more accurate ANER than any one of the classifiers when used individually. The proposed approach has been evaluated using two data sets, the first dataset is a recently published corpus called ALTEC Named Entity Corpus for Modern Standard Arabic proposed by the Arabic Language Technology Center (ALTEC), and the second dataset is a standard dataset in Arabic NER called ANERcrop proposed by Benajiba. The proposed approach proved that it outperforms state of art Arabic NER systems for both of the two data sets using the 6-fold evaluation criterion.