• Home
  • Browse
    • Current Issue
    • By Issue
    • By Author
    • By Subject
    • Author Index
    • Keyword Index
  • Journal Info
    • About Journal
    • Aims and Scope
    • Editorial Board
    • Publication Ethics
    • Peer Review Process
  • Guide for Authors
  • Submit Manuscript
  • Contact Us
 
  • Login
  • Register
Home Articles List Article Information
  • Save Records
  • |
  • Printable Version
  • |
  • Recommend
  • |
  • How to cite Export to
    RIS EndNote BibTeX APA MLA Harvard Vancouver
  • |
  • Share Share
    CiteULike Mendeley Facebook Google LinkedIn Twitter
The Egyptian Journal of Language Engineering
arrow Articles in Press
arrow Current Issue
Journal Archive
Volume Volume 11 (2024)
Volume Volume 10 (2023)
Volume Volume 9 (2022)
Volume Volume 8 (2021)
Volume Volume 7 (2020)
Issue Issue 2
Issue Issue 1
Volume Volume 6 (2019)
Volume Volume 5 (2018)
Volume Volume 4 (2017)
Volume Volume 3 (2016)
Volume Volume 2 (2015)
Volume Volume 1 (2014)
Elnahas, A., Elfishawy, N., Nour, M., Tolba, M. (2020). Machine Learning and Feature Selection Approaches for Categorizing Arabic Text: Analysis, Comparison, and Proposal. The Egyptian Journal of Language Engineering, 7(2), 1-19. doi: 10.21608/ejle.2020.29313.1006
Ayat Elnahas; Nawal Elfishawy; Mohamed Nour; Maha Tolba. "Machine Learning and Feature Selection Approaches for Categorizing Arabic Text: Analysis, Comparison, and Proposal". The Egyptian Journal of Language Engineering, 7, 2, 2020, 1-19. doi: 10.21608/ejle.2020.29313.1006
Elnahas, A., Elfishawy, N., Nour, M., Tolba, M. (2020). 'Machine Learning and Feature Selection Approaches for Categorizing Arabic Text: Analysis, Comparison, and Proposal', The Egyptian Journal of Language Engineering, 7(2), pp. 1-19. doi: 10.21608/ejle.2020.29313.1006
Elnahas, A., Elfishawy, N., Nour, M., Tolba, M. Machine Learning and Feature Selection Approaches for Categorizing Arabic Text: Analysis, Comparison, and Proposal. The Egyptian Journal of Language Engineering, 2020; 7(2): 1-19. doi: 10.21608/ejle.2020.29313.1006

Machine Learning and Feature Selection Approaches for Categorizing Arabic Text: Analysis, Comparison, and Proposal

Article 1, Volume 7, Issue 2, September 2020, Page 1-19  XML PDF (1.68 MB)
Document Type: Original Article
DOI: 10.21608/ejle.2020.29313.1006
View on SCiNiTO View on SCiNiTO
Authors
Ayat Elnahas email 1; Nawal Elfishawy2; Mohamed Nour3; Maha Tolba2
1Department of Research Informatics, Electronics Research Institute, Cairo, Egypt
2Department of Computer Science and Engineering, Faculty of Electronic Engineering, Menoufia University, Menoufia, Egypt
3Department of Research Informatics, Electronics Research Institute, Cairo, Egypt
Abstract
This work adopts some classification approaches for categorizing Arabic text. The approaches are operated on two datasets as test-beds. A comparative study is done to evaluate the performance of the adopted classifiers. Some feature selection methods are also analyzed, investigated, and evaluated. Selecting the most significant features is important because the huge number of features may cause performance degradation for text classification. A comparative study is done among the adopted feature selection methods for classifying Arabic documents.
Moreover, a modification is done on the feature selection approaches by doing amalgamation for the chosen methods. A novel method is also proposed for selecting the most appropriate features. The method is based on the semantic fusion and multiple-words (SF-MW) for constructing the features. A comparison is done among the adopted feature selection methods and the proposed one.
The experimental results show that the best performance was for the SVM classifier compared to the KNN and NB classifiers. The combination among the adopted feature selection methods presents better results compared to the individual adopted ones. The proposed feature selection method (SF-MW) is promising as it reduced the features and achieved higher classification accuracy. The accuracy improvement was about 22% for the two chosen Arabic test-beds which contain 1246 and 1500 documents respectively. The proposed method is expected to be also efficient for other Arabic and English datasets.
Keywords
Classification Algorithms; Feature Selection; Multiple-Arabic-Words; Semantic Fusion; and Measurable Evaluation Criteria
Statistics
Article View: 430
PDF Download: 940
Home | Glossary | News | Aims and Scope | Sitemap
Top Top

Journal Management System. Designed by NotionWave.