Correctness, Strength and Similarity Evaluation of Stemming Algorithms for Arabic

Document Type : Original Article

Authors

1 Princess Somaya University for Technology

2 GETALP, LIG, Université Joseph Fourier, France

Abstract

In this paper, we present a comprehensive evaluation of four Arabic stemmers, based on metrics for correctness, strength and similarity. Two data sets were used in this study. For correctness evaluation, we used a list of 8697 Arabic words grouped into 1606 conceptual classes. For similarity and strength evaluation, we used a list of 72,000 unique Arabic words. Conclusions about correctness, strength and similarity of the four Arabic stemming algorithms are reported.