MASHAEIR: Bootstrapping a Multi-Dialect Fine- Grained Emotion Thesaurus for Arabic Using Twitter

Document Type : Original Article

Author

Faculty of Alsun, Ain Shams University

Abstract

The user-generated content on social media sites, e.g. Twitter and Facebook, provides a rich source of
people's emotions towards products, issues, people and major events. Accordingly, the focus of more research has moved
from negative-positive sentiment classification tasks to tasks of recognizing more fine-grained emotions. However,
research on and resources for fine-grained emotion identification in Arabic texts are still lacking. To fill in this gap, this
paper introduces MASHAEIR (an Arabic word that means ‘emotions’), a corpus-based multi-dialect fine-grained
emotion thesaurus for Arabic. MASHAEIR was bootstrapped using 'big data' from Arabic Twitter from January 2007 to
July 2015. The thesaurus is enriched with (i) different types of single- as well as multi-word terms expressing emotions,
(ii) Arabic dialectal variations in the expression of emotions and (iii) scores that reflect the intensity of the emotions
conveyed through these units. The paper also presents a simple evaluation of the thesaurus coverage on a sample Twitter
corpus. MASHAEIR is intended to present an outline of a large-scale and easy-to-update emotion thesaurus for Arabic
that could also be enriched in the future with more information such as gender and age preferences in expressing
emotions.

Keywords