• Home
  • Browse
    • Current Issue
    • By Issue
    • By Author
    • By Subject
    • Author Index
    • Keyword Index
  • Journal Info
    • About Journal
    • Aims and Scope
    • Editorial Board
    • Publication Ethics
    • Peer Review Process
  • Guide for Authors
  • Submit Manuscript
  • Contact Us
 
  • Login
  • Register
Home Articles List Article Information
  • Save Records
  • |
  • Printable Version
  • |
  • Recommend
  • |
  • How to cite Export to
    RIS EndNote BibTeX APA MLA Harvard Vancouver
  • |
  • Share Share
    CiteULike Mendeley Facebook Google LinkedIn Twitter
The Egyptian Journal of Language Engineering
arrow Articles in Press
arrow Current Issue
Journal Archive
Volume Volume 11 (2024)
Volume Volume 10 (2023)
Volume Volume 9 (2022)
Volume Volume 8 (2021)
Volume Volume 7 (2020)
Volume Volume 6 (2019)
Volume Volume 5 (2018)
Issue Issue 2
Issue Issue 1
Volume Volume 4 (2017)
Volume Volume 3 (2016)
Volume Volume 2 (2015)
Volume Volume 1 (2014)
Moftah, M., Fakhre, M., El-Ramly, S. (2018). Spoken Arabic Dialect Identification Using Motif Discovery. The Egyptian Journal of Language Engineering, 5(1), 25-36. doi: 10.21608/ejle.2018.59306
Mohsen Moftah; Mohamed Fakhre; Salwa El-Ramly. "Spoken Arabic Dialect Identification Using Motif Discovery". The Egyptian Journal of Language Engineering, 5, 1, 2018, 25-36. doi: 10.21608/ejle.2018.59306
Moftah, M., Fakhre, M., El-Ramly, S. (2018). 'Spoken Arabic Dialect Identification Using Motif Discovery', The Egyptian Journal of Language Engineering, 5(1), pp. 25-36. doi: 10.21608/ejle.2018.59306
Moftah, M., Fakhre, M., El-Ramly, S. Spoken Arabic Dialect Identification Using Motif Discovery. The Egyptian Journal of Language Engineering, 2018; 5(1): 25-36. doi: 10.21608/ejle.2018.59306

Spoken Arabic Dialect Identification Using Motif Discovery

Article 3, Volume 5, Issue 1, April 2018, Page 25-36  XML PDF (815.26 K)
Document Type: Original Article
DOI: 10.21608/ejle.2018.59306
View on SCiNiTO View on SCiNiTO
Authors
Mohsen Moftah email 1; Mohamed Fakhre2; Salwa El-Ramlyorcid 1
1Electronics and Communications Engineering Department, Faculty of Engineering, Ain Shams University
2The Arab Academy for Science and Technology (Cairo, Egypt)
Abstract
In traditional Dialect Identification (DID) approaches, regardless of the level and type of features used for identification,
they use either predefined references such as phones, phonemes, or even acoustic sounds that characterize a language/dialect, or involve some sort of transcription of the input data. The transcription may be manual or automatic using tools such as ASRs,Tokenizers, or Phone Recognizers. In this paper, we introduce a new approach based on analyzing the speech signal directly and extracting the features that characterize the dialect without any predefined references and without any sort of transcription. The main idea is that we find the repeated sequences (motifs) of the dialect by treating the speech signal as a times series, so we can apply motif discovery techniques to extract the repeated sequences directly from the speech signal. For motif extraction, we
adopted an extremely fast parameter-free Self-Join motif discovery algorithm called Scalable Time series Ordered-search Matrix Profile (STOMP). We implemented the new approach in two stages; in the first we built a base line system in which we extracted 12 Mel Frequency Cepstral Coefficients (MFCC) from each motif, in the second stage we built an improved system using 39 coefficients by adding 13 Delta coefficients, 13 Delta-Delta coefficients, and 1 Log Energy coefficient. In both systems, we used Gaussian Mixture Model-Universal Background Model (GMM-UBM) as a classifier. We applied our new approach on three
different motif lengths 500ms, 1000ms, and 1500ms using 1gmm component up to 2048gmm components. We downloaded the data set from Qatar-Computing-Research- Institute domain. We carried out our experiments on different Arabic dialects: the Egyptian (EGY), Gulf (GLF), Levantine (LEV), and North African (NOR).The base line results were very competitive with the traditional, more sophisticated approaches, while the improved system showed very good result. The improvement was so significant that we can consider the new approach as competitive, simple, and dialect-independent approach.
Keywords
motif discovery; dialect identification; language identification; GMM-UBM; time series
Statistics
Article View: 183
PDF Download: 549
Home | Glossary | News | Aims and Scope | Sitemap
Top Top

Journal Management System. Designed by NotionWave.