Abdelmaksoud, E., Hassen, A., Hassan, N., Hesham, M. (2021). CONVOLUTIONAL NEURAL NETWORK FOR ARABIC SPEECH RECOGNITION. The Egyptian Journal of Language Engineering, 8(1), 27-38. doi: 10.21608/ejle.2020.47685.1015
Engy Ragaei Abdelmaksoud; Arafa Hassen; Nabila Hassan; Mohamed Hesham. "CONVOLUTIONAL NEURAL NETWORK FOR ARABIC SPEECH RECOGNITION". The Egyptian Journal of Language Engineering, 8, 1, 2021, 27-38. doi: 10.21608/ejle.2020.47685.1015
Abdelmaksoud, E., Hassen, A., Hassan, N., Hesham, M. (2021). 'CONVOLUTIONAL NEURAL NETWORK FOR ARABIC SPEECH RECOGNITION', The Egyptian Journal of Language Engineering, 8(1), pp. 27-38. doi: 10.21608/ejle.2020.47685.1015
Abdelmaksoud, E., Hassen, A., Hassan, N., Hesham, M. CONVOLUTIONAL NEURAL NETWORK FOR ARABIC SPEECH RECOGNITION. The Egyptian Journal of Language Engineering, 2021; 8(1): 27-38. doi: 10.21608/ejle.2020.47685.1015
CONVOLUTIONAL NEURAL NETWORK FOR ARABIC SPEECH RECOGNITION
1Basic science, Faculty of Computers and Informayion, Fayoum University
2Physics departrment, faculty of science, Fayoum University,
3Basic Science department, faculty of computers & information, Fayoum University
4Engineering Math & Physics Department, Faculty of Engineering, Cairo University
Abstract
This work is focused on single word Arabic automatic speech recognition (AASR). Two techniques are used during the feature extraction phase; Log frequency spectral coefficients (MFSC) and Gammatone-frequency cepstral coefficients (GFCC) with their first and second-order derivatives. The convolutional neural network (CNN) is mainly used to execute feature learning and classification process. CNN achieved performance enhancement in automatic speech recognition (ASR). Local connectivity, weight sharing, and pooling are the crucial properties of CNNs that have the potential to improve ASR. We tested the CNN model using an Arabic speech corpus of isolated words. The used corpus is synthetically augmented by applying different transformations such as changing the pitch, the speed, the dynamic range, adding noise, and forward and backward shift in time. It was found that the maximum accuracy obtained when using GFCC with CNN is 99.77 %. The outcome results of this work are compared to previous reports and indicate that CNN achieved better performance in AASR.