• Home
  • Browse
    • Current Issue
    • By Issue
    • By Author
    • By Subject
    • Author Index
    • Keyword Index
  • Journal Info
    • About Journal
    • Aims and Scope
    • Editorial Board
    • Publication Ethics
    • Peer Review Process
  • Guide for Authors
  • Submit Manuscript
  • Contact Us
 
  • Login
  • Register
Home Articles List Article Information
  • Save Records
  • |
  • Printable Version
  • |
  • Recommend
  • |
  • How to cite Export to
    RIS EndNote BibTeX APA MLA Harvard Vancouver
  • |
  • Share Share
    CiteULike Mendeley Facebook Google LinkedIn Twitter
The Egyptian Journal of Language Engineering
arrow Articles in Press
arrow Current Issue
Journal Archive
Volume Volume 11 (2024)
Issue Issue 2
Issue Issue 1
Volume Volume 10 (2023)
Volume Volume 9 (2022)
Volume Volume 8 (2021)
Volume Volume 7 (2020)
Volume Volume 6 (2019)
Volume Volume 5 (2018)
Volume Volume 4 (2017)
Volume Volume 3 (2016)
Volume Volume 2 (2015)
Volume Volume 1 (2014)
Ahmed, M., AlGhamdi, F., Hawwari, A. (2024). Constructing and Augmenting a Bidirectional Paraphrases Dataset from an English-Arabic Subtitling Parallel Corpus. The Egyptian Journal of Language Engineering, 11(2), 1-12. doi: 10.21608/ejle.2024.308019.1070
Mohamed Attia Ahmed; Fahad AlGhamdi; Abdelati Hawwari. "Constructing and Augmenting a Bidirectional Paraphrases Dataset from an English-Arabic Subtitling Parallel Corpus". The Egyptian Journal of Language Engineering, 11, 2, 2024, 1-12. doi: 10.21608/ejle.2024.308019.1070
Ahmed, M., AlGhamdi, F., Hawwari, A. (2024). 'Constructing and Augmenting a Bidirectional Paraphrases Dataset from an English-Arabic Subtitling Parallel Corpus', The Egyptian Journal of Language Engineering, 11(2), pp. 1-12. doi: 10.21608/ejle.2024.308019.1070
Ahmed, M., AlGhamdi, F., Hawwari, A. Constructing and Augmenting a Bidirectional Paraphrases Dataset from an English-Arabic Subtitling Parallel Corpus. The Egyptian Journal of Language Engineering, 2024; 11(2): 1-12. doi: 10.21608/ejle.2024.308019.1070

Constructing and Augmenting a Bidirectional Paraphrases Dataset from an English-Arabic Subtitling Parallel Corpus

Article 1, Volume 11, Issue 2, October 2024, Page 1-12  XML PDF (1.15 MB)
Document Type: Original Article
DOI: 10.21608/ejle.2024.308019.1070
View on SCiNiTO View on SCiNiTO
Authors
Mohamed Attia Ahmed email 1; Fahad AlGhamdi2; Abdelati Hawwari3
1RDI; www.rdi-eg.ai
2Al-Baha University, Al-Baha - Saudi Arabia, fghamdi@bu.edu.sa
3Datalex4ai, Santa Clara – California - USA
Abstract
Paraphrasing is one of the major yet the most challenging tasks of the deep semantic analysis of natural languages. In this paper we present a novel algorithm that operates on a big parallel text corpus and automatically generates the paraphrases of the two natural languages of the corpus. Like several previously crafted algorithms in this regard, our algorithm exploits the bidirectional translation provided by the big parallel text corpora to infer couples of synonymous phrases, however, our algorithm is simpler and more efficient. Moreover, our algorithm is the only one that constructs the whole paraphrase through its run without any need for further post processing. We implemented and ran our algorithm on the English-Arabic text corpora from the 2018 version of the OpenSubtitles (OPUS) parallel text corpora, and through the statistical evaluation of random samples we found that the semantic quality among the phrases of the automatically generated paraphrases to be interestingly superb.
Keywords
bidirectional semantic augmentation; paraphrase; paraphrasing; phrase; semantic analysis
Statistics
Article View: 218
PDF Download: 217
Home | Glossary | News | Aims and Scope | Sitemap
Top Top

Journal Management System. Designed by NotionWave.