Community Question Answering Ranking: Methodology Survey

Document Type : Original Article

Authors

1 Faculty of Engineering, Cairo University

2 Electronics and Communication Department, Faculty of Engineering, Cairo University, Giza, Egypt

3 IBM

Abstract

This paper surveys the evolution of word embeddings along with the methodologies used in Community Question Answering (cQA), and how these methodologies use word embeddings to achieve higher performance metrics. The paper first discusses vector modelling and how it affected Natural Language Processing (NLP) as a whole, then it details some of the approaches used like the one-hot-encoding, word2vec and others. The paper then discusses contextualized embeddings and how they improve on the previous techniques. The paper then sheds some light on language modelling along with new attention-based architectures (Transformers), discussing briefly how they work and how they affected not only cQA but NLP in general. Then the paper discusses in brief the shift in the field from model-based AI where most of the focus is on producing a model with high performance metrics to Data Centric AI where the focus is on trying to have a systemic way of labelling the data to ease the generation of a high-performance model.

Keywords