You can view the full text of this article for free using the link below.

Title: A survey on word embedding techniques and semantic similarity for paraphrase identification

Authors: Divesh R. Kubal; Anant V. Nimkar

Addresses: Department of Computer Engineering, Sardar Patel Institute of Technology, Mumbai, 400058, India ' Department of Computer Engineering, Sardar Patel Institute of Technology, Mumbai, 400058, India

Abstract: In natural language processing (NLP), paraphrase identification (PI) determines the relatedness between the pair of sentences having fewer or negligible lexical overlap but still pointing towards the same meaning. The major challenge faced while attempting to solve this problem is the many possible linguistic variations conveying the same purpose. This paper aims to provide a detailed survey of traditional similarity measures, statistical machine translation metrics, machine learning and deep learning techniques and a well-defined flow between them. This article encompasses various word embedding methods and step-wise derivation of its learning module. This survey paper also provides a definite flow pointing towards the evolution of deep learning in an unambiguous manner. A comparative analysis of various techniques to solve PI is presented and it will provide research directions to work in the similar domain.

Keywords: paraphrase identification; word embedding; deep learning; convolutional neural network; CNN; semantic similarity.

DOI: 10.1504/IJCSYSE.2019.10019686

International Journal of Computational Systems Engineering, 2019 Vol.5 No.1, pp.36 - 52

Received: 20 Dec 2017
Accepted: 09 Mar 2018

Published online: 22 Mar 2019 *

Full-text access for editors Access for subscribers Free access Comment on this article