Title: A document similarity approach using grammatical linkages with graph databases

Authors: V. Priya; K. Umamaheswari

Addresses: Department of Computer Science and Engineering, Dr. Mahalingam College of Engineering and Technology, Pollachi-642003, India ' Department of Information Technology, PSG College of Technology, Peelamedu, Coimbatore 641004, India

Abstract: Document similarity had become essential in many applications such as document retrieval, recommendation systems, plagiarism checker, etc. Many similarity evaluation approaches rely on word-based document representation, because it is very fast. But these approaches are not accurate when documents with different language and vocabulary are used. When graph representation is used for documents they use some relational knowledge which is not feasible in many applications because of expensive graph operations. In this work a novel approach for document similarity computation which utilises verbal intent has been developed. This improves the similarity by increasing the number of linkages using verbs between two documents. Graph databases were used for faster performance. The performance of the system is evaluated using various metrics like cosine similarity, jaccard similarity and dice with different review datasets. The verbal intent-based approach has registered promising results based on the links between two documents.

Keywords: graph databases; text similarity; grammatical linkages; verbal intent modelling; knowledge graphs.

DOI: 10.1504/IJENM.2019.103143

International Journal of Enterprise Network Management, 2019 Vol.10 No.3/4, pp.211 - 223

Received: 23 May 2018
Accepted: 17 Oct 2018

Published online: 21 Oct 2019 *

Full-text access for editors Full-text access for subscribers Purchase this article Comment on this article