Title: An efficient context-aware agglomerative fuzzy clustering framework for plagiarism detection

Authors: Anirban Chakrabarty; Sudipta Roy

Addresses: Computer Science and Engineering Department, Assam University, Silchar, India ' Computer Science and Engineering Department, Assam University, Silchar, India

Abstract: Plagiarism refers to the act of copying content without acknowledging the original source. Though there are several existing commercial tools for plagiarism detection, still plagiarism is tricky and challenging due to the rise in volume of online publications. Existing plagiarism detection methods use paraphrasing, sentence and key-word matching, but such techniques has not been very effective. In this work, a framework for fuzzy based plagiarism detection is proposed using a context-aware agglomerative clustering approach with an improved time complexity. The work aims in retrieving key concepts at word, sentence and paragraph level by integrating semantic features in a novel optimisation function to detect plagiarism effectively. The notion of fuzzy clustering has been applied to improve the robustness and consistency of results for clustering multi-disciplinary papers. The experimental analysis is supported by comparison with other contemporary techniques which indicate the superiority of proposed approach for plagiarism detection.

Keywords: fuzzy clustering; context similarity; plagiarism detection; spanning tree; agglomerative clustering; validity index; constrained objective function.

DOI: 10.1504/IJDMMM.2018.092533

International Journal of Data Mining, Modelling and Management, 2018 Vol.10 No.2, pp.188 - 208

Received: 29 Nov 2016
Accepted: 07 Aug 2017

Published online: 24 Jun 2018 *

Full-text access for editors Full-text access for subscribers Purchase this article Comment on this article