Authors: Anirban Chakrabarty; Sudipta Roy
Addresses: Computer Science and Engineering Department, Assam University, Silchar, India ' Computer Science and Engineering Department, Assam University, Silchar, India
Abstract: Plagiarism refers to the act of copying content without acknowledging the original source. Though there are several existing commercial tools for plagiarism detection, still plagiarism is tricky and challenging due to the rise in volume of online publications. Existing plagiarism detection methods use paraphrasing, sentence and key-word matching, but such techniques has not been very effective. In this work, a framework for fuzzy based plagiarism detection is proposed using a context-aware agglomerative clustering approach with an improved time complexity. The work aims in retrieving key concepts at word, sentence and paragraph level by integrating semantic features in a novel optimisation function to detect plagiarism effectively. The notion of fuzzy clustering has been applied to improve the robustness and consistency of results for clustering multi-disciplinary papers. The experimental analysis is supported by comparison with other contemporary techniques which indicate the superiority of proposed approach for plagiarism detection.
Keywords: fuzzy clustering; context similarity; plagiarism detection; spanning tree; agglomerative clustering; validity index; constrained objective function.
International Journal of Data Mining, Modelling and Management, 2018 Vol.10 No.2, pp.188 - 208
Received: 29 Nov 2016
Accepted: 07 Aug 2017
Published online: 07 Jun 2018 *