Title: Text plagiarism detection method based on path patterns

Authors: Chun Kit See, Kuok-Shoong Wong, Wei Lee Woon

Addresses: Department of Information Technology, Malaysia University of Science and Technology, Unit GL33, Block C, Dataran Usahawan Kelana, 17 Jalan SS7/26, 47301 Petaling Jaya, Malaysia. ' Department of Information Technology, Malaysia University of Science and Technology, Unit GL33, Block C, Dataran Usahawan Kelana, 17 Jalan SS7/26, 47301 Petaling Jaya, Malaysia. ' Department of Information Technology, Malaysia University of Science and Technology, Unit GL33, Block C, Dataran Usahawan Kelana, 17 Jalan SS7/26, 47301 Petaling Jaya, Malaysia

Abstract: This paper extends the forward method plagiarism detection in finding the percentage of similarity between documents. We have developed an algorithm to quantify the similarity based on path patterns, and the method employed is simple, as it involves only ordinary mathematics, thus simplifying application programming and speed up processing time. The method simply converts words into steps, which walks on a mesh, using a new proposed hash function. The hash function guarantees that the number of steps for each different word is unique and thus the walk pattern on a mesh is unique. Hence, a plagiarised version document will display a unique pattern on a mesh that is similar to the original document. This extended paper presents the algorithm in detail, and results are compared with an available online plagiarism detection tool.

Keywords: hash function; documents matching; simple mathematics; text plagiarism; plagiarism detection; path patterns; similarity.

DOI: 10.1504/IJBIDM.2008.020515

International Journal of Business Intelligence and Data Mining, 2008 Vol.3 No.2, pp.136 - 146

Published online: 28 Sep 2008 *

Full-text access for editors Access for subscribers Purchase this article Comment on this article