Title: PDE4Java: Plagiarism Detection Engine for Java source code: a clustering approach

Authors: Ameera Jadalla, Ashraf Elnagar

Addresses: Department of Computer Science, College of Arts and Science, University of Sharjah, 27272 Sharjah, UAE. ' Department of Computer Science, College of Arts and Science, University of Sharjah, 27272 Sharjah, UAE

Abstract: The educational community across the world is facing the increasing problem of plagiarism. The proposed Plagiarism Detection Engine for Java (PDE4Java) detects code-plagiarism by applying data mining techniques. The engine consists of three main phases; Java tokenisation, similarity measurement and clustering. It has an optional default tokeniser that makes it flexible to be used with almost any programming language. The system provides a visualising representation for each cluster besides the textual representation. The simulation results of PDE4Java showed a comparable performance to that of JPlag and it outperformed the expectations when compared to the domain experts| findings.

Keywords: plagiarism detection; Java source code; clustering; code plagiarism; data mining; similarity measurement; Java tokenisation.

DOI: 10.1504/IJBIDM.2008.020514

International Journal of Business Intelligence and Data Mining, 2008 Vol.3 No.2, pp.121 - 135

Published online: 28 Sep 2008 *

Full-text access for editors Full-text access for subscribers Purchase this article Comment on this article