Authors: Ameera Jadalla, Ashraf Elnagar
Addresses: Department of Computer Science, College of Arts and Science, University of Sharjah, 27272 Sharjah, UAE. ' Department of Computer Science, College of Arts and Science, University of Sharjah, 27272 Sharjah, UAE
Abstract: The educational community across the world is facing the increasing problem of plagiarism. The proposed Plagiarism Detection Engine for Java (PDE4Java) detects code-plagiarism by applying data mining techniques. The engine consists of three main phases; Java tokenisation, similarity measurement and clustering. It has an optional default tokeniser that makes it flexible to be used with almost any programming language. The system provides a visualising representation for each cluster besides the textual representation. The simulation results of PDE4Java showed a comparable performance to that of JPlag and it outperformed the expectations when compared to the domain experts| findings.
Keywords: plagiarism detection; Java source code; clustering; code plagiarism; data mining; similarity measurement; Java tokenisation.
International Journal of Business Intelligence and Data Mining, 2008 Vol.3 No.2, pp.121 - 135
Published online: 28 Sep 2008 *Full-text access for editors Access for subscribers Purchase this article Comment on this article