Title: EXCLS: enhanced XML clustering by level structure accuracy

Authors: Rehab Desoki; Ahmed Elfatatry

Addresses: Information Technology Department, Institute of Graduate Studies and Research, Alexandria University, Egypt ' Information Technology Department, Institute of Graduate Studies and Research, Alexandria University, Egypt

Abstract: The increasing popularity of XML on the internet has brought about a number of research problems regarding methods of data management, indexing, and retrieval in large repositories. XML clustering is used to decrease the size of large collections of XML documents in a repository to facilitate retrieval operations. Most of clustering approaches focus on improving performance by using structure summary but at the cost of accuracy. A major drawback of summarisation techniques is the loss of XML documents' characteristics. The main objective of this work is improving the accuracy of XML document clustering specifically in the case of homogeneous datasets while preserving performance. Towards this end, in this work we propose a new XML document structure and present an enhanced matching procedure to calculate the similarity between XML documents. The proposed method is implemented and evaluated using homogeneous and heterogeneous datasets. The experimental results show a significant improvement in clustering accuracy, especially in homogeneous XML documents without a significant impact on processing time.

Keywords: Extensible Markup Language; XML clustering; level structure accuracy; XCLS; XEdge; data type definition; DTD; data mining; large repositories; XML document structure; similarity matching; XML documents; processing time.

DOI: 10.1504/IJWET.2014.067539

International Journal of Web Engineering and Technology, 2014 Vol.9 No.4, pp.303 - 329

Published online: 28 Feb 2015 *

Full-text access for editors Full-text access for subscribers Purchase this article Comment on this article