Article: Efficient similarity measure for comparing tree structures Journal: International Journal of Advanced Intelligence Paradigms (IJAIP) 2016 Vol.8 No.1 pp.77 - 97 Abstract: The problem of comparing tree structures is known to be a task often characterised by a particularly high computational complexity. Any attempt to reduce this complexity by considering a tree as a linear structure has generally resulted in a loss of information. Indeed, a comparison of tree structures (based on their similarity in order to classify them) which considers a tree as a single vector, obviously takes less execution time, but unfortunately has less credibility with respect to the classification task. The hierarchical relationships are thus ignored or suppressed and tree structures then behave like sequential data structures. The goal in this paper is to find a compromise between the processing time, on the one hand, and the preservation of information, on the other hand. For this purpose, the proposed approach relies on two types of traversal algorithm, namely the depth-first traversal and breadth-first traversal. The strategy aims to exploit the advantages of combining the two types of algorithms. To validate our approach, we conducted experiments on two sets of tree structures obtained from two collections of real and synthetic XML documents, respectively. Inderscience Publishers - linking academia, business and industry through research

Title: Efficient similarity measure for comparing tree structures

Authors: Fatiha Souam; Ali Aït El Hadj

Addresses: Computer Science Department, Faculty of Electrical Engineering and Computer Science, Mouloud Mammeri University of Tizi-Ouzou, Algeria ' Computer Science Department, Faculty of Electrical Engineering and Computer Science, Mouloud Mammeri University of Tizi-Ouzou, Algeria

Abstract: The problem of comparing tree structures is known to be a task often characterised by a particularly high computational complexity. Any attempt to reduce this complexity by considering a tree as a linear structure has generally resulted in a loss of information. Indeed, a comparison of tree structures (based on their similarity in order to classify them) which considers a tree as a single vector, obviously takes less execution time, but unfortunately has less credibility with respect to the classification task. The hierarchical relationships are thus ignored or suppressed and tree structures then behave like sequential data structures. The goal in this paper is to find a compromise between the processing time, on the one hand, and the preservation of information, on the other hand. For this purpose, the proposed approach relies on two types of traversal algorithm, namely the depth-first traversal and breadth-first traversal. The strategy aims to exploit the advantages of combining the two types of algorithms. To validate our approach, we conducted experiments on two sets of tree structures obtained from two collections of real and synthetic XML documents, respectively.

Keywords: clustering; depth-first traversal; breadth-first traversal; distance threshold; hierarchical context; structural similarity; time complexity; tree structures; XML documents; similarity measures; processing time; information preservation.

DOI: 10.1504/IJAIP.2016.074779

International Journal of Advanced Intelligence Paradigms, 2016 Vol.8 No.1, pp.77 - 97

Received: 10 Feb 2015
Accepted: 27 May 2015
Published online: 17 Feb 2016 *

Full-text access for editors Full-text access for subscribers Purchase this article Comment on this article

Title: Efficient similarity measure for comparing tree structures

Keep up-to-date