Article: OLAP textual aggregation approach using the Google similarity distance Journal: International Journal of Business Intelligence and Data Mining (IJBIDM) 2016 Vol.11 No.1 pp.31 - 48 Abstract: Data warehousing and online analytical processing (OLAP) are essential elements to decision support. In the case of textual data, decision support requires new tools, mainly textual aggregation functions, for better and faster high level analysis and decision making. Such tools will provide textual measures to users who wish to analyse documents online. In this paper, we propose a new aggregation function for textual data in an OLAP context based on the K-means method. This approach will highlight aggregates semantically richer than those provided by classical OLAP operators. The distance used in K-means is replaced by the Google similarity distance which takes into account the semantic similarity of keywords for their aggregation. The performance of our approach is analysed and compared to other methods such as Topkeywords, TOPIC, TuBE and BienCube. The experimental study shows that our approach achieves better performances in terms of recall, precision, F-measure complexity and runtime. Inderscience Publishers - linking academia, business and industry through research

Title: OLAP textual aggregation approach using the Google similarity distance

Authors: Mustapha Bouakkaz; Sabile Loudcher; Youcef Ouinten

Addresses: LIM Laboratory, Laghouat University, Algeria ' ERIC Laboratory, Lyon 2 University, France ' LIM Laboratory, Laghouat University, Algeria

Abstract: Data warehousing and online analytical processing (OLAP) are essential elements to decision support. In the case of textual data, decision support requires new tools, mainly textual aggregation functions, for better and faster high level analysis and decision making. Such tools will provide textual measures to users who wish to analyse documents online. In this paper, we propose a new aggregation function for textual data in an OLAP context based on the K-means method. This approach will highlight aggregates semantically richer than those provided by classical OLAP operators. The distance used in K-means is replaced by the Google similarity distance which takes into account the semantic similarity of keywords for their aggregation. The performance of our approach is analysed and compared to other methods such as Topkeywords, TOPIC, TuBE and BienCube. The experimental study shows that our approach achieves better performances in terms of recall, precision, F-measure complexity and runtime.

Keywords: online analytical processing; OLAP; textual aggregation; Google similarity distance; K-means clustering; semantic similarity; keywords.

DOI: 10.1504/IJBIDM.2016.076425

International Journal of Business Intelligence and Data Mining, 2016 Vol.11 No.1, pp.31 - 48

Received: 23 Sep 2015
Accepted: 26 Sep 2015
Published online: 06 May 2016 *

Full-text access for editors Full-text access for subscribers Purchase this article Comment on this article

Title: OLAP textual aggregation approach using the Google similarity distance

Keep up-to-date