You can view the full text of this article for free using the link below.

Title: Three level weight for latent semantic analysis: an efficient approach to find enhanced semantic themes

Authors: Pooja Kherwa; Poonam Bansal

Addresses: Maharaja Surajmal Institute of Technology, New Delhi, 110058, India; Affiliated to: GGSIPU, India ' Indira Gandhi Delhi Technical University for Women, Opp. St., Kashmere Gate, New Delhi, Delhi 110006, India

Abstract: Latent semantic analysis is a prominent semantic themes detection and topic modelling technique. In this paper, we have designed a three-level weight for latent semantic analysis for creating an optimised semantic space for large collection of documents. Using this novel approach, an efficient latent semantic space is created, in which terms in documents comes closer to each other, which appear far away in actual document collection. In this approach, authors used two dataset: first is a synthetic dataset consists of small stories collected by the authors; second is benchmark BBC-news dataset used in text mining applications. These proposed three level weight models assign weight at term level, document level, and at a corpus level. These weight models are known as: 1) NPC; 2) NTC; 3) APC; 4) ATC. These weight models are tested on both the dataset, compared with state of the art term frequency and it has shown significant improved performances in term set correlation, document set correlation and has also shown highest correlation in semantic similarity of terms in semantic space generated through these three level weights. Our approach also shows automatic context clustering generated in dataset through three level weights.

Keywords: single value decomposition; SVD; latent semantic analysis; LSA; context clustering; semantic space.

DOI: 10.1504/IJKL.2023.127328

International Journal of Knowledge and Learning, 2023 Vol.16 No.1, pp.56 - 72

Received: 18 May 2021
Accepted: 04 Apr 2022

Published online: 30 Nov 2022 *

Full-text access for editors Full-text access for subscribers Free access Comment on this article