Title: Statistical data mining technique for salient feature extraction

Authors: Yeturu Jahnavi

Addresses: Department of Computer Science and Engineering, Geethanjali Institute of Science and Technology, Nellore, Andhra Pradesh, India

Abstract: Internet-based news documents are an important source of information transmission. Large numbers of news documents from various news wire sources are available on the internet. The objective of this work is to study the existing term weighting algorithms for feature extraction and to develop an efficient term weighting algorithm for mining salient features from internet-based newswire sources. TF*PDF is the influential algorithm that satisfies the basic property of the features in news documents, i.e., frequency and thus increases the accuracy when compared to other term weighing algorithms. However, only frequency property is not sufficient for salient topic extraction. To overcome that problem, this paper presents an innovative and effective term weighting algorithm that considers position, scattering and topicality along with frequency for extracting salient events. Experimental evaluation shows that the proposed term weighting algorithm performs better than the existing term weighting algorithms in terms of coverage rate.

Keywords: term weighting; TF*PDF; FPST.

DOI: 10.1504/IJISTA.2019.100797

International Journal of Intelligent Systems Technologies and Applications, 2019 Vol.18 No.4, pp.353 - 376

Received: 03 Nov 2016
Accepted: 04 Dec 2017

Published online: 18 Jul 2019 *

Full-text access for editors Full-text access for subscribers Purchase this article Comment on this article