Authors: Yeturu Jahnavi
Addresses: Department of Computer Science and Engineering, Geethanjali Institute of Science and Technology, Nellore, Andhra Pradesh, India
Abstract: Internet-based news documents are an important source of information transmission. Large numbers of news documents from various news wire sources are available on the internet. The objective of this work is to study the existing term weighting algorithms for feature extraction and to develop an efficient term weighting algorithm for mining salient features from internet-based newswire sources. TF*PDF is the influential algorithm that satisfies the basic property of the features in news documents, i.e., frequency and thus increases the accuracy when compared to other term weighing algorithms. However, only frequency property is not sufficient for salient topic extraction. To overcome that problem, this paper presents an innovative and effective term weighting algorithm that considers position, scattering and topicality along with frequency for extracting salient events. Experimental evaluation shows that the proposed term weighting algorithm performs better than the existing term weighting algorithms in terms of coverage rate.
Keywords: term weighting; TF*PDF; FPST.
International Journal of Intelligent Systems Technologies and Applications, 2019 Vol.18 No.4, pp.353 - 376
Received: 03 Nov 2016
Accepted: 04 Dec 2017
Published online: 18 Jul 2019 *