Title: An improved kNN text classification method

Authors: Fengfei Wang; Zhen Liu; Chundong Wang

Addresses: Graduate School of Computer and Communication Engineering, Tianjin University of Technology, Tianjin, China ' Graduate School of Computer and Communication Engineering, Tianjin University of Technology, Tianjin, China; Graduate School of Engineering, Nagasaki Institute of Applied Science, 536 Aba Machi, Nagasaki, 851-0193, Japan ' Graduate School of Computer and Communication Engineering, Tianjin University of Technology, Tianjin, China

Abstract: This paper proposes an improved kNN text classification method. The kNN algorithm in vector space models (VSM) has several limitations, such as occupying excessive storage space and all dimensions in the kNN algorithm share the same weight, making classification inaccurate. To solve these problems, this paper proposes a SOM neural network with principal component weighting. In this model, the principal component analysis process is embedded into the SOM neural network. Specifically, principal component analysis is used to extract the main feature components of the assessed target. Then, it is inputted into the network for computation. Meanwhile, variance contribution rates of principal components are introduced into the Euclidean distance function in the forms of weights. Using the principal component weighting SOM algorithm to compute the weights of VSM dimensions together with the kNN algorithm could effectively reduce dimensions of a vector space, and increase the precision and speed of the kkNN text classification method.

Keywords: text classification; k-nearest neighbours; kNN; self-organising map; SOM; neural network; computer science; engineering.

DOI: 10.1504/IJCSE.2019.103944

International Journal of Computational Science and Engineering, 2019 Vol.20 No.3, pp.397 - 403

Received: 18 Nov 2016
Accepted: 27 Jun 2017

Published online: 03 Dec 2019 *

Full-text access for editors Full-text access for subscribers Purchase this article Comment on this article