Title: An efficient weighted nearest neighbour classifier using vertical data representation

Authors: William Perrizo, Qin Ding, Maleq Khan, Anne Denton, Qiang Ding

Addresses: Department of Computer Science, North Dakota State University, P.O. Box 5164, Fargo, ND 58105-5164, USA. ' Department of Computer Science, Pennsylvania State University at Harrisburg, Middletown, PA 17057, USA. ' Department of Computer Science, Purdue University, 250 N. University St., West Lafayette, IN 47907, USA. ' Department of Computer Science and Operations Research, North Dakota State University, P.O. Box 5164, Fargo, ND 58105-5164, USA. ' Jiangsu Telecom Co., Ltd., Huan Cheng Nan Lu #88, Nantong, Jiangsu 226001, China

Abstract: The k-nearest neighbour (KNN) technique is a simple yet effective method for classification. In this paper, we propose an efficient weighted nearest neighbour classification algorithm, called PINE, using vertical data representation. A metric called HOBBit is used as the distance metric. The PINE algorithm applies a Gaussian podium function to set weights to different neighbours. We compare PINE with classical KNN methods using horizontal and vertical representation with different distance metrics. The experimental results show that PINE outperforms other KNN methods in terms of classification accuracy and running time.

Keywords: nearest neighbours; k-nearest neighbours; KNN; classification; data mining; vertical data; spatial data; podium functions; nearest neighbour algorithms; distance metrics; podium incremental neighbour evaluator; PINE; P-trees; high order basic bit; HOBBit.

DOI: 10.1504/IJBIDM.2007.012946

International Journal of Business Intelligence and Data Mining, 2007 Vol.2 No.1, pp.64 - 78

Published online: 31 Mar 2007 *

Full-text access for editors Full-text access for subscribers Purchase this article Comment on this article