Title: Efficient data clustering algorithm designed using a heuristic approach

Authors: Poonam Nandal; Deepa Bura; Meeta Singh

Addresses: Department of Computer Science and Engineering, Faculty of Engineering and Technology, Manav Rachna International Institute of Research and Studies, India ' Department of Computer Science and Engineering, Faculty of Engineering and Technology, Manav Rachna International Institute of Research and Studies, India ' Department of Computer Science and Engineering, Faculty of Engineering and Technology, Manav Rachna International Institute of Research and Studies, India

Abstract: Information retrieval from a large amount of information available in a database is a major issue these days. The relevant information extraction from the voluminous information available on the web is being done using various techniques like natural language processing, lexical analysis, clustering, categorisation, etc. In this paper, we have discussed the clustering methods used for clustering of large amount of data using different features to classify the data. In today's era, various problem solving techniques makes the use of a heuristic approach for designing and developing various efficient algorithms. In this paper, we have proposed a clustering technique using a heuristic function to select the centroid so that the clusters formed are as per the need of the user. The heuristic function designed in this paper is based on the conceptually similar data points so that they are grouped into accurate clusters. k-means clustering algorithm is majorly used to cluster the data which is also focussed in this paper. It has been empirically found that the clusters formed and the data points which belong to a cluster are close to human analysis as compared to existing clustering algorithms.

Keywords: clustering; natural language processing; k-means; concept; heuristic; Euclidean distance; 2D algorithm; information retrieval; Manhattan distance; density concept.

DOI: 10.1504/IJDATS.2021.114666

International Journal of Data Analysis Techniques and Strategies, 2021 Vol.13 No.1/2, pp.3 - 14

Received: 19 Sep 2018
Accepted: 05 Mar 2019

Published online: 23 Apr 2021 *

Full-text access for editors Access for subscribers Purchase this article Comment on this article