Title: Incremental clustering algorithm based on representative points and covariance for large data

Authors: Jiayao Li; Qiannan Wu; Li Li; Ruizhi Sun; Huiyu Mu; Kaiyi Zhao

Addresses: College of Information and Electrical Engineering, China Agricultural University, Beijing 110000, China ' College of Information and Electrical Engineering, China Agricultural University, Beijing 110000, China ' Computer School, Beijing Information Science and Technology University, Beijing, 100101, China ' College of Information and Electrical Engineering, China Agricultural University, Beijing 110000, China; Scientific Research Base for Integrated Technologies of Precision Agriculture (Animal Husbandry), The Ministry of Agriculture, Beijing 110000, China ' College of Information and Electrical Engineering, China Agricultural University, Beijing 110000, China ' College of Information and Electrical Engineering, China Agricultural University, Beijing 110000, China

Abstract: As the dynamic data increases, more space is needed to store the data. However, most traditional clustering methods are time-consuming and only suitable for static data. For this problem, incremental clustering methods are increasingly used in dynamic data. The study proposes an incremental clustering algorithm based on representative points and covariance for large data (IDPC_RC). Firstly, the representative points were selected in the initial data. Then, the similarity between new data points and representative points was calculated to find the pre-allocated cluster. Finally, the covariance determinant was used to measure the degree of local imbalance for pre-allocated clusters after new data is added, and the cluster numbers were adjusted adaptively. The performance of the proposed scheme was tested on five benchmark datasets and real consumption data. The experimental results show the scheme achieves excellent clustering performance and low time consumption on all datasets, which is useful for incremental clustering tasks.

Keywords: density peaks; incremental clustering algorithms; clustering algorithms; representative points; covariance; clustering methods.

DOI: 10.1504/IJSPM.2023.136478

International Journal of Simulation and Process Modelling, 2023 Vol.20 No.2, pp.113 - 124

Received: 23 Dec 2022
Accepted: 15 Aug 2023

Published online: 02 Feb 2024 *

Full-text access for editors Full-text access for subscribers Purchase this article Comment on this article