Title: Concept drift detection method for noisy big data streams based on least squares line fitting

Authors: Zhihong Qin; Zhanlei Shang; Fuhong Geng

Addresses: Department of Network Security, Henan Police College, Zhengzhou, Henan, China ' Engineering Training Centre, Zhengzhou University of Light Industry, Zhengzhou, Henan, China ' Automation Major of School of Electrical Engineering, Xinjiang University, Wulumuqi, Xinjiang, China

Abstract: In order to improve the accuracy of concept drift detection in big data streams, a noisy big data streams concept drift detection method based on least squares line fitting is proposed. Firstly, the Kalman Filter model is used to identify and remove noise components from the original data stream, and then the wavelet thresholding algorithm is used to filter out the remaining noise. Secondly, by fitting the denoised data stream with a least squares line, linear trend features are extracted to simulate the trend of concept drift. Finally, the Tr-OEM algorithm is used to identify concept drift in big data streams, and an efficient and accurate concept drift detection is achieved through a machine learning model that dynamically adapts to data changes. The experimental results show that the detection accuracy of our method remains between 96.89% and 97.45%, and the maximum average change distance does not exceed 1.

Keywords: least squares linear fitting; noisy big data stream; concept drift detection; Kalman Filtering.

DOI: 10.1504/IJIPT.2025.148569

International Journal of Internet Protocol Technology, 2025 Vol.18 No.3, pp.141 - 149

Received: 24 Mar 2025
Accepted: 26 Apr 2025

Published online: 12 Sep 2025 *

Full-text access for editors Full-text access for subscribers Purchase this article Comment on this article