Title: Improved Dirichlet mixture model clustering algorithm for medical data anomaly detection

Authors: Lili Wu; Majid Khan Majahar Ali; Fam Pei Shan; Ying Tian; Li Tao

Addresses: Department of Computer Science, Xinzhou Teachers University, Xinzhou, 034000, China; School of Mathematical Sciences, Universiti Sains Malaysia (USM), Pulau Pinang, 11800, Malaysia ' School of Mathematical Sciences, USM, Pulau Pinang, 11800, Malaysia ' School of Mathematical Sciences, USM, Pulau Pinang, 11800, Malaysia ' Department of Mathematics, Taiyuan Institute of Technology, Taiyuan, 030024, China ' School of Mathematical Sciences, USM, Pulau Pinang, 11800, Malaysia

Abstract: In order to address the issue of identifying over-diagnosis and anomaly expenses in the healthcare service process, a local outlier mining clustering algorithm (ILOF-DPMM) is proposed by combining the clustering-based local outlier factor (CBLOF) algorithm with Dirichlet mixture model (DPMM). By extracting the patient's hospitalisation records from the medical record homepage, the influencing factors of hospitalisation costs for different disease types are classified, and the random forest method is used to reduce the feature dimension by disease type. The feature extraction and dimensionality reduction methods adopted by this algorithm effectively cluster medical insurance expense data. When calculating the LOF value of data, using a weighted calculation method based on the similarity of discrete and continuous features can more accurately detect abnormal data points in the data set, and has the ability to detect new data in real time, thus improving detection accuracy and efficiency.

Keywords: over-diagnosis; anomaly expenses; anomaly detection; DPMM; CBLOF.

DOI: 10.1504/IJBIC.2025.143652

International Journal of Bio-Inspired Computation, 2025 Vol.25 No.1, pp.11 - 21

Received: 20 Nov 2023
Accepted: 14 Jan 2024

Published online: 03 Jan 2025 *

Full-text access for editors Full-text access for subscribers Purchase this article Comment on this article