Title: A systematic approach for pre-processing electronic health records for mining: case study of heart disease

Authors: Leila Baradaran Sorkhabi; Farhad Soleimanian Gharehchopogh; Jafar Shahamfar

Addresses: Department of Computer Engineering, Urmia Branch, Islamic Azad University, Urmia, Iran ' Department of Computer Engineering, Urmia Branch, Islamic Azad University, Urmia, Iran ' Department of Computer Engineering, Urmia Branch, Islamic Azad University, Urmia, Iran; Department of Community and Family Medicine, Tabriz University, Tabriz, Iran

Abstract: Electronic Health Records (EHRs) form major part of Medical Big Data (MBD) and are enormous resources of knowledge. Mining EHRs can lead us to new generations of medicine (e.g. precision medicine). But actually it is not simply possible because EHRs are unsuitable for mining. Naturally any raw data is dirty but some special challenges make EHRs more susceptible to be dirty. To extract more precise and reliable knowledge we must pre-process EHRs. Performing appropriate pre-processing techniques which are based on specific properties of EHRs will provide high quality and more utilisable data. Here we introduce PEPMED, a systematic pre-processing approach that consists of three main stages. Each stage includes hybrid methods to deal with challenges of dirty data. Four well-known subgrouping methods were performed on both raw and pre-processed data to evaluate the approach. We used precision value and overall accuracy for measurements. Results show that PEPMED dramatically improved accuracy.

Keywords: EHRs; pre-processing; medical big data; data mining; precision medicine; heart disease; systematic; data quality; accuracy; data volume.

DOI: 10.1504/IJDMB.2020.110154

International Journal of Data Mining and Bioinformatics, 2020 Vol.24 No.2, pp.97 - 120

Received: 28 Jan 2019
Accepted: 28 Feb 2020

Published online: 07 Oct 2020 *

Full-text access for editors Full-text access for subscribers Purchase this article Comment on this article