Title: A cross-sectional analysis of severe SARS cases evolution in a Brazilian municipality using data mining techniques
Authors: Silvano Herculano da Luz Júnior; Willian Farias Carvalho Oliveira; Luis Cesar de Albuquerque Neto; Hugo Araujo Souza; Yúri Faro Dantas de Sant'Anna
Addresses: Computer Center, Federal University of Pernambuco, Avenida Professor Moraes Rego, S/N – Cidade Universitária, Recife – PE. CEP: 50670-420, Brazil ' Computer Center, Federal University of Pernambuco, Avenida Professor Moraes Rego, S/N – Cidade Universitária, Recife – PE. CEP: 50670-420, Brazil ' Computer Center, Federal University of Pernambuco, Avenida Professor Moraes Rego, S/N – Cidade Universitária, Recife – PE. CEP: 50670-420, Brazil ' Computer Center, Federal University of Pernambuco, Avenida Professor Moraes Rego, S/N – Cidade Universitária, Recife – PE. CEP: 50670-420, Brazil ' Computer Center, Federal University of Pernambuco, Avenida Professor Moraes Rego, S/N – Cidade Universitária, Recife – PE. CEP: 50670-420, Brazil
Abstract: The first severe acute respiratory syndrome (SARS) outbreak occurred in China in 2002, followed by other coronavirus variants like MERS (2012), 2019-nCOV (2019), and Omicron (2020). While data mining (DM) has been widely used for SARS classification and decision-making, most studies overlook socioeconomic factors such as income and education. This study applies the cross-industry standard process for data mining (CRISP-DM) framework and DM techniques to predict severe SARS case progression in Recife, Brazil. Using open datasets, it incorporates attributes related to symptoms, pre-existing conditions, and socioeconomic indicators. Three healthcare experts participated in the analysis. Results showed that the apriori algorithm performed best in rule induction, while the decision tree slightly outperformed logistic regression. Notably, correlations emerged between severe case progression and socioeconomic data, underscoring the importance of integrating social determinants in disease classification models. These findings provide insights for improving predictive models and public health strategies.
Keywords: SARS; severe acute respiratory syndrome; data mining; machine learning; apriori; ROC curve; CRISP-DM; cross-industry standard process for data mining.
DOI: 10.1504/IJDATS.2025.148564
International Journal of Data Analysis Techniques and Strategies, 2025 Vol.17 No.3, pp.196 - 215
Received: 03 Aug 2023
Accepted: 03 Jun 2024
Published online: 12 Sep 2025 *