Title: Feature importance analysis for a highly unbalanced multiple myeloma data classification

Authors: Rima Guilal; Nesma Settouti; Gonzalo Martínez-Muñoz; Mohammed Amine Chikh

Addresses: Biomedical Engineering Laboratory, University of Tlemcen, Algeria ' Biomedical Engineering Laboratory, University of Tlemcen, Algeria ' Escuela Politéctica Superior, Universidad Autónoma de Madrid, Spain ' Biomedical Engineering Laboratory, University of Tlemcen, Algeria

Abstract: Multiple myeloma (MM) is a hematological cancer associated with abnormal plasma cell proliferation. Its diagnostic process is long because it is very difficult to discover it at an early stage. This paper presents an approach to aid in MM diagnosis and staging. Tree-based ensemble learning methods are used to measure the features importance in models constructed for predicting MM stages. Comparative analysis showed that random forest outperformed other algorithms with an accuracy of over 97%; however, XGBoost gives a ranking of features considered most prognostic for MM staging. A discussion of results with specialists in hematology supported and validated our proposed study.

Keywords: blood cancers; multiple myeloma; prognostic factors; ensemble learning; feature importance; unbalanced data; grid search.

DOI: 10.1504/IJMEI.2024.138289

International Journal of Medical Engineering and Informatics, 2024 Vol.16 No.3, pp.199 - 209

Received: 10 Jun 2021
Accepted: 11 Jan 2022

Published online: 01 May 2024 *

Full-text access for editors Full-text access for subscribers Purchase this article Comment on this article