Title: Meta-analysis of computational methods for breast cancer classification
Authors: Tri-Cong Pham; Chi-Mai Luong; Antoine Doucet; Van-Dung Hoang; Diem-Phuc Tran; Duc-Hau Le
Addresses: School of Computer Science and Engineering, Thu-yloi University, 175 Tay Son, Dong Da, Hanoi, Vietnam; Department of Informatics and Communication Technology, University of Science and Technology of Hanoi, Hanoi Vietnam; Vietnam Academy of Science and Technology, 18 Hoang Quoc Viet, Cau Giay, Hanoi Vietnam ' Department of Informatics and Communication Technology, University of Science and Technology of Hanoi, Hanoi Vietnam; Institute of Information Technology, Vietnam Academy of Science and Technology, 18 Hoang Quoc Viet, Cau Giay, Hanoi Vietnam ' Laboratory L3i, University of La Rochelle, France ' QuangBinh University, Dong Hoi, Quang Binh, Vietnam ' DuyTan University, Da Nang, Vietnam ' Department of Computational Biomedicine, Vingroup Big Data Institute No 7, Bang Lang 1 Street, Viet Hung Ward, Long Bien District, Hanoi, Vietnam
Abstract: Millions of women are suffering from breast cancer pressing burden on their shoulders and the global economy. Meanwhile, general treatment methods are applied without considering personalised health and genetic features. Artificial intelligence appears to be a robust method for breast cancer sub-typing. Most of researches have been implemented on binary classification with limited number of data samples. Multi-classification is much more difficult especially on large number of samples. The study aims to use machine learning to find better ways to subtype breast cancer as well as find new disease causative genes which help facilitate more personalised treatment with limited side effect in the future. This study compares the accuracy of three classification methods in combination with eight feature selection methods on a dataset of 2,682 samples. The study shows that the highest accuracy was 83.96% with the SVM-C005 classifier and percentile feature selection (800 genes). Additionally, our method can predict causative disease genes of breast cancer with four of them known to be associated with breast cancer and 29 promising ones with supporting evidence from the literature. This shows the effectiveness of our research.
Keywords: breast cancer; gene expression; multi-class classification; feature selection; gene selection; microarray data; subtype related gene.
DOI: 10.1504/IJIIDS.2020.108226
International Journal of Intelligent Information and Database Systems, 2020 Vol.13 No.1, pp.89 - 111
Received: 22 Apr 2019
Accepted: 08 Dec 2019
Published online: 06 Jul 2020 *