Title: Feature prioritisation on big genomic data for analysing gene-gene interactions

Authors: Ahmad A. Aloqaily; Siamak Tafavogh; Bronwyn L. Harvey; Daniel R. Catchpoole; Paul J. Kennedy

Addresses: Department of Computer Science and its Applications, Faculty of Prince Al-Hussein Bin Abdallah II For Information Technology, The Hashemite University, P.O. Box 330127, Zarqa 13133, Jordan ' University of Technology Sydney, P.O. Box 123, Broadway, NSW, 2007, Australia ' University of Technology Sydney, P.O. Box 123, Broadway, NSW, 2007, Australia ' The Tumour Bank, Children's Cancer Research Unit, The Kids Research Institute, The Children's Hospital at Westmead, Westmead, NSW, 2145, Australia ' Faculty of Engineering and Information Technology, Australian Artificial Intelligence Institute, Joint Research Centre in AI for Health and Wellness, University of Technology Sydney, P.O Box 123, Broadway, NSW, 2007, Australia

Abstract: Complex diseases are not caused by single genes but result from intricate non-linear interactions among them. There is a critical need to implement approaches that take into account non-linear gene-gene interactions in searching for markers that jointly cause diseases. Determining the interaction between more than two single nucleotide polymorphisms (SNP) within the whole genome data is computationally expensive and often infeasible. In this paper, we develop an approach to classify patients with Acute Lymphoblastic Leukaemia by analysing multiple SNP interactions. A novel feature prioritisation algorithm called interaction effect quantity (IEQ) selects SNPs with high potential of interaction by analysing their distribution throughout the genomic data and enables deeper analysis of non-linear interactions within large datasets. We show that IEQ enables analyses of interactions between up to four SNPs, with F-measure for classification greater than 89% obtained. Such an analysis is typically much more computationally challenging if IEQ is not implemented.

Keywords: large genomic data; dimensionality reduction; feature prioritisation; gene-gene interaction.

DOI: 10.1504/IJBRA.2021.114420

International Journal of Bioinformatics Research and Applications, 2021 Vol.17 No.2, pp.158 - 177

Received: 18 May 2018
Accepted: 13 May 2019

Published online: 21 Apr 2021 *

Full-text access for editors Full-text access for subscribers Purchase this article Comment on this article