Title: Comparative analysis of machine learning based QSAR models and molecular docking studies to screen potential anti-tubercular inhibitors against InhA of mycobacterium tuberculosis

Authors: Madhulata Kumari; Neeraj Tiwari; Subhash Chandra; Naidu Subbarao

Addresses: Department of Information Technology, Kumaun University, SSJ Campus, Almora, Uttarakhand 263601, India ' Department of Statistics, Kumaun University, SSJ Campus Almora, Uttarakhand, 263601, India ' Department of Botany, Kumaun University, SSJ Campus, Almora, Uttarakhand 263601, India ' School of Computational and Integrative Sciences, Jawaharlal Nehru University, New Delhi, 110067, India

Abstract: Machine learning techniques are advanced computational techniques which can be used to build the quantitative structure-activity relationship (QSAR) model of compounds dataset to find out important descriptors which are able to predict a specific biological activity from unknown compounds to discover better drugs. In the present study, by optimising descriptors using correlation-based feature selection, principal component analysis, and genetic programming technique, several machine learning techniques were used to build QSAR models on three different experimental datasets of InhA inhibitors. The best QSAR models were deployed on a dataset of 1450 approved drug from drug bank to screen new InhA inhibitors. Amoxicillin was found to show highest predicted activity pIC50 = 6.54, and Itraconazole was the second compound with highest predicted activity 6.4 (pIC50) that was calculated based on the best random forest (RF) model using CFS-GS-FW descriptor set in the dataset of ChEMBL997779 of InhA of Mtb. Additionally, screening by molecular docking identified top-ranked 10 approved drugs as anti-tubercular hits showing G-scores -8.23 to -6.95 (in kcal/mol) as compared with control compounds(known InhA Mtb inhibitors) G-scores -7.86 to -6.68 (in kcal/mol). Thus results indicate these potent compounds may have the better binding affinity for InhA of Mtb. From our studies, we conclude that machine learning based QSAR models can be useful for the development of novel target specific anti-tubercular compounds.

Keywords: machine learning algorithms; quantitative structure-activity relationships; SVM; support vector machine; random forest; multilayer perceptron; genetic algorithm; genetic programming; regression; mycobacterium tuberculosis; Gaussian process; correlation-based feature selection; InhA.

DOI: 10.1504/IJCBDD.2018.094630

International Journal of Computational Biology and Drug Design, 2018 Vol.11 No.3, pp.209 - 235

Received: 17 Jul 2017
Accepted: 04 Oct 2017

Published online: 10 Sep 2018 *

Full-text access for editors Full-text access for subscribers Purchase this article Comment on this article