Authors: Nesma Settouti; Mostafa El Habib Daho; Mohamed Amine Chikh
Addresses: Biomedical Engineering Laboratory, Tlemcen University, Chetouane, Algeria ' Biomedical Engineering Laboratory, Tlemcen University, Chetouane, Algeria ' Biomedical Engineering Laboratory, Tlemcen University, Chetouane, Algeria
Abstract: Variable importance measure with Random Forests (RF) have received increased attention as a means of variable selection in classification tasks. The measure of variable importance in Random Forests is a smart way of variable selection in many applications, but is not reliable in situations where potential predictor variables vary in their scale of measurement or their number of categories. In this paper, we have implemented Random Forest built from Conditional Inference Trees (CIT) that is called Conditional Inference Forest (CIF). In each tree in the forest of conditional inference, the division of the nodes is based on the way to have a good associativity. The chi-square test statistics is used to measure the association. In addition to identifying variables that improve the classification accuracy, the methodology also clearly identifies the variables that are neutral to the accuracy, and also those who interfere in the right classification. In this paper, we are particularly interested in the overall algorithm Conditional Inference Forest (CIF) for the classification of large biological data. The algorithm is evaluated on its ability to select a reduced number of features while preserving a very satisfactory classification rate.
Keywords: variable selection; conditional inference forest; conditional inference trees; variable importance; biological datasets; random forest; variables; classification accuracy; feature selection; bioinformatics.
International Journal of Bioinformatics Research and Applications, 2017 Vol.13 No.2, pp.95 - 108
Accepted: 11 Mar 2016
Published online: 21 Mar 2017 *