Article: Using conditional inference forest to identify variable importance Journal: International Journal of Bioinformatics Research and Applications (IJBRA) 2017 Vol.13 No.2 pp.95 - 108 Abstract: Variable importance measure with Random Forests (RF) have received increased attention as a means of variable selection in classification tasks. The measure of variable importance in Random Forests is a smart way of variable selection in many applications, but is not reliable in situations where potential predictor variables vary in their scale of measurement or their number of categories. In this paper, we have implemented Random Forest built from Conditional Inference Trees (CIT) that is called Conditional Inference Forest (CIF). In each tree in the forest of conditional inference, the division of the nodes is based on the way to have a good associativity. The chi-square test statistics is used to measure the association. In addition to identifying variables that improve the classification accuracy, the methodology also clearly identifies the variables that are neutral to the accuracy, and also those who interfere in the right classification. In this paper, we are particularly interested in the overall algorithm Conditional Inference Forest (CIF) for the classification of large biological data. The algorithm is evaluated on its ability to select a reduced number of features while preserving a very satisfactory classification rate. Inderscience Publishers - linking academia, business and industry through research

Title: Using conditional inference forest to identify variable importance

Authors: Nesma Settouti; Mostafa El Habib Daho; Mohamed Amine Chikh

Addresses: Biomedical Engineering Laboratory, Tlemcen University, Chetouane, Algeria ' Biomedical Engineering Laboratory, Tlemcen University, Chetouane, Algeria ' Biomedical Engineering Laboratory, Tlemcen University, Chetouane, Algeria

Abstract: Variable importance measure with Random Forests (RF) have received increased attention as a means of variable selection in classification tasks. The measure of variable importance in Random Forests is a smart way of variable selection in many applications, but is not reliable in situations where potential predictor variables vary in their scale of measurement or their number of categories. In this paper, we have implemented Random Forest built from Conditional Inference Trees (CIT) that is called Conditional Inference Forest (CIF). In each tree in the forest of conditional inference, the division of the nodes is based on the way to have a good associativity. The chi-square test statistics is used to measure the association. In addition to identifying variables that improve the classification accuracy, the methodology also clearly identifies the variables that are neutral to the accuracy, and also those who interfere in the right classification. In this paper, we are particularly interested in the overall algorithm Conditional Inference Forest (CIF) for the classification of large biological data. The algorithm is evaluated on its ability to select a reduced number of features while preserving a very satisfactory classification rate.

Keywords: variable selection; conditional inference forest; conditional inference trees; variable importance; biological datasets; random forest; variables; classification accuracy; feature selection; bioinformatics.

DOI: 10.1504/IJBRA.2017.083129

International Journal of Bioinformatics Research and Applications, 2017 Vol.13 No.2, pp.95 - 108

Accepted: 11 Mar 2016
Published online: 21 Mar 2017 *

Full-text access for editors Full-text access for subscribers Purchase this article Comment on this article

Title: Using conditional inference forest to identify variable importance

Keep up-to-date