Title: A commensurate univariate variable ranking method for classification

Authors: Nuo Xu; Xuan Huang; Thanh Nguyen; Jake Y. Chen

Addresses: Department of MISQ, Collat School of Business, University of Alabama at Birmingham, Birmingham, AL 35294, USA ' Department of MISQ, Collat School of Business, University of Alabama at Birmingham, Birmingham, AL 35294, USA ' Informatics Institute, University of Alabama at Birmingham, Birmingham, AL 35294, USA ' Informatics Institute, University of Alabama at Birmingham, Birmingham, AL 35294, USA

Abstract: To apply a variable ranking method for feature selection in classification, the notion of commensurateness is necessitated by the presence of different types of independent variables in a dataset. A commensurate ranking method is one that produces consistent and comparable ranking results among independent variables of different types, such as numeric vs. categorical and discrete vs. continuous. We invent a ranking method named conditional empirical expectation (CEE) and demonstrate it is the most commensurate among several representative ranking methods. Further, it has the highest statistical power as a test of independence when the categorical dependent variable is imbalanced. These properties make CEE uniquely suitable for fast feature selection for any datasets, especially those with high dimensionality of mixed types of variables. Its usage is demonstrated with a case study in facilitating preprocessing for classification.

Keywords: variable types; variable ranking; variable relevance; commensurate; statistical dependence.

DOI: 10.1504/IJDS.2025.149831

International Journal of Data Science, 2025 Vol.10 No.2, pp.175 - 194

Received: 18 May 2023
Accepted: 23 Aug 2024

Published online: 14 Nov 2025 *

Full-text access for editors Full-text access for subscribers Purchase this article Comment on this article