Title: Learning from high-dimensional unlabelled data with outliers: a novel robust approach
Authors: Abdul Wahid
Addresses: Department of Mathematics and Statistics, Institute of Southern Punjab, Multan, 60800, Pakistan
Abstract: This paper investigates the problem of feature selection and classification under the presence of multivariate outliers in high-dimensional unlabelled data. The research question is how to identify outliers and deal with them in unsupervised learning to improve the clustering accuracy compared with the state-of-the-art non-robust learning techniques. For this purpose, a robust method is proposed by utilise the Mahalanobis distance for outlier identification based on the minimum regularised covariance determinants approach. Furthermore, a new weighting scheme based on Mahalanobis distance is developed for dealing with outlying data points. Finally, it is suggested to combine the proposed weight function and least squared loss function along with the graph and sparsity constraints for achieving the robustness. This new procedure is named robust self-representation sparse reconstruction and manifold regularisation (RSSRMR). The novel technique is compared with previously proposed unsupervised feature selection techniques in simulation and real-world data experiments and exhibits better performance.
Keywords: clustering; high-dimensional data; feature selection; Mahalanobis distance; multivariate outliers.
DOI: 10.1504/IJBIDM.2025.145353
International Journal of Business Intelligence and Data Mining, 2025 Vol.26 No.3/4, pp.282 - 302
Received: 06 Mar 2024
Accepted: 03 Aug 2024
Published online: 31 Mar 2025 *