A semi-supervised clustering-based classification model for classifying imbalanced data streams in the presence of scarcely labelled data Online publication date: Fri, 11-Feb-2022
by Kiran Bhowmick; Meera Narvekar
International Journal of Business Intelligence and Data Mining (IJBIDM), Vol. 20, No. 2, 2022
Abstract: Data streams are potentially infinite in length, fast changing and scarcely labelled. It is practically impossible to label all the observed instances. Online frameworks for classifying data streams are generally supervised in nature assuming the availability of labelled data and hence cannot be used for data streams. Semi-supervised learning (SSL) addresses this problem of scarcely labelled data by using large amount of unlabelled data together with labelled data to build classifiers. Data streams may also suffer from the problem of imbalanced data. Previous works in learning from data streams have analysed problems of imbalanced data. But to the best of our knowledge no work has applied semi-supervised learning approaches for classifying imbalanced data streams so far. This paper proposes a model using a semi-supervised clustering technique to classify an imbalanced data stream in the presence of scarcely labelled data. The results prove that the model outperforms many state-of-the-art techniques.
Existing subscribers:
Go to Inderscience Online Journals to access the Full Text of this article.
If you are not a subscriber and you just want to read the full contents of this article, buy online access here.Complimentary Subscribers, Editors or Members of the Editorial Board of the International Journal of Business Intelligence and Data Mining (IJBIDM):
Login with your Inderscience username and password:
Want to subscribe?
A subscription gives you complete access to all articles in the current issue, as well as to all articles in the previous three years (where applicable). See our Orders page to subscribe.
If you still need assistance, please email subs@inderscience.com