Title: Ascertaining the impact of balancing the flood dataset on the performance of classification based flood forecasting models for the northern districts of Bihar
Authors: Vikas Mittal; T.V. Vijay Kumar; Aayush Goel
Addresses: School of Computer and Systems Sciences, Jawaharlal Nehru University, New Delhi, 110067, India ' School of Computer and Systems Sciences, Jawaharlal Nehru University, New Delhi, 110067, India ' Department of Electronics and Communication Engineering, Bharati Vidyapeeth's College of Engineering, New Delhi, 110063, India
Abstract: Bihar is the most flood-affected state in India and the losses incurred amount to one-third of the total losses due to floods in India. These losses can be alleviated by designing models that forecast floods in real time. One such model exists that uses classification based machine learning techniques to forecast floods in northern district of Bihar. However, the flood dataset used was imbalanced, as the non-flooding instances far exceeded the flooding instances. This paper attempts to address this problem by balancing this data using oversampling techniques and thereafter use it for designing flood forecasting models. The objective of the paper is to ascertain whether balancing dataset improves the performance of classifiers. Experimental based comparison showed that the classifiers performed comparatively better on balanced dataset in terms of accuracy, precision, recall, F-measure and AUC-ROC. Further, dataset balanced using KMeans SMOTE resulted in the maximum improvement in the performance of all classifiers.
Keywords: natural hazards; floods; forecasting; artificial intelligence; machine learning; supervised learning; classification; SMOTE; synthetic minority oversampling technique.
International Journal of Water, 2022 Vol.15 No.2, pp.75 - 100
Received: 18 Apr 2022
Accepted: 14 Dec 2022
Published online: 17 Jul 2023 *