Title: Assaying SARIMA and generalised regularised regression for particulate matter PM10 modelling and forecasting
Authors: Snezhana Gocheva-Ilieva; Atanas Ivanov
Addresses: Department of Applied Mathematics and Modelling, Faculty of Mathematics and Informatics, University of Plovdiv 'Paisii Hilendarski', 24 Tsar Asen str., 4000 Plovdiv, Bulgaria ' Department of Applied Mathematics and Modelling, Faculty of Mathematics and Informatics, University of Plovdiv 'Paisii Hilendarski', 24 Tsar Asen str., 4000 Plovdiv, Bulgaria
Abstract: Two different predictive modelling approaches – classical SARIMA time series methodology and the new Generalised PathSeeker (GPS) regularised regression method, supported by stochastic gradient boosting trees, RuleLearner and other data mining techniques - are used to examine the concentration of particulate matter PM10 in the town of Kardzhali, Bulgaria. Empirical models are developed to simulate and forecast pollution levels based on hourly PM10 data from 1 January 2011 to 28 February 2014 in dependence on six meteorological variables. The constructed models have been used for 5-days-ahead hourly forecasts, compared to actual data from 1 to 5 March 2014. The obtained SARIMA and GPS models fit very well to historical data with coefficients of determination R2 = 90% and 82% and root mean square error RMSE = 0.114 and 0.151, respectively. In forecasting, the GPS models outperform SARIMA approach. This could be explained by the preliminary classification provided by the data mining techniques and cross-validation procedure.
Keywords: air pollution; particulate matter PM10; seasonal ARIMA; generalised PathSeeker regularised regression; stochastic gradient boosting; data mining; forecasting; environmental pollution.
International Journal of Environment and Pollution, 2019 Vol.66 No.1/2/3, pp.41 - 62
Received: 11 Apr 2018
Accepted: 09 Jan 2019
Published online: 15 Jan 2020 *