Authors: Robert L. Grant
Addresses: BayesCamp Ltd., 16 City Business Centre, Hyde Street, Winchester, SO23 7TA, UK
Abstract: The concept of 'updating' parameter estimates and predictions as more data arrive is an important attraction for people adopting Bayesian methods, and essential in big data settings. Implementation via the hyperparameters of a joint prior distribution is challenging. This paper considers non-parametric updating, using a previous posterior sample as a new prior sample. Streaming data can be analysed in a moving window of time by subtracting old posterior sample(s) with appropriate weights. We evaluate three forms of kernel density, a sampling importance resampling implementation, and a novel algorithm called kudzu, which smooths density estimation trees. Methods are tested for distortion of illustrative prior distributions, long-run performance in a low-dimensional simulation study, and feasibility with a realistically large and fast dataset of taxi journeys. Kernel estimation appears to be useful in low-dimensional problems, and kudzu in high-dimensional problems, but careful tuning and monitoring is required. Areas for further research are outlined.
Keywords: Bayesian data analysis; big data; density estimation trees; kernel density estimation; non-parametric statistics; streaming data.
International Journal of Computational Economics and Econometrics, 2022 Vol.12 No.4, pp.405 - 428
Received: 31 Jan 2021
Accepted: 02 Sep 2021
Published online: 20 Oct 2022 *