Title: A confidence-prioritisation approach for learning noisy data

Authors: Nathaniel Gustafson; Christophe Giraud-Carrier

Addresses: Department of Computer Science, Brigham Young University, Provo, UT 84602, USA ' Department of Computer Science, Brigham Young University, Provo, UT 84602, USA

Abstract: In a number of real-world applications, there is a range of noise associated with individual data points. Some points are extracted under relatively clear and defined conditions, while others may be affected by a variety of known or unknown confounding factors, which may decrease those points' validity. These points may or may not remain useful for training, depending on how much uncertainty they contain. We submit that in situations where some variability exists in the clarity or confidence associated with individual data points, an approach that takes this confidence into account during the training phase is beneficial. We propose a methodological framework for assigning confidence to individual data records and augmenting training with that information. We test the methodology on two separate datasets, a simulated dataset and a streamflow diel signals dataset. Results indicate that applying and utilising confidence in training improves performance.

Keywords: data mining; confidence prioritisation; diel signal prediction; applications; noisy data; learning; environmental data; training.

DOI: 10.1504/IJDATS.2014.066603

International Journal of Data Analysis Techniques and Strategies, 2014 Vol.6 No.4, pp.307 - 326

Published online: 14 Jan 2015 *

Full-text access for editors Full-text access for subscribers Purchase this article Comment on this article