Authors: Hang Yang; Simon Fong
Addresses: Department of Computer and Information Science, Faculty of Science and Technology, University of Macau, Av. Padre Tomás Pereira Taipa, Macau, China ' Department of Computer and Information Science, Faculty of Science and Technology, University of Macau, Av. Padre Tomás Pereira Taipa, Macau, China
Abstract: Big data has become a significant problem in software applications nowadays. Extracting classification model from such data requires an incremental learning process. The model should update when new data arrive, without re-scanning historical data. A single-pass algorithm suits continuously arrival data environment. However, one practical and important aspect that has gone relatively unstudied is noisy data streams. Such data are inevitable in real-world applications. This paper presents a new classification model with a single decision tree, so called incrementally Optimised Very Fast Decision Tree (iOVFDT) that embeds multi-objectives incremental optimisation and functional tree leaf. In the performance evaluation, noisy values were added into synthetic data. This evaluation investigated the performance under noisy data scenario. The result showed that iOVFDT outperforms the existing algorithms.
Keywords: big data; data streams; classification models; decision trees; noisy data; performance evaluation; incremental learning; multi-objective optimisation.
International Journal of Computer Applications in Technology, 2013 Vol.47 No.2/3, pp.206 - 214
Published online: 05 Jun 2013 *Full-text access for editors Access for subscribers Purchase this article Comment on this article