Authors: Marcio P. Basgalupp, Andre C.P.L.F. De Carvalho, Rodrigo C. Barros, Duncan D. Ruiz, Alex A. Freitas
Addresses: Institute of Mathematical Sciences and Computing, University of Sao Paulo, Av. Trabalhador Sao-Carlense 400, Sao Carlos, 668, Brazil. ' Institute of Mathematical Sciences and Computing, University of Sao Paulo, Av. Trabalhador Sao-Carlense 400, Sao Carlos, 668, Brazil. ' Faculty of Informatics, Pontifical Catholic University of RS, Av. Ipiranga 6681, Porto Alegre, CEP 90619-900, Brazil. ' Faculty of Informatics, Pontifical Catholic University of RS, Av. Ipiranga 6681, Porto Alegre, CEP 90619-900, Brazil. ' Computing Laboratory, University of Kent, Canterbury, Kent, CT2 7NZ, UK
Abstract: Among the several tasks that evolutionary algorithms have successfully employed, the induction of classification rules and decision trees has been shown to be a relevant approach for several application domains. Decision tree induction algorithms represent one of the most popular techniques for dealing with classification problems. However, conventionally used decision trees induction algorithms present limitations due to the strategy they usually implement: recursive top-down data partitioning through a greedy split evaluation. The main problem with this strategy is quality loss during the partitioning process, which can lead to statistically insignificant rules. In this paper, we propose a new GA-based algorithm for decision tree induction. The proposed algorithm aims to prevent the greedy strategy and to avoid converging to local optima. For such, it is based on a lexicographic multi-objective approach. In order to evaluate the proposed algorithm, it is compared with a well-known and frequently used decision tree induction algorithm using different public datasets. According to the experimental results, the proposed algorithm is able to avoid the previously described problems, reporting accuracy gains. Even more important, the proposed algorithm induced models with a significantly reduction in the complexity considering tree sizes.
Keywords: lexicographic genetic algorithms; GAs; decision tree induction; multi-objective genetic algorithms; classification tasks; data mining; bio-inspired computation; evolutionary induction.
International Journal of Bio-Inspired Computation, 2009 Vol.1 No.1/2, pp.105 - 117
Published online: 26 Jan 2009 *Full-text access for editors Access for subscribers Purchase this article Comment on this article