Authors: Fadila Bentayeb, Jerome Darmont, Cecile Favre, Cedric Udrea
Addresses: ERIC, University of Lyon 2, 5 avenue Pierre Mendes-France, 69676 Bron Cedex, France. ' ERIC, University of Lyon 2, 5 avenue Pierre Mendes-France, 69676 Bron Cedex, France. ' ERIC, University of Lyon 2, 5 avenue Pierre Mendes-France, 69676 Bron Cedex, France. ' EURISE, University of St Etienne, 23 rue du Docteur Paul Michelon, 42023 Saint Etienne Cedex 2, France
Abstract: Great efforts have been achieved to apply data mining algorithms onto large databases. However, long processing times remain a practical issue. This paper presents a framework to offer to database users online operators for mining large databases without size limit, in acceptable processing times. First, we integrate decision tree algorithms directly into database management systems. We are thus only limited by disc capacity and not by main memory. However, disc accesses still induce long response times. Hence, we propose two optimisations in a second step: reducing the size of the learning database by building its corresponding contingency table and reducing the number of database accesses by exploiting bitmap indices. Thus, the various decision tree based methods we implemented within Oracle deal with contingency tables or bitmap indices rather than with the whole training set. Experimentations performed show the efficiency of our integrated methods.
Keywords: bitmap indices; contingency table; large databases; decision trees; online data mining; performance; relational views; database management systems; business information systems.
International Journal of Business Information Systems, 2007 Vol.2 No.3, pp.328 - 350
Published online: 07 Jan 2007 *Full-text access for editors Access for subscribers Purchase this article Comment on this article