| Forthcoming Papers > International Journal of Business Intelligence and Data Mining (IJBIDM) Journal Homepage This page lists papers submitted for IJBIDM via the web that have been reviewed and accepted but not yet published. Please note that titles, authors, abstracts and keywords may change upon publication. Our TOC e-mail alerting service will notify you immediately when new issues of IJBIDM are published on-line. Click here to register for our TOC E-Mail Alerting. We also offer the convenience of RSS feeds which provide a means to view new content timely posted to your web site or desktop. Click here to start to use our free RSS news feeds. | International Journal of Business Intelligence and Data Mining (5 papers in press)
- ReliefMSS: a variation on a feature ranking ReliefF algorithm
by Salim Chikhi Abstract: The problem of the attributes quality estimation is an important issue in machine learning. There are several important tasks in the process of machine learning like feature subset selection, constructive induction, and decision tree building, which contain the attribute estimation procedure as their principal component. Relief algorithms are successful attribute estimators. They are able to detect conditional dependencies between attributes and provide a unified view on the attribute estimation. In addition, their quality estimates have a natural interpretation. They have commonly been viewed as feature subset selection methods that are applied in pre-processing step before a model is learned. In this paper, we propose a variant of ReliefF algorithm: ReliefMSS. We analyse the ReliefMSS parameters and compare ReliefF and ReliefMSS performances as regards the number of iterations, the number of random attributes, the noise effect, the number of nearest neighbours and the number of examples presented. We find that for the most of these parameters, ReliefMSS is better than ReliefF. Keywords: feature selection, Relief algorithms, number of nearest neighbours. - Exact and Inexact Methods for Solving the Problem of View Selection for Aggregate Queries
by Zohreh Asgharzadeh Talebi, Rada Chirkova, Yahya Fathi Abstract: We present a study of the following warehouse view-selection problem: Given a frequency distribution on parameterized aggregate queries on a data warehouse, return definitions of aggregate views that, when materialized in the warehouse, would reduce the evaluation costs of the frequent queries. Optimizing the layout of stored data using view selection has a direct impact on the performance of data warehouses. However, the optimization problem is intractable, even under natural restrictions on the types of queries of interest. We introduce an integer-programming model to obtain optimal solutions for the warehouse view-selection problem, and propose a heuristic to obtain competitive inexact solutions where our exact method is inapplicable. We show that both our approaches can be used to solve realistic-size instances of the problem. In addition, we experimentally compare our methods to those of Harinarayan et al. (1996) and Shukla et al. (1998), and delineate applicability areas for these and our approaches. Keywords: business intelligence, data reporting, OLAP, business intelligence cycle, schema specification selection, view selection, data warehouse design, data analysis tools. - Shape matching through contour extraction using Circular Augmented Rotational Trajectory (CART) algorithm
by Russel Apu, Marina Gavrilova Abstract: A novel Circular Augmented Rotational Trajectory (CART) algorithm to compute an R-Space based shape descriptors, allowing efficient shape matching, generalization and classification, is given. The shape descriptor is rotation and scale invariant, and is capable of detecting invariant geometric properties despite the presence of considerable noise and quantization errors. The distinctive feature of this method is corner preservation and ability to detect points of discontinuity even in a noisy trajectory. The method can efficiently process any general contour addressing difficult ambiguities present within the original shape. Experimental analysis performed on a number of non-trivial (difficult or ambiguous) object boundaries shows that the CART method can correctly detect and represent the inherent shape and extract geometric properties of the object. Universality, robustness and consistent performance on a variety of shapes makes this method a power technique for contour representation and analysis. Keywords: R-Space, Shape Recognition, Intelligent Data Processing, Circular Augmented Rotational Trajectory (CART) - WHEN TO CHOOSE AN ENSEMBLE CLASSIFIER MODEL FOR DATA MINING
by Mordechai Gal-Or, Jerrold May, William Spangler Abstract: This study empirically explores the use of a group, or ensemble, of classifiers to support managerial decision making in domains characterized by asymmetric misclassification costs. The approach developed in this study is intended to assist a decision maker in determining whether a current situation warrants the choice of an ensemble over an individual classifier. The decision is based primarily on misclassification costs in the decision context and the associated basis on which performance is assessed. We show that the criteria for evaluating classifier performance are fundamentally dependent on the symmetry or asymmetry of misclassification costs. The result of this study is a set of heuristics for identifying highly- and poorly-performing ensembles. Keywords: data mining; classification costs; multiple classifiers; ensemble - WebUser: Mining Unexpected Web Usage
by Haoyuan Li, Anne Laurent, Pascal Poncelet Abstract: Web usage mining has been much concentrated on the discovery of relevant user behaviours from Web access record data. Although the sequential pattern mining has been well adapted for discovering frequent user behaviours, however, the decision makers will be more and more interested in the unexpected behaviours that contradict existing knowledge of user navigation data. In this paper, we present WebUser, an approach to discover unexpected usage in Web access log. We first formalize Web access log file into user session sequence database, with which we propose different forms of sequence rules for describing Web usage behaviours. We then present a belief-driven method for extracting unexpectedWeb usage sequences, where the belief system consists of a temporal relation and semantics constrained sequence rules acquired with respect to prior knowledge. Our experiments show the effectiveness and usefulness of the proposed approach. Further, discovered rules of unexpected Web usage can be used for Web content personalization and recommendation, site structure optimization, and critical event prediction. Keywords: Data mining; Web usage mining; log analysis; unexpected usage; sequence rules; concept hierarchies.
|
|