Title: Graphical tools for assessing information quality: loan application decisions

Authors: Dominique Haughton, Mary Ann Robbert, Linda P. Senne

Addresses: Bentley College, 175 Forest Street, Waltham, MA 02452, USA. ' Bentley College, 175 Forest Street, Waltham, MA 02452, USA. ' Bentley College, 175 Forest Street, Waltham, MA 02452, USA.

Abstract: Using a loan application data set, this paper demonstrates the use of several graphical tools to assess information quality: histograms to study individual variables, scatter plots to compare original and cleaned variables as well as to examine the effects that cleaning a particular predictor has on models of a decision, decision trees to identify important predictors of a decision, and ROC curves to evaluate the predictive value of each attribute. Proposed techniques for cleaning a data set include eliminating erroneous records, excluding attributes with too many incorrect values from the model and applying domain knowledge. We suggest that our approach can be applied to a small sample of a data set to help prioritise which variables should be cleaned.

Keywords: information quality; data cleaning; logistic regression models; decision trees; ROC curves; decision making; loan applications; decision variables; mortgage loans.

DOI: 10.1504/IJTPM.2005.008634

International Journal of Technology, Policy and Management, 2005 Vol.5 No.4, pp.330 - 347

Published online: 12 Jan 2006 *

Full-text access for editors Full-text access for subscribers Purchase this article Comment on this article