Data quality improvement in data warehouse: a framework
by Rajiv Arora; Payal Pahwa; Daya Gupta
International Journal of Data Analysis Techniques and Strategies (IJDATS), Vol. 9, No. 1, 2017

Abstract: Data cleansing is an extremely imperative process which when carried out on the datasets, eliminates the inconsistency and duplicity from the data. It also handles null values or missing values in the data in an organised and proper manner thereby enhancing the quality of the data. In this paper, we use Kullback-Leibler divergence (KL-divergence) technique to eliminate duplicity in the datasets. Inconsistency, null values or missing values are also handled in the datasets. This is done by maintaining data marts which are made on the basis of test data. Accordingly, a framework for efficient data cleansing is suggested in order to make the data appropriate and proper for decision making purpose. A brief comparison of existing approaches of data cleansing have also been discussed. This comparison is based on various parameters such as prediction error, bias, mean square error, variance, mean absolute error, root mean square error, Theil statistics etc. These parameters are used by distance sum-based approach (DSA) to accomplish the task. The results obtained demonstrate the feasibility and validity of our method.

Online publication date: Mon, 20-Mar-2017

The full text of this article is only available to individual subscribers or to users at subscribing institutions.

 
Existing subscribers:
Go to Inderscience Online Journals to access the Full Text of this article.

Pay per view:
If you are not a subscriber and you just want to read the full contents of this article, buy online access here.

Complimentary Subscribers, Editors or Members of the Editorial Board of the International Journal of Data Analysis Techniques and Strategies (IJDATS):
Login with your Inderscience username and password:

    Username:        Password:         

Forgotten your password?


Want to subscribe?
A subscription gives you complete access to all articles in the current issue, as well as to all articles in the previous three years (where applicable). See our Orders page to subscribe.

If you still need assistance, please email subs@inderscience.com