Int. J. of Big Data Intelligence   »   2016 Vol.3, No.1

 

 

You can view the full text of this article for Free access using the link below.

 

 

Title: Automated validation of structured large databases: an illustration of material code bulk validation

 

Authors: Ravindra Patankar; Sandeep Dulluri

 

Addresses:
Big Data Analytics Team, Reliance Industries Ltd., Mumbai, India
Big Data Analytics Team, Reliance Industries Ltd., Mumbai, India

 

Abstract: The accumulation of data and henceforth the storage is growing at an exponential phase owing to the decrease in the memory costs and increasingly complex business processes. With the increased data, typically there would be an increase in the complexity of validating the data. Often, the complexity and effort in validation of large scale databases grows nonlinearly with the increase in database size (Lee et al., 1999). In this paper, we discuss a novel methodology for bulk validation of large scale structured databases. The approach we propose is generic and has been tested in a real time environment. We present an illustration of validation on a material codes validation problem faced by a Fortune 100 enterprise. The demonstration would highlight the heterogeneity, and scale-scope of data validation related problems and henceforth tackling these problems effectively via application of machine learning techniques on Big Data.

 

Keywords: decision trees; structured large databases; automated validation; big data; MADlib; Greenplum; material codes; bulk validation; data validation; machine learning.

 

DOI: 10.1504/IJBDI.2016.073888

 

Int. J. of Big Data Intelligence, 2016 Vol.3, No.1, pp.38 - 50

 

Available online: 29 Dec 2015

 

 

Editors Full text accessFree access Free accessComment on this article