Title: Automated validation of structured large databases: an illustration of material code bulk validation
Authors: Ravindra Patankar; Sandeep Dulluri
Addresses: Big Data Analytics Team, Reliance Industries Ltd., Mumbai, India ' Big Data Analytics Team, Reliance Industries Ltd., Mumbai, India
Abstract: The accumulation of data and henceforth the storage is growing at an exponential phase owing to the decrease in the memory costs and increasingly complex business processes. With the increased data, typically there would be an increase in the complexity of validating the data. Often, the complexity and effort in validation of large scale databases grows nonlinearly with the increase in database size (Lee et al., 1999). In this paper, we discuss a novel methodology for bulk validation of large scale structured databases. The approach we propose is generic and has been tested in a real time environment. We present an illustration of validation on a material codes validation problem faced by a Fortune 100 enterprise. The demonstration would highlight the heterogeneity, and scale-scope of data validation related problems and henceforth tackling these problems effectively via application of machine learning techniques on Big Data.
Keywords: decision trees; structured large databases; automated validation; big data; MADlib; Greenplum; material codes; bulk validation; data validation; machine learning.
International Journal of Big Data Intelligence, 2016 Vol.3 No.1, pp.38 - 50
Received: 05 Mar 2014
Accepted: 14 Sep 2014
Published online: 29 Dec 2015 *