Title: A methodology for dealing with spatial big data

Authors: Gabriella Schoier; Giuseppe Borruso

Addresses: DEAMS – Department of Economic, Business, Mathematic and Statistical Sciences, 'Bruno de Finetti' University of Trieste, Trieste, Italy ' DEAMS – Department of Economic, Business, Mathematic and Statistical Sciences, 'Bruno de Finetti' University of Trieste, Trieste, Italy

Abstract: Spatial data mining (SDM) refers to the mining of knowledge from spatial data. Recently, location-based services have enabled the gathering of a significant amount of geo-referenced data, i.e., of spatial big data (SBD). Spatial datasets often exceed the ability of current computing systems to manage these data with reasonable effort; therefore, data-intensive computing and data mining techniques are useful tools for conducting an analysis. In this paper, we present an approach to the clustering of high-dimensional data that allows a flexible approach to the statistical modelling of phenomena characterised by unobserved heterogeneity. Numerous clustering algorithms have been developed for large databases; density-based algorithms particularly treat a huge amount of data in large spatial databases. We present the Modified Density-Based Spatial Clustering of Applications with Noise (MDBSCAN) algorithm and compare it to the classical k-means approach. Both applications use synthetic datasets and a dataset of satellite images.

Keywords: spatial data mining; clustering algorithms; arbitrary cluster shape; Lagrange-Chebyshev metrics; efficiency; large spatial databases; handling noise; image analysis; spatial big data; k-means clustering; density-based spatial clustering.

DOI: 10.1504/IJBIDM.2017.082705

International Journal of Business Intelligence and Data Mining, 2017 Vol.12 No.1, pp.1 - 13

Available online: 03 Mar 2017 *

Full-text access for editors Access for subscribers Purchase this article Comment on this article