Title: A scalable system for executing and scoring K-means clustering techniques and its impact on applications in agriculture

Authors: Nevena Golubovic; Chandra Krintz; Rich Wolski; Balaji Sethuramasamyraja; Bo Liu

Addresses: Department of Computer Science, University of California, Santa Barbara, 5112 Harold Frank Hall, Santa Barbara, CA 93106, USA ' Department of Computer Science, University of California, Santa Barbara, 5112 Harold Frank Hall, Santa Barbara, CA 93106, USA ' Department of Computer Science, University of California, Santa Barbara, 5112 Harold Frank Hall, Santa Barbara, CA 93106, USA ' Department of Industrial Technology, California State University, Jordan College of Agricultural Sciences and Technology, Fresno 2255 East Barstow Avenue, M/S IT9, Fresno, CA 93740, USA ' BioResource and Agricultural Engineering Department, California Polytechnic State University, 8-106, 1 Grand Ave. San Luis Obispo, CA 93407, USA

Abstract: We present Centaurus - a scalable, open source, clustering service for K-means clustering of correlated, multidimensional data. Centaurus provides users with automatic deployment via public or private cloud resources, model selection (using Bayesian information criterion), and data visualisation. We apply Centaurus to a real-world, agricultural analytics application and compare its results to the industry standard clustering approach. The application uses soil electrical conductivity (EC) measurements, GPS coordinates, and elevation data from a field to produce a 'map' of differing soil zones (so that management can be specialised for each). We use Centaurus and these datasets to empirically evaluate the impact of considering multiple K-means variants and large numbers of experiments. We show that Centaurus yields more consistent and useful clusterings than the competitive approach for use in zone-based soil decision-support applications where a 'hard' decision is required.

Keywords: K-means clustering; cloud computing.

DOI: 10.1504/IJBDI.2019.100883

International Journal of Big Data Intelligence, 2019 Vol.6 No.3/4, pp.163 - 175

Received: 27 Feb 2018
Accepted: 16 May 2018

Published online: 19 Jul 2019 *

Full-text access for editors Full-text access for subscribers Purchase this article Comment on this article