Title: Optimising the calculation of statistical functions

Authors: André Rodrigues; Carla Silva; Paulo Borges; Sérgio Silva; Inês Dutra

Addresses: NLPC Lda., Praça Mouzinho de Albuquerque, 113 – 5º, 4100-359 Porto, Portugal ' NLPC Lda., Praça Mouzinho de Albuquerque, 113 – 5º, 4100-359 Porto, Portugal ' NLPC Lda., Praça Mouzinho de Albuquerque, 113 – 5º, 4100-359 Porto, Portugal ' NLPC Lda., Praça Mouzinho de Albuquerque, 113 – 5º, 4100-359 Porto, Portugal ' Department of Computer Science, CRACS INESC TEC and University of Porto, Rua do Campo Alegre, 1021, 4169-007, Porto, Portugal

Abstract: Statistical data analysis methods are well-known for their difficulty in handling large number of instances or large number of parameters. In this paper, we study popular and well-known statistical functions, generally applied to data analysis, and assess their performance as implemented by SPSS, MATLAB, R and our own software, DataIP. We use medium to large datasets and show that DataIP outperforms SPSS, MATLAB and R by several orders of magnitude. We argue that the design and implementation of these functions need to be rethought to adapt to today's data challenges.

Keywords: statistical data analysis; statistical functions; performance evaluation; SPSS; MATLAB; optimisation.

DOI: 10.1504/IJBDI.2017.083155

International Journal of Big Data Intelligence, 2017 Vol.4 No.2, pp.123 - 138

Received: 22 Mar 2016
Accepted: 12 Sep 2016

Published online: 21 Mar 2017 *

Full-text access for editors Full-text access for subscribers Purchase this article Comment on this article