Title: A comprehensive analysis about the influence of low-level preprocessing techniques on mass spectrometry data for sample classification

Authors: Hugo López-Fernández; Miguel Reboiro-Jato; Daniel Glez-Peña; Florentino Fernández-Riverola

Addresses: Department Informática, University of Vigo, Escuela Superior de Ingeniería Informática, Edificio Politécnico, Campus Universitario As Lagoas s/n, 32004, Ourense, Spain ' Department Informática, University of Vigo, Escuela Superior de Ingeniería Informática, Edificio Politécnico, Campus Universitario As Lagoas s/n, 32004, Ourense, Spain ' Department Informática, University of Vigo, Escuela Superior de Ingeniería Informática, Edificio Politécnico, Campus Universitario As Lagoas s/n, 32004, Ourense, Spain ' Department Informática, University of Vigo, Escuela Superior de Ingeniería Informática, Edificio Politécnico, Campus Universitario As Lagoas s/n, 32004, Ourense, Spain

Abstract: Matrix-Assisted Laser Desorption Ionisation Time-of-Flight (MALDI-TOF) is one of the high-throughput mass spectrometry technologies able to produce data requiring an extensive preprocessing before subsequent analyses. In this context, several low-level preprocessing techniques have been successfully developed for different tasks, including baseline correction, smoothing, normalisation, peak detection and peak alignment. In this work, we present a systematic comparison of different software packages aiding in the compulsory preprocessing of MALDI-TOF data. In order to guarantee the validity of our study, we test multiple configurations of each preprocessing technique that are subsequently used to train a set of classifiers whose performance (kappa and accuracy) provide us accurate information for the final comparison. Results from experiments show the real impact of preprocessing techniques on classification, evidencing that MassSpecWavelet provides the best performance and Support Vector Machines (SVM) are one of the most accurate classifiers.

Keywords: mass spectrometry data; data preprocessing; low-level preprocessing; sample classification; model comparison; software comparison; bioinformatics; support vector machines; SVM; classification accuracy.

DOI: 10.1504/IJDMB.2014.064897

International Journal of Data Mining and Bioinformatics, 2014 Vol.10 No.4, pp.455 - 473

Received: 02 Nov 2012
Accepted: 25 Mar 2013

Published online: 21 Oct 2014 *

Full-text access for editors Full-text access for subscribers Purchase this article Comment on this article