Title: Discovery of metabolite features for the modelling and analysis of high-resolution NMR spectra

Authors: Hyun-Woo Cho, Seoung Bum Kim, Myong K. Jeong, Youngja Park, Nana Gletsu Miller, Thomas R. Ziegler, Dean P. Jones

Addresses: Department of Industrial and Information Engineering, University of Tennessee, Knoxville, TN 37996, USA. ' Department of Industrial and Manufacturing Systems Engineering, University of Texas at Arlington, Arlington, TX 76019, USA. ' Center for Operations Research and Department of Industrial and Systems Engineering, Rutgers, The State University of New Jersey, Piscataway, NJ 08854-8003, USA. ' Clinical Biomarkers Laboratory, Department of Medicine, Emory University, Atlanta, GA 30322, USA. ' Department of Surgery, Emory University, Atlanta, GA 30322, USA. ' Clinical Biomarkers Laboratory, Center for Clinical and Molecular Nutrition, Department of Medicine, Emory University, Atlanta, GA 30322, USA. ' Clinical Biomarkers Laboratory, Center for Clinical and Molecular Nutrition, Department of Medicine, Emory University, Atlanta, GA 30322, USA

Abstract: This study presents three feature selection methods for identifying the metabolite features in nuclear magnetic resonance spectra that contribute to the distinction of samples among varying nutritional conditions. Principal component analysis, Fisher discriminant analysis, and Partial Least Square Discriminant Analysis (PLS-DA) were used to calculate the importance of individual metabolite feature in spectra. Moreover, an Orthogonal Signal Correction (OSC) filter was used to eliminate unnecessary variations in spectra. We evaluated the presented methods by comparing the ability of classification based on the features selected by each method. The result showed that the best classification was achieved from an OSC-PLS-DA model.

Keywords: nuclear magnetic resonance; NMR spectra; feature selection; metabolomics; multivariate statistical analysis; orthogonal signal correction; OSC; data mining; bioinformatics; metabolite features; classification.

DOI: 10.1504/IJDMB.2008.019097

International Journal of Data Mining and Bioinformatics, 2008 Vol.2 No.2, pp.176 - 192

Published online: 28 Jun 2008 *

Full-text access for editors Full-text access for subscribers Purchase this article Comment on this article