Title: Using multivariate methods to infer knowledge from genomic data

Authors: Liliana López-Kleine; Nicolás Molano; Luis Ospina

Addresses: Statistics Department, Universidad Nacional de Colombia, Colombia ' Statistics Department, Universidad Nacional de Colombia, Colombia ' Statistics Department, Universidad Nacional de Colombia, Colombia

Abstract: Since the introduction of genome sequencing techniques several methods for genomic data preprocessing and analysis have been published and applied to answer different biological questions. Rarely, multivariate methods have been used to extract knowledge about protein roles. Two of the most informative types of data are gene expression data (microarrays) and phylogenetic profiles indicating presence of genes in other organisms and therefore providing information about their co-evolution. Here we show that these two types of data, analyzed by means of principal component analysis and non parametric discriminant analysis, provide useful information about protein function and their participation in virulence processes.

Keywords: statistical genomics; microarray data; phylogenetic profiles; multivariate statistical analysis; protein function; virulence factors; bioinformatics; gene expression data; principal component analysis; PCA; nonparametric discriminant analysis.

DOI: 10.1504/IJBRA.2013.053607

International Journal of Bioinformatics Research and Applications, 2013 Vol.9 No.3, pp.285 - 300

Received: 02 Mar 2011
Accepted: 31 Aug 2011

Published online: 06 Sep 2014 *

Full-text access for editors Full-text access for subscribers Purchase this article Comment on this article