Title: Bi-level clustering of mixed categorical and numerical biomedical data

Authors: Bill Andreopoulos, Aijun An, Xiaogang Wang

Addresses: Department of Computer Science and Engineering, York University, Toronto, Ontario M3J 1P3, Canada. ' Department of Computer Science and Engineering, York University, Toronto, Ontario M3J 1P3, Canada. ' Department of Mathematics and Statistics, York University, Toronto, Ontario M3J 1P3, Canada

Abstract: Biomedical data sets often have mixed categorical and numerical types, where the former represent semantic information on the objects and the latter represent experimental results. We present the BILCOM algorithm for |Bi-Level Clustering of Mixed categorical and numerical data types|. BILCOM performs a pseudo-Bayesian process, where the prior is categorical clustering. BILCOM partitions biomedical data sets of mixed types, such as hepatitis, thyroid disease and yeast gene expression data with Gene Ontology annotations, more accurately than if using one type alone.

Keywords: bi-level clustering; categorical; numerical data; nominal; ordinal; biomedical data sets; Bayesian; data mining; bioinformatics; semantic information; experimental results; hepatitis; thyroid disease; yeast gene expression; gene ontology.

DOI: 10.1504/IJDMB.2006.009920

International Journal of Data Mining and Bioinformatics, 2006 Vol.1 No.1, pp.19 - 56

Published online: 02 Jun 2006 *

Full-text access for editors Full-text access for subscribers Purchase this article Comment on this article