Title: Combining multiple clusterings for protein structure prediction

Authors: C. Okan Sakar; Olcay Kursun; Huseyin Seker; Fikret Gurgen

Addresses: Department of Computer Engineering, Bahcesehir University, Istanbul, Turkey ' Department of Computer Engineering, Istanbul University, Istanbul, Turkey ' Bio-Health Informatics Research Group, Department of Informatics, De Montfort University, UK ' Department of Computer Engineering, Bogazici University, Istanbul, Turkey

Abstract: Computational annotation and prediction of protein structure is very important in the post-genome era due to existence of many different proteins, most of which are yet to be verified. Mutual information based feature selection methods can be used in selecting such minimal yet predictive subsets of features. However, as protein features are organised into natural partitions, individual feature selection that ignores the presence of these views, dismantles them, and treats their variables intermixed along with those of others at best results in a complex un-interpretable predictive system for such multi-view datasets. In this paper, instead of selecting a subset of individual features, each feature subset is passed through a clustering step so that it is represented in discrete form using the cluster indices; this makes mutual information based methods applicable to view-selection. We present our experimental results on a multi-view protein dataset that are used to predict protein structure.

Keywords: cluster ensembles; protein structure prediction; view selection; robust clustering; mutual information; bioinformatics; feature selection.

DOI: 10.1504/IJDMB.2014.064012

International Journal of Data Mining and Bioinformatics, 2014 Vol.10 No.2, pp.162 - 174

Accepted: 02 May 2012
Published online: 21 Oct 2014 *

Full-text access for editors Full-text access for subscribers Purchase this article Comment on this article