Title: Identification of Intrinsically Unstructured Proteins using hierarchical classifier

Authors: Jack Y. Yang, Mary Qu Yang

Addresses: Department of Radiation Oncology, Massachusetts General Hospital and Harvard Medical School, Harvard University, Boston, Massachusetts 02114, USA. ' National Human Genome Research Institute, National Institutes of Health, US Department of Health and Human Services Bethesda, MD 20852, USA

Abstract: It is suggested that protein functions only when folded into a particular 3-D structure. Recently, many protein regions and some entire proteins have been identified with no definite tertiary structure, but presenting instead as dynamic, disorder ensembles under different physiochemical circumstances. These proteins and regions are known as Intrinsically Unstructured regions and Proteins (IUP). We constructed a Recursive Maximum Contrast Tree (RMCT) based classifier to identify IUP. The classifier has been benchmarked against industrial standard PONDR VLXT on out-of-sample data by external evaluators. The IUP predictor is a viable alternative software tool for identifying intrinsic unstructured regions and proteins.

Keywords: intrinsically unstructured regions; intrinsically unstructured proteins; recursive maximum contrast tree; RMCT; IUP identification; machine learning; classification; data mining; bioinformatics.

DOI: 10.1504/IJDMB.2008.019093

International Journal of Data Mining and Bioinformatics, 2008 Vol.2 No.2, pp.121 - 133

Published online: 28 Jun 2008 *

Full-text access for editors Full-text access for subscribers Purchase this article Comment on this article