Perturbation and candidate analysis to combat overfitting of gene expression microarray data Online publication date: Sat, 24-Jan-2015
by Ravi Mathur; J. David Schaffer; Walker H. Land
International Journal of Computational Biology and Drug Design (IJCBDD), Vol. 4, No. 4, 2011
Abstract: Analysis of gene expression microarray datasets presents the high risk of over-fitting (spurious patterns) because of their feature-rich but case-poor nature. This paper describes our ongoing efforts to develop a method to combat over-fitting and determine the strongest signal in the dataset. A GA-SVM hybrid along with Gaussian noise (manual noise gain) is used to discover feature sets of minimal size that accurately classifies the cases under cross-validation. Initial results on a colorectal cancer dataset shows that the strongest signal (modest number of candidates) can be found by a binary search.
Existing subscribers:
Go to Inderscience Online Journals to access the Full Text of this article.
If you are not a subscriber and you just want to read the full contents of this article, buy online access here.Complimentary Subscribers, Editors or Members of the Editorial Board of the International Journal of Computational Biology and Drug Design (IJCBDD):
Login with your Inderscience username and password:
Want to subscribe?
A subscription gives you complete access to all articles in the current issue, as well as to all articles in the previous three years (where applicable). See our Orders page to subscribe.
If you still need assistance, please email subs@inderscience.com