Title: Two stages weighted sampling strategy for detecting the relation between gene expression and disease

Authors: Chih-Chung Yang; Wen-Shin Lin; Chien-Pang Lee; Yungho Leu

Addresses: Department of Information Management, National Taiwan University of Science and Technology, No. 43, Section 4, Keelung Road, Taipei 106, Taiwan ' Department of Plant Industry, National Pingtung University of Science and Technology, No. 1, Shuefu Road, Neipu, Pingtung 912, Taiwan ' Department of Information Management, Da-Yeh University, No. 168, University Rd., Dacun, Changhua 515, Taiwan ' Department of Information Management, National Taiwan University of Science and Technology, No. 43, Section 4, Keelung Road, Taipei 106, Taiwan

Abstract: For microarray data analysis, most of them focus on selecting relevant genes and calculating the classification accuracy by the selected relevant genes. This paper wants to detect the relation between the gene expression levels and the classes of a cancer (or a disease) to assist researchers for initial diagnosis. The proposed method is called a Two Stages Weighted Sampling strategy (TSWS strategy). According to the results, the performance of TSWS strategy is better than other existing methods in terms of the classification accuracy and the number of selected relevant genes. Furthermore, TSWS strategy also can use to understand and detect the relation between the gene expression levels and the classes of a cancer (or a disease).

Keywords: microarray data; gene expression data; gene selection; boxplot; weighted sampling; disease classification; cancer classification; bioinformatics; initial diagnosis; classification accuracy.

DOI: 10.1504/IJDMB.2015.069417

International Journal of Data Mining and Bioinformatics, 2015 Vol.12 No.2, pp.207 - 223

Received: 21 Feb 2013
Accepted: 19 Jul 2013

Published online: 15 May 2015 *

Full-text access for editors Full-text access for subscribers Purchase this article Comment on this article