Title: Discovering cancer biomarkers: from DNA to communities of genes

Authors: Mohammed Alshalalfa, Tansel Ozyer, Reda Alhajj, Jon Rokne

Addresses: Department of Computer Science, University of Calgary, Calgary, Alberta, Canada; Department of Computer Engineering, TOBB Economics and Technology University, Ankara, Turkey; Department of Computer Science, Global University, Beirut, Lebanon. ' Department of Computer Science, University of Calgary, Calgary, Alberta, Canada; Department of Computer Engineering, TOBB Economics and Technology University, Ankara, Turkey; Department of Computer Science, Global University, Beirut, Lebanon. ' Department of Computer Science, University of Calgary, Calgary, Alberta, Canada; Department of Computer Engineering, TOBB Economics and Technology University, Ankara, Turkey; Department of Computer Science, Global University, Beirut, Lebanon. ' Department of Computer Science, University of Calgary, Calgary, Alberta, Canada; Department of Computer Engineering, TOBB Economics and Technology University, Ankara, Turkey; Department of Computer Science, Global University, Beirut, Lebanon

Abstract: In this paper, we consider genes as actors of a social network, a research area that has not yet received attention in the literature of social network mining and analysis. Even though our research project covers both genes and proteins, we concentrate in this paper on gene; we first try to describe the gene expression data and how gene interactions can be realised as a social network. Then we describe how data mining techniques could reveal important information by identifying disease biomarkers from the social communities of genes. This is possible because of the way genes interact and form communities that are anticipated to have certain effects on the different processes that take place within an organism. Gene communities both contribute to the development of an organism by coding proteins and cause serious diseases. In this paper, we concentrate on genes that act as cancer biomarkers. We apply a multiobjective clustering approach to produce alternative clustering solutions and then derive a matrix that reflects the link between genes based on their common occurrence on the same cluster within different alternative solutions. The latter matrix leads to the social network of genes, which is then analysed to discover the communities and the central genes within each community. The latter genes are studied further as cancer biomarkers. The test results are promising in demonstrating the applicability and effectiveness of the developed mining-based methodology.

Keywords: social communities; social networks; social networking; web based communities; virtual communities; online communities; customer behaviour; data mining; gene expression data; multiobjective optimisation; cancer biomarkers; DNA; deoxyribonucleic acid; social network analysis; gene interactions; organisms; coding proteins; disease; health; clustering; alternative solutions; central genes; common occurrences; networks; virtual organisations; web based organisations; online organisations; open source intelligence; web mining; world wide web; internet.

DOI: 10.1504/IJNVO.2011.037166

International Journal of Networking and Virtual Organisations, 2011 Vol.8 No.1/2, pp.158 - 178

Published online: 30 Nov 2010 *

Full-text access for editors Access for subscribers Purchase this article Comment on this article