Title: Dynamic visual data mining: biological sequence analysis and annotation using SeqVISTA
Author: Tianhua Niu, Zhenjun Hu
Division of Preventive Medicine, Department of Medicine, Brigham and Women
s Hospital, 900 Commonwealth Ave. East, Boston, MA 02215, USA.
Center for Advanced Genomic Technology, Bioinformatics Program, Boston University, 44 Cummington St., Boston, MA 02215, USA
Abstract: In the post-genomic era, the volume of public sequence databases is increasing exponentially and visualisation-centric techniques have become more and more important in biological sequence analysis and annotation. In this paper, we present a methodology called dynamic visual data mining (DVDM), which combines biological object modelling, interactive display, and data analysis tools into one integrative platform. Using Java Development Kit v1.4, an object-oriented software named SeqVISTA has been developed based on DVDM. To illustrate the application of SeqVISTA, the following examples are shown: regular expression pattern matching; comparative analysis of alternative exon splicing patterns; Fourier analyses; exon prediction (MZEF and GENSCAN). Overall, we argue that DVDM is an important technique for biologists to unveil the information hidden behind the large genomic and proteomic databases, and SeqVISTA provides a versatile tool that integrates multiple computational algorithms for meeting biologists' data mining needs.
Keywords: visualisation; dynamic visual data mining; GenBank; SWISS-PROT; biological sequence analysis; annotation; biological object modelling; interactive display; data analysis; object-oriented software; DNA sequence; protein sequence; computational biology; bioinformatics.
Int. J. of Bioinformatics Research and Applications, 2005 Vol.1, No.1, pp.18 - 30
Available online: 21 Apr 2005