Title: Prediction of DNA-binding residues from sequence information using convolutional neural network

Authors: Jiyun Zhou; Qin Lu; Ruifeng Xu; Lin Gui; Hongpeng Wang

Addresses: School of Computer Science and Technology, Harbin Institute of Technology Shenzhen Graduate School, Shenzhen, Guangdong, China; Department of Computing, The Hong Kong Polytechnic University, Hung Hom, Hong Kong ' Department of Computing, The Hong Kong Polytechnic University, Hung Hom, Hong Kong ' School of Computer Science and Technology, Harbin Institute of Technology Shenzhen Graduate School, Shenzhen, Guangdong, China; Shenzhen Engineering Laboratory of Performance Robots at Digital Stage, Shenzhen Graduate School, Harbin Institute of Technology, Shenzhen, China ' School of Computer Science and Technology, Harbin Institute of Technology Shenzhen Graduate School, Shenzhen, Guangdong, China ' School of Computer Science and Technology, Harbin Institute of Technology Shenzhen Graduate School, Shenzhen, Guangdong, China

Abstract: Most DNA-binding residue prediction methods overlooked the motif features which are important for the recognition between protein and DNA. In order to efficiently use the motif features for prediction, we first propose to use Convolutional Neural Network (CNN) in deep learning to extract discriminant motif features. We then propose a neural network classifier, referred to as CNNsite, by combining the extracted motif features, sequence features and evolutionary features. The evaluation on PDNA-62, PDNA-224 and TR-265 shows that motif features perform better than sequence features and evolutionary features. The evaluation on PDNA-62, PDNA-224 and an independent data set shows that CNNsite also outperforms the previous methods. We also show that many motif features composed by the residues which play important roles in DNA-protein interactions have large discriminant powers. It indicates that CNNsite has very good ability to extract important motif features for DNA-binding residue prediction.

Keywords: DNA; protein; interaction; residue; CNN; motif; sequence; PSSM; evolutionary; binding; neural network.

DOI: 10.1504/IJDMB.2017.084265

International Journal of Data Mining and Bioinformatics, 2017 Vol.17 No.2, pp.132 - 152

Received: 08 Mar 2017
Accepted: 08 Mar 2017

Published online: 22 May 2017 *

Full-text access for editors Full-text access for subscribers Purchase this article Comment on this article