Prediction of Gene Structures from RNA-seq Data Using Dual Decomposition

Abstract

Numerous computational algorithms for predicting protein-coding genes from genomic sequences have been developed, and hidden Markov models (HMMs) have frequently been used to model gene structures. For eukaryotes, more complex gene structures such as introns make gene prediction much harder due to isoforms of transcripts by alternative splicing machinery. We develop a novel gene prediction method for eukaryote genomes that extends the traditional HMM-based gene prediction model by incorporating comprehensive evidence of transcripts by using RNA sequencing (RNA-seq) technology. We formulate gene prediction as an integer programming problem, and solve it by the dual decomposition technique. To confirm the utility of the proposed algorithm, computational experiments on benchmark datasets were conducted. The results show that our algorithm efficiently and effectively employs RNA-seq data in gene structure prediction.

Publication
IPSJ Transactions on Bioinformatics, 9:1-6