Published online 4 February 2004
Nucleic Acids Research, 2004, Vol. 32, No. 2 776-783
© 2004 Oxford University Press
Gene structure conservation aids similarity based gene prediction
The Wellcome Trust Sanger Institute, Wellcome Trust Genome Campus, Hinxton, Cambridge CB10 1SA, UK
*To whom correspondence should be addressed at Oxford Centre for Gene Function, South Parks Road, Oxford OX1 3QB, UK. Tel: +44 1865 285365; Fax: +44 1865 285384; Email: meyer{at}stats.ox.ac.uk
One of the primary tasks in deciphering the functional contents of a newly sequenced genome is the identification of its protein coding genes. Existing computational methods for gene prediction include ab initio methods which use the DNA sequence itself as the only source of information, comparative methods using multiple genomic sequences, and similarity based methods which employ the cDNA or protein sequences of related genes to aid the gene prediction. We present here an algorithm implemented in a computer program called Projector which combines comparative and similarity approaches. Projector employs similarity information at the genomic DNA level by directly using known genes annotated on one DNA sequence to predict the corresponding related genes on another DNA sequence. It therefore makes explicit use of the conservation of the exonintron structure between two related genes in addition to the similarity of their encoded amino acid sequences. We evaluate the performance of Projector by comparing it with the program Genewise on a test set of 491 pairs of independently confirmed mouse and human genes. It is more accurate than Genewise for genes whose proteins are <80% identical, and is suitable for use in a combined gene prediction system where other methods identify well conserved and non-conserved genes, and pseudogenes.
![]()
CiteULike
Connotea
Del.icio.us What's this?
This article has been cited by other articles:
![]() |
T. Y. Lam and I. M. Meyer HMMCONVERTER 1.0: a toolbox for hidden Markov models Nucleic Acids Res., September 8, 2009; (2009) gkp662v1. [Abstract] [Full Text] [PDF] |
||||
![]() |
M. Seifert, J. Keilwagen, M. Strickert, and I. Grosse Utilizing gene pair orientations for HMM-based analysis of promoter array ChIP-chip data Bioinformatics, August 15, 2009; 25(16): 2118 - 2125. [Abstract] [Full Text] [PDF] |
||||
![]() |
O. Gotoh Direct mapping and alignment of protein sequences onto genomic sequence Bioinformatics, November 1, 2008; 24(21): 2438 - 2444. [Abstract] [Full Text] [PDF] |
||||
![]() |
O. Gotoh A space-efficient and accurate method for mapping and aligning cDNA sequences onto genomic sequence Nucleic Acids Res., May 1, 2008; 36(8): 2630 - 2638. [Abstract] [Full Text] [PDF] |
||||
![]() |
M. Stanke, M. Diekhans, R. Baertsch, and D. Haussler Using native and syntenically mapped cDNA alignments to improve de novo gene finding Bioinformatics, March 1, 2008; 24(5): 637 - 644. [Abstract] [Full Text] [PDF] |
||||
![]() |
X. Cui, T. Vinar, B. Brejova, D. Shasha, and M. Li Homology search for genes Bioinformatics, July 1, 2007; 23(13): i97 - i103. [Abstract] [Full Text] [PDF] |
||||
![]() |
S. J. Hsieh, C. Y. Lin, N. H. Liu, W. Y. Chow, and C. Y. Tang GeneAlign: a coding exon prediction tool based on phylogenetical comparisons. Nucleic Acids Res., July 1, 2006; 34(Web Server issue): W280 - W284. [Abstract] [Full Text] [PDF] |
||||
![]() |
P. Larranaga, B. Calvo, R. Santana, C. Bielza, J. Galdiano, I. Inza, J. A. Lozano, R. Armananzas, G. Santafe, A. Perez, et al. Machine learning in bioinformatics Brief Bioinform, March 1, 2006; 7(1): 86 - 112. [Abstract] [Full Text] [PDF] |
||||
![]() |
B. J. Wagstaff and D. J. Begun Molecular Population Genetics of Accessory Gland Protein Genes and Testis-Expressed Genes in Drosophila mojavensis and D. arizonae Genetics, November 1, 2005; 171(3): 1083 - 1101. [Abstract] [Full Text] [PDF] |
||||



