Nucleic Acids Research Advance Access originally published online on March 5, 2009
Nucleic Acids Research 2009 37(7):e52; doi:10.1093/nar/gkp052
| ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
Nucleic Acids Research, 2009, Vol. 37, No. 7 e52
© 2009 The Author(s)
This is an Open Access article distributed under the terms of the Creative Commons Attribution Non-Commercial License (http://creativecommons.org/licenses/by-nc/2.0/uk/) which permits unrestricted non-commercial use, distribution, and reproduction in any medium, provided the original work is properly cited.
Methods Online |
Finding genes in Schistosoma japonicum: annotating novel genomes with help of extrinsic evidence
a Brejová1
Vina
21Department of Computer Science, 2Department of Applied Informatics, Faculty of Mathematics, Physics, and Informatics, Comenius University, Mlynska Dolina, 84248 Bratislava, Slovakia, 3Shanghai-MOST Key Laboratory of Health and Disease Genomics, Chinese National Human Genome Center at Shanghai, Shanghai 201203, 4Shanghai Institutes for Biological Sciences, Chinese Academy of Sciences, Shanghai 200031, China, 5David R. Cheriton School of Computer Science, University of Waterloo, Waterloo, ON, Canada and 6School of Life Sciences, Fudan University, Shanghai 200433, China
*To whom correspondence should be addressed. Tel: +86 21 50804801; Fax: +86 21 50801922; Email: zhouy{at}chgc.sh.cn
Received September 13, 2008. Revised January 15, 2009. Accepted January 15, 2009.
We have developed a novel method for estimating the parameters of hidden Markov models for gene finding in newly sequenced species. Our approach does not rely on curated training data sets, but instead uses extrinsic evidence (including paired-end ditags that have not been used in gene finding previously) and iterative training. This new method is particularly suitable for annotation of species with large evolutionary distance to the closest annotated species. We have used our approach to produce an initial annotation of more than 16 000 genes in the newly sequenced Schistosoma japonicum draft genome. We established the high quality of our predictions by comparison to full-length cDNAs (withdrawn from the extrinsic evidence) and to CEGMA core genes. We also evaluated the effectiveness of the new training procedure on Caenorhabditis elegans genome. ExonHunter and the newest parametric files for S. japonicum genome are available for download at www.bioinformatics.uwaterloo.ca/downloads/exonhunter
The authors wish it to be known that, in their opinion, the first two authors should be regarded as joint First Authors.