Nucleic Acids Research Advance Access originally published online on May 8, 2009
Nucleic Acids Research 2009 37(11):e80; doi:10.1093/nar/gkp319
| ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
Nucleic Acids Research, 2009, Vol. 37, No. 11 e80
© 2009 The Author(s)
This is an Open Access article distributed under the terms of the Creative Commons Attribution Non-Commercial License (http://creativecommons.org/licenses/by-nc/2.0/uk/) which permits unrestricted non-commercial use, distribution, and reproduction in any medium, provided the original work is properly cited.
Methods Online |
Sim4cc: a cross-species spliced alignment program
1Department of Computer Science, George Washington University, Washington, DC 20052 and 2Center for Bioinformatics and Computational Biology, University of Maryland, College Park, MD 20742, USA
*To whom correspondence should be addressed. Tel: +1 301 405 9901; Fax: +1 301 314 1341; Email: florea{at}umiacs.umd.edu
Received January 6, 2009. Revised March 24, 2009. Accepted April 20, 2009.
Advances in sequencing technologies have accelerated the sequencing of new genomes, far outpacing the generation of gene and protein resources needed to annotate them. Direct comparison and alignment of existing cDNA sequences from a related species is an effective and readily available means to determine genes in the new genomes. Current spliced alignment programs are inadequate for comparing sequences between different species, owing to their low sensitivity and splice junction accuracy. A new spliced alignment tool, sim4cc, overcomes problems in the earlier tools by incorporating three new features: universal spaced seeds, to increase sensitivity and allow comparisons between species at various evolutionary distances, and powerful splice signal models and evolutionarily-aware alignment techniques, to improve the accuracy of gene models. When tested on vertebrate comparisons at diverse evolutionary distances, sim4cc had significantly higher sensitivity compared to existing alignment programs, more than 10% higher than the closest competitor for some comparisons, while being comparable in speed to its predecessor, sim4. Sim4cc can be used in one-to-one or one-to-many comparisons of genomic and cDNA sequences, and can also be effectively incorporated into a high-throughput annotation engine, as demonstrated by the mapping of 64 000 Fagus grandifolia 454 ESTs and unigenes to the poplar genome.
Present addresses: Leming Zhou, Department of Health Information Management, University of Pittsburgh, Pittsburgh, PA 15260, USA.
Liliana Florea, Center for Bioinformatics and Computational Biology, University of Maryland, College Park, MD 20742, USA.