Nucleic Acids Research Advance Access originally published online on August 20, 2007
Nucleic Acids Research 2007 35(17):e108; doi:10.1093/nar/gkm495
| ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
Nucleic Acids Research, 2007, Vol. 35, No. 17 e108
© 2007 The Author(s)
This is an Open Access article distributed under the terms of the Creative Commons Attribution Non-Commercial License (http://creativecommons.org/licenses/by-nc/2.0/uk/) which permits unrestricted non-commercial use, distribution, and reproduction in any medium, provided the original work is properly cited.
Methods Online |
Transcriptome annotation using tandem SAGE tags
1Laboratoire dInformatique, de Robotique et de Microélectronique, UMR 5506 CNRS – Université de Montpellier II, 161 rue Ada, 34392 Montpellier 05, 2Institut de Génétique Humaine, CNRS UPR 1142, 141 rue de la Cardonille, 34396 Montpellier 05, France, 3Helsinki University of Technology, P.O. Box 5400, FI-02015 HUT, Finland and 4Skuld-Tech, 134, rue du Curat – Bat. Amarante, 34090 Montpellier, France
*To whom correspondence should be addressed. Tel: +33 4 67 14 42 36; Fax: +33 4 67 14 42 36; Email: commes{at}univ-montp2.fr Correspondence may also be addressed to Jacques Marti. Tel: +334 67 144241; Email: jmarti{at}univ-montp2.fr
Received April 23, 2007. Revised June 5, 2007. Accepted June 6, 2007.
Analysis of several million expressed gene signatures (tags) revealed an increasing number of different sequences, largely exceeding that of annotated genes in mammalian genomes. Serial analysis of gene expression (SAGE) can reveal new Poly(A) RNAs transcribed from previously unrecognized chromosomal regions. However, conventional SAGE tags are too short to identify unambiguously unique sites in large genomes. Here, we design a novel strategy with tags anchored on two different restrictions sites of cDNAs. New transcripts are then tentatively defined by the two SAGE tags in tandem and by the spanning sequence read on the genome between these tagged sites. Having developed a new algorithm to locate these tag-delimited genomic sequences (TDGS), we first validated its capacity to recognize known genes and its ability to reveal new transcripts with two SAGE libraries built in parallel from a single RNA sample. Our algorithm proves fast enough to experiment this strategy at a large scale. We then collected and processed the complete sets of human SAGE tags to predict yet unknown transcripts. A cross-validation with tiling arrays data shows that 47% of these TDGS overlap transcriptional active regions. Our method provides a new and complementary approach for complex transcriptome annotation.