Nucleic Acids Research, 2003, Vol. 31, No. 8 2217-2226
© 2003 Oxford University Press
Transcript identification by analysis of short sequence tagsinfluence of tag length, restriction site and transcript database
Department of Biotechnology, Royal Institute of Technology (KTH), Roslagsvägen 30B, S-106 91 Stockholm, Sweden and 1 Department of Biosciences, Karolinska Institute, Novum, S-141 57 Huddinge, Sweden
*To whom correspondence should be addressed. Tel: +46 8 5537 8347; Fax: +46 8 5537 8481; Email: peru{at}biotech.kth.se
There exist a number of gene expression profiling techniques that utilize restriction enzymes for generation of short expressed sequence tags. We have studied how the choice of restriction enzyme influences various characteristics of tags generated in an experiment. We have also investigated various aspects of in silico transcript identification that these profiling methods rely on. First, analysis of 14 248 mRNA sequences derived from the RefSeq transcript database showed that 130% of the sequences lack a given restriction enzyme recognition site. Moreover, 15% of the transcripts have recognition sites located less than 10 bases from the poly(A) tail. The uniqueness of 10 bp tags lies in the range 9095%, which increases only slightly with longer tags, due to the existence of closely related transcripts. Furthermore, 330% of upstream 10 bp tags are identical to 3' tags, introducing a risk of misclassification if upstream tags are present in a sample. Second, we found that a sequence length of 1617 bp, including the recognition site, is sufficient for unique transcript identification by BLAST based sequence alignment to the UniGene Human non-redundant database. Third, we constructed a tag-to-gene mapping for UniGene and compared it to an existing mapping database. The mappings agreed to 7983%, where the selection of representative sequences in the UniGene clusters is the main cause of the disagreement. The results of this study may serve to improve the interpretation of sequence-based expression studies and the design of hybridization arrays, by identifying short tags that have a high reliability and separating them from tags that carry an inherent ambiguity in their capacity to discriminate between genes. To this end, supplementary information in the form of a web companion to this paper is located at http:// biobase.biotech.kth.se/tagseq.
![]()
CiteULike
Connotea
Del.icio.us What's this?
This article has been cited by other articles:
![]() |
A. Fischer, A. Lenhard, H. Tronecker, Y. Lorat, M. Kraenzle, O. Sorgenfrei, T. Zeppenfeld, M. Haushalter, G. Vogt, U. Gruene, et al. iGentifier: indexing and large-scale profiling of unknown transcriptomes Nucleic Acids Res., July 9, 2007; 35(14): 4640 - 4648. [Abstract] [Full Text] [PDF] |
||||
![]() |
V. Poroyko, L.G. Hejlek, W.G. Spollen, G.K. Springer, H.T. Nguyen, R.E. Sharp, and H.J. Bohnert The Maize Root Transcriptome by Serial Analysis of Gene Expression Plant Physiology, July 1, 2005; 138(3): 1700 - 1710. [Abstract] [Full Text] [PDF] |
||||
![]() |
T. Tengs, T. LaFramboise, R. B. Den, D. N. Hayes, J. Zhang, S. DebRoy, R. C. Gentleman, K. O'Neill, B. Birren, and M. Meyerson Genomic representations using concatenates of Type IIB restriction endonuclease digestion fragments Nucleic Acids Res., August 25, 2004; 32(15): e121 - e121. [Abstract] [Full Text] [PDF] |
||||

