Article |
Transtermextended search facilities and improved integration with other databases
Department of Biochemistry and Centre for Gene Research, University of Otago PO Box 56, Dunedin, New Zealand 1Bioinfotools PO Box 6129, Dunedin, New Zealand
*To whom correspondence should be addressed. Email: chris.brown{at}otago.ac.nz
Received September 16, 2005. Revised October 31, 2005. Accepted October 31, 2005.
| ABSTRACT |
|---|
|
|
|---|
Transterm has now been publicly available for >10 years. Major changes have been made since its last description in this database issue in 2002. The current database provides data for key regions of mRNA sequences, a curated database of mRNA motifs and tools to allow users to investigate their own motifs or mRNA sequences. The key mRNA regions database is derived computationally from Genbank. It contains 3' and 5' flanking regions, the initiation and termination signal context and coding sequence for annotated CDS features from Genbank and RefSeq. The database is non-redundant, enabling summary files and statistics to be prepared for each species. Advances include providing extended search facilities, the database may now be searched by BLAST in addition to regular expressions (patterns) allowing users to search for motifs such as known miRNA sequences, and the inclusion of RefSeq data. The database contains >40 motifs or structural patterns important for translational control. In this release, patterns from UTRsite and Rfam are also incorporated with cross-referencing. Users may search their sequence data with Transterm or user-defined patterns. The system is accessible at http://uther.otago.ac.nz/Transterm.html.
| INTRODUCTION |
|---|
|
|
|---|
The fate of a large number of mRNAs is determined by motifs or structures encoded within them. These motifs are often located in the 3'-untranslated region (3'-UTR) or 5'-UTR but may be located in coding regions. Non-coding regions have been the focus of much research, reviewed in (13), and are implicated in the regulation of gene expression by microRNAs (4).
| RELEVANT MRNA REGIONS EXTRACTED FROM GENBANK AND REFSEQ |
|---|
|
|
|---|
The 5'-UTR, CDS and 3'-UTRs were extracted from all CDS entries that have a termination codon in Genbank (5) and were analysed using our previously described methods (6) and references therein. As most CDS do not have known and annotated 3' or 5' ends, we extract 1000 bases prior to the initiation codon, or 3000 bases after the termination codon for sequences from eukaryote species and 200 prior and 600 after for bacterial sequences. Entries are truncated at the next annotated feature if it overlaps (e.g. next CDS in bacteria). This results in files that will include the 3'- and 5'-UTRs, but may extend beyond them. A small proportion of long UTRs will be truncated by this method. Our analysis of 17 048 non-redundant human RefSeq mRNAs shows only 3% were >3000 bases in length. This gives a redundant set, e.g. for human 3'-UTRs 94 791 due to the redundancy in Genbank. A non-redundant set is derived (e.g. 33 332 sequences for humans) according to our published methods (6). These non-redundant datasets are analysed by species to give summary files, e.g. the frequency of bases around the termination codon for these 33 332 genes analysed by several means (*.termnrttmatrix, *.termnrttbit, *.termnrttchi, *.termnrttcvs, files; see also Figure 1 legend) (6). As expected, these show a bias toward A and G in the position immediately after the termination codon. Purines in this position have previously shown to enhance termination (7). These summary files represent the most commonly used codons or initiation and termination contexts for each species.
|
| PATTERN/MOTIF DESCRIPTIONS |
|---|
|
|
|---|
The Transterm database also contains descriptions of experimentally defined motifs from mRNAs. These are derived from the literature, or other databases [UTRdb (8) and Rfam (9)], reviewed, updated and integrated into the Transterm database. An example of a Transterm motif description is shown in Table 1. The element described promotes read-through of a termination codon, hindering termination in
5% of ribosome passes. The entry contains the pattern, a description of its function as well as key references and cross-references to other databases (in this case Recode, 10). An interesting feature of this pattern is that it contains a C in the position immediately after the stop codon, this is both less frequent and efficient in eukaryotic termination (7). These files represent features important for particular mRNAs.
|
| ACCESS TO THE DATABASE |
|---|
|
|
|---|
Processed sequence data and the programs used to make them can be obtained from the website. The interface has been redesigned for this release. Subsets of the database can be searched for putative motifs using regular expressions and matrices using the program scan_for_matches (10) or BLAST (11). Subsets may be user-chosen regions of a gene (5'- or 3'-UTR, CDS, translation start and stop context) for specified Genbank divisions or species (patterns only).
User-defined pattern searches can include a wide range of elements including simple sequences, gaps, reverse complemented sequences, palindromes, mismatches, n mismatches in a pattern, range of gap sizes, weight patterns and repeats. The on-line Help Browser that is part of Transterm contains detailed notes under help on Motif patterns (scan-for-matches).
We have added the facility to search using longer query sequences with BLAST using empirically altered defaults to make it suitable for finding motifs. This approach will be useful to users with sequences of
50100 bases, which they expect contains a conserved motif. The motif must have retained at least seven identical bases, but elsewhere in the motif sequence, it may have undergone insertions, deletions and substitutions that are common in UTRs. For such long motifs regular expression-based algorithms are usually impractical, as they would need to include a high tolerance for mismatches, insertions and deletions, which makes them inefficient.
The additional BLAST parameters given, presented in the Other advanced options section of the BLAST search form, are -W 7 -G 2 -E 1 -q -2 -r 2 -e 100 -S 1. These, in order, with the default value for blastn in square brackets, are W, initial (seed) word size [11]; G, gap opening penalty [5]; E, gap extension penalty [2]; q, nucleotide mismatch score [3]; r, score for a nucleotide match [1]; e, threshold expectation value for keeping an alignment [10] and S, search only the top strand. These parameters are suitable for matching small motifs, which may contain gaps and substitutions, and may occur fairly frequently.
| COMPARISON WITH OTHER TRANSLATIONAL CONTROL DATABASES |
|---|
|
|
|---|
Databases of mRNA sequences
Transterm sequence files are provided for all CDS sequences in Genbank, making it the most comprehensive of the databases available of UTRs. UTRdb and UTRsite focus on those eukaryotic UTRs that are well annotated in the sequence databases (e.g. complete mRNAs rather than genomic sequences).
Databases that include translational control elements
Several specialized databases that include translational control elements are available and referenced on our website. Examples include ARED, a database of putative AU rich element containing mRNAs (12), the Recode database of recoding data (13) and the Rfam database of RNA families (9). Elements/motifs described in these databases and relevant to mRNA biology have been included in Transterm where it was possible to create an accurate pattern file and they complement the Transterm data.
Alternative approaches to identifying regulatory motifs in mRNAs include phylogenetic footprinting (14). The Ancient Conserved UnTranslated Sequence (ACUTS) database is available, but has not been recently updated. However, it contains descriptions of several hundred phylogenetically conserved elements in 3'- and 5'-UTRs (14). On the Transterm website access is also provided to search the conserved 5'- and 3'-UTRs from ACUTS.
| FURTHER INFORMATION |
|---|
|
|
|---|
Extensive help is available on the website. This includes an outline of approaches to finding motifs in mRNAs that may affect gene expression and links to other resources that facilitate such investigations.
| ACKNOWLEDGEMENTS |
|---|
The work was supported by a NZ Marsden fund grant to C.M.B., and NZ Health Research Council grant to W.P.T., Elisabeth Poole and C.M.B. Funding to pay the Open Access publication charges for this article was provided by the Health Research Council of New Zealand.
Conflict of interest statement. None declared.
| REFERENCES |
|---|
|
|
|---|
- Mazumder, B., Seshadri, V., Fox, P.L. (2003) Translational control by the 3'-UTR: the ends specify the means Trends Biochem. Sci, . 28, 9198[CrossRef][Web of Science][Medline] .
- Waggoner, S.A. and Liebhaber, S.A. (2003) Regulation of alpha-globin mRNA stability Exp. Biol. Med, . 228, 387395
[Abstract/Free Full Text] . - Kuersten, S. and Goodwin, E.B. (2003) The power of the 3' UTR: translational control and development Nat. Rev. Genet, . 4, 626637[CrossRef][Web of Science][Medline] .
- Pasquinelli, A.E. (2002) MicroRNAs: deviants no longer Trends Genet, . 18, 171173[CrossRef][Web of Science][Medline] .
- Benson, D.A., Karsch-Mizrachi, I., Lipman, D.J., Ostell, J., Wheeler, D.L. (2003) GenBank Nucleic Acids Res, . 31, 2327
[Abstract/Free Full Text] . - Jacobs, G.H., Rackham, O., Stockwell, P.A., Tate, W., Brown, C.M. (2002) Transterm: a database of mRNAs and translational control elements Nucleic Acids Res, . 30, 310311
[Abstract/Free Full Text] . - McCaughan, K.K., Brown, C.M., Dalphin, M.E., Berry, M.J., Tate, W.P. (1995) Translational termination efficiency in mammals is influenced by the base following the stop codon Proc. Natl Acad. Sci. USA, 92, 54315435
[Abstract/Free Full Text] . - Pesole, G., Liuni, S., Grillo, G., Licciulli, F., Mignone, F., Gissi, C., Saccone, C. (2002) UTRdb and UTRsite: specialized databases of sequences and functional elements of 5' and 3' untranslated regions of eukaryotic mRNAs. Update 2002 Nucleic Acids Res, . 30, 335340
[Abstract/Free Full Text] . - Griffiths-Jones, S., Bateman, A., Marshall, M., Khanna, A., Eddy, S.R. (2003) Rfam: an RNA family database Nucleic Acids Res, . 31, 439441
[Abstract/Free Full Text] . - Dsouza, M., Larsen, N., Overbeek, R. (1997) Searching for patterns in genomic data Trends Genet, . 13, 497498[Web of Science][Medline] .
- Altschul, S.F., Madden, T.L., Schaffer, A.A., Zhang, J., Zhang, Z., Miller, W., Lipman, D.J. (1997) Gapped BLAST and PSI-BLAST: a new generation of protein database search programs Nucleic Acids Res, . 25, 33893402
[Abstract/Free Full Text] . - Bakheet, T., Frevel, M., Williams, B.R., Greer, W., Khabar, K.S. (2001) ARED: human AU-rich element-containing mRNA database reveals an unexpectedly diverse functional repertoire of encoded proteins Nucleic Acids Res, . 29, 246254
[Abstract/Free Full Text] . - Baranov, P.V., Gurvich, O.L., Hammer, A.W., Gesteland, R.F., Atkins, J.F. (2003) RECODE 2003 Nucleic Acids Res, . 31, 8789
[Abstract/Free Full Text] . - Duret, L. and Bucher, P. (1997) Searching for regulatory elements in human noncoding sequences Curr. Opin. Struct. Biol, . 7, 399406[CrossRef][Web of Science][Medline] .
- Jacobs, G.H., Stockwell, P.A., Schrieber, M.J., Tate, W.P., Brown, C.M. (2000) Transterm: a database of messenger RNA components and signals Nucleic Acids Res, . 28, 293295
[Abstract/Free Full Text] .
This article has been cited by other articles:
![]() |
G. H. Jacobs, A. Chen, S. G. Stevens, P. A. Stockwell, M. A. Black, W. P. Tate, and C. M. Brown Transterm: a database to aid the analysis of regulatory sequences in mRNAs Nucleic Acids Res., January 1, 2009; 37(suppl_1): D72 - D76. [Abstract] [Full Text] [PDF] |
||||
![]() |
P. P. Gardner, J. Daub, J. G. Tate, E. P. Nawrocki, D. L. Kolbe, S. Lindgreen, A. C. Wilkinson, R. D. Finn, S. Griffiths-Jones, S. R. Eddy, et al. Rfam: updates to the RNA families database Nucleic Acids Res., January 1, 2009; 37(suppl_1): D136 - D140. [Abstract] [Full Text] [PDF] |
||||
![]() |
M. Gupta and P. R. Copeland Functional Analysis of the Interplay between Translation Termination, Selenocysteine Codon Context, and Selenocysteine Insertion Sequence-binding Protein 2 J. Biol. Chem., December 21, 2007; 282(51): 36797 - 36807. [Abstract] [Full Text] [PDF] |
||||
![]() |
I. P. Ivanov and J. F. Atkins Ribosomal frameshifting in decoding antizyme mRNAs from yeast and protists to humans: close to 300 cases reveal remarkable diversity despite underlying conservation Nucleic Acids Res., March 19, 2007; 35(6): 1842 - 1858. [Abstract] [Full Text] [PDF] |
||||
![]() |
H.-Y. Huang, C.-H. Chien, K.-H. Jen, and H.-D. Huang RegRNA: an integrated web server for identifying regulatory RNA motifs and elements. Nucleic Acids Res., July 1, 2006; 34(Web Server issue): W429 - W434. [Abstract] [Full Text] [PDF] |
||||
| ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||


