Skip Navigation

This Article
Right arrow Abstract Freely available
Right arrow Print PDF (143K) Freely available
Right arrow Alert me when this article is cited
Right arrow Alert me if a correction is posted
Services
Right arrow Email this article to a friend
Right arrow Similar articles in this journal
Right arrow Similar articles in ISI Web of Science
Right arrow Similar articles in PubMed
Right arrow Alert me to new issues of the journal
Right arrow Add to My Personal Archive
Right arrow Download to citation manager
Right arrow Search for citing articles in:
ISI Web of Science (20)
Right arrowRequest Permissions
Right arrow Commercial Re-use Guidelines
for Open Access NAR Content
Google Scholar
Right arrow Articles by Dalphin, M. E.
Right arrow Articles by Tate, W. P.
Right arrow Search for Related Content
PubMed
Right arrow PubMed Citation
Right arrow Articles by Dalphin, M. E.
Right arrow Articles by Tate, W. P.
Social Bookmarking
 Add to CiteULike   Add to Connotea   Add to Del.icio.us  
What's this?

Nucleic Acids Research Pages 335-337  


The translational signal database, TransTerm, is now a relational database
Introduction
Database Organization
References


The translational signal database, TransTerm, is now a relational database

The translational signal database, TransTerm, is now a relational database

Mark E. Dalphin*, Chris M. Brown, Peter A. Stockwell, Warren P. Tate

Department of Biochemistry and Centre for Gene Research, University of Otago, PO Box 56, Dunedin, New Zealand

Received October 8, 1997; Accepted October 15, 1997

ABSTRACT

TransTerm-97 contains more than 97 500 non-redundant coding-sequence initiation and termination contexts compiled from GenBank, release 101 (15-June-1997). In addition, several coding sequence parameters are available: coding sequence length, Nc, GC3, and, when it is computable, codon adaptation index (CAI). Codon usage tables and summaries of start and stop codon contexts are also included. The information covers more than 325 species and organelles, including seven complete bacterial genomes and one complete eukaryotic genome. To promote research in translational control of protein synthesis, TransTerm has been converted into a relational database to ease the process of making queries. The relational database manager, Postgresql, gives access to the database using SQL (Structured Query Language). A World Wide Web interface using forms is being completed to allow the casual user access to the database. Extensions are planned to include the full 5[prime]-UTR, full coding sequence and 3[prime]-UTR. TransTerm-97 is available on the World Wide Web at: http://biochem.otago.ac.nz:800/Transterm/homepage.html

INTRODUCTION

TransTerm-97 is a major departure from the previous versions of the database as we have converted it into a relational database (1,2) under the relational database manager, PostgreSQL (3). In addition, we are in the process of extending the sequence data currently incorporated into TransTerm, to include the full sequences of the 5[prime]-untranslated region (UTR), 3[prime]-UTR and coding sequence. Figure 1, showing an idealized mRNA, indicates which additional data are to be incorporated into TransTerm.


Figure 1 An idealized mRNA, showing the coding sequence bracketed by a start codon (AUG) and a stop codon (UAG). The current release of TransTerm contains the sequence data: UTR5p, Start, CDS5p, CDS3p, Stop and UTR3p. These sequences do not include the complete UTRs or the complete coding sequence. A version of TransTerm that should be available within the year will include these as new tables.

A comparison of the old flat-file formatted TransTerm with the new relational database TransTerm is presented in Table 1. All the data initially present in TransTerm are available in this new design, however, because of the relational model and SQL, it is now much easier to create queries compared with the past. For example, queries which compare between groups of species are now greatly simplified.


Table 1. A comparison of data available in the old flat-file TransTerm and the new relational database, TransTerm-97Items which are `being added' to the relational database vary in `time to completion'. Some items, such as protein-translation should be completed long before publication of this paper. Other items, such as the full 3[prime]-UTR, are the subject of a grant we have just received and we expect them to be added during 1998. Consult our WWW site for the latest progress reports.


DATABASE ORGANIZATION

Figure 2 contains the Entity Relationship Diagram (ERD) for TransTerm-97. This figure shows most of the tables present along with their relationships using Martin's notation for optionality and cardinality (4). A list of the contents of each table is related in Table 2. The central table in the database is the CDS table which contains the sequence contexts about the start and stop codons, plus the coding sequence parameters: length, Nc (effective number of codons), GC3 (percent G+C in codon third position) and CAI (codon adaptation index) (5,6). As indicated in the diagram by the `crow's feet' and the short slashes, each tuple (record) in the CDS table has one Locus tuple associated with it, while each Locus tuple has one or more CDS tuples associated with it. Similarly, each Locus tuple has one Species tuple related to it, while each Species tuple has one or more (it is 40 or more for most Species) related Locus tuples.


Figure 2 A simplified Entity Relationship Diagram (ERD) for the TransTerm relational database. Martin's notation for cardinality and optionality is used to show the relationships between the tables [open circles indicate an optional relationship, while vertical slashes indicate a required relationship. The `crows feet' indicate that many tuples (i.e. many rows) can take part in the relationship, while a single slash indicates that only one tuple (i.e. one row) may take part]. Dotted lines are used to indicate relationships that are not completed. The descriptions of the column contents of these tables are specified in Table 2.

For some entries, there are both major species and a few entries from related strains. The relationship between the tables Strains and Species represents our method of including related strains in the database. We group organisms with identical genus and species as one entry in the Species tables, but any additional information is entered in the Strains table. For example: Mus musculus in the Species tables consists of all entries in GenBank in which the ORGANISM is: Mus musculus and Mus musculus castaneus, Mus musculus domesticus, Mus musculus molossinus, Mus musculus musculus or Mus musculus wagneri. An entry would be made in the Strains table for each of these additional strains.

To provide a cross-reference back to the original GenBank data files, the table InputFiles has been prepared. This is linked to the table Locus, so each Locus tuple points to the file it is in and where in that file it can be found. This permits users to `drill-down' and check the original annotation in GenBank.

In order to answer questions about highly expressed mammalian genes, we have prepared a table of genes, previously identified as highly expressed (7). Users may use the table MamHiExpr to select this set of coding sequences for further queries.

We are presently extending the data in TransTerm to include more than just the start- and stop-codon sequence contexts. We will be including the full 5[prime]-UTR, the full coding sequence and the full 3[prime]-UTR. This is indicated by the dotted lines in Figure 2. We plan to make separate tables for these sequences due to a limit on tuple size of 8000 bytes. These new tables will be linked back to the table CDS so they are readily available for queries.


Table 2 A simplified description of the tables in the TransTerm-97 relational database


A World Wide Web interface based on forms is being completed to allow the casual user many of the benefits of this relational format, without the requirement of knowing the computer language SQL. That forms-based interface will also be extended to permit the more advanced user of our database full access to the power of queries in SQL. Further information about these developments and TransTerm-97[prime]s data are available on the World Wide Web at http://biochem.otago.ac.nz:800/Transterm/homepage.html

REFERENCES

1. Brown,C.M., Dalphin,M.E., Stockwell,P.A. and Tate,W.P. (1993) Nucleic Acids Res. 21, 3119-3123. MEDLINE Abstract

2. Dalphin,M.E., Brown,C.M., Stockwell,P.A. and Tate,W.P. (1997) Nucleic Acids Res. 25, 246-247. MEDLINE Abstract

PostgreSQL Homepage. http://www.postgresql.org/ (Oct 1997).

3. Finkelstein,C. (1989) An Introduction to Information Engineering. From Strategic Planning to Information Systems. Addison-Wesley Publishing, Reading, MA.

4. Sharp,P.M. and Li,W.H. (1987) Nucleic Acids Res. 15, 1281-1295. MEDLINE Abstract

5. Wright,F. (1990) Gene 87, 23-29. MEDLINE Abstract

6. Brown,C.M., Stockwell,P.A., Dalphin,M.E. and Tate,W.P. (1994) Nucleic Acids Res. 22, 3620-3624. MEDLINE Abstract


*To whom correspondence should be addressed. Tel: +64 3 479 7841; Fax: +64 3 479 7866; Email: mdalphin@sanger.otago.ac.nz


This page is run by Oxford University Press, Great Clarendon Street, Oxford OX2 6DP, as part of the OUP Journals Comments and feedback: www-admin{at}oup.co.uk
Last modification: 17 Dec 1997
Copyright© Oxford University Press, 1998.

Add to CiteULike CiteULike   Add to Connotea Connotea   Add to Del.icio.us Del.icio.us    What's this?


This article has been cited by other articles:


Home page
Physiol. GenomicsHome page
Y. FUKUNISHI and Y. HAYASHIZAKI
Amino acid translation program for full-length cDNA sequences with frameshift errors
Physiol Genomics, March 8, 2001; 5(2): 81 - 87.
[Abstract] [Full Text] [PDF]


Home page
Proc. Natl. Acad. Sci. USAHome page
Y. W. Park, J. Wilusz, and M. G. Katze
Regulation of eukaryotic protein synthesis: Selective influenza viral mRNA translation is mediated by the cellular RNA-binding protein GRSF-1
PNAS, June 8, 1999; 96(12): 6694 - 6699.
[Abstract] [Full Text] [PDF]


This Article
Right arrow Abstract Freely available
Right arrow Print PDF (143K) Freely available
Right arrow Alert me when this article is cited
Right arrow Alert me if a correction is posted
Services
Right arrow Email this article to a friend
Right arrow Similar articles in this journal
Right arrow Similar articles in ISI Web of Science
Right arrow Similar articles in PubMed
Right arrow Alert me to new issues of the journal
Right arrow Add to My Personal Archive
Right arrow Download to citation manager
Right arrow Search for citing articles in:
ISI Web of Science (20)
Right arrowRequest Permissions
Right arrow Commercial Re-use Guidelines
for Open Access NAR Content
Google Scholar
Right arrow Articles by Dalphin, M. E.
Right arrow Articles by Tate, W. P.
Right arrow Search for Related Content
PubMed
Right arrow PubMed Citation
Right arrow Articles by Dalphin, M. E.
Right arrow Articles by Tate, W. P.
Social Bookmarking
 Add to CiteULike   Add to Connotea   Add to Del.icio.us  
What's this?