| Nucleic Acids Research | Pages |
The translational signal database, TransTerm, is now a relational database
Introduction
Database Organization
References
The translational signal database, TransTerm, is now a relational database
ABSTRACT
INTRODUCTION
TransTerm-97 is a major departure from the previous versions of the database as we have converted it into a relational database (1,2) under the relational database manager, PostgreSQL (3). In addition, we are in the process of extending the sequence data currently incorporated into TransTerm, to include the full sequences of the 5[prime]-untranslated region (UTR), 3[prime]-UTR and coding sequence. Figure 1, showing an idealized mRNA, indicates which additional data are to be incorporated into TransTerm.
Figure
A comparison of the old flat-file formatted TransTerm with the new relational database TransTerm is presented in Table 1. All the data initially present in TransTerm are available in this new design, however, because of the relational model and SQL, it is now much easier to create queries compared with the past. For example, queries which compare between groups of species are now greatly simplified.
Table

DATABASE ORGANIZATION
Figure 2 contains the Entity Relationship Diagram (ERD) for TransTerm-97. This figure shows most of the tables present along with their relationships using Martin's notation for optionality and cardinality (4). A list of the contents of each table is related in Table 2. The central table in the database is the CDS table which contains the sequence contexts about the start and stop codons, plus the coding sequence parameters: length, Nc (effective number of codons), GC3 (percent G+C in codon third position) and CAI (codon adaptation index) (5,6). As indicated in the diagram by the `crow's feet' and the short slashes, each tuple (record) in the CDS table has one Locus tuple associated with it, while each Locus tuple has one or more CDS tuples associated with it. Similarly, each Locus tuple has one Species tuple related to it, while each Species tuple has one or more (it is 40 or more for most Species) related Locus tuples.
Figure
For some entries, there are both major species and a few entries from related strains. The relationship between the tables Strains and Species represents our method of including related strains in the database. We group organisms with identical genus and species as one entry in the Species tables, but any additional information is entered in the Strains table. For example: Mus musculus in the Species tables consists of all entries in GenBank in which the ORGANISM is: Mus musculus and Mus musculus castaneus, Mus musculus domesticus, Mus musculus molossinus, Mus musculus musculus or Mus musculus wagneri. An entry would be made in the Strains table for each of these additional strains. To provide a cross-reference back to the original GenBank data files, the table InputFiles has been prepared. This is linked to the table Locus, so each Locus tuple points to the file it is in and where in that file it can be found. This permits users to `drill-down' and check the original annotation in GenBank. In order to answer questions about highly expressed mammalian genes, we have prepared a table of genes, previously identified as highly expressed (7). Users may use the table MamHiExpr to select this set of coding sequences for further queries. We are presently extending the data in TransTerm to include more than just the start- and stop-codon sequence contexts. We will be including the full 5[prime]-UTR, the full coding sequence and the full 3[prime]-UTR. This is indicated by the dotted lines in Figure 2. We plan to make separate tables for these sequences due to a limit on tuple size of 8000 bytes. These new tables will be linked back to the table CDS so they are readily available for queries.
Table A World Wide Web interface based on forms is being completed to allow the casual user many of the benefits of this relational format, without the requirement of knowing the computer language SQL. That forms-based interface will also be extended to permit the more advanced user of our database full access to the power of queries in SQL. Further information about these developments and TransTerm-97[prime]s data are available on the World Wide Web at http://biochem.otago.ac.nz:800/Transterm/homepage.html

REFERENCES
This page is run by Oxford University Press, Great Clarendon Street, Oxford OX2 6DP, as part of the OUP Journals Comments and feedback: www-admin{at}oup.co.uk
Last modification: 17 Dec 1997
Copyright© Oxford University Press, 1998.
This article has been cited by other articles:
![]() |
Y. FUKUNISHI and Y. HAYASHIZAKI Amino acid translation program for full-length cDNA sequences with frameshift errors Physiol Genomics, March 8, 2001; 5(2): 81 - 87. [Abstract] [Full Text] [PDF] |
||||
![]() |
Y. W. Park, J. Wilusz, and M. G. Katze Regulation of eukaryotic protein synthesis: Selective influenza viral mRNA translation is mediated by the cellular RNA-binding protein GRSF-1 PNAS, June 8, 1999; 96(12): 6694 - 6699. [Abstract] [Full Text] [PDF] |
||||
| ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||



