Nucleic Acids Research, 2000, Vol. 28, No. 1 372-373
© 2000 Oxford University Press
Update of KEYnet: a gene and protein names database for biosequences functional organisation
Area di Ricerca, CNR, 70126 Bari, Italy and 1Department of Biochemistry and Molecular Biology, Faculty of Sciences, University of Bari, 70126 Bari, Italy
Received October 4, 1999; Revised and Accepted October 13, 1999.
| ABSTRACT |
|---|
|
|
|---|
KEYnet is a database where gene and protein names are hierarchically structured. Particular care has been devoted to the search and organisation of synonyms. The structuring is based on biological criteria in order to assist the user in data search and to minimise the risk of information loss. Links to the EMBL data library by the entry name and the accession number are implemented. KEYnet is available through the WWW at the following site: http://www.ba.cnr.it/keynet.html
| INTRODUCTION |
|---|
|
|
|---|
The most common interrogation criteria for bio-databases are gene and protein names but, so far, the majority of them have been incorrectly annotated in the nucleic acid sequence databases which causes inconsistencies in data retrieval. In order to properly target retrieval using such criteria, gene and protein names need to be correctly coded. Here we present the database KEYnet (1,2) where gene and protein names are organised in a hierarchical structure according to the biological function of the associated sequence. Links among lexical or biological synonyms are implemented.
| DATABASE DESCRIPTION |
|---|
|
|
|---|
Each entry in the KEYnet database is related to a gene or protein name. The whole database is hierarchically structured according to the scheme previously reported (1,2) and visible at http://bio-www.ba.cnr.it:8000/Tutorials/KEYnet/network.html . In particular, KEYnet structure is made up of a set of elements, nodes, linked to form a fatherson relationship. At the highest level there is the root which links all the branches in the tree. The most important branches are the nodes Protein, DNA and RNA. Each leaf in the tree is composed of several elements linked by synonymy. Two by-side branches are implemented: the RAT Gene Names Tree and the Mitochondrial Genome Tree [the Mitochondrion Gene names classification has been structured as a contribution to the MitBASE project (3)]. Gene and protein names are extracted from the EMBL data library (4).
Biological information about associated sequences are extracted from the same primary databases [EMBL data library (4) and GenBank (5)] and from specialised databases such as SWISS-PROT (6), ENZYME (7) or any other suitable database. MEDLINE is also consulted whenever the above mentioned databases do not contain the necessary information for the gene and protein name classification. KEYnet database is updated at each EMBL data library release and, at this time, the link among KEYnet and the EMBL data library is established.
One of the major problems encountered during data classification is the gene names branch. Gene naming is recognised worldwide as a difficult problem, due to the freedom with which users assign a name to a gene whenever it is discovered. Several attempts to address this problem are in progress (8,9; see http://www.ebi.ac.uk:7081/docs/nomenclature and http://www.gene.ucl.ac.uk/nomenclature ).
We have organised gene names by establishing a starting set of main ancestor keywords relevant to their primary biological functions. At present KEYnet contains 66 219 gene and protein names as is reported in detail in the table at http://bio-www. ba.cnr.it:8000/Tutorials/KEYnet/Table1.html
| KEYnet QUERY SYSTEMS |
|---|
|
|
|---|
KEYnet database can be queried through the RETKEY program, written in FORTRAN and C, available at the CNR Research Area of the Bari server. A slightly different version is KEYnetWWW (http://www.ba.cnr.it/keynet.html ), which is more powerful because it can be accessed worldwide and the retrievable information is more complete.
The usage of KEYnetWWW is described in the following examples. Searching for glutamine synthetase nucleotide sequences in the KEYnet database (http://bio-www.ba.cnr. it:8000/Tutorials/KEYnet/example1 ) we obtain 257 entries from release 58 of the EMBL data library. Searching for the same protein starting from the ENZYME database through the SRS (10) retrieval system (http://bio-www.area.ba.cnr.it:8000/Tutorials/KEYnet/example2 ) gives 148 entries from the same EMBL data library release. The retrieved data have been carefully revised and the numbers actually refer to entries related to nucleotide sequences coding for glutamine synthetase.
Users of KEYnet are kindly invited to cite the present article.
| ACKNOWLEDGEMENTS |
|---|
This work has been partially supported by the EU-Biotechnology Programme (Contracts n. BIO4-CT95-0037 and BIO4-CT97-0), by Programma Biotecnologie legge 95/95 (MURST 5%), by MPI (Italy) and by CNR Research Area of Bari (IT).
| FOOTNOTES |
|---|
* To whom correspondence should be addressed. Tel: +39 080 548 2130; Fax: +39 080 548 4467; Email: marcella@area.ba.cnr.it
| REFERENCES |
|---|
|
|
|---|
-
1 Tullo,A., Liuni,S. and Attimonelli,M. (1990) Protein Seq. Data Anal., 3, 327334.[Medline]
2 Liciulli,F., Catalano,D., DElia,D., Lorusso,V. and Attiminelli,M. (1999) Nucleic Acids Res., 27, 365367.
3 Attimonelli,M., Altamura,N., Benne,R., Boyen,C., Brennicke,A., Carone,A., Cooper,J.M., DElia,D., de Montalvo,A., de Pinto,B., De Robertis,M., Golik,P., Grienenberger,J.M., Knoop,V., Lanave,C., Lazowska,J., Lemagnen,A., Malladi,B.S., Memeo,F., Monnerot,M., Pilbout,S., Schapira,A.H.V., Sloof,P., Slonimski,P., Stevens,K. and Saccone,C. (1999) Nucleic Acids Res., 27, 128133. Updated article in this issue: Nucleic Acids Res. (2000), 28, 148152.
4 Stoesser,G., Tuli,M.A., Lopez,R. and Sterk,P. (1999) Nucleic Acids Res., 27, 1824. Updated article in this issue: Nucleic Acids Res. (2000), 28, 1923.
5 Dennis,A., Benson,M., Boguski,S., Lipman,D.J., Ostell,J., Ouellette,B.F.F., Rapp,B.A. and Wheeler,D.L. (1999) Nucleic Acids Res., 27, 1217. Updated article in this issue: Nucleic Acids Res. (2000), 28, 1518.
6 Bairoch,A. and Apweiler,R. (1999) Nucleic Acids Res., 27, 4954. Updated article in this issue: Nucleic Acids Res. (2000), 28, 4548.
7 Bairoch,A. (1999) Nucleic Acids Res., 27, 310311.
8 Lonsdale,D.M. and Leaver,C.J. (1988) Plant Mol. Biol., 6, 1421.
9 Hallick,R.B. (1989) Plant Mol. Biol., 7, 266275.
10 Etzold,T., Ulyanov,A. and Argos,P. (1996) Methods Enzymol., 266, 114128.[ISI][Medline]
This article has been cited by other articles:
![]() |
M. Attimonelli, D. Catalano, C. Gissi, G. Grillo, F. Licciulli, S. Liuni, M. Santamaria, G. Pesole, and C. Saccone MitoNuc: a database of nuclear genes coding for mitochondrial proteins. Update 2002 Nucleic Acids Res., January 1, 2002; 30(1): 172 - 173. [Abstract] [Full Text] [PDF] |
||||
| ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
