Skip Navigation

This Article
Right arrow Abstract Freely available
Right arrow Print PDF (17K) Freely available
Right arrow Alert me when this article is cited
Right arrow Alert me if a correction is posted
Services
Right arrow Email this article to a friend
Right arrow Similar articles in this journal
Right arrow Similar articles in PubMed
Right arrow Alert me to new issues of the journal
Right arrow Add to My Personal Archive
Right arrow Download to citation manager
Right arrow Search for citing articles in:
ISI Web of Science (9)
Right arrowRequest Permissions
Right arrow Commercial Re-use Guidelines
for Open Access NAR Content
Google Scholar
Right arrow Articles by Riley, M
Right arrow Articles by Space, D.
Right arrow Search for Related Content
PubMed
Right arrow PubMed Citation
Right arrow Articles by Riley, M
Right arrow Articles by Space, D.
Social Bookmarking
 Add to CiteULike   Add to Connotea   Add to Del.icio.us  
What's this?

© 1996 Oxford University Press 40-40

Footnote

Genes and proteins of Escherichia coli (GenProtEc)

Genes and proteins of Escherichia coli (GenProtEc) Monica Riley* and David B. Space

Marine Biology Laboratory, Woods Hole , MA 02540, USA

Received August 18, 1995 ; Revised and Accepted October 16, 1995

ABSTRACT

GenProtEc is a database of Escherichia coli genes and their gene products, classified by type of function and physiological role and with citations to the literature for each. Also present are data on sequence similarities among E.coli proteins with PAM values, percent identity of amino acids, length of alignment and percent aligned. The database is available as a PKZip file by ftp from mbl.edu/pub/ecoli.exe. The program runs under MS-DOS on IBM-compatible machines. GenProtEc can also be accessed through the World Wide Web atURL http://mbl.edu/html/ecoli.html.

GenProtEc (Genes and Proteins of Escherichia coli ) is a dBase style database and associated query program for IBM-compatible PCs which centers around the gene products of E.coli . As of July 1993 the database contained 1717 gene products whose physiological function was known to some degree ( 1 ), some better understood than others. As of Fall 1994 the number had risen to 1894 ( 2 ). Additions continue to be made.

There are two main sets of data, one for genes, gene products and their physiological role, the other identifying E.coli proteins of similar sequence. The gene/gene product database contains the full gene product name, the Enzyme Commision (EC) number for enzymes, the gene name and synonyms, the type of gene product and the category of physiological function of the gene product. Up to three literature references are supplied for each entry.

All gene products have been classified as to type, as either an enzyme, a regulator, RNA, part of the membrane, a member of a transport system, a protein factor, a carrier or a part of the structure of the cell other than membrane. The distribution of 1894 E.coli gene products by type has been summarized ( 3 ). The gene products have also been assigned to one or up to four of 118 categories of physiological function. An early version of the classification of genes and gene products by function is in Riley ( 1 ), a more recent version will be available soon ( 3 ).

One can search this database by gene name or a synonym of the gene name, by gene product string or by physiological category. Complete pick lists are available for each of these. The search can be refined by adding more terms with the logical relationships AND, OR and AND/OR.

Information on sequence similarities within the E.coli genome is also available. When a sequence similarity exists between the amino acid sequence of any chosen gene product and the sequence of another E.coli protein information about the similarity is available. The database contains the results of similarity analyses ( 2 ) that used AllAllDB of the Darwin suite at Zurich ( 4 ), requiring an alignment of at least 100 amino acids and a PAM (accepted point mutations) score ( 5 ) of <250. The 1894 E.coli K-12 chromosomally encoded proteins of known sequence formed 2126 pairs with sequence similarity as defined above. The major characteristics of the 2126 pairs reside in GenProtEc. Again, pick lists are available for the SWISS-PROT mnemonic names. After selecting the starting sequence the names of other E.coli proteins that have a sequence similar to it are shown. Selecting any one, the length of the alignment of the two proteins is shown together with the percent of the protein aligned, the percent identical amino acids and the PAM score. As new sequences are provided by the SWISS-PROT database ( 6 ) additional information on seqence relationships will be incorporated into the database.

The database and associated query program are available at no charge as a self-extracting compressed binary file on the anonymous site pub/ecoli/ecoli.exe through the Marine Biological Laboratory's server hoh.mbl.edu. The user name is `anonymous' and the password is the user's email address. Once downloaded, GenProtEc may be expanded by the command `ecoli'. Following expansion, the query program may be run by entering the command `eco'. The database can be queried directly on the World Wide Web, accessing through the home page at URL http://www.mbl.edu/html/Riley/Monica.html.

Feedback will be gratefully received. Users kindly cite this article.

REFERENCES

1 Riley,M. (1993) Microbiol. Rev., 57, 862-952. MEDLINE Abstract

2 Labedan,B. and Riley,M. (1995) Mol. Biol. Evol., 12, 980-987.

3 Riley,M. and Labedan,B. (1996) In Curtiss,R., Lin,E.C.C., Ingraham,J., Low,K.B., Magasanik,B., Neidhardt,F., Reznikoff,W., Riley,M., Schaechter,M. and Umbarger,H.E. (eds), Escherichia coli and Salmonella. American Society for Microbiology, Washington, DC, in press.

4 Gonnet,G.H., Cohen,M.A. and Benner,S.A. (1992) Science, 256, 1443-1445. MEDLINE Abstract

5 Dayhoff,M.O., Schwartz,R.M. and Orcutt,B.C. (1978) In Dayhoff,M.O.(ed.), Atlas of Protein Sequence and Structure. National Biomedical Research Foundation, Washington, DC, Vol. 5, suppl. 3, pp. 345-358.

6 Bairoch,A. and Boeckman,B. (1993) Nucleic Acids Res., 21, 3093-3096. MEDLINE Abstract


Return

* To whom correspondence should be addressed
Add to CiteULike CiteULike   Add to Connotea Connotea   Add to Del.icio.us Del.icio.us    What's this?


This article has been cited by other articles:


Home page
BioinformaticsHome page
H. Yu, R. Jansen, G. Stolovitzky, and M. Gerstein
Total ancestry measure: quantifying the similarity in tree-like classification, with genomic applications
Bioinformatics, August 15, 2007; 23(16): 2163 - 2173.
[Abstract] [Full Text] [PDF]


Home page
GeneticsHome page
D. P. Zimmer, O. Paliy, B. Thomas, P. Gyaneshwar, and S. Kustu
Genome Image Programs: Visualization and Interpretation of Escherichia coli Microarray Experiments
Genetics, August 1, 2004; 167(4): 2111 - 2119.
[Abstract] [Full Text] [PDF]


Home page
Nucleic Acids ResHome page
M. H. Serres, S. Goswami, and M. Riley
GenProtEC: an updated and improved analysis of functions of Escherichia coli K-12 proteins
Nucleic Acids Res., January 1, 2004; 32(90001): D300 - 302.
[Abstract] [Full Text] [PDF]


This Article
Right arrow Abstract Freely available
Right arrow Print PDF (17K) Freely available
Right arrow Alert me when this article is cited
Right arrow Alert me if a correction is posted
Services
Right arrow Email this article to a friend
Right arrow Similar articles in this journal
Right arrow Similar articles in PubMed
Right arrow Alert me to new issues of the journal
Right arrow Add to My Personal Archive
Right arrow Download to citation manager
Right arrow Search for citing articles in:
ISI Web of Science (9)
Right arrowRequest Permissions
Right arrow Commercial Re-use Guidelines
for Open Access NAR Content
Google Scholar
Right arrow Articles by Riley, M
Right arrow Articles by Space, D.
Right arrow Search for Related Content
PubMed
Right arrow PubMed Citation
Right arrow Articles by Riley, M
Right arrow Articles by Space, D.
Social Bookmarking
 Add to CiteULike   Add to Connotea   Add to Del.icio.us  
What's this?