Skip Navigation

This Article
Right arrow Abstract Freely available
Right arrow Print PDF (119K) Freely available
Right arrow Alert me when this article is cited
Right arrow Alert me if a correction is posted
Services
Right arrow Email this article to a friend
Right arrow Similar articles in this journal
Right arrow Similar articles in ISI Web of Science
Right arrow Similar articles in PubMed
Right arrow Alert me to new issues of the journal
Right arrow Add to My Personal Archive
Right arrow Download to citation manager
Right arrow Search for citing articles in:
ISI Web of Science (66)
Right arrow Commercial Re-use Guidelines
for Open Access NAR Content
Google Scholar
Right arrow Articles by Karp, P. D.
Right arrow Articles by Krummenacker, M.
Right arrow Search for Related Content
PubMed
Right arrow PubMed Citation
Right arrow Articles by Karp, P. D.
Right arrow Articles by Krummenacker, M.
Social Bookmarking
 Add to CiteULike   Add to Connotea   Add to Del.icio.us  
What's this?

Nucleic Acids Research Pages 55-58  


Eco Cyc: Encyclopedia of Escherichia coli genes and metabolism
Introduction
The EcoCyc Data
   Two-component signal-transduction pathways
   Incorporation of the full E.coli nucleotide sequence
   The evolving annotation of the E.coli genome
   EcoCyc taxonomies
The EcoCyc Graphical User Interface
Distribution
Acknowledgements
References


Eco Cyc: Encyclopedia of Escherichia coli genes and metabolism

Eco Cyc: Encyclopedia of Escherichia coli genes and metabolism

Peter D. Karp*, Monica Riley1, Suzanne M. Paley, Alida Pellegrini-Toole1 and Markus Krummenacker

Pangea Systems Inc., 4040 Campbell Avenue, Menlo Park, CA 94025, USA and 1Marine Biological Laboratory, Woods Hole, MA 02543, USA

Received October 2, 1998; Revised October 8, 1998; Accepted October 14, 1998

ABSTRACT

The EcoCyc database describes the genome and gene products of Escherichia coli, its metabolic and signal-transduction pathways, and its tRNAs. The database describes 4391 genes of E.coli, 695 enzymes encoded by a subset of these genes, 904 metabolic reactions that occur in E.coli, and the organization of these reactions into 129 metabolic pathways. The EcoCyc graphical user interface allows scientists to query and explore the EcoCyc database using visualization tools such as genomic-map browsers and automatic layouts of metabolic pathways. EcoCyc has many references to the primary literature, and is a (qualitative) computational model of E.coli metabolism. EcoCyc is available at URL http://ecocyc.PangeaSystems.com/ecocyc/

INTRODUCTION

The Encyclopedia of Escherichia coli genes and metabolism (EcoCyc) is a database (DB) that describes all known genes of E.coli K-12, the enzymes of small-molecule metabolism that are encoded by these genes, the reactions catalyzed by each enzyme, and the organization of these reactions into metabolic pathways. EcoCyc also describes E.coli signal-transduction pathways, and E.coli tRNAs. EcoCyc can be viewed as an electronic review article because it is a carefully sifted collection of information drawn largely from (and containing 1834 citations to) the primary literature. The EcoCyc graphical user interface (GUI) allows scientists to query, explore and visualize the EcoCyc DB. EcoCyc integrates genomic and functional data to allow scientists to investigate a broad range of questions (1).

EcoCyc is employed for the following tasks by the scientific community. (i) EcoCyc is a resource for analysis of microbial genomes at the level of individual genes and entire pathways. Because the E.coli genome has a high fraction of genes whose functions were determined experimentally, it is an accurate reference for inferring gene function by sequence similarity. The metabolic pathways within EcoCyc have been used to predict the metabolic pathways of Haemophilus influenzae (2) and of Helicobacter pylori (3). (ii) Because of its links to sequence DBs such as Swiss-Prot, EcoCyc can be used to perform function-based retrieval of DNA or protein sequences, for example to prepare datasets for studies of protein structure-function relationships. (iii) Scientists who study the evolution of metabolism can use EcoCyc to search out examples of duplication and divergence of enzymes and pathways. (iv) EcoCyc provides a foundation for performing simulations of the metabolism, although it currently lacks the kinetics data used by most simulation techniques. (v) The DB is used as an aid in teaching biochemistry.

This article describes recent enhancements to EcoCyc and how to access EcoCyc. We request that users of EcoCyc cite this article in publications related to its use.

Two new versions of EcoCyc were released in 1998: version 4.0 (released in April, 1998) and version 4.5 (released in September, 1998).

THE EcoCyc DATA

The EcoCyc data are stored within a frame knowledge representation system (FRS) called Ocelot (4,5). FRSs use an object-oriented data model that organizes information within classes: collections of objects that share similar properties and attributes. Table 1 shows the current size of several EcoCyc classes.


Table 1. The number of objects in EcoCyc version 4.5

For more information on the contents of EcoCyc and the data validation procedures we employ, see ref. 6. The retrieval operations supported by the DB are described in ref. 6 and in the EcoCyc User's Guide at http://ecocyc.PangeaSystems.com/ecocyc/doc/ecocyc-uguide/paper.html. The EcoCyc software architecture is described in ref. 4.

Two-component signal-transduction pathways

Pathways of two-component signal transduction in E.coli were added to EcoCyc version 4.0. The two components, the sensor protein and the response regulator protein, interact to convert an environmental signal (either internal or external) into regulation of gene expression. In EcoCyc version 4.5 we extended the pathway-layout capabilities of EcoCyc to support drawing of signal-transduction pathways, as shown in Figure 1.


Figure 1. The NarX nitrate/nitrite-dependent two-component regulatory system. The diagram shows a number of linked phosphorylation reactions of the NarX, NarL and NarP proteins. The circled minus signs indicated that inhibitors of those reactions are known; clicking on the minus sign will list the inhibitors.

Incorporation of the full E.coli nucleotide sequence

Version 4.0 of EcoCyc was the first to contain the full E.coli genome as determined by the Blattner laboratory (7). The main challenge we faced in incorporating the Blattner-lab data into EcoCyc was how to add new information from the full genomic sequence without losing the unique information about 3030 E.coli genes that was already contained within EcoCyc version 3.7. The EcoCyc gene objects contained information such as gene-name synonyms, unique IDs in use by external DBs that linked to EcoCyc (such as Swiss-Prot), and links to polypeptide objects within EcoCyc that describe the products of these genes.

We therefore proceeded to determine as many correspondences as possible between genes within the Blattner-lab GenBank entry (accession number U00096) and previously existing EcoCyc genes. We did so by matching the names of genes within U00096 and the names of EcoCyc genes; we confirmed those matches by cross-checking the Swiss-Prot IDs included in U00096 with Swiss-Prot links stored within EcoCyc. This procedure found 2257 matches. For each gene we imported from U00096 the starting and ending base-pair position of the gene within the chromosome, the unique ID (b-number) assigned to the gene by the Blattner lab, and synonyms for the gene name. We compared the product name assigned in U00096 with that assigned within EcoCyc.

1874 genes in U00096 did not match any EcoCyc gene; we therefore created 1874 new gene objects in EcoCyc containing the same imported information, plus the gene product assigned by the Blattner lab. Centisome positions for all genes were recomputed from the base-pair positions from U00096. 552 genes within EcoCyc did not match any gene from U00096. These genes were retained within EcoCyc; they represent genes reported in the literature that had not been associated with a particular ORF within the E.coli genome. Over time we have identified many correspondences between those genes and E.coli ORFs, and we have merged those corresponding genes together, so that now only 279 of those 552 genes remain without a chromosomal location.

Whenever a gene-merging event is performed within EcoCyc, the event is recorded in the resulting gene object. Some current EcoCyc gene objects were derived from multiple merging operations as we determined that genes reported under different names in the literature were in fact one and the same gene. For example, the two history entries shown below for the gene glnU reflect two merging events undergone by this gene. The earlier history entry indicates the merging of a gene in U00096 with an EcoCyc gene whose internal id is EG30028; the resulting gene was later merged with the EcoCyc gene whose id was G791 when we determined that these two objects within EcoCyc described the same gene.

7/10/1998Merged genes G791/trnA and EG30028/glnU.

10/20/1997Gene b0670 from Blattner lab Genbank (vM52) entry merged into EcoCyc gene EG30028.

The nucleotide sequence of E.coli genes is accessible to EcoCyc users in two ways. Within a gene window the user may click on the button Show Sequence to retrieve the nucleotide sequence of that gene from U00096 (EcoCyc does not currently provide access to non-coding DNA). Or the user may click on the button Query Genbank to query the NCBI Entrez server for all Genbank entries for E.coli containing a gene with the same name as the current gene.

The evolving annotation of the E.coli genome

We seek to make the annotation of the E.coli genome within EcoCyc as complete and up-to-date as possible based on new functional characterizations of E.coli genes in the literature, and based on an ongoing sequence analysis of the E.coli genome by the EcoCyc project.

EcoCyc is very careful to distinguish genes whose functions have been determined experimentally, from those whose functions have been determined through sequence analysis. The names of gene products whose functions have been determined through sequence analysis always contain the word `putative' and very occasionally, for a high certainty hit, `probable'. The level of assurance for each functional assignment is given in the following way. If an ORF sequence shows similarity to a number of hydrolases, all of which act on sugars, the ORF is identified as, for instance, a `putative sugar hydrolase'. In other cases, an assignment may be `putative amidotransferase', `putative aminotransferase', or `putative formyl acetyltransferase'. More often a degree of uncertainty is signified by confining the assignment to a general class such as `putative transferase', or sometimes only as `an enzyme' if the ORF can be identified as an enzyme, but not what kind of enzyme. The same gradations are used for other types of gene products such as regulators, transport components and RNAs.

Some assignments of putative function have been made on the basis of similarity among paralogous sequences of proteins within E.coli (8). The advantage of using paralogous groups is that within one genome even when sequence similarity within these sets is weak, functions are the same or closely related.

In the initial annotation of the E.coli genome published by Blattner et al. in September 1997, 38% of the open reading frames had no attributed function. In EcoCyc 4.5, there are 1400 ORFs with no attributed function (32%), and 939 genes whose attributed function is marked as putative (21%).

Because EcoCyc combines the E.coli genome with experimentally derived information about E.coli gene products, we can assess the degree of correspondence between these two bodies of knowledge. For example, we can write a query to EcoCyc to retrieve all enzymes within EcoCyc for which the gene has not yet been determined (in the case of enzymes known to have multiple subunits, we require that none of the subunits have a gene assigned). That list of enzymes is given at URL http://ecocyc.PangeaSystems.com/ecocyc/enzymes.html , and represents a challenge to both experimentalists and bioinformaticians.

EcoCyc taxonomies


Table 2. A taxonomy of genes according to the physiological role of the gene product
Each line in the figure indicates a single class. A new level of indentation indicates a subclass of the class above. The numbers in the right hand column indicate the number of genes within each class.The EcoCyc project has developed several taxonomies for the different types of biological information within EcoCyc. It includes the taxonomy of gene products developed by Riley (9), which has been adopted by a number of other genome projects. The most recent version of that taxonomy is shown in Tables 2, 3, 4 and 5. EcoCyc also includes a taxonomy of biochemical pathways, shown in Table 6.


Table 3. A taxonomy of genes continued


Table 4. A taxonomy of genes continued


Table 5. A taxonomy of genes continued

These taxonomies are accessible within EcoCyc for taxonomic querying of EcoCyc objects. For example, the user can easily navigate to the pathways within any of the classes in Table 6, using the query page at URL http://ecocyc.PangeaSystems.com/ecocyc/server.html

THE EcoCyc GRAPHICAL USER INTERFACE

The EcoCyc GUI provides graphical tools for visualizing and navigating through an integrated metabolic/genomic DB. For each type of biological object in the EcoCyc DB, the GUI provides a corresponding visualization tool that dynamically queries the underlying DB.

Version 4.5 includes a number of extensions to the query operations of the Overview diagram that displays the full metabolic map of E.coli. For example, the Overview can be used to highlight those reaction steps that are activated or inhibited by a specified compound, those steps for which E.coli has multiple isozymes, or those steps that are shared or not shared with other organisms for which a metabolic DB is available [such as the HinCyc DB (2)]. The diagram can also be used to visualize gene-expression data.


Table 6. A taxonomy of pathways
Each line in the figure indicates a single class. A new level of indentation indicates a subclass of the class above. For example, degradation of `Amino acids, amines' is a subclass of the general category of degradation pathways. The numbers in the right hand column indicate the number of individual EcoCyc pathways within each class.

The EcoCyc pathway-layout algorithms have been extended to draw lines that indicate feedback inhibition and activation of an enzyme by substrates of the pathway containing the enzyme.

DISTRIBUTION

EcoCyc is available under license from Pangea Systems in two forms: (i) EcoCyc is accessible online through the WWW (this version supports a subset of the GUI functionality of the X-windows version). (ii) An X-windows version of EcoCyc for the Sun workstation bundles together the EcoCyc GUI and the EcoCyc DB.

Access is free to academic institutions for research use; a fee applies to other forms of use. The EcoCyc WWW pages describe both types of access to EcoCyc; they also provide links to the EcoCyc User's Guide, and to the publications produced by the EcoCyc project. The URL for the EcoCyc home page is http://ecocyc.PangeaSystems.com/ecocyc/

ACKNOWLEDGEMENTS

This work was supported by grant 1-R01-RR07861-01 from the National Center for Research Resources. The contents of this article are solely the responsibility of the authors and do not necessarily represent the official views of the National Institutes of Health.

REFERENCES

1. Karp,P. and Mavrovouniotis,M. (1994) IEEE Expert, 9, 11-21.

2. Karp,P.D., Ouzounis,C. and Paley,S.M. (1996) In States,D.J., Agarwal,P., Gaasterland,T., Hunter,L. and Smith,R. (eds), Proceedings of the Fourth International Conference on Intelligent Systems for Molecular Biology. AAAI Press, Menlo Park, CA. pp. 116-124.

3. Tomb,J.-F., White,O., Kerlavage,A.R., Clayton,R.A., Sutton,G.G., Fleischmann,R.D., Ketchum,K.A., Klenk,H.P., Gill,S., Dougherty,B.A. et al.) (1997) Nature, 388, 539-547. MEDLINE Abstract

4. Karp,P. and Paley,S. (1996) J. Computat. Biol., 3, 191-212.

5. Karp,P.D., Chaudri,V.K. and Paley,S.M. (1999) J. Intelligent, Information Syst., in press.

6. Karp,P., Riley,M., Paley,S., Pellegrini-Toole,A. and Krummenacker,M. (1997) Nucleic Acids Res., 25, 43-50. MEDLINE Abstract

7. Blattner,F.R., Plunkett,G.,III, Bloch,C.A., Perna,N.T., Burland,V., Riley,M., Collado-Vides,J., Glasner,J.D., Rode,C.K., Mayhew,G.F. et al.) (1997) Science, 277, 1453-1462. MEDLINE Abstract

8. Riley,M. and Labedan,B. (1997) J. Mol. Biol., 268, 857-868. MEDLINE Abstract

9. Riley,M. (1993) Microbiol. Rev., 57, 862-952. MEDLINE Abstract


*To whom correspondence should be addressed. Tel: +1 650 614 7066; Fax: +1 650 324 9313; Email: pkarp@pangeasystems.com


This page is run by Oxford University Press, Great Clarendon Street, Oxford OX2 6DP, as part of the OUP Journals
Comments and feedback: www-admin{at}oup.co.uk
Last modification: 9 Dec 1998
Copyright©Oxford University Press, 1998.

Add to CiteULike CiteULike   Add to Connotea Connotea   Add to Del.icio.us Del.icio.us    What's this?


This article has been cited by other articles:


Home page
J. Bacteriol.Home page
E. Brombacher, A. Baratto, C. Dorel, and P. Landini
Gene Expression Regulation by the Curli Activator CsgD Protein: Modulation of Cellulose Biosynthesis and Control of Negative Determinants for Microbial Adhesion
J. Bacteriol., March 15, 2006; 188(6): 2027 - 2037.
[Abstract] [Full Text] [PDF]


Home page
Nucleic Acids ResHome page
H. Wu, Z. Su, F. Mao, V. Olman, and Y. Xu
Prediction of functional modules based on comparative genome analysis and Gene Ontology application
Nucleic Acids Res., May 18, 2005; 33(9): 2822 - 2837.
[Abstract] [Full Text] [PDF]


Home page
BioinformaticsHome page
M. Holford, N. Li, P. Nadkarni, and H. Zhao
VitaPad: visualization tools for the analysis of pathway data
Bioinformatics, April 15, 2005; 21(8): 1596 - 1602.
[Abstract] [Full Text] [PDF]


Home page
MicrobiologyHome page
A. Kayser, J. Weber, V. Hecht, and U. Rinas
Metabolic flux analysis of Escherichia coli in glucose-limited continuous culture. I. Growth-rate-dependent metabolic efficiency at steady state
Microbiology, March 1, 2005; 151(3): 693 - 706.
[Abstract] [Full Text] [PDF]


Home page
Proc. Natl. Acad. Sci. USAHome page
M. Kleerebezem, J. Boekhorst, R. van Kranenburg, D. Molenaar, O. P. Kuipers, R. Leer, R. Tarchini, S. A. Peters, H. M. Sandbrink, M. W. E. J. Fiers, et al.
Complete genome sequence of Lactobacillus plantarum WCFS1
PNAS, February 18, 2003; 100(4): 1990 - 1995.
[Abstract] [Full Text] [PDF]


Home page
Nucleic Acids ResHome page
J. A. Thanassi, S. L. Hartman-Neumann, T. J. Dougherty, B. A. Dougherty, and M. J. Pucci
Identification of 113 conserved essential genes using a high-throughput gene disruption system in Streptococcus pneumoniae
Nucleic Acids Res., July 15, 2002; 30(14): 3152 - 3162.
[Abstract] [Full Text] [PDF]


Home page
Microbiol. Mol. Biol. Rev.Home page
E. Diaz, A. Ferrandez, M. A. Prieto, and J. L. Garcia
Biodegradation of Aromatic Compounds by Escherichia coli
Microbiol. Mol. Biol. Rev., December 1, 2001; 65(4): 523 - 569.
[Abstract] [Full Text] [PDF]


Home page
Proc. Natl. Acad. Sci. USAHome page
D. Capela, F. Barloy-Hubler, J. Gouzy, G. Bothe, F. Ampe, J. Batut, P. Boistard, A. Becker, M. Boutry, E. Cadieu, et al.
Analysis of the chromosome sequence of the legume symbiont Sinorhizobium meliloti strain 1021
PNAS, July 24, 2001; (2001) 161294398.
[Abstract] [Full Text] [PDF]


Home page
Proc. Natl. Acad. Sci. USAHome page
M. J. Barnett, R. F. Fisher, T. Jones, C. Komp, A. P. Abola, F. Barloy-Hubler, L. Bowser, D. Capela, F. Galibert, J. Gouzy, et al.
Nucleotide sequence and predicted functions of the entire Sinorhizobium meliloti pSymA megaplasmid
PNAS, July 24, 2001; (2001) 161294798.
[Abstract] [Full Text] [PDF]


Home page
Appl. Environ. Microbiol.Home page
H. Liyanage, S. Kashket, M. Young, and E. R. Kashket
Clostridium beijerinckii and Clostridium difficile Detoxify Methylglyoxal by a Novel Mechanism Involving Glycerol Dehydrogenase
Appl. Envir. Microbiol., May 1, 2001; 67(5): 2004 - 2010.
[Abstract] [Full Text]


Home page
Appl. Environ. Microbiol.Home page
Y. Dong, J. D. Glasner, F. R. Blattner, and E. W. Triplett
Genomic Interspecies Microarray Hybridization: Rapid Discovery of Three Thousand Genes in the Maize Endophyte, Klebsiella pneumoniae 342, by Microarray Hybridization with Escherichia coli K-12 Open Reading Frames
Appl. Envir. Microbiol., April 1, 2001; 67(4): 1911 - 1921.
[Abstract] [Full Text]


Home page
Microbiol. Mol. Biol. Rev.Home page
A. T. Bull, A. C. Ward, and M. Goodfellow
Search and Discovery Strategies for Biotechnology: the Paradigm Shift
Microbiol. Mol. Biol. Rev., September 1, 2000; 64(3): 573 - 606.
[Abstract] [Full Text] [PDF]


Home page
Genome Res.Home page
C. A. Ouzounis and P. D. Karp
Global Properties of the Metabolic Map of Escherichia coli
Genome Res., April 1, 2000; 10(4): 568 - 576.
[Abstract] [Full Text]


Home page
Plant Physiol.Home page
S. Mekhedov, O. M. de Ilárduya, and J. Ohlrogge
Toward a Functional Catalog of the Plant Genome. A Survey of Genes for Lipid Biosynthesis
Plant Physiology, February 1, 2000; 122(2): 389 - 402.
[Abstract] [Full Text]


Home page
Nucleic Acids ResHome page
K. E. Rudd
EcoGene: a genome sequence database for Escherichia coli K-12
Nucleic Acids Res., January 1, 2000; 28(1): 60 - 64.
[Abstract] [Full Text] [PDF]


Home page
Nucleic Acids ResHome page
I. Xenarios, D. W. Rice, L. Salwinski, M. K. Baron, E. M. Marcotte, and D. Eisenberg
DIP: the Database of Interacting Proteins
Nucleic Acids Res., January 1, 2000; 28(1): 289 - 291.
[Abstract] [Full Text] [PDF]


Home page
Proc. Natl. Acad. Sci. USAHome page
H. Salgado, G. Moreno-Hagelsieb, T. F. Smith, and J. Collado-Vides
Operons in Escherichia coli: Genomic analyses and predictions
PNAS, June 6, 2000; 97(12): 6652 - 6657.
[Abstract] [Full Text] [PDF]


Home page
Proc. Natl. Acad. Sci. USAHome page
D. Capela, F. Barloy-Hubler, J. Gouzy, G. Bothe, F. Ampe, J. Batut, P. Boistard, A. Becker, M. Boutry, E. Cadieu, et al.
From the Cover: Analysis of the chromosome sequence of the legume symbiont Sinorhizobium meliloti strain 1021
PNAS, August 14, 2001; 98(17): 9877 - 9882.
[Abstract] [Full Text] [PDF]


Home page
Proc. Natl. Acad. Sci. USAHome page
M. J. Barnett, R. F. Fisher, T. Jones, C. Komp, A. P. Abola, F. Barloy-Hubler, L. Bowser, D. Capela, F. Galibert, J. Gouzy, et al.
From the Cover: Nucleotide sequence and predicted functions of the entire Sinorhizobium meliloti pSymA megaplasmid
PNAS, August 14, 2001; 98(17): 9883 - 9888.
[Abstract] [Full Text] [PDF]


This Article
Right arrow Abstract Freely available
Right arrow Print PDF (119K) Freely available
Right arrow Alert me when this article is cited
Right arrow Alert me if a correction is posted
Services
Right arrow Email this article to a friend
Right arrow Similar articles in this journal
Right arrow Similar articles in ISI Web of Science
Right arrow Similar articles in PubMed
Right arrow Alert me to new issues of the journal
Right arrow Add to My Personal Archive
Right arrow Download to citation manager
Right arrow Search for citing articles in:
ISI Web of Science (66)
Right arrow Commercial Re-use Guidelines
for Open Access NAR Content
Google Scholar
Right arrow Articles by Karp, P. D.
Right arrow Articles by Krummenacker, M.
Right arrow Search for Related Content
PubMed
Right arrow PubMed Citation
Right arrow Articles by Karp, P. D.
Right arrow Articles by Krummenacker, M.
Social Bookmarking
 Add to CiteULike   Add to Connotea   Add to Del.icio.us  
What's this?