Institute for Chemical Research, Kyoto University, Uji, Kyoto 611-0011, Japan
Received September 8, 1998;Revised September 22, 1998; Accepted October 14, 1998
ABSTRACT
Kyoto Encyclopedia of Genes and Genomes (KEGG) is a knowledge base for systematic analysis of gene functions in terms of the networks of genes and molecules. The major component of KEGG is the PATHWAY database that consists of graphical diagrams of biochemical pathways including most of the known metabolic pathways and some of the known regulatory pathways. The pathway information is also represented by the ortholog group tables summarizing orthologous and paralogous gene groups among different organisms. KEGG maintains the GENES database for the gene catalogs of all organisms with complete genomes and selected organisms with partial genomes, which are continuously re-annotated, as well as the LIGAND database for chemical compounds and enzymes. Each gene catalog is associated with the graphical genome map for chromosomal locations that is represented by Java applet. In addition to the data collection efforts, KEGG develops and provides various computational tools, such as for reconstructing biochemical pathways from the complete genome sequence and for predicting gene regulatory networks from the gene expression profiles. The KEGG databases are daily updated and made freely available (http://www.genome.ad.jp/kegg/).
The progress in structural genomics will soon uncover the complete genome sequences for hundreds and thousands of organisms. However, roughly a half of the genes identified in every genome thus far sequenced still remain unknown in terms of their biological functions. New experimental and informatics technologies in functional genomics are urgently required for systematic identification of gene functions. Kyoto Encyclopedia of Genes and Genomes (KEGG) is an effort to computerize the current knowledge of biochemical pathways and other types of molecular interactions that can be used as reference for systematic interpretation of sequence data (1). At the same time, KEGG attempts to standardize the functional annotation of genes and proteins, and maintain gene catalogs for all complete genomes and some partial genomes including mouse and human.
The basic concepts of KEGG (1) and underlying informatics technologies (2,3) have already been published. KEGG is tightly integrated with the LIGAND chemical database for enzyme reactions (4,5) as well as with most of the major molecular biology databases by the DBGET/LinkDB system (6) under the Japanese GenomeNet service (7). The database organization efforts require extensive analyses of completely sequenced genomes, as exemplified by the analyses of metabolic pathways (8) and ABC transport systems (9). In this article, we describe the current status of the KEGG databases and discuss the use of KEGG for functional genomics.
In May 1995, we initiated the KEGG project under the Human Genome Program of the Ministry of Education, Science, Sports and Culture in Japan. We wish to automate human reasoning steps for interpreting biological meaning encoded in the sequence data. We consider the problem of predicting gene functions as a process of reconstructing a functioning biological system from the complete set of genes and gene products. Thus, it is critical to understand how genes and molecules are networked to form a biological system. Specifically, the objectives of KEGG are summarized in the following four points.
First, KEGG aims at computerizing the current knowledge of genetics, biochemistry, and molecular and cellular biology in terms of the pathway of interacting molecules or genes. The KEGG pathway database contains the information of how molecules or genes are networked, which is complementary to most of the existing molecular biology databases that contain the information of individual molecules or individual genes. During the first two years of the KEGG project we focused on the metabolic pathways, but since July 1997 we have also been collecting a large body of knowledge in the regulatory aspects of cellular functions.
Second, KEGG maintains the gene catalogs for all organisms with completely sequenced genomes and selected organisms with partial genomes. Because the criteria of interpreting sequence similarity are different for different authors, the quality of gene function annotations varies significantly in GenBank (10). KEGG's gene catalogs are intended to provide consistent and standardized annotations by linking individual genes to components of the KEGG biochemical pathways.
Third, KEGG maintains the catalog of chemical elements, compounds, and other substances in living cells as the LIGAND database (4,5) and they are again linked to pathway components. This is based on our view that both the genetic information encoded in the genome and the chemical information encoded in the cell are required for understanding cellular functions.
Fourth, in addition to the database efforts, KEGG aims at providing new informatics technologies that combine genomic and functional information towards predicting biological systems and designing further experiments. For example, the KEGG reference pathways can be used to uncover molecular interactions and pathways that underlie gene expression profiles obtained by microarray experiments (11).
All the data in KEGG can be accessed from the KEGG tableof contents page (http://www.genome.ad.jp/kegg/kegg2.html ), which is divided into two broad categories: the pathway (functional) information and the genomic (structural) information. The user may enter the KEGG system top-down starting from the pathway information or bottom-up starting from the genomic information. In either case it should be easy to navigate through KEGG since all the information is highly integrated. The KEGG data are also linked with many other molecular biology databases by the DBGET/LinkDB system. A summary of KEGG data contents is shown in Table 1.
The KEGG/PATHWAY database is a collection of graphical diagrams (pathway maps) for the biochemical pathways. Table 2 shows the current list of the KEGG biochemical pathways, which is highly biased toward the metabolism. There are about 90 reference maps for the metabolic pathways that are manually drawn and continuously updated according to biochemical evidence. The organism-specific pathway maps are then automatically generated by matching the EC numbers in the gene catalogs and in the reference maps. The maps for the regulatory pathways are drawn separately for each organism, since they are too divergent to be represented in a single reference map.
The KEGG biochemical pathways (release 8.0, October 1998)
Figure1. The KEGG pathway map of citrate (TCA) cycle for (a) Haemophilusinfluenzae and (b) Helicobacter pylori. A rectangle and a circle represent, respectively, an enzyme and a compound. The enzymes whose genes are identified in the genome are shown by colored (shaded) rectangles.
Figure 1 shows an example of the KEGG metabolic pathway map; citrate (TCA) cycle for Haemophilusinfluenzae and Helicobacterpylori. A box is an enzyme (gene product) with the EC number inside and a circle is a metabolic compound. They are both clickable objects to retrieve detailed molecular information. The enzymes (boxes) whose genes are identified in the genome are colored green (shaded in Fig. 1) by the process of matching the gene catalog and the reference pathway according to the EC numbers. It is interesting to note that the two organisms seem to have only the lower and upper half of the TCA cycle, respectively, although a missing enzyme in H.pylori still needs be identified to make the pathway continuous.
The PATHWAY database can be retrieved most conveniently from the top menu of the KEGG table of contents page, metabolic pathways and regulatory pathways that are categorized by the hierarchical classification of Table 2. Alternatively, the PATHWAY database can be searched by the DBGET/LinkDB system.
The KEGG/GENES database is a collection of genes for all organisms in KEGG that is organized as a flat-file database of textual information. In fact, each organism is a primary database and GENES is defined as a composite database of all organisms under the DBGET/LinkDB system. An entry of the GENES database contains the information: organism name, gene name, functional description, functional hierarchy (KEGG pathway classification), chromosomal position, codon usage, amino acid sequence, and nucleotide sequence (Table 1). The genes in each organism are hierarchically classified in the gene catalog according to the KEGG pathways (Table 2) and they can be viewed and searched as, what we call, hierarchical texts in KEGG. The entire GENES database or each organism separately can also be searched by the DBGET/LinkDB system.
The GENES database is maintained as follows. First the information of all genes in an organism is automatically generated from the complete genomes section of the GenBank database (10). Then the EC number assignment is performed by GFIT (8) and other programs with manual verification efforts. The gene function annotations are continuously re-evaluated according to the KEGG/PATHWAY database and by comparing with SWISS-PROT (12) and other databases.
Figure2. The correlation of a physical unit and a functional unit of genes. (a) Starting from the KEGG genome map for Escherichia coli, (b) a chromosomal region is selected in the zoom-up window. (c) By clicking on the `Pathway' button, the user can examine if the biochemical pathway can be formed from a set of genes in the zoom-up window.
The genome map represents a one-dimensional network of genes that are physically located in a circular or linear genome. The gene order turns out to be extremely valuable information in functional annotation, especially for bacteria and archaea. As shown in Figure 2, the genome map is linked to the pathway map by the Pathway button. The user can check if a physical unit of closely located genes (e.g., genes in an operon) would form a functional unit of related proteins that appear at close positions in the pathway. By clicking on the List button in the genome map, the user can also invoke a sequence similarity search to see if a stretch of genes would match a functional unit in the pathway.
Figure3. The comparative genome map in KEGG. (a) An overview of comparing Mycoplasma genitalium (horizontal) and Mycoplasma pneumoniae (vertical) genomes is shown where each dot represents significant amino acid sequence similarity of two genes. (b) The zoom-up window for the boxed area in (a) is shown where each gene name can be identified and pre-calculated homologous gene strings can optionally be displayed in color.
The co-linearity of genes between two genomes is quite useful for identification of clusters of orthologous genes. KEGG provides the comparative genome map for identification of such clusters and for functional annotation of newly sequenced genomes (Fig. 3). The comparative genome map is analogous to the dot matrix for comparing two nucleotide or amino acid sequences. Here, a dot represents significant sequence similarity between two genes at the amino acid sequence level. To present this map, the sequence similarity scores are pre-computed by SSEARCH for all pairs of organisms and homologous gene clusters are pre-identified so that they can be optionally highlighted in the map.
The ortholog group table is a summary table that represents functional correlations in the pathway, physical (positional) correlations in the genome, and evolutionary (sequence similarity) correlations among species. Currently there are about 4800 genes that belong to about 140 groups of functional units. The ortholog group table is most useful as a reference data set for functional annotations. KEGG provides a computational tool to search against the ortholog group tables for sequence similarity of a set of query sequences (see below).
A case in point is the annotation of ABC transporters. A typical ABC transporter consists of a substrate-binding protein, two membrane proteins, and two ATP-binding proteins. The genes for these molecular components usually form an operon, and there are many paralogous genes that are responsible for the transport of different substances. If a set of query sequences that are located at physically close positions in the genome match against all components, it is a good indication that the transporter unit is correctly reconstructed. Since the KEGG ortholog group table for ABC transporters is functionally categorized (9), the pattern of matches and their similarity scores can be used to deduce substrate specificity.
The KEGG molecular catalogs are intended to provide functional and structural classifications of proteins, RNAs, other biological macromolecules, chemical substances and molecular assemblies. However, the organization is still rudimentary except for enzymes. The information of chemical compounds, enzyme molecules, and enzymatic and non-enzymatic reactions is stored in the LIGAND database as described in the accompanying paper (5). Based on this database KEGG provides several molecular catalogs for classifications of enzymes and a preliminary classification of chemical compounds.
The KEGG reference maps for metabolic pathways represent biochemical knowledge containing all chemically identified reaction pathways. The constraint of the genome, i.e., a list of enzymes encoded in the genome, will reconstruct organism-specific pathways, which are represented by coloring of boxes in the KEGG pathway maps (Fig. 1). Furthermore, the constraint of operons will often predict functional units or conserved pathway motifs, which are represented by additional coloring in the KEGG pathway maps (Fig. 2). The operon information probably reflects a regulatory unit of transcription. Thus, it is easy to see how the information of gene expression profiles can be used as still another constraint against the KEGG reference pathway maps. In fact, KEGG provides a tool to color the pathway maps in order to visualize, for example, the microarray patterns of gene expression profiles.
Table 3 shows the list of currently available tools for search and analysis of KEGG pathway maps and genome maps. The user-interfaces for these tools can be accessed from the KEGG table of contents page. The pathway reconstruction tools in the category of prediction tools are based on sequence similarity search that involves a set of query sequences at a time to see if a pathway is correctly reconstructed. At the moment, there are two versions of the reconstruction tools. One is to search against the KEGG pathway maps: http://www.genome.ad.jp/kegg-bin/mk_homology_pathway_html , and the other is to search against the ortholog group tables: http://www.genome.ad.jp/kegg-bin/srch_orth_html . The former contains a larger data set of pathways but the search is made against single organisms. The latter is limited to selected portions of the pathways (pathway motifs or functional units), but the search is made against multiple alignments of organisms and tends to produce better results.
It is often the case that no homology can be found when searching against the KEGG reference pathways, which suggests that the current biochemical knowledge is not sufficient to predict a pathway. If there is a missing portion in the reconstructed metabolic pathway, KEGG provides a tool to predict alternative paths with alternative enzymes. The tool actually computes all possible reaction pathways between two compounds from a set of substrate-product relations, i.e., from a set of enzymes (5): http://www.genome.ad.jp/kegg-bin/mk_pathcomp_html
This tool also has a feature called query relaxation to incorporate grouping or hierarchy of relations. Whenever any member of the group is identified in the genome by sequence similarity (e.g., an enzyme in the same hierarchy of EC numbers), the entire group is incorporated for computation (e.g., to represent wider substrate specificity). This effectively increases the number of substrate-product relations and expands possible reaction pathways.
This type of computation, which we call pathway reconstruction from binary relations, can be performed in a more general way. A binary relation can be a substrate-product relation in metabolic pathways, a gene-gene interaction observed in gene expression profiles, or a protein-protein interaction observed by yeast two hybrid system experiments. In practice, the reconstruction from binary relations works better when the problem size is not large. For example, if most of the pathway is reconstructed from the reference but there still remain a few missing enzymes, then the computation with binary relations may fill the missing links of fragmented pathways. In addition to the substrate-product binary relations, KEGG will provide tools to integrate and compute different types of binary relations. Perhaps this is the most challenging area of computational problems that have become accessible by the KEGG project.
For strictly academic purposes at academic institutions the KEGG mirror server package may be installed. The package, which also includes a minimal set of DBGET/LinkDB, can be obtained from the KEGG anonymous FTP site:
The mirror package runs on a Solaris or IRIX machine. The individual databases PATHWAY, GENES, and LIGAND can also be obtained from this FTP site.
The CD version of KEGG was once distributed and a copy still exists at the FTP site. However, since the database has become so large and since the system is undergoing frequent revisions, the CD version is not supported at the moment. We will redefine the role of CD and hope to start distributing again. Finally, some of the search tools are also available at the KEGG mail server:
This work was supported by the Grant-in-Aid for Scientific Research on the Priority Area `Genome Science' from the Ministry of Education, Science, Sports and Culture of Japan. The computation time was provided by the Supercomputer Laboratory, Institute for Chemical Research, Kyoto University.
S. R. Ramakrishnan, C. Vogel, T. Kwon, L. O. Penalva, E. M. Marcotte, and D. P. Miranker Mining gene functional networks to improve mass-spectrometry-based protein identification
Bioinformatics,
November 15, 2009;
25(22):
2955 - 2961.
[Abstract][Full Text][PDF]
Y.-F. Ma, Y. Zhang, J.-Y. Zhang, D.-W. Chen, Y. Zhu, H. Zheng, S.-Y. Wang, C.-Y. Jiang, G.-P. Zhao, and S.-J. Liu The Complete Genome of Comamonas testosteroni Reveals Its Genetic Adaptations to Changing Environments
Appl. Envir. Microbiol.,
November 1, 2009;
75(21):
6812 - 6819.
[Abstract][Full Text][PDF]
J. P. Goncalves, M. Graos, and A. X.C.N. Valente POLAR MAPPER: a computational tool for integrated visualization of protein interaction networks and mRNA expression data
J R Soc Interface,
October 6, 2009;
6(39):
881 - 896.
[Abstract][Full Text][PDF]
Y. Chen, G. Lin, J. S. Huo, D. Barney, Z. Wang, T. Livshiz, D. J. States, Z. S. Qin, and J. Schwartz Computational and Functional Analysis of Growth Hormone (GH)-Regulated Genes Identifies the Transcriptional Repressor B-Cell Lymphoma 6 (Bc16) as a Participant in GH-Regulated Transcription
Endocrinology,
August 1, 2009;
150(8):
3645 - 3654.
[Abstract][Full Text][PDF]
A. V. Antonov, S. Dietmann, P. Wong, D. Lutter, and H. W. Mewes GeneSet2miRNA: finding the signature of cooperative miRNA activities in the gene lists
Nucleic Acids Res.,
July 1, 2009;
37(suppl_2):
W323 - W328.
[Abstract][Full Text][PDF]
J. Zuber, I. Radtke, T. S. Pardee, Z. Zhao, A. R. Rappaport, W. Luo, M. E. McCurrach, M.-M. Yang, M. E. Dolan, S. C. Kogan, et al. Mouse models of human AML accurately predict chemotherapy response
Genes & Dev.,
April 1, 2009;
23(7):
877 - 889.
[Abstract][Full Text][PDF]
F.E. Frenkel and E.V. Korotkov Using Triplet Periodicity of Nucleotide Sequences for Finding Potential Reading Frame Shifts in Genes
DNA Res,
April 1, 2009;
16(2):
105 - 114.
[Abstract][Full Text][PDF]
D. M. Gatti, M. Sypa, I. Rusyn, F. A. Wright, and W. T. Barry SAFEGUI: resampling-based tests of categorical significance in gene expression data made easy
Bioinformatics,
February 15, 2009;
25(4):
541 - 542.
[Abstract][Full Text][PDF]
B. N. Chau, R. L. Diaz, M. A. Saunders, C. Cheng, A. N. Chang, P. Warrener, J. Bradshaw, P. S. Linsley, and M. A. Cleary Identification of SULF2 as a Novel Transcriptional Target of p53 by Use of Integrated Genomic Analyses
Cancer Res.,
February 15, 2009;
69(4):
1368 - 1374.
[Abstract][Full Text][PDF]
A. L. Tarca, S. Draghici, P. Khatri, S. S. Hassan, P. Mittal, J.-s. Kim, C. J. Kim, J. P. Kusanovic, and R. Romero A novel signaling pathway impact analysis
Bioinformatics,
January 1, 2009;
25(1):
75 - 82.
[Abstract][Full Text][PDF]
E. Marston, V. Weston, J. Jesson, E. Maina, C. McConville, A. Agathanggelou, A. Skowronska, K. Mapp, K. Sameith, J. E. Powell, et al. Stratification of pediatric ALL by in vitro cellular responses to DNA double-strand breaks provides insight into the molecular mechanisms underlying clinical response
Blood,
January 1, 2009;
113(1):
117 - 126.
[Abstract][Full Text][PDF]
M. Dori-Bachash, B. Dassa, S. Pietrokovski, and E. Jurkevitch Proteome-Based Comparative Analyses of Growth Stages Reveal New Cell Cycle-Dependent Functions in the Predatory Bacterium Bdellovibrio bacteriovorus
Appl. Envir. Microbiol.,
December 1, 2008;
74(23):
7152 - 7162.
[Abstract][Full Text][PDF]
Y. J. Huang, D. Hang, L. J. Lu, L. Tong, M. B. Gerstein, and G. T. Montelione Targeting the Human Cancer Pathway Protein Interaction Network by Structural Genomics
Mol. Cell. Proteomics,
October 1, 2008;
7(10):
2048 - 2060.
[Abstract][Full Text][PDF]
A. Gevorgyan, M. G. Poolman, and D. A. Fell Detection of stoichiometric inconsistencies in biomolecular models
Bioinformatics,
October 1, 2008;
24(19):
2245 - 2251.
[Abstract][Full Text][PDF]
F. Ortega, K. Sameith, N. Turan, R. Compton, V. Trevino, M. Vannucci, and F. Falciani Models and computational strategies linking physiological response to molecular networks from large-scale data
Phil Trans R Soc A,
September 13, 2008;
366(1878):
3067 - 3089.
[Abstract][Full Text][PDF]
F. J. Planes and J. E. Beasley A critical examination of stoichiometric and path-finding approaches to metabolic pathways
Brief Bioinform,
September 1, 2008;
9(5):
422 - 436.
[Abstract][Full Text][PDF]
J. Liu, X. Xu, and G. D. Stormo The cis-regulatory map of Shewanella genomes
Nucleic Acids Res.,
September 1, 2008;
36(16):
5376 - 5390.
[Abstract][Full Text][PDF]
M. Tamura and P. D'haeseleer Microbial genotype-phenotype mapping by class association rule mining
Bioinformatics,
July 1, 2008;
24(13):
1523 - 1529.
[Abstract][Full Text][PDF]
C. Calderon-Vazquez, E. Ibarra-Laclette, J. Caballero-Perez, and L. Herrera-Estrella Transcript profiling of Zea mays roots reveals gene responses to phosphate deficiency at the plant- and species-specific levels
J. Exp. Bot.,
June 6, 2008;
(2008)
ern115v2.
[Abstract][Full Text][PDF]
N. Gava, C. L. Clarke, C. Bye, K. Byth, and A. deFazio Global gene expression profiles of ovarian surface epithelial cells in vivo
J. Mol. Endocrinol.,
June 1, 2008;
40(6):
281 - 296.
[Abstract][Full Text][PDF]
M. Hackenberg and R. Matthiesen Annotation-Modules: a tool for finding significant combinations of multisource annotations for gene lists
Bioinformatics,
June 1, 2008;
24(11):
1386 - 1393.
[Abstract][Full Text][PDF]
X. Hu, W. Fan, B. Han, H. Liu, D. Zheng, Q. Li, W. Dong, J. Yan, M. Gao, C. Berry, et al. Complete Genome Sequence of the Mosquitocidal Bacterium Bacillus sphaericus C3-41 and Comparison with Those of Closely Related Bacillus Species
J. Bacteriol.,
April 15, 2008;
190(8):
2892 - 2902.
[Abstract][Full Text][PDF]
A. L. Fisher, K. E. Page, G. J. Lithgow, and L. Nash The Caenorhabditis elegans K10C2.4 Gene Encodes a Member of the Fumarylacetoacetate Hydrolase Family: A CAENORHABDITIS ELEGANS MODEL OF TYPE I TYROSINEMIA
J. Biol. Chem.,
April 4, 2008;
283(14):
9127 - 9135.
[Abstract][Full Text][PDF]
J. J. Goeman and U. Mansmann Multiple testing on the directed acyclic graph of gene ontology
Bioinformatics,
February 15, 2008;
24(4):
537 - 544.
[Abstract][Full Text][PDF]
M. Bashton, I. Nobeli, and J. M. Thornton PROCOGNATE: a cognate ligand domain mapping for enzymes
Nucleic Acids Res.,
January 11, 2008;
36(suppl_1):
D618 - D622.
[Abstract][Full Text][PDF]
R. Alfieri, I. Merelli, E. Mosca, and L. Milanesi The cell cycle DB: a systems biology approach to cell cycle analysis
Nucleic Acids Res.,
January 11, 2008;
36(suppl_1):
D641 - D645.
[Abstract][Full Text][PDF]
R. Muralla, E. Chen, C. Sweeney, J. A. Gray, A. Dickerman, B. J. Nikolau, and D. Meinke A Bifunctional Locus (BIO3-BIO1) Required for Biotin Biosynthesis in Arabidopsis
Plant Physiology,
January 1, 2008;
146(1):
60 - 73.
[Abstract][Full Text][PDF]
S. Moco, E. Capanoglu, Y. Tikunov, R. J. Bino, D. Boyacioglu, R. D. Hall, J. Vervoort, and R. C. H. De Vos Tissue specialization at the metabolite level is perceived during the development of tomato fruit
J. Exp. Bot.,
December 7, 2007;
(2007)
erm271v1.
[Abstract][Full Text][PDF]
N. Raghavan, A. M. I. M. De Bondt, W. Talloen, D. Moechars, H. W. H. Gohlmann, and D. Amaratunga The high-level similarity of some disparate gene expression measures
Bioinformatics,
November 15, 2007;
23(22):
3032 - 3038.
[Abstract][Full Text][PDF]
C. Buffat, F. Boubred, F. Mondon, S. T. Chelbi, J.-M. Feuerstein, M. Lelievre-Pegorier, D. Vaiman, and U. Simeoni Kidney Gene Expression Analysis in a Rat Model of Intrauterine Growth Restriction Reveals Massive Alterations of Coagulation Genes
Endocrinology,
November 1, 2007;
148(11):
5549 - 5557.
[Abstract][Full Text][PDF]
S. Draghici, P. Khatri, A. L. Tarca, K. Amin, A. Done, C. Voichita, C. Georgescu, and R. Romero A systems biology approach for pathway level analysis
Genome Res.,
October 1, 2007;
17(10):
1537 - 1545.
[Abstract][Full Text][PDF]
J. Lin, C. M. Gan, X. Zhang, S. Jones, T. Sjoblom, L. D. Wood, D. W. Parsons, N. Papadopoulos, K. W. Kinzler, B. Vogelstein, et al. A multidimensional analysis of genes mutated in breast and colorectal cancers
Genome Res.,
September 1, 2007;
17(9):
1304 - 1318.
[Abstract][Full Text][PDF]
P. Khatri, C. Voichita, K. Kattan, N. Ansari, A. Khatri, C. Georgescu, A. L. Tarca, and S. Draghici Onto-Tools: new additions and improvements in 2006
Nucleic Acids Res.,
July 13, 2007;
35(suppl_2):
W206 - W211.
[Abstract][Full Text][PDF]
H. A. Himburg, S. E. Dowd, and M. H. Friedman Frequency-dependent response of the vascular endothelium to pulsatile shear stress
Am J Physiol Heart Circ Physiol,
July 1, 2007;
293(1):
H645 - H653.
[Abstract][Full Text][PDF]
F. L. Fevre, S. Smidtas, and V. Schachter Cyclone: java-based querying and computing with Pathway/Genome databases
Bioinformatics,
May 15, 2007;
23(10):
1299 - 1300.
[Abstract][Full Text][PDF]
J. Quackenbush Extracting biology from high-dimensional biological data
J. Exp. Biol.,
May 1, 2007;
210(9):
1507 - 1517.
[Abstract][Full Text][PDF]
J. J. Goeman and P. Buhlmann Analyzing gene expression data in terms of gene sets: methodological issues
Bioinformatics,
April 15, 2007;
23(8):
980 - 987.
[Abstract][Full Text][PDF]
M. Jain, C. J. Petzold, M. W. Schelle, M. D. Leavell, J. D. Mougous, C. R. Bertozzi, J. A. Leary, and J. S. Cox Lipidomics reveals control of Mycobacterium tuberculosis virulence lipids via metabolic coupling
PNAS,
March 20, 2007;
104(12):
5133 - 5138.
[Abstract][Full Text][PDF]
E. M. Heizer Jr, D. W. Raiford, M. L. Raymer, T. E. Doom, R. V. Miller, and D. E. Krane Amino Acid Cost and Codon-Usage Biases in 6 Prokaryotic Genomes: A Whole-Genome Analysis
Mol. Biol. Evol.,
September 1, 2006;
23(9):
1670 - 1680.
[Abstract][Full Text][PDF]
J. Widmann, M. Hamady, and R. Knight DivergentSet, a Tool for Picking Non-redundant Sequences from Large Sequence Collections
Mol. Cell. Proteomics,
August 1, 2006;
5(8):
1520 - 1532.
[Abstract][Full Text][PDF]
M. Baitaluk, M. Sedova, A. Ray, and A. Gupta BiologicalNetworks: visualization and analysis tool for systems biology.
Nucleic Acids Res.,
July 1, 2006;
34(Web Server issue):
W466 - W471.
[Abstract][Full Text][PDF]
M. K. Mulligan, I. Ponomarev, R. J. Hitzemann, J. K. Belknap, B. Tabakoff, R. A. Harris, J. C. Crabbe, Y. A. Blednov, N. J. Grahame, T. J. Phillips, et al. Toward understanding the genetics of alcohol drinking through transcriptome meta-analysis
PNAS,
April 18, 2006;
103(16):
6368 - 6373.
[Abstract][Full Text][PDF]
J. T. Bell, C. Wallace, R. Dobson, S. Wiltshire, C. Mein, J. Pembroke, M. Brown, D. Clayton, N. Samani, A. Dominiczak, et al. Two-dimensional genome-scan identifies novel epistatic loci for essential hypertension
Hum. Mol. Genet.,
April 15, 2006;
15(8):
1365 - 1374.
[Abstract][Full Text][PDF]
B. Dysvik, E. N. Vasstrand, R. Lovlie, O. A-A. Elgindi, K. W. Kross, H. J. Aarstad, A. Chr. Johannessen, I. Jonassen, and S. O. Ibrahim Gene Expression Profiles of Head and Neck Carcinomas from Sudanese and Norwegian Patients Reveal Common Biological Pathways Regardless of Race and Lifestyle
Clin. Cancer Res.,
February 15, 2006;
12(4):
1109 - 1120.
[Abstract][Full Text][PDF]
W. Eisenreich, Jör. Slaghuis, R. Laupitz, J. Bussemer, J. Stritzker, C. Schwarz, R. Schwarz, T. Dandekar, W. Goebel, and A. Bacher 13C isotopologue perturbation studies of Listeria monocytogenes carbon metabolism and its modulation by the virulence regulator PrfA
PNAS,
February 14, 2006;
103(7):
2040 - 2045.
[Abstract][Full Text][PDF]
A. Birkland and G. Yona BIOZON: a hub of heterogeneous biological data
Nucleic Acids Res.,
January 1, 2006;
34(suppl_1):
D235 - D242.
[Abstract][Full Text][PDF]
U. Pieper, N. Eswar, F. P. Davis, H. Braberg, M. S. Madhusudhan, A. Rossi, M. Marti-Renom, R. Karchin, B. M. Webb, D. Eramian, et al. MODBASE: a database of annotated comparative protein structure models and associated resources
Nucleic Acids Res.,
January 1, 2006;
34(suppl_1):
D291 - D295.
[Abstract][Full Text][PDF]
S. H. Ahmed, R. Lutjens, L. D. van der Stap, D. Lekic, V. Romano-Spica, M. Morales, G. F. Koob, V. Repunte-Canonigo, and P. P. Sanna Gene expression evidence for remodeling of lateral hypothalamic circuitry in cocaine addiction
PNAS,
August 9, 2005;
102(32):
11533 - 11538.
[Abstract][Full Text][PDF]
D. Croes, F. Couche, S. J. Wodak, and J. van Helden Metabolic PathFinding: inferring relevant pathways in biochemical networks
Nucleic Acids Res.,
July 1, 2005;
33(suppl_2):
W326 - W330.
[Abstract][Full Text][PDF]
X. Wu, M. G. Walker, J. Luo, and L. Wei GBA server: EST-based digital gene expression profiling
Nucleic Acids Res.,
July 1, 2005;
33(suppl_2):
W673 - W676.
[Abstract][Full Text][PDF]
R. Karchin, M. Diekhans, L. Kelly, D. J. Thomas, U. Pieper, N. Eswar, D. Haussler, and A. Sali LS-SNP: large-scale annotation of coding non-synonymous SNPs based on multiple information sources
Bioinformatics,
June 15, 2005;
21(12):
2814 - 2820.
[Abstract][Full Text][PDF]
L. Cabusora, E. Sutton, A. Fulmer, and C. V. Forst Differential network expression during drug and stress response
Bioinformatics,
June 15, 2005;
21(12):
2898 - 2905.
[Abstract][Full Text][PDF]
S. C. Janga, J. Collado-Vides, and G. Moreno-Hagelsieb Nebulon: a system for the inference of functional relationships of gene products from the rearrangement of predicted operons
Nucleic Acids Res.,
May 2, 2005;
33(8):
2521 - 2530.
[Abstract][Full Text][PDF]
R. D. King, S. M. Garrett, and G. M. Coghill On the use of qualitative reasoning to simulate and identify metabolic pathways
Bioinformatics,
May 1, 2005;
21(9):
2017 - 2026.
[Abstract][Full Text][PDF]
C.-R. Yang, B. E. Shapiro, S.-p. Hung, E. D. Mjolsness, and G. W. Hatfield A Mathematical Model for the Branched Chain Amino Acid Biosynthetic Pathways of Escherichia coli K12
J. Biol. Chem.,
March 25, 2005;
280(12):
11224 - 11232.
[Abstract][Full Text][PDF]
B.-M. Lee, Y.-J. Park, D.-S. Park, H.-W. Kang, J.-G. Kim, E.-S. Song, I.-C. Park, U.-H. Yoon, J.-H. Hahn, B.-S. Koo, et al. The genome sequence of Xanthomonas oryzae pathovar oryzae KACC10331, the bacterial blight pathogen of rice
Nucleic Acids Res.,
January 26, 2005;
33(2):
577 - 586.
[Abstract][Full Text][PDF]
N. E. Collins, J. Liebenberg, E. P. de Villiers, K. A. Brayton, E. Louw, A. Pretorius, F. E. Faber, H. van Heerden, A. Josemans, M. van Kleef, et al. The genome of the heartwater agent Ehrlichia ruminantium contains multiple tandem repeats of actively variable copy number
PNAS,
January 18, 2005;
102(3):
838 - 843.
[Abstract][Full Text][PDF]
J. P. Massar, M. Travers, J. Elhai, and J. Shrager BioLingua: a programmable knowledge environment for biologists
Bioinformatics,
January 15, 2005;
21(2):
199 - 207.
[Abstract][Full Text][PDF]
K. G. Mawuenyega, C. V. Forst, K. M. Dobos, J. T. Belisle, J. Chen, E. M. Bradbury, A. R.M. Bradbury, and X. Chen Mycobacterium tuberculosis Functional Network Analysis by Global Subcellular Protein Profiling
Mol. Biol. Cell,
January 1, 2005;
16(1):
396 - 404.
[Abstract][Full Text][PDF]
B. Snel, V. van Noort, and M. A. Huynen Gene co-regulation is highly conserved in the evolution of eukaryotes and prokaryotes
Nucleic Acids Res.,
September 7, 2004;
32(16):
4725 - 4731.
[Abstract][Full Text][PDF]
J. D. Jaffe, N. Stange-Thomann, C. Smith, D. DeCaprio, S. Fisher, J. Butler, S. Calvo, T. Elkins, M. G. FitzGerald, N. Hafez, et al. The Complete Genome and Proteome of Mycoplasma mobile
Genome Res.,
August 1, 2004;
14(8):
1447 - 1461.
[Abstract][Full Text][PDF]
P. Khatri, P. Bhavsar, G. Bawa, and S. Draghici Onto-Tools: an ensemble of web-accessible, ontology-based tools for the functional design and interpretation of high-throughput gene expression experiments
Nucleic Acids Res.,
July 1, 2004;
32(suppl_2):
W449 - W456.
[Abstract][Full Text][PDF]
J. Z. Li, M. P. Vawter, D. M. Walsh, H. Tomita, S. J. Evans, P. V. Choudary, J. F. Lopez, A. Avelar, V. Shokoohi, T. Chung, et al. Systematic changes in gene expression in postmortem human brains associated with tissue pH and terminal medical conditions
Hum. Mol. Genet.,
March 15, 2004;
13(6):
609 - 616.
[Abstract][Full Text][PDF]
B. Snel and M. A. Huynen Quantifying Modularity in the Evolution of Biomolecular Systems
Genome Res.,
March 1, 2004;
14(3):
391 - 397.
[Abstract][Full Text][PDF]
J. B. German, M.-A. Roberts, and S. M. Watkins Personal Metabolomics as a Next Generation Nutritional Assessment
J. Nutr.,
December 1, 2003;
133(12):
4260 - 4266.
[Abstract][Full Text][PDF]
T. E. Allen, M. J. Herrgard, M. Liu, Y. Qiu, J. D. Glasner, F. R. Blattner, and B. O. Palsson Genome-Scale Analysis of the Uses of the Escherichia coli Genome: Model-Driven Analysis of Heterogeneous Data Sets
J. Bacteriol.,
November 1, 2003;
185(21):
6392 - 6399.
[Abstract][Full Text][PDF]
J. M. Stuart, E. Segal, D. Koller, and S. K. Kim A Gene-Coexpression Network for Global Discovery of Conserved Genetic Modules
Science,
October 10, 2003;
302(5643):
249 - 255.
[Abstract][Full Text][PDF]
R. Gil, F. J. Silva, E. Zientz, F. Delmotte, F. Gonzalez-Candelas, A. Latorre, C. Rausell, J. Kamerbeek, J. Gadau, B. Holldobler, et al. The genome sequence of Blochmannia floridanus: Comparative analysis of reduced genomes
PNAS,
August 5, 2003;
100(16):
9388 - 9393.
[Abstract][Full Text][PDF]
S. AlSairafi, F.-S. Emmanouil, M. Ghanem, N. Giannadakis, Y. Guo, D. Kalaitzopoulos, M. Osmond, A. Rowe, J. Syed, and P. Wendel The Design of Discovery Net: Towards Open Grid Services for Knowledge Discovery
International Journal of High Performance Computing Applications,
August 1, 2003;
17(3):
297 - 315.
[Abstract][PDF]
T E Raevaara, T Timoharju, K E Lonnqvist, R Kariola, M Steinhoff, R M W Hofstra, E Mangold, Y J Vos, and M Nystrom-Lahti Description and functional analysis of a novel in frame mutation linked to hereditary non-polyposis colorectal cancer
J. Med. Genet.,
October 1, 2002;
39(10):
747 - 750.
[Full Text][PDF]
D. R. Rhodes, T. R. Barrette, M. A. Rubin, D. Ghosh, and A. M. Chinnaiyan Meta-Analysis of Microarrays: Interstudy Validation of Gene Expression Profiles Reveals Pathway Dysregulation in Prostate Cancer
Cancer Res.,
August 1, 2002;
62(15):
4427 - 4433.
[Abstract][Full Text][PDF]
H. Akashi and T. Gojobori Metabolic efficiency and amino acid composition in the proteomes of Escherichia coli and Bacillus subtilis
PNAS,
March 19, 2002;
99(6):
3695 - 3700.
[Abstract][Full Text][PDF]
E. M. Panina, A. A. Mironov, and M. S. Gelfand Comparative analysis of FUR regulons in gamma-proteobacteria
Nucleic Acids Res.,
December 15, 2001;
29(24):
5195 - 5206.
[Abstract][Full Text][PDF]
Y. Pouliot, J. Gao, Q. J. Su, G. G. Liu, and X. B. Ling DIAN: A Novel Algorithm for Genome Ontological Classification
Genome Res.,
October 1, 2001;
11(10):
1766 - 1779.
[Abstract][Full Text][PDF]
M. A. Florczyk, L. A. McCue, R. F. Stack, C. R. Hauer, and K. A. McDonough Identification and Characterization of Mycobacterial Proteins Differentially Expressed under Standing and Shaking Culture Conditions, Including Rv2623 from a Novel Class of Putative ATP-Binding Proteins
Infect. Immun.,
September 1, 2001;
69(9):
5777 - 5785.
[Abstract][Full Text][PDF]
R. Ramakrishna, J. S. Edwards, A. McCulloch, and B. O. Palsson Flux-balance analysis of mitochondrial energy metabolism: consequences of systemic stoichiometric constraints
Am J Physiol Regulatory Integrative Comp Physiol,
March 1, 2001;
280(3):
R695 - R704.
[Abstract][Full Text][PDF]
T. Sicheritz-Ponten and S. G. E. Andersson A phylogenomic approach to microbial evolution
Nucleic Acids Res.,
January 15, 2001;
29(2):
545 - 552.
[Abstract][Full Text][PDF]
A. Manson McGuire and G. M. Church Predicting regulons and their cis-regulatory motifs by comparative genomics
Nucleic Acids Res.,
November 15, 2000;
28(22):
4523 - 4530.
[Abstract][Full Text][PDF]
W. Fujibuchi, H. Ogata, H. Matsuda, and M. Kanehisa Automatic detection of conserved gene clusters in multiple genomes by graph comparison and P-quasi grouping
Nucleic Acids Res.,
October 15, 2000;
28(20):
4029 - 4036.
[Abstract][Full Text][PDF]
M. Huynen, B. Snel, W. Lathe III, and P. Bork Predicting Protein Function by Genomic Context: Quantitative Evaluation and Qualitative Inferences
Genome Res.,
August 1, 2000;
10(8):
1204 - 1210.
[Abstract][Full Text]
D. Kihara and M. Kanehisa Tandem Clusters of Membrane Proteins in Complete Genome Sequences
Genome Res.,
June 1, 2000;
10(6):
731 - 743.
[Abstract][Full Text]
A. M. McGuire, J. D. Hughes, and G. M. Church Conservation of DNA Regulatory Motifs and Discovery of New Motifs in Microbial Genomes
Genome Res.,
June 1, 2000;
10(6):
744 - 757.
[Abstract][Full Text]
M. Kanehisa and S. Goto KEGG: Kyoto Encyclopedia of Genes and Genomes
Nucleic Acids Res.,
January 1, 2000;
28(1):
27 - 30.
[Abstract][Full Text][PDF]
M. Ringwald, J. T. Eppig, J. A. Kadin, J. E. Richardson, and the Gene Expression Database Group GXD: a Gene Expression Database for the laboratory mouse: current status and recent enhancements
Nucleic Acids Res.,
January 1, 2000;
28(1):
115 - 119.
[Abstract][Full Text][PDF]
R. Overbeek, N. Larsen, G. D. Pusch, M. D'Souza, E. S. Jr, N. Kyrpides, M. Fonstein, N. Maltsev, and E. Selkov WIT: integrated system for high-throughput genome sequence analysis and metabolic reconstruction
Nucleic Acids Res.,
January 1, 2000;
28(1):
123 - 125.
[Abstract][Full Text][PDF]
T. Kawashima, S. Kawashima, M. Kanehisa, H. Nishida, and K. W. Makabe MAGEST: MAboya Gene Expression patterns and Sequence Tags
Nucleic Acids Res.,
January 1, 2000;
28(1):
133 - 135.
[Abstract][Full Text][PDF]
I. Xenarios, D. W. Rice, L. Salwinski, M. K. Baron, E. M. Marcotte, and D. Eisenberg DIP: the Database of Interacting Proteins
Nucleic Acids Res.,
January 1, 2000;
28(1):
289 - 291.
[Abstract][Full Text][PDF]