Nucleic Acids Research, 2007, Vol. 35, Database issue D533-D537
© 2006 The Author(s)
This is an Open Access article distributed under the terms of the Creative Commons Attribution Non-Commercial License (http://creativecommons.org/licenses/by-nc/2.0/uk/) which permits unrestricted non-commercial use, distribution, and reproduction in any medium, provided the original work is properly cited.
Articles |
SYSTOMONAS an integrated database for systems biology analysis of Pseudomonas
1 Institut für Mikrobiologie, Technische Universität Braunschweig Spielmannstraße 7, D-38106 Braunschweig, Germany 2 Institut für Biochemie, Universität zu Köln Zülpicher Straße 47, D-50674 Köln, Germany 3 Institut für Bioverfahrenstechnik, Technische Universität Braunschweig Gaußstraße 17, D-38106 Braunschweig, Germany 4 Fachbereich Informatik, Fachhochschule Wolfenbüttel Am Exer 2, D-38302 Wolfenbüttel, Germany
*To whom correspondence should be addressed. Tel: +49 531 391 5801; Fax: +49 531 391 5854; Email: d.jahn{at}tu-bs.de
Received August 11, 2006. Revised October 5, 2006. Accepted October 5, 2006.
| ABSTRACT |
|---|
|
|
|---|
To provide an integrated bioinformatics platform for a systems biology approach to the biology of pseudomonads in infection and biotechnology the database SYSTOMONAS (SYSTems biology of pseudOMONAS) was established. Besides our own experimental metabolome, proteome and transcriptome data, various additional predictions of cellular processes, such as gene-regulatory networks were stored. Reconstruction of metabolic networks in SYSTOMONAS was achieved via comparative genomics. Broad data integration is realized using SOAP interfaces for the well established databases BRENDA, KEGG and PRODORIC. Several tools for the analysis of stored data and for the visualization of the corresponding results are provided, enabling a quick understanding of metabolic pathways, genomic arrangements or promoter structures of interest. The focus of SYSTOMONAS is on pseudomonads and in particular Pseudomonas aeruginosa, an opportunistic human pathogen. With this database we would like to encourage the Pseudomonas community to elucidate cellular processes of interest using an integrated systems biology strategy. The database is accessible at http://www.systomonas.de.
| MOTIVATION |
|---|
|
|
|---|
Traditionally, metabolic and gene-regulatory networks were analysed separately. There are various tools for metabolic network reconstruction [e.g. (13)], and for the generation of gene-regulatory networks (4), i.e. the prediction of the regulation of certain genes by specific transcription factors. However, there still exist only a few tools combining both networks such as the Pathway Tools Omics Viewer (5). This poor connectivity between the two outlined approaches might be due to the fact that the required information is stored in different databases. Information on transcription factor binding sites can be found for example in RegulonDB (6) or PRODORIC (7), while metabolic reactions or pathways need to be retrieved from other database, such as BRENDA (8), BioCyc (5), KEGG (9), PseudoCyc (10), and UM-BDD (11). Combining knowledge from multiple disciplines and data resources will drive our understanding of cellular processes and lead to the prediction of the cellular behaviour in its entirety.
Consequently, we constructed the database SYSTOMONAS, which provides the basis for a systems biology approach. Here we focus on data integration for the biotechnologically and medically relevant group of bacteria, the pseudomonads.
| CONTENT OF SYSTOMONAS |
|---|
|
|
|---|
The complexity of a systems biology approach requires the focus on a certain well investigated organism. We have chosen the Gram negative proteobacterium Pseudomonas aeruginosa. This organism is a versatile soil bacterium and an important opportunistic pathogen causing persistent infection in immunocompromised patients (12). Our long term goal is the development of a dynamic model simulating the behaviour of P.aeruginosa during infection. The basis of our approach is SYSTOMONAS, a comprehensive database that includes data from all levels of analysis as microarray and proteomics data, metabolite measurements, sequence data, gene-regulatory networks and corresponding enzyme data.
Our database comprises information on eight different Pseudomonas species and strains, which genomes have been completely sequenced and functionally annotated. Besides the medically relevant P.aeruginosa the genera Pseudomonas contains various important plant pathogens and biotechnologically as well as ecologically interesting species.
Our initial focus was on metabolomics. However, all current information coming from genomics, transcriptomics and proteomics (13) is also stored in our database as data warehouse (14) or dynamically accessible via web services using SOAP interfaces (15), a platform-independent data transfer protocol (see section Database Techniques). Besides other research groups and our own experimental results further data are retrieved from major general data resources. Major external sources of SYSTOMONAS are KEGG [Kyoto Encyclopedia of Genes and Genomes (9)], Pseudomonas Genome Database v2 [PGDv2 (16)], PRODORIC [PROcaryotIC Database Of gene-Regulation (4)], and BRENDA (8). KEGG provided metabolic reactions, compounds, glycans and pathways; PGDv2 and PRODORIC supplied protein, gene annotation, gene-regulatory and genome structure data. BRENDA supports kinetic and disease information. ENZYME (17) and BioCyc (5) provide further functional characterization of proteins.
Currently, SYSTOMONAS contains 10 034 proteins identified as enzymes, 195 transcription factorgene relations, 14 250 measuring points of three independent metabolome experiments. Moreover, 11 exemplary protein spots from one proteome experiment were entered. Transcriptome data are provided by PRODORIC via SOAP (see section Database Techniques). For P.aeruginosa PAO1 1509 unique proteins were annotated in SYSTOMONAS as enzymes. The 1509 annotated enzymes were retrieved from KEGG (1003), PGDv2 (1017), BioCyc (493), ENZYME (393) and from our own annotation (241). The corresponding annotation process is described in the following sections. By comparison, PseudoCyc contains 738 enzymes (version 9.6, http://v2.pseudomonas.com:1555/).
| COMPARATIVE GENOMICS AND REGULATORY NETWORK PREDICTION |
|---|
|
|
|---|
Comparing a Pseudomonas protein of interest with other well-characterized proteins may deliver useful insights into the evolution, distribution and species specific function. Therefore, we searched for all deduced proteins of the SYSTOMONAS database for orthologous proteins in other Pseudomonas species to obtain orthologous protein clusters. First, a restricted BLAST analysis (18) was performed on the protein sequences followed by a pairwise global alignment using the tool stretcher of the EMBOSS package (19). The homologous protein pairs can be obtained from the SYSTOMONAS protein table and visualized as multiple alignments, which are produced by MUSCLE [MUltiple Sequence Comparison by Log-Expectation (20)]. A more dynamic and flexible tool for the visualization of multiple alignments is provided by Jalview (21), which is also accessible from SYSTOMONAS. This tool does not only display multiple alignments but is also able to generate a phylogenetic tree for the protein group by different algorithms. The E-value of BLAST and the identity calculated by stretcher can be retrieved by activating the two multiple alignment tools.
Graphical maps of corresponding gene regions can be retrieved via a hyperlink to the BRENDA Genome Explorer, a tool within the BRENDA package (8). BRENDA Genome Explorer visualizes orthologous gene regions, which have a sequence identity of at least 50% in different organisms.
If the user is interested in the prediction of transcription factor binding sites and the deduction of corresponding regulons the tool Virtual Footprint (4) can be employed. We adapted this tool to SYSTOMONAS by limiting the analysis on Pseudomonas species.
| Metabolic network reconstruction |
|---|
|
|
|---|
Genes and proteins of eight Pseudomonas species and strains are carefully annotated by the Pseudomonas community and involved genome projects (16). PGDv2 is the resource for the continually updated P.aeruginosa PAO1 genome annotation. It also refers to genome annotation web sites of other Pseudomonas genome projects for the most up-to-date information. To reconstruct metabolic networks, we transferred known EC numbers of each Pseudomonas protein to its homologous partners. The enzyme designation of proteins was determined in three steps. First, the external databases KEGG (9), PGDv2 (16), ENZYME (17), BioCyc (5) provide EC numbers for the proteins. Second, homology analyses lead to putative orthologous protein pairs (see section Comparative Genomics). Third, if the identity of the global alignment (stretcher) equals or exceeds 60%, all EC number connections to proteins were transferred to their orthologous protein partners. The EC number source on the website is indicated accordingly. EC numbers newly identified by our method are declared as predicted. In order to improve the metabolic network reconstruction by applying another method, we also used the tool metaSHARK (1). This tool is able to identify potential enzyme-encoding genes in raw DNA sequences, which are not annotated yet. All newly detected EC numbers by metaSHARK are also indicated as predicted in SYSTOMONAS. Table 1 lists parts of the corresponding database content.
|
Next, we imported KEGG-pathway data and enabled links to pathway maps via the SOAP interface provided by KEGG (9), in which P.aeruginosa enzymes are highlighted. A metabolic network including all involved enzymes, metabolic reactions and metabolites of all pseudomonads is delivered by our own adapted tool, which is based on GraphViz (22) and creates clickable image maps (Figure 1). The user immediately recognizes enzymatic reactions unique to one Pseudomonas species. All necessary information concerning this reaction is provided by mouse click.
|
| METABOLOMICS DATA |
|---|
|
|
|---|
The database structure of SYSTOMONAS is suitable for the simple storage of various types of experimental data. All currently available transcriptome and proteome data are deposited in the database. For a start, we included our own experimental data obtained for the P.aeruginosa strain PAO1 measured under different growth conditions. Our raw data analysed by GC/MS can be accessed via the query form omics data. As an extra feature, the metabolomics data obtained for one specific growth condition can be plotted against another dataset using gnuplot (www.gnuplot.info). Data points are clickable and lead to the corresponding metabolite of the database (Figure 2). If the levels of specific metabolites differ significantly between different conditions, the measured values are found distinct to a fictive diagonal line. Experimental conditions and methods are indicated along with the raw data.
|
| DATABASE TECHNIQUES |
|---|
|
|
|---|
We have chosen the open source object-relational database management system PostgreSQL 8.0.3 (www.posgresql.org) for our database. This database is accessed with the scripting language PHP (www.php.net), which also allows the dynamic generation of the web interface. The web server is Apache 2.0 (http://httpd.apache.org).
Data integration with SYSTOMONAS combines two different principle concepts, the data warehouse concept (14) and dynamic web services via SOAP. The advantage of a data warehouse is mainly its fast performance during the data retrieval. For this purpose, a major portion of the data of SYSTOMONAS, such as KEGG compounds, reactions or PRODORIC transcription factorgene interactions is locally stored. The major advantage of SOAP is its up-to-dateness, since SOAP-transmitted information is corresponding to the most recent data of the consulted database. Several databases provide web services via SOAP, such as the major sequence databases (2325). Several other databases, such as Atlas (26) are organized as data warehouses. The main data sources of SYSTOMONAS are stored and matched in an intermediate data container metabold, which supplies data to SYSTOMONAS (Figure 3). The web services via SOAP, which are used on the websites of SYSTOMONAS, are listed in Table 2. Whenever a webpage with these web services is accessed, the data is retrieved from the actual external database and amends the locally stored data of SYSTOMONAS. The API (application programming interface) is constructed by the SOAP extension of PHP.
|
|
| AVAILABILITY |
|---|
|
|
|---|
Currently, the data of SYSTOMONAS along with its visualization tools and web services can be accessed freely via a web-based user interface (http://www.systomonas.de). Additional information, such as kinetic data, operon structures or transcriptomics data are retrieved on-the-fly via web services from PRODORIC, BRENDA and KEGG (Table 2). We provide SBML (27) formatted files for downloading our metabolic and gene-regulatory networks along with a database copy at the SYSTOMONAS website.
| ACKNOWLEDGEMENTS |
|---|
The authors are much obliged to Frank Klawonn for advising in the statistical part of metabolomics comparison. Many thanks go to Mathias Krull and Barbara Schulz, who proof-read this paper. This work was funded by the German Bundesministerium für Bildung und Forschung (BMBF) for the National Genome Research Network (NGFN2-EP, grant no. 0313398A), BMBF for the Bioinformatics Competence Center Intergenomics (Grant No. 031U110A/031U210A) and the Volkswagen Foundation.
Conflict of interest statement. None declared.
| REFERENCES |
|---|
|
|
|---|
- Pinney, J.W., Shirley, M.W., McConkey, G.A., Westhead, D.R. (2005) metaSHARK: software for automated metabolic network prediction from DNA sequence and its application to the genomes of Plasmodium falciparum and Eimeria tenella Nucleic Acids Res, . 33, 13991409
[Abstract/Free Full Text] . - Karp, P.D., Paley, S., Romero, P. (2002) The Pathway Tools software Bioinformatics, 18, S225S232[Abstract] .
- Goesmann, A., Haubrock, M., Meyer, F., Kalinowski, J., Giegerich, R. (2002) PathFinder: reconstruction and dynamic visualization of metabolic pathways Bioinformatics, 18, 124129
[Abstract/Free Full Text] . - Münch, R., Hiller, K., Grote, A., Scheer, M., Klein, J., Schobert, M., Jahn, D. (2005) Virtual Footprint and PRODORIC: an integrative framework for regulon prediction in prokaryotes Bioinformatics, 21, 41874189
[Abstract/Free Full Text] . - Karp, P.D., Ouzounis, C.A., Moore-Kochlacs, C., Goldovsky, L., Kaipa, P., Ahrén, D., Tsoka, S., Darzentas, N., Kunin, V., López-Bigas, N. (2005) Expansion of the BioCyc collection of pathway/genome databases to 160 genomes Nucleic Acids Res, . 33, 60836089
[Abstract/Free Full Text] . - Salgado, H., Gama-Castro, S., Peralta-Gil, M., Díaz-Peredo, E., Sánchez-Solano, F., Santos-Zavaleta, A., Martínez-Flores, I., Jiménez-Jacinto, V., Bonavides-Martínez, C., Segura-Salazar, J., et al. (2006) RegulonDB (version 5.0): Escherichia coli K-12 transcriptional regulatory network, operon organization and growth conditions Nucleic Acids Res, . 34, D394D397
[Abstract/Free Full Text] . - Münch, R., Hiller, K., Barg, H., Heldt, D., Linz, S., Wingender, E., Jahn, D. (2003) PRODORIC: prokaryotic database of gene regulation Nucleic Acids Res, . 31, 266269
[Abstract/Free Full Text] . - Schomburg, I., Chang, A., Ebeling, C., Gremse, M., Heldt, C., Huhn, G., Schomburg, D. (2004) BRENDA, the enzyme database: updates and major new developments Nucleic Acids Res, . 32, D431D433
[Abstract/Free Full Text] . - Kanehisa, M., Goto, S., Hattori, M., Aoki-Kinoshita, K.F., Itoh, M., Kawashima, S., Katayama, T., Araki, M., Hirakawa, M. (2006) From genomics to chemical genomics: new developments in KEGG Nucleic Acids Res, . 34, D354D357
[Abstract/Free Full Text] . - Romero, P. and Karp, P. (2003) PseudoCyc, a pathway-genome database for Pseudomonas aeruginosa J. Mol. Microbiol. Biotechnol, . 5, 230239[CrossRef][Web of Science][Medline] .
- Ellis, L.B.M., Roe, D., Wackett, L.P. (2006) The University of Minnesota Biocatalysis/Biodegradation Database: the first decade Nucleic Acids Res, . 34, D517D521
[Abstract/Free Full Text] . - Lyczak, J.B., Cannon, C.L., Pier, G.B. (2000) Establishment of Pseudomonas aeruginosa infection: lessons from a versatile opportunist Microbes Infect, . 2, 10511060[CrossRef][Web of Science][Medline] .
- Schreiber, K., Bös, N., Eschbach, M., Jänsch, L., Wehland, J., Bjarnsholt, T., Givskov, M., Hentzer, M., Schobert, M. (2006) Anaerobic survival of Pseudomonas aeruginosa by pyruvate fermentation requires an Usp-type stress protein J. Bacteriol, . 188, 659668
[Abstract/Free Full Text] . - Stein, L.D. (2003) Integrating biological databases Nature Rev. Genet, . 4, 337345[Web of Science][Medline] .
- Stein, L.D. (2002) Creating a bioinformatics nation Nature, 417, 119120[CrossRef][Medline] .
- Winsor, G.L., Lo, R., Sui, S.J.H., Ung, K.S.E., Huang, S., Cheng, D., Ching, W.-K.H., Hancock, R.E.W., Brinkman, F.S.L. (2005) Pseudomonas aeruginosa Genome Database and PseudoCAP: facilitating community-based, continually updated, genome annotation Nucleic Acids Res, . 33, D338D343
[Abstract/Free Full Text] . - Bairoch, A. (2000) The ENZYME database in 2000 Nucleic Acids Res, . 28, 304305
[Abstract/Free Full Text] . - McGinnis, S. and Madden, T.L. (2004) BLAST: at the core of a powerful and diverse set of sequence analysis tools Nucleic Acids Res, . 32, W20W25
[Abstract/Free Full Text] . - Rice, P., Longden, I., Bleasby, A. (2000) EMBOSS: the European Molecular Biology Open Software Suite Trends Genet, . 16, 276277[CrossRef][Web of Science][Medline] .
- Edgar, R.C. (2004) MUSCLE: multiple sequence alignment with high accuracy and high throughput Nucleic Acids Res, . 32, 17921797
[Abstract/Free Full Text] . - Clamp, M., Cuff, J., Searle, S.M., Barton, G.J. (2004) The Jalview Java alignment editor Bioinformatics, 20, 426427
[Abstract/Free Full Text] . - Gansner, E.R. and North, S.C. (2000) An open graph visualization system and its applications to software engineering Softw. Pract. Exper, . 30, 12031233[CrossRef] .
- Benson, D.A., Karsch-Mizrachi, I., Lipman, D.J., Ostell, J., Wheeler, D.L. (2006) GenBank Nucleic Acids Res, . 34, D16D20
[Abstract/Free Full Text] . - Okubo, K., Sugawara, H., Gojobori, T., Tateno, Y. (2006) DDBJ in preparation for overview of research activities behind data submissions Nucleic Acids Res, . 34, D6D9
[Abstract/Free Full Text] . - Cochrane, G., Aldebert, P., Althorpe, N., Andersson, M., Baker, W., Baldwin, A., Bates, K., Bhattacharyya, S., Browne, P., van den Broek, A., et al. (2006) EMBL Nucleotide Sequence Database: developments in 2005 Nucleic Acids Res, . 34, D10D15
[Abstract/Free Full Text] . - Shah, S.P., Huang, Y., Xu, T., Yuen, M.M.S., Ling, J., Ouellette, B.F.F. (2005) Atlasa data warehouse for integrative bioinformatics BMC Bioinformatics, 6, 34[CrossRef][Medline] .
- Finney, A. and Hucka, M. (2003) Systems biology markup language: level 2 and beyond Biochem. Soc. Trans, . 31, 14721473[Web of Science][Medline]
.
This article has been cited by other articles:
![]() |
A. Grote, J. Klein, I. Retter, I. Haddad, S. Behling, B. Bunk, I. Biegler, S. Yarmolinetz, D. Jahn, and R. Munch PRODORIC (release 2009): a database and tool platform for the analysis of gene regulation in prokaryotes Nucleic Acids Res., January 1, 2009; 37(suppl_1): D61 - D65. [Abstract] [Full Text] [PDF] |
||||
![]() |
D. S. Wishart, C. Knox, A. C. Guo, R. Eisner, N. Young, B. Gautam, D. D. Hau, N. Psychogios, E. Dong, S. Bouatra, et al. HMDB: a knowledgebase for the human metabolome Nucleic Acids Res., January 1, 2009; 37(suppl_1): D603 - D610. [Abstract] [Full Text] [PDF] |
||||
![]() |
G. L. Winsor, T. Van Rossum, R. Lo, B. Khaira, M. D. Whiteside, R. E. W. Hancock, and F. S. L. Brinkman Pseudomonas Genome Database: facilitating user-friendly, comprehensive comparisons of microbial genomes Nucleic Acids Res., January 1, 2009; 37(suppl_1): D483 - D488. [Abstract] [Full Text] [PDF] |
||||
![]() |
S. Moco, E. Capanoglu, Y. Tikunov, R. J. Bino, D. Boyacioglu, R. D. Hall, J. Vervoort, and R. C. H. De Vos Tissue specialization at the metabolite level is perceived during the development of tomato fruit J. Exp. Bot., December 7, 2007; (2007) erm271v1. [Abstract] [Full Text] [PDF] |
||||
| ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||




