Nucleic Acids Research, 2003, Vol. 31, No. 1 266-269
© 2003 Oxford University Press
PRODORIC: prokaryotic database of gene regulation
1 Institut für Mikrobiologie, Technische Universität Braunschweig, Spielmannstrasse 7, D-38106 Braunschweig, Germany 2 Gesellschaft für Biotechnologische Forschung mbH, Mascheroder Weg 1, D-38124 Braunschweig, Germany 3 BIOBASE GmbH, Halchtersche Strasse 33, D-38304 Wolfenbüttel, Germany
*To whom correspondence should be addressed. Tel: +49 5313915801; Fax: +49 5313915854; Email: d.jahn{at}tu-bs.de
Received August 14, 2002; Revised September 4, 2002. Accepted September 12, 2002
ABSTRACT
The database PRODORIC aims to systematically organize information on prokaryotic gene expression, and to integrate this information into regulatory networks. The present version focuses on pathogenic bacteria such as Pseudomonas aeruginosa. PRODORIC links data on environmental stimuli with trans-acting transcription factors, cis-acting promoter elements and regulon definition. Interactive graphical representations of operon, gene and promoter structures including regulator-binding sites, transcriptional and translational start sites, supplemented with information on regulatory proteins are available at varying levels of detail. The data collection provided is based on exhaustive analyses of scientific literature and computational sequence prediction. Included within PRODORIC are tools to define and predict regulator binding sites. It is accessible at http://prodoric.tu-bs.de.
INTRODUCTION
The last decade has witnessed the successful completion of numerous bacterial genome-sequencing projects accompanied by their detailed annotation. Specialized databases on such widely studied model-organisms as Escherichia coli (1,2) and Bacillus subtilis (3,4), amongst others, reflect the added understanding of gene structure, expression and regulation. One central future target of bioinformatics is the integration of these data into regulatory networks. As yet, such integrated data and interpretative software are not widely available. Especially for the future understanding of the fine-tuned interaction between a bacterial pathogen and its host, it is necessary to store the existing knowledge in a structural database and to develop tools for modeling.
We, hereby describe a universal, genome-based database that describes and depicts prokaryotic gene regulation in great detail with a special emphasis on pathogenic bacteria. For this purpose, DNA binding sites of transcriptional regulators have been correlated with information on interacting proteins, promoter structures, operon and regulon organization by screening the original literature. The information on pathogenic bacteria is mapped onto their complete genome. We, additionally also provide a set of tools to predict DNA binding sites and to graphically depict genomic data.
Users are asked to cite this article when results were obtained using the database or tools therein.
STRUCTURE OF THE DATABASE
PRODORIC uses a relational database model. A modified TRANSFAC database structure (5) was gradually adapted to bacterial requirements (6). Figure 1 schematically depicts the relational structure of the main tables. The genomic sequence of the organisms included represents the structural backbone of the database. It is stored in a separate, numeric table. All DNA sequences described are linked to a fixed position within these genomes. Other genome-based tables such as Gene, Transcriptional Unit, Promoter and Binding Site similarly refer to a fixed and numeric positions in the appropriate genome. DNA sequence information, including ORF annotation and gene nomenclature, conforms to the standard style of the NCBI-server (http://www.ncbi.nlm.nih.gov/PMGifs/Genomes/micr.html). The protein table, a major part of PRODORIC, contains functional and structural information on the proteome. The distinctive use of the terms polypeptide and (functional) protein permits the unambiguous discrimination of oligomeric protein complexes or heteromers from their constituent subunits. Proteins are classified according to the cluster of orthologous groups (COG) classification (7). External databases are accessible through a direct link to the SWISS-PROT database (8).
|
To distinguish proteinDNA, proteinprotein and other interactions, we employ a central linking table. Numerous aspects of regulatory networks including external stimuli, promoter structures, regulatory factors and related metabolic pathways may thus be freely combined and used to analyse DNA binding sites, regulatory proteins, signal molecules or relevant metabolites. The future incorporation of regulatory circuits from pathogen-host interactions will further improve this powerful tool.
At present, we have restricted our annotational work to the structural definition of regulons. Information on established transcription factorDNA binding site interactions was extended to include genes activated or repressed by these factors. Co-transcribed genes were designated a transcriptional unit. Experimental evidence was included through the Experiment table rated by confidence levels to reflect their reliability. In the description of protein-DNA-interactions, for example, data from DNAse-I-footprinting experiments were more highly rated than data from reporter gene fusion analyses. As indicated, regulatory DNA-motifs (sites) involved in transcription factor binding are linked to a fixed genomic location extending the annotated genome. To fully describe the promoter structure and transcriptional unit of each operon, this information was combined with features like transcriptional initiation sites and RNA polymerase binding sites and included in a separate promoter table. Where necessary, links to PubMed database are provided.
WORLD WIDE WEB INTERFACE AND ANALYSIS TOOLS
PRODORIC is available through a web interface (http://prodoric.tu-bs.de). Apart from the features outlined above, the website offers links to other molecular biology databases and bioinformatics tools. Forms and indexes facilitate user queries at our database. User requests are transformed into SQL queries by PHP scripts to generate dynamic web pages (9). The interface provides four forms reflecting the major biological questions: Genes/Operons, Proteins, Binding Sites and Matrices/Consensi. Each contains common search fields like name, organism, description etc. Combined searches, use of wildcards and degenerated searches are also possible. Search results are tabulated. Links to external databases are provided where possible. Alternatively, an alphabetical list of regulons is available.
We have developed a new genome browser tool to provide an accessible overview of entire bacterial genomes. A subset of the genome may be displayed either as a schematic map in its genomic context or as a formatted, colour-coded sequence. In addition to the overall operon structure, the schematic map highlights sequence motifs and promoter information. Borders between coding and non-coding regions as well as transcription factor binding sites and other DNA sequences are clearly visible. Individual sequences may be exported by copy and paste. Navigation through the application is possible through a context-sensitive control panel.
Regulator binding sites in promoter sequences may be predicted by scanning the sequence for putative binding motifs using a weight matrix representation. Alignment of binding sites allows the nucleotide distribution for every position to be determined (Fig. 2). The resulting distribution is used to locate additional DNA binding sites conforming to a user-defined threshold using our software tool Matrix Search based on published algorithms (10,11). Where possible, we have defined our own weight matrices for transcription factor binding sites. For several bacterial transcriptional regulators, only a single or a few individual DNA binding sites are known. Here a matrix representation is meaningless. The number of DNA sequences for matrix definition may be increased using orthologous binding sites of analogous regulators from other bacteria though slight differences in the binding consensi would introduce significant ambiguity. For example, an alignment of 15, 8 and 7 sequences, respectively from E. coli, Pseudomonas aeruginosa and B. subtilis indicate the binding site of the fumarate and nitrate reduction regulatory protein (Fnr, Fig. 2) to be clearly different in these organisms. The binding consensi in IUPAC 15-letter code are TTGMYNNNNRTCAR (E. coli), TTGATNNNNNWCAA (P. aeruginosa), and TGTGANNNNNNTCACA (B. subtilis). Our tool Consensus Search using simple IUPAC strings to define a binding motif consensus sequence is useful for those cases where only a few binding sequences are known. We provide a large library of weight matrices and binding consensi for bacterial transcription regulators. User-defined matrices and consensi may also be used.
|
DATA CONTENT
We currently provide annotated genomes of five organisms though the data for the pathogenic bacteria P. aeruginosa, Listeria monocytogenes and Helicobacter pylori are most advanced while those for E. coli and B. subtilis were essentially collected to verify the tools provided. The data are accessible through web forms and the genome browser. PRODORIC is designed to be easily extensible allowing the facile incorporation of any microbial genome. The total number of entries is summarized in Table 1 (release September 2002), though this is expected to double by the time of publishing.
|
FUTURE PROSPECTS
Extensive screening of relevant literature is currently in progress to complete an up-to-date annotation of the P. aeruginosa genome. Other pathogenic bacteria are soon to follow. Additional transcription factor sites predicted by weight matrix searches will further extend the data. We have begun to include data provided by high throughput genomics and proteomics analyses to complement the published experimental data. We will shortly implement additional software tools and interconnected data tables containing information on signal transduction cascades and metabolic networks. The final step will be the description of pathogen-host interactions. We, therefore, intend to link our database to databases of eukaryotic regulators including TRANSFAC (5) and TRANSPATH (12).
ACKNOWLEDGEMENTS
We would like to thank the our colleagues Dirk Budke, Christoph Grunwald, Jörgen Haneke, Claudia Hundertmark and Johannes Klein for support in data annotation, literature scanning and programming and Dr Wolf-Dieter Schubert for critical reading of the manuscript. This work was funded by the German Bundesministerium für Bildung und Forschung (BMBF) for the bioinformatics competence center Intergenomics (grant no. 031U110A/031U210A).
REFERENCES
- Salgado,H., Santos-Zavaleta,A., Gama-Castro,S., Millan-Zarate,D., Diaz-Peredo,E., Sanchez-Solano,F., Perez-Rueda,E., Bonavides-Martinez,C. and Collado-Vides,J. (2001) RegulonDB (version 3.2): transcriptional regulation and operon organization in Escherichia coli K-12. Nucleic Acids Res., 29, 7274.
[Abstract/Free Full Text] - Robison,K., McGuire,A.M. and Church,G.M. (1998) A comprehensive library of DNA-binding site matrices for 55 proteins applied to the complete Escherichia coli K-12 Genome. J. Mol. Biol., 284, 241254.[CrossRef][Web of Science][Medline]
- Moszer,I., Jones,L.M., Moreira,S., Fabry,C. and Danchin,A. (2002) Subtilist: the reference database for the Bacillus subtilis genome. Nucleic Acids Res., 30, 6265.
[Abstract/Free Full Text] - Ishii,T., Yoshida,K., Terai,G., Fujita,Y. and Nakai,K. (2001) DBTBS: a database of Bacillus subtilis promoters and transcription factors. Nucleic Acids Res., 29, 278280.
[Abstract/Free Full Text] - Wingender,E., Chen,X., Fricke,E., Geffers,R., Hehl,R., Liebich,I., Krull,M., Matys,V., Michael,H., Ohnhäuser,R., Prüß,M., Schacherer,F., Thiele,S. and Urbach,S. (2001) The TRANSFAC system on gene expression regulation. Nucleic Acids Res., 29, 281283.
[Abstract/Free Full Text] - Falb,M. (2001) Etablierung einer Datenbank für prokaryontische Transcriptionsfaktoren (TRANSFACmicro). Diploma thesis, Technical University Braunschweig.
- Tatusov,R.L., Natale,D.A., Garkavtsev,I.V., Tatusova,T.A., Shankavaram,U.T., Rao,B.S., Kiryutin,B., Galperin,M.Y., Fedorova,N.D. and Koonin,E.V. (2001) The COG database: new developments in phylogenetic classification of proteins from complete genomes. Nucleic Acids Res., 29, 2228.
[Abstract/Free Full Text] - Bairoch,A. and Apweiler,R. (2000) The SWISS-PROT protein sequence database and its supplement TrEMBL in 2000. Nucleic Acids Res., 28, 4548.
[Abstract/Free Full Text] - Stoll,R.D. and Leierer,G.A. (2001) PHP 4+MySQL. Data Becker, Düsseldorf.
- Schneider,T.D., Stormo,G.D., Gold,L. and Ehrenfeucht,A. (1986) Information content of binding sites on nucleotide sequences. J. Mol. Biol., 188, 415431.[CrossRef][Web of Science][Medline]
- Quandt,K., Frech,K., Karas,H., Wingender,E. and Werner,T. (1995) MatInd and MatInspector: new fast and versatile tools for detection of consensus matches in nucleotide sequence data. Nucleic Acids Res., 23, 48784884.
[Abstract/Free Full Text] - Schacherer,F., Choi,C., Goetze,U., Krull,M., Pistor,S. and Wingender,E. (2001) The TRANSPATH signal transduction database: a knowledge base on signal transduction networks. Bioinformatics, 17, 10531057.
[Abstract/Free Full Text]
This article has been cited by other articles:
![]() |
G. Li, B. Liu, and Y. Xu Accurate recognition of cis-regulatory motifs with the correct lengths in prokaryotic genomes Nucleic Acids Res., November 11, 2009; (2009) gkp907v1. [Abstract] [Full Text] [PDF] |
||||
![]() |
O. B. Kim, J. Reimann, H. Lukas, U. Schumacher, J. Grimpo, P. Dunnwald, and G. Unden Regulation of tartrate metabolism by TtdR and relation to the DcuS-DcuR-regulated C4-dicarboxylate metabolism of Escherichia coli Microbiology, November 1, 2009; 155(11): 3632 - 3640. [Abstract] [Full Text] [PDF] |
||||
![]() |
H. K. Janagama, T. M. A. Senthilkumar, J. P. Bannantine, G. M. Rodriguez, I. Smith, M. L. Paustian, J. A. McGarvey, and S. Sreevatsan Identification and functional characterization of the iron-dependent regulator (IdeR) of Mycobacterium avium subsp. paratuberculosis Microbiology, November 1, 2009; 155(11): 3683 - 3690. [Abstract] [Full Text] [PDF] |
||||
![]() |
F. Hillmann, C. Doring, O. Riebe, A. Ehrenreich, R.-J. Fischer, and H. Bahl The Role of PerR in O2-Affected Gene Expression of Clostridium acetobutylicum J. Bacteriol., October 1, 2009; 191(19): 6082 - 6093. [Abstract] [Full Text] [PDF] |
||||
![]() |
J. N. Murphy, K. J. Durbin, and C. W. Saltikov Functional Roles of arcA, etrA, Cyclic AMP (cAMP)-cAMP Receptor Protein, and cya in the Arsenate Respiration Pathway in Shewanella sp. Strain ANA-3 J. Bacteriol., February 1, 2009; 191(3): 1035 - 1043. [Abstract] [Full Text] [PDF] |
||||
![]() |
J. Baumbach, A. Tauch, and S. Rahmann Towards the integrated analysis, visualization and reconstruction of microbial gene regulatory networks Brief Bioinform, January 1, 2009; 10(1): 75 - 83. [Abstract] [Full Text] [PDF] |
||||
![]() |
A. Grote, J. Klein, I. Retter, I. Haddad, S. Behling, B. Bunk, I. Biegler, S. Yarmolinetz, D. Jahn, and R. Munch PRODORIC (release 2009): a database and tool platform for the analysis of gene regulation in prokaryotes Nucleic Acids Res., January 1, 2009; 37(suppl_1): D61 - D65. [Abstract] [Full Text] [PDF] |
||||
![]() |
E. Wingender The TRANSFAC project as an example of framework technology that supports the analysis of genomic regulation Brief Bioinform, July 1, 2008; 9(4): 326 - 332. [Abstract] [Full Text] [PDF] |
||||
![]() |
J. Klein, S. Leupold, R. Munch, C. Pommerenke, T. Johl, U. Karst, L. Jansch, D. Jahn, and I. Retter ProdoNet: identification and visualization of prokaryotic gene regulatory and metabolic networks Nucleic Acids Res., July 1, 2008; 36(suppl_2): W460 - W464. [Abstract] [Full Text] [PDF] |
||||
![]() |
Y. Cai, B. Hartnett, C. Gustafsson, and J. Peccoud A syntactic model to design and verify synthetic genetic constructs derived from standard biological parts Bioinformatics, October 15, 2007; 23(20): 2760 - 2767. [Abstract] [Full Text] [PDF] |
||||
![]() |
C. Choi, R. Munch, S. Leupold, J. Klein, I. Siegel, B. Thielen, B. Benkert, M. Kucklick, M. Schobert, J. Barthelmes, et al. SYSTOMONAS -- an integrated database for systems biology analysis of Pseudomonas Nucleic Acids Res., January 12, 2007; 35(suppl_1): D533 - D537. [Abstract] [Full Text] [PDF] |
||||
![]() |
A. E. Kazakov, M. J. Cipriano, P. S. Novichkov, S. Minovitsky, D. V. Vinogradov, A. Arkin, A. A. Mironov, M. S. Gelfand, and I. Dubchak RegTransBase--a database of regulatory sequences and interactions in a wide range of prokaryotic genomes Nucleic Acids Res., January 12, 2007; 35(suppl_1): D407 - D412. [Abstract] [Full Text] [PDF] |
||||
![]() |
M. Pachkov, I. Erb, N. Molina, and E. van Nimwegen SwissRegulon: a database of genome-wide annotations of regulatory sites Nucleic Acids Res., January 12, 2007; 35(suppl_1): D127 - D131. [Abstract] [Full Text] [PDF] |
||||
![]() |
V. Jensen, D. Lons, C. Zaoui, F. Bredenbruch, A. Meissner, G. Dieterich, R. Munch, and S. Haussler RhlR Expression in Pseudomonas aeruginosa Is Modulated by the Pseudomonas Quinolone Signal via PhoB-Dependent and -Independent Pathways J. Bacteriol., December 15, 2006; 188(24): 8601 - 8606. [Abstract] [Full Text] [PDF] |
||||
![]() |
T. Wecke, B. Veith, A. Ehrenreich, and T. Mascher Cell Envelope Stress Response in Bacillus licheniformis: Integrating Comparative Genomics, Transcriptional Profiling, and Regulon Mining To Decipher a Complex Regulatory Network J. Bacteriol., November 1, 2006; 188(21): 7500 - 7511. [Abstract] [Full Text] [PDF] |
||||
![]() |
E. Hartig, A. Hartmann, M. Schatzle, A. M. Albertini, and D. Jahn The Bacillus subtilis nrdEF Genes, Encoding a Class Ib Ribonucleotide Reductase, Are Essential for Aerobic and Anaerobic Growth Appl. Envir. Microbiol., August 1, 2006; 72(8): 5260 - 5265. [Abstract] [Full Text] [PDF] |
||||
![]() |
S. Jordan, A. Junker, J. D. Helmann, and T. Mascher Regulation of LiaRS-Dependent Gene Expression in Bacillus subtilis: Identification of Inhibitor Proteins, Regulator Binding Sites, and Target Genes of a Conserved Cell Envelope Stress-Sensing Two-Component System. J. Bacteriol., July 1, 2006; 188(14): 5153 - 5166. [Abstract] [Full Text] [PDF] |
||||
![]() |
S. Ranjan, J. Seshadri, V. Vindal, S. Yellaboina, and A. Ranjan iCR: a web tool to identify conserved targets of a regulatory protein across the multiple related prokaryotic species. Nucleic Acids Res., July 1, 2006; 34(Web Server issue): W584 - W587. [Abstract] [Full Text] [PDF] |
||||
![]() |
A. F. Alice, C. S. Lopez, C. A. Lowe, M. A. Ledesma, and J. H. Crosa Genetic and Transcriptional Analysis of the Siderophore Malleobactin Biosynthesis and Transport Genes in the Human Pathogen Burkholderia pseudomallei K96243 J. Bacteriol., February 15, 2006; 188(4): 1551 - 1566. [Abstract] [Full Text] [PDF] |
||||
![]() |
H. Reents, R. Munch, T. Dammeyer, D. Jahn, and E. Hartig The Fnr Regulon of Bacillus subtilis J. Bacteriol., February 1, 2006; 188(3): 1103 - 1112. [Abstract] [Full Text] [PDF] |
||||
![]() |
K. Schreiber, N. Boes, M. Eschbach, L. Jaensch, J. Wehland, T. Bjarnsholt, M. Givskov, M. Hentzer, and M. Schobert Anaerobic Survival of Pseudomonas aeruginosa by Pyruvate Fermentation Requires an Usp-Type Stress Protein J. Bacteriol., January 15, 2006; 188(2): 659 - 668. [Abstract] [Full Text] [PDF] |
||||
![]() |
R. Munch, K. Hiller, A. Grote, M. Scheer, J. Klein, M. Schobert, and D. Jahn Virtual Footprint and PRODORIC: an integrative framework for regulon prediction in prokaryotes Bioinformatics, November 15, 2005; 21(22): 4187 - 4189. [Abstract] [Full Text] [PDF] |
||||
![]() |
A. Grote, K. Hiller, M. Scheer, R. Munch, B. Nortemann, D. C. Hempel, and D. Jahn JCat: a novel tool to adapt codon usage of a target gene to its potential expression host Nucleic Acids Res., July 1, 2005; 33(suppl_2): W526 - W531. [Abstract] [Full Text] [PDF] |
||||
![]() |
G. L. Winsor, R. Lo, S. J. H. Sui, K. S.E. Ung, S. Huang, D. Cheng, W.-K. H. Ching, R. E. W. Hancock, and F. S. L. Brinkman Pseudomonas aeruginosa Genome Database and PseudoCAP: facilitating community-based, continually updated, genome annotation Nucleic Acids Res., January 1, 2005; 33(suppl_1): D338 - D343. [Abstract] [Full Text] [PDF] |
||||
![]() |
E. Hartig, H. Geng, A. Hartmann, A. Hubacek, R. Munch, R. W. Ye, D. Jahn, and M. M. Nakano Bacillus subtilis ResD Induces Expression of the Potential Regulatory Genes yclJK upon Oxygen Limitation J. Bacteriol., October 1, 2004; 186(19): 6477 - 6484. [Abstract] [Full Text] [PDF] |
||||
![]() |
M. Eschbach, K. Schreiber, K. Trunk, J. Buer, D. Jahn, and M. Schobert Long-Term Anaerobic Survival of the Opportunistic Pathogen Pseudomonas aeruginosa via Pyruvate Fermentation J. Bacteriol., July 15, 2004; 186(14): 4596 - 4604. [Abstract] [Full Text] [PDF] |
||||
![]() |
S. Yellaboina, J. Seshadri, M. S. Kumar, and A. Ranjan PredictRegulon: a web server for the prediction of the regulatory protein binding sites and operons in prokaryote genomes Nucleic Acids Res., July 1, 2004; 32(suppl_2): W318 - W320. [Abstract] [Full Text] [PDF] |
||||
| ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||







