PredictRegulon: a web server for the prediction of the regulatory protein binding sites and operons in prokaryote genomes
1 Computational & Functional Genomics Group and 2 Molecular Oncology Laboratory, Centre for DNA Fingerprinting and Diagnostics, EMBnet India Node, Hyderabad 500076, India
* To whom correspondence should be addressed. Tel: +9140 27171454; Fax: +9140 27155610; Email: akash{at}cdfd.org.in
Received February 15, 2004; Revised and Accepted March 4, 2004
| ABSTRACT |
|---|
|
|
|---|
An interactive web server is developed for predicting the potential binding sites and its target operons for a given regulatory protein in prokaryotic genomes. The program allows users to submit known or experimentally determined binding sites of a regulatory protein as ungapped multiple sequence alignments. It analyses the upstream regions of all genes in a user-selected prokaryote genome and returns the potential binding sites along with the downstream co-regulated genes (operons). The known binding sites of a regulatory protein can also be used to identify its orthologue binding sites in phylogeneticaly related genomes where the trans-acting regulator protein and cognate cis-acting DNA sequences could be conserved. PredictRegulon can be freely accessed from a link on our world wide web server: http://www.cdfd.org.in/predictregulon/.
| INTRODUCTION |
|---|
|
|
|---|
With over 100 bacterial genomes sequenced, a key challenge of post-genomic research is to dissect the complex transcription regulatory network which controls the metabolic and physiological process of a cell. A first step towards this goal is to identify the genes within a genome that are controlled by a specific transcription regulatory protein. This paper describes a web server toolPredictRegulonfor genome-wide prediction of potential binding sites and target operons of a regulatory protein for which few experimentally identified binding sites are known. This technique could utilize the available experimental data on binding sites of transcription regulatory proteins from various bacterial species (13) for identification of regulons in phylogenetically related species.
| PREDICTREGULON METHOD |
|---|
|
|
|---|
The program, PredictRegulon, first constructs the binding site recognition profile based on ungapped multiple sequence alignment of known binding sites. This profile is calculated using Shannon's positional relative entropy approach (4). The positional relative entropy Qi at position i in a binding site is defined as
![]() |
![]() |
The profile, encoded as the matrix, is used to scan the upstream sequences of all the genes of the user-selected genome. The entropy score of each site is calculated as the sum of the respective positional nucleotide entropy (Wb,i). A maximally scoring site is selected from the upstream sequence of each gene. The score may represent the strength of interaction between regulatory protein and binding site (5). The lowest score among the input sites is considered as the cut-off score. The sites scoring higher than the the cut-off value are reported as potential binding sites conforming to the consensus profile.
Co-directionally transcribed genes downstream of the predicted binding site were selected as potential co-regulated genes (operons) according to one of the following criteria: (i) co-directionally transcribed orthologous gene pairs conserved in at least three genomes (6); (ii) genes belong to the same cluster of orthologous gene function category and the intergenic distance is <200 bp (7); (iii) the first three letters in gene names are identical (the gene names for all the bacterial species were assigned using the COG annotation); (iv) intergenic distance is <90 bp (8).
This method has two specific requirements: a few experimentally determined regulatory protein binding sites should be available for developing the binding site recognition profile, and the profile should be applicable to the genome where the regulator or its homologue is present. In the absence of any experimental information on the regulatory sites in a given genome one may look up the known regulatory motifs from other related species from one of the four online databases which host the information about known transcription regulatory protein binding sites in prokaryote genomes (13).
A limitation of this approach is that it may predict a few false positive sites as candidates. However, this limitation can be overcome by experimental validations, by either in vitro binding studies with double strand oligonucleotides containing the binding sites (designed based on prediction) and regulatory proteins or real-time PCR analysis of candidate co-regulated genes.
| EXAMPLE: PREDICTION OF LEXA REGULON IN MYCOBACTERIUM TUBERCULOSIS |
|---|
|
|
|---|
To demonstrate a typical usage of PredictRegulon, we predicted the LexA binding sites and LexA regulon of M.tuberculosis using the LexA binding sites of Bacillus subtilis. LexA regulators from B.subtilis and M.tuberculosis share a high sequence identity (45%) at protein level (data not shown). Table 1 lists the known LexA binding sites from B. subtilis given as input to the program (2) and Table 2 shows the output of predicted LexA binding sites in M.tuberculosis. The site column in Table 2 represents the predicted binding sites of LexA in M.tuberculosis. In a typical output the perfect match to the known binding sites and the downstream genes are highlighted with a yellow background, and the rest with score greater than cut-off is shown with a blue background (colours not shown in the table). Eighteen of these genes (indicated by a) belonging to the LexA regulon were also observed in data obtained by experimental means by others (912). The rest of the matches are potential novel regulatory sites which could be confirmed experimentaly.
|
|
The web output of PredictRegulon also contains the hyperlinked gene-synonym and COG number. A click on the former shows the predicted operon context of the regulatory motif while a click on the latter opens a new page showing a description of this gene in the NCBI Conserved Domain Database, which is in turn linked to Pubmed for published information on this gene. These additional links provides users a simple way to browse and understand the functional/physiological implication of the genes that are part of predicted regulon.
| Notes |
|---|
The online version of this article has been published under an open access model. Users are entitled to use, reproduce, disseminate, or display the open access version of this article provided that: the original authorship is properly and fully attributed; the Journal and Oxford University Press are attributed as the original place of publication with the correct citation details given; if an article is subsequently reproduced or disseminated not in its entirety but only in part or as a derivative work this must be clearly indicated.
| REFERENCES |
|---|
|
|
|---|
- Salgado,H., Santos-Zavaleta,A., Gama-Castro,S., Millan-Zarate,D., Diaz-Peredo,E., Sanchez-Solano,F., Perez-Rueda,E., Bonavides-Martinez,C. and Collado-Vides,J. ( (2001) ) RegulonDB (Version 3.2): transcriptional regulation and operon organization in Escherichia coli K-12. Nucleic Acids Res., , 29, , 7274.
[Abstract/Free Full Text] - Munch,R., Hiller,K., Barg,H., Heldt,D., Linz,S., Wingender,E. and Jahn,D. ( (2003) ) PRODORIC: prokaryotic database of gene regulation. Nucleic Acids Res., , 31, , 266269.
[Abstract/Free Full Text] - Ishii,T., Yoshida,K., Terai,G., Fujita,Y. and Nakai,K. ( (2001) ) DBTBS: a database of Bacillus subtilis promoters and transcription factors. Nucleic Acids Res., , 29, , 278280.
[Abstract/Free Full Text] - Shannon,C.E. ( (1948) ) A mathematical theory of communication. Bell Sys. Tech. J., , 379423 and 623656.
- Benos,P.V., Bulyk,M.L. and Stormo,G.D. ( (2002) ) Additivity in protein-DNA interactions: how good an approximation is it? Nucleic Acids Res., , 30, , 44424451.
[Abstract/Free Full Text] - Ermolaeva,M.D., White,O. and Salzberg,S.L. ( (2001) ) Prediction of operons in microbial genomes. Nucleic Acids Res., , 295, , 12161221.
- Salgado,H., Moreno-Hagelsieb,G., Smith,T.F. and Collado-Vides,J. ( (2000) ) Operons in Escherichia coli: genomic analyses and predictions Proc. Natl Acad. Sci., USA, , 97, , 66526657.
[Abstract/Free Full Text] - Strong,M., Mallick P., Pellegrini,M., Thompson,M.J. and Eisenberg,D. ( (2003) ) Inference of protein function and protein linkages in Mycobacterium tuberculosis based on prokaryotic genome organization: a combined computational approach. Genome Biol., , 4, , R59.[CrossRef][Medline]
- Durbach,S.I., Andersen,S.J. and Mizrahi,V. ( (1997) ) SOS induction in mycobacteria: analysis of the DNA-binding activity of a LexA-like repressor and its role in DNA damage induction of the recA gene from Mycobacterium smegmatis. Mol. Microbiol., , 26, , 643653.[CrossRef][Web of Science][Medline]
- Brooks,P.C., Movahedzadeh,F. and Davis,E.O. ( (2001) ) Identification of some DNA damage-inducible genes of Mycobacterium tuberculosis: apparent lack of correlation with LexA binding. J. Bacteriol., , 183, , 44594467.
[Abstract/Free Full Text] - Dullaghan,E.M., Brooks,P.C. and Davis,E.O. ( (2002) ) The role of multiple SOS boxes upstream of the Mycobacterium tuberculosis lexA geneidentification of a novel DNA-damage-inducible gene. Microbiology, , 148, , 36093615.
[Abstract/Free Full Text] - Boshoff,H.I., Reed,M.B., Barry,C.E. and Mizrahi,V. ( (2003) ) DNAE2 polymerase contributes to in vivo survival and the emergence of drug resistance in Mycobacterium tuberculosis. Cell, , 113, , 183193.[CrossRef][Web of Science][Medline]
This article has been cited by other articles:
![]() |
L. van Oeffelen, P. Cornelis, W. Van Delm, F. De Ridder, B. De Moor, and Y. Moreau Detecting cis-regulatory binding sites for cooperatively binding proteins Nucleic Acids Res., May 1, 2008; 36(8): e46 - e46. [Abstract] [Full Text] [PDF] |
||||
![]() |
F. Veyrier, B. Said-Salim, and M. A. Behr Evolution of the Mycobacterial SigK Regulon J. Bacteriol., March 15, 2008; 190(6): 1891 - 1899. [Abstract] [Full Text] [PDF] |
||||
![]() |
M. Abella, S. Rodriguez, S. Paytubi, S. Campoy, M. F. White, and J. Barbe The Sulfolobus solfataricus radA paralogue sso0777 is DNA damage inducible and positively regulated by the Sta1 protein Nucleic Acids Res., November 29, 2007; 35(20): 6788 - 6797. [Abstract] [Full Text] [PDF] |
||||
![]() |
S. P. Kidd, D. Jiang, M. P. Jennings, and A. G. McEwan Glutathione-Dependent Alcohol Dehydrogenase AdhC Is Required for Defense against Nitrosative Stress in Haemophilus influenzae Infect. Immun., September 1, 2007; 75(9): 4506 - 4513. [Abstract] [Full Text] [PDF] |
||||
![]() |
I. Brune, N. Jochmann, K. Brinkrolf, A. T. Huser, R. Gerstmeir, B. J. Eikmanns, J. Kalinowski, A. Puhler, and A. Tauch The IclR-Type Transcriptional Repressor LtbR Regulates the Expression of Leucine and Tryptophan Biosynthesis Genes in the Amino Acid Producer Corynebacterium glutamicum J. Bacteriol., April 1, 2007; 189(7): 2720 - 2733. [Abstract] [Full Text] [PDF] |
||||
![]() |
A. E. Kazakov, M. J. Cipriano, P. S. Novichkov, S. Minovitsky, D. V. Vinogradov, A. Arkin, A. A. Mironov, M. S. Gelfand, and I. Dubchak RegTransBase--a database of regulatory sequences and interactions in a wide range of prokaryotic genomes Nucleic Acids Res., January 12, 2007; 35(suppl_1): D407 - D412. [Abstract] [Full Text] [PDF] |
||||
![]() |
G. Girard, S. Barends, S. Rigali, E. T. van Rij, B. J. J. Lugtenberg, and G. V. Bloemberg Pip, a Novel Activator of Phenazine Biosynthesis in Pseudomonas chlororaphis PCL1391 J. Bacteriol., December 1, 2006; 188(23): 8283 - 8293. [Abstract] [Full Text] [PDF] |
||||
![]() |
U. M. Samen, B. J. Eikmanns, and D. J. Reinscheid The Transcriptional Regulator RovS Controls the Attachment of Streptococcus agalactiae to Human Epithelial Cells and the Expression of Virulence Genes. Infect. Immun., October 1, 2006; 74(10): 5625 - 5635. [Abstract] [Full Text] [PDF] |
||||
![]() |
E. Rolerson, A. Swick, L. Newlon, C. Palmer, Y. Pan, B. Keeshan, and G. Spatafora The SloR/Dlg Metalloregulator Modulates Streptococcus mutans Virulence Gene Expression. J. Bacteriol., July 1, 2006; 188(14): 5033 - 5044. [Abstract] [Full Text] [PDF] |
||||
![]() |
S. Ranjan, J. Seshadri, V. Vindal, S. Yellaboina, and A. Ranjan iCR: a web tool to identify conserved targets of a regulatory protein across the multiple related prokaryotic species. Nucleic Acids Res., July 1, 2006; 34(Web Server issue): W584 - W587. [Abstract] [Full Text] [PDF] |
||||
![]() |
P. Prakash, S. Yellaboina, A. Ranjan, and S. E. Hasnain Computational prediction and experimental verification of novel IdeR binding sites in the upstream sequences of Mycobacterium tuberculosis open reading frames Bioinformatics, May 15, 2005; 21(10): 2161 - 2166. [Abstract] [Full Text] [PDF] |
||||
![]() |
I. Erill, M. Jara, N. Salvador, M. Escribano, S. Campoy, and J. Barbe Differences in LexA regulon structure among Proteobacteria through in vivo assisted comparative genomics Nucleic Acids Res., December 16, 2004; 32(22): 6617 - 6626. [Abstract] [Full Text] [PDF] |
||||
| ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||





