| Nucleic Acids Research | Pages |
Computer analysis of transcription regulatory patterns in completely sequenced bacterial genomes
Introduction
Materials And Methods
Results
Identification of candidate regulator-binding sites
Evolution of regulons
Discussion
Acknowledgements
References
Computer analysis of transcription regulatory patterns in completely sequenced bacterial genomes
Received November 18, 1998; Revised March 12, 1999; Accepted May 26, 1999
ABSTRACT Recognition of transcription regulation sites (operators) is a hard problem in computational molecular biology. In most cases, small sample size and low degree of sequence conservation preclude the construction of reliable recognition rules. We suggest an approach to this problem based on simultaneous analysis of several related genomes. It appears that as long as a gene coding for a transcription regulator is conserved in the compared bacterial genomes, the regulation of the respective group of genes (regulons) also tends to be maintained. Thus a gene can be confidently predicted to belong to a particular regulon in case not only itself, but also its orthologs in other genomes have candidate operators in the regulatory regions. This provides for a greater sensitivity of operator identification as even relatively weak signals are likely to be functionally relevant when conserved. We use this approach to analyze the purine (PurR), arginine (ArgR) and aromatic amino acid (TrpR and TyrR) regulons of Escherichia coli and Haemophilus influenzae. Candidate binding sites in regulatory regions of the respective H.influenzae genes are identified, a new family of purine transport proteins predicted to belong to the PurR regulon is described, and probable regulation of arginine transport by ArgR is demonstrated. Differences in the regulation of some orthologous genes in E.coli and H.influenzae, in particular the apparent lack of the autoregulation of the purine repressor gene in H.influenzae, are demonstrated.
INTRODUCTION
With the sequencing of multiple complete bacterial and archaeal genomes, computational biology entered a new era. The availability of the sequences of all genes in several prokaryotic species created the opportunity of perceiving the relationships between prokaryotic genomes in a comprehensive and precise fashion, which was unattainable previously. Initially, the main efforts have been directed at large-scale comparison of proteomes with the aim of reconstructing the metabolism and other cellular functions in poorly characterized organisms and clarifying distant evolutionary relationships, particularly those between the three primary divisions of life-bacteria, archaea and eukaryotes (1-4). One unexpected result that has become immediately obvious was the lack of long-range conservation of the gene order in bacterial genomes, with the exception of species within the same genus (5-7). In fact, in distantly related bacteria, such as, for example, Proteobacteria and Cyanobacteria, there are only a few conserved operons that encode primarily, if not exclusively, genes whose products physically interact (8). At intermediate phylogenetic distances, however, for example in Escherichia coli and Haemophilus influenzae, a large number of operons are conserved, although their order is not (8,9).
An important further step in the functional annotation of genomes is the identification of regulatory signals, particularly binding sites for transcription factors. Although the problem of prediction of regulatory sites had been addressed for over 15 years (reviewed in 10), it is still far from being solved (11). One reason for this is that the learning sample rarely contains more than 20-30 sites. However, even for large samples, it proved to be extremely difficult to construct a good recognition rule. The physics of protein-DNA interaction is poorly understood, making it virtually impossible to derive a proper set of features for statistical or pattern recognition algorithms. Furthermore, the latter type of algorithms cannot take into account context effects, in particular, interactions between different regulatory sites, and structural properties of DNA. Nevertheless, in many cases, simple profile methods perform reasonably well, in the sense that they can correctly identify true sites if the number of alternatives is not too large (for benchmarking of several most popular algorithms; 12).
Good results in computer-assisted functional annotation of nucleotide sequences frequently have been obtained by combination of statistical analysis of DNA and comparative analysis of the protein sequences encoded by the respective genes. To a varying extent, this approach is used in the analysis of all genomic sequences. In more systematic efforts, it was employed in the construction of reliable gene recognition algorithms (13-15) and in the prediction of the specificity of new restriction-modification systems (16). Here we apply this methodology to the analysis of bacterial transcription regulation in the context of a comparison of complete genomes.
The approach is based on the assumption that groups of genes subject to a specific mode of regulation (regulons) are at least partially conserved in evolution. This assumption generally seems to hold provided that the cognate regulatory factor is present in all compared genomes. Preliminary analyses have shown that in these cases, the regulatory signal also is conserved, and accordingly, a recognition rule derived for the most thoroughly studied genome can also be applied to other genomes (17). Under this approach, the assignment of a gene to a particular regulon is reinforced if not only this gene itself but also its orthologs in other genomes have candidate regulatory sites in the appropriate regions.
We applied this comparative approach to the analysis of purine, arginine, and aromatic amino acid regulons in E.coli and H.influenzae. Among the completely sequenced genomes, this is a natural choice for the first attempt of such a study since, first, E.coli gene regulation is by far the best understood among all bacteria, and second, H.influenzae is the only complete bacterial genome that is close enough to E.coli so that many operons are conserved but distant enough for significant differences to be apparent. Recognition rules derived from samples of known E.coli regulatory sites were used to predict sites in the H.influenzae genome and to detect likely new members of the three regulons in both species. We describe the general conservation of the three regulons in E.coli and H.influenzae along with differences in the regulation of some of the orthologous genes.
MATERIALS AND METHODS
Complete genome sequences of E.coli (18) and H.influenzae (19) as well as partial sequences from other Proteobacteria were extracted from GenBank.
Three regulons were analyzed; the purine regulon (set of genes regulated by PurR) (20) and the arginine regulon (regulated by ArgR) (21) were considered separately, whereas the genes controlled by TrpR and TyrR were considered to comprise one aromatic amino acid regulon since some of them are subject to regulation by both factors (22). Known E.coli transcription factor binding sites were collected from the literature (20-23). Each site was considered in the orientation that corresponds to the coding strand of the regulated operon. Positional nucleotide weight matrices (profiles) were derived using the following formula for positional nucleotide weights:
| W(b,k) = log[N(b,k) + 0.5] - 0.25 [Sigma]i = A,C,G,T log[N(i,k) + 0.5] |
where N(b,k) is the count of nucleotide b in position k. The site score is the sum of the respective positional nucleotide weights. The base of the logarithm was chosen such that the standard deviation of the site score distribution on random oligomers equals 1. The site score defined by this formula is linearly related to the discrimination energy used in a number of other papers.
Candidate sites (PUR, ARG, TRP and TYR boxes) were identified in upstream regions of annotated E.coli and H.influenzae genes, including predicted ones. Thresholds and region boundaries in each case were selected so that none of the known sites were missed. Sets of potentially co-regulated genes were constructed from genes that have candidate regulatory sites in their upstream regions and genes that are located downstream of them if they are transcribed in the same direction and the intergenic distances do not exceed certain threshold (normally 100 nucleotides).
Orthologous genes in E.coli and H.influenzae were identified by comparing the complete sets of protein sequences from the two species using the gapped BLASTP program or the Smith-Waterman algorithm as implemented in the GENOME program (A.A.Mironov, unpublished), selecting pairs of proteins with the greatest similarity to each other and checking for the conservation of domain architecture (6,24). The upstream regions of genes that are orthologous to genes containing regulatory sites were examined for candidate sites, even if these were not detected automatically. Site recognition was performed using the DNA-SUN (25) and GENOME programs (A.A.Mironov, unpublished). The non-redundant protein and nucleotide databases at the NCBI were searched using the gapped BLAST programs (26). Multiple sequence alignments were constructed using the CLUSTALX program (27). Phylogenetic trees were constructed using the PHYLIP package programs NEIGHBOR (the neighbor-joining method) and PROTPARS (maximum parsimony method) (28). Sequence logos were constructed using the MAKELOGO program (29) as implemented on the WorldWide Web by Stephen E. Brenner (http://www.bio.cam.ac.uk/cgi-bin/seqlogo/logo.cgi ).
RESULTS
Identification of candidate regulator-binding sites
Sequence logos for the PUR, ARG, TRP and TYR boxes are shown in Figure 1. The boxes vary strongly in terms of information content, with the PUR and TRP boxes being stronger, and the ARG and TYR boxes being weaker. The latter sites are often present in a regulatory zone of a gene in several copies that are recognized co-operatively. The recognition weight matrices are shown in Table 1.
a
![]() |
b
![]() |
c
![]() |
d
![]() |
Figure 1. Sequence logos for the PUR, ARG, TRP and TYR boxes. Horizontal axis, position in the binding site; vertical axis, information in bits. The height of each stack of letters is proportional to the positional information content in the given position; the height of each individual letter reflects its prevalence in the given position. The logos were constructed from the aligned sequences of the known E.coli regulatory sites (Table 2).
Table 1. Positional nucleotide weight matrices (profiles) for PUR, ARG, TRP and TYR boxesa
aEach column shows the weights of the given nucleotide in the consecutive positions of the respective binding site.
bCns, Consensus derived for each position by majority rule.
The distributions of candidate site scores for the four boxes are shown in Figure 2. Scores of the sites from the learning sample and their positions relative to the gene starts are given in Table 2. Comparison of this table with Figure 2 shows that, even for strong signals with a relatively large learning sample (the PUR box), the use of a statistical recognition rule is not sufficient to reliably predict operators.
Figure 2. Histograms and distribution functions of candidate site scores for PUR, ARG, TRP and TYR boxes. Horizontal axis, score; vertical axis, percentage of genes whose candidate binding sites (highest scoring sites in upstream non-coding regions) for the given regulatory factor have a score greater than the respective value. Solid curves, E.coli; broken curves, H.influenzae.
![]() |
![]() |
Table 2. Scores and positions relative to the gene start of sites from the learning samples
We attempted to take into account co-operative binding of ArgR to tandemly repeated ARG boxes. A procedure that searched for pairs of ARG boxes performed quite well in the sense that it clearly separated all sites from the learning sample from all other sequences (data are not shown). However, since ArgR can bind to single ARG boxes, albeit with a low specificity (30), we used the single box recognizer for further analysis.
Evolution of regulons
The purine regulon. Haemophilus influenzae retains the regulation of the PurR regulon genes directly responsible for purine biosynthesis, and the structure of the operons purEK, cvpApurF, purC, purMN, purL is the same in E.coli and H.influenzae (Table 3). Other genes of the core regulon also retain the regulation, although with some modifications (see below). Orthologs of several genes that in E.coli belong to the purine regulon, namely codBA, pyrC, gcvTHP, speAB, purT are missing in H.influenzae. Of these genes, only purT is directly involved in purine synthesis, but its function is redundant with that of purN (20). Finally and most interestingly, orthologs of some genes of the E.coli PurR regulon, namely pyrD, prsA, glnB, purA and purR itself, are present in H.influenzae but apparently have lost the PurR regulation. [The regulation of E.coli purA by PurR binding to the two rather weak PUR boxes in its upstream regions is in fact questionable (31,32).]. The E.coli purR gene is autoregulated through two PUR boxes. However, no sequence resembling a PUR box can be found upstream of purR in H.influenzae, and it seems that direct autoregulation in this case can be ruled out.
Table 3. Haemophilus influenzae operons predicted to belong to the purine regulon
Several operons of the purine regulon have different gene organization and/or mode of regulation in E.coli and H.influenzae. Two E.coli operons-purHD and glyA, both regulated by PurR, correspond to a single H.influenzae gene string HI0887-HI0889, and a PUR box is found upstream of HI0887 (Fig. 3a). Thus these three H.influenzae genes are confidently predicted to form a single PurR-regulated operon. The E.coli purB gene is the ortholog of the H.influenzae gene HI0639. In E.coli, this gene is regulated by PurR via the roadblock mechanism (33), which explains an unusual location of the PUR box within the coding region (around codon 60). In H.influenzae the PUR box is found upstream of the first gene in the operon-like gene string HI0638-HI0639. Notably, HI0638 is the ortholog of the uncharacterized E.coli gene ycfC, which is located upstream of purB (Fig. 3b).
Figure 3. Some Proteobacterial operons with variations in gene organization and/or mode of regulation. (a) The purine regulon, the purHD operon. (b) The purine regulon, the purB operon. (c) The arginine regulon, the argECDH operon. (d) The arginine regulon, the art operon. (e) The aromatic amino acid regulon, the trp operon. (f) The aromatic amino acid regulon, the mtr operon. (g) The aromatic amino acid regulon, the aroF,G,H operons The candidate binding sites are indicated by a double dotted line.
The regulation status of the guaBA (HI0221-HI0222) operon of H.influenzae is unclear since the only candidate PUR box is within the second gene of the operon in position (+260) and is weak (score = 3.90). Although it could be another case of a distinct regulation mechanism, it is more likely that this operon is not regulated by PurR.
The arginine regulon. Of this E.coli regulon, H.influenzae retains only the repressor and two genes, namely argG and argH, which encode enzymes that catalyze the conversion of citrulline into arginine (Table 4). Orthologs of the other genes of the argCBH operon, as well as the single-gene operon argE (that in E.coli is transcribed in the opposite direction and is regulated by the same operator), are all missing in H.influenzae (Fig. 3c). The argR and argH genes of H.influenzae have single ARG boxes, and thus the regulatory effect is predicted to be weak.
Table 4. Haemophilus influenzae operons predicted to belong to the arginine regulon
The aromatic amino acid regulon. This case is the most complicated, and the analysis has been supplemented by consideration of the available genome fragments from other Proteobacteria. The autoregulation is conserved for the orthologs of trpR and tyrR genes in H.influenzae, as well as for trpR of Enterobacter cloacae and Salmonella typhimurium (Table 5) and tyrR of Citrobacter braakii (Table 6). The main tryptophan operon trpLEDCBA is conserved in the enterobacterium Vibrio parahaemoliticus but is broken into two parts in H.influenzae. The first part, which includes the HI1430-HI1432 genes (orthologs of E.coli ydfG-trpBA), contains an additional gene ydfG, which encodes a predicted oxidoreductase. This gene may be a relatively recent addition to the operon since it is not present in the trpBA operon of the closely related species Pasteurella multocida (Table 5; Fig. 3e). In Pseudomonas aeruginosa, the trpBA operon is regulated by an unrelated transcription factor trpI, and accordingly, no TRP boxes are found upstream of this operon.
Table 5. Operons of various bacteria predicted to be regulated by TrpR
aSpecies: Hin, H.influenzae; Ecl, E.cloacae; Sty, S.typhimurium; Vpa, V.parahaemoliticus; Pmu, P.multocida.
Table 6. Operons of various bacteria predicted to be regulated by TyrR
aSpecies: Hin, H.influenzae; Cbr, C.braakii; Sty, S.typhimurium.
In E.coli, the aroLM and mtr operons are regulated by both TrpR and TyrR. There are no orthologs of aroL and aroM genes in H.influenzae; the ortholog of the mtr gene has only the TRP box (Fig. 3f). Other operons that have no orthologs in H.influenzae are tyrB and aroP. By contrast, H.influenzae has two paralogous tyrP genes (HI0477 and HI0528). The former has three candidate TYR boxes, whereas the latter has only one; the single E.coli tyrP gene has two binding sites for TyrR.
The most interesting case is that of the unique H.influenzae 3-deoxy-D-arabino-heptulosonate 7-phosphate (DAPH) synthase. There are three DAPH-synthases in E.coli, which are encoded by aroH, aroG and aroF and feedback-inhibited by tryptophan, phenylalanine and tyrosine, respectively (34). The gene HI1547 is confidently identified as the ortholog of aroG (data not shown) and thus is predicted to encode DAPH-synthase-PHE. However, unlike aroG, which is regulated by TyrR (with phenylalanine and tryptophan acting as co-repressors), this H.influenzae gene has a TRP box, but no TYR boxes, similarly to the E.coli tryptophan-regulated gene aroH, which encodes the DAPH-synthase-TRP (Fig. 3g). Two alternative explanations of this evolutionary conundrum seem possible: (i) the H.influenzae DAPH-synthase-PHE is regulated by tryptophan at the transcriptional level, the functional implications of which are unclear, and (ii) the H.influenzae DAPH-synthase, although in phylogenetic terms orthologous to aroG, has changed the specificity of allosteric inhibition and is feedback-inhibited by tryptophan. A final solution can be reached only by experimental analysis of the H.influenzae enzyme.
Finally, catabolic operons tutBA in Erwinia herbicola and tpl in Citrobacter freundii also are regulated by TyrR and their regulatory regions contain multiple TYR boxes (data are not shown).
Transport proteins: new members of known regulons. Our analysis of the PurR regulon resulted in the identification of a family of transport proteins that is represented in E.coli and H.influenzae, as well as a number of other bacteria (Fig. 4). The family consists of two subfamilies. The known members of one subfamily are uracyl and xanthine transporters (35), whereas the other subfamily does not include any transporters with a known specificity. Escherichia coli has representatives in both subfamilies, and notably, they form pairs of closely related paralogs (yicO and yieG, yjcD and ygfQ/R, yicE and ygfO). In each case, the first member of a pair has a strong PUR box and thus is likely to be regulated by PurR, whereas the second member has no PUR boxes (Table 7 and Fig. 4). All close relatives of the yicE-ygfO pair and one additional gene with a PUR box, ygfU, encode H+/purine(xanthine) symporters, and thus purine transport is the most likely function for these genes. The other two pairs, yicO-yieG and yjcD-ygfQ/R, as well as the H.influenzae gene HI0125, which is the ortholog of the latter pair, can be assigned only an unspecified transport function.
Figure 4. A phylogenetic tree of purine and uracyl permeases. The tree was constructed using the neighbor-joining method. The numbers at forks indicate the percentage of bootstrap replications (out of 1000), in which the given grouping was observed. The putative permeases from E.coli and H.influenzae that were predicted to belong to the pur regulon as a result of our analysis and their apparently unregulated paralogs (broken lines) are shown by bold type. PyrP, UraA, uracil permeases; UapA, uric acid-xanthine permease; UapC, broad specificity purine permease; PbuX, xanthine permease. The remaining proteins are functionally uncharacterized gene products that are indicated either by provisional gene name (starting with the letter Y) or by Gene Identification number. Species abbreviations: Bb, Borrelia burgdorferi; Bc, Bacillus caldoliticus; Bs, B.subtilis; Ec, E.coli; Ef, Enterococcus faecalis; En, Emericella nidulans; Hi, H.influenzae; Hp, H.pylori; Mj, Methanococcus jannaschii.
Table 7. Transport operons predicted to belong to the purine and arginine regulons
aSpecies: Eco, E.coli; Hin, H.influenzae; Eae, E.aerogenes; Kpn, K.pneumoniae; Sty, S.typhimurium.
In addition, PUR boxes were found upstream of the tsx gene, which encodes an outer membrane nucleoside-specific channel in E.coli, Enterobacter aerogenes, Klebsiella pneumoniae and S.typhimurium (36,37).
The analysis of the ArgR regulon resulted in the identification of ARG boxes upstream of the operons that encode arginine-specific ABC transport systems (artPIQM and artJ from E.coli; HI1180-HI1177 from H.influenzae); thus these operons belong to the arginine regulon (Table 7). In this system, ArtP is the ATPase, ArtQ and ArtM are transmembrane proteins, and ArtI and ArtJ are periplasmic arginine-binding proteins. The orthologous operon of H.influenzae has the same gene order. The E.coli artJ gene is located immediately downstream of the artPIQM operon, but is transcribed independently and has its own ARG box (Fig. 3d); H.influenzae has no ortholog for this gene. The regulatory regions of each of the transport operons contain a single ARG box, which suggests that the regulatory effect caused by ArgR binding is likely to be low.
DISCUSSION
Computer analysis had been used for prediction of bacterial transcription signals for more than 15 years (10,38-44) and on many occasions the results have served as the basis for further experimental work (e.g. 43). Co-evolution of regulons and regulators also was examined (45). However, to the best of our knowledge, this study is the first attempt to systematically characterize regulatory sites in two or more genomes by comparing the respective complete gene sets.
This comparative approach involves three main components: (i) prediction of transcription factor binding sites, (ii) delineation of orthologous relationship between genes by comparing their protein products and (iii) comparison and, when necessary, prediction of protein functions. The use of complete genomes facilitates the identification of orthologs and thus increases the reliability of inferences regarding identical or similar cellular roles of proteins. However, in spite of potential uncertainty in terms of orthology, identification of homologous genes in all bacterial species, including those whose genome sequences have not been completed yet, using similarity search in GenBank is a useful supplement to this analysis.
All sites considered in this paper are approximately palindromic. However, we used the sites in the orientation corresponding to the direction of transcription and did not symmetrize the profiles. There were two reasons for this. First, we were interested in designing a general procedure for site recognition, rather than one that is applicable to symmetrical sites only. Second, it is not guaranteed that even the dimeric factors bind their operators in the symmetric manner. This possibility has been raised in the case of TrpR based on the crystallographic data (46) and chemical modification of natural sites (47), and in the case of AraC based on mutational analysis (48). The Lrp binding signal derived from the SELEX data is not symmetrical either (49).
The comparative analysis of the E.coli and H.influenzae genomes revealed three principal types of differences between operons that are subject to the same mode of regulation. The differences of the first type are limited to the presence or absence of individual genes in otherwise conserved operons. The examples in H.influenzae are operons ycfCpurB (purB in E.coli, Fig. 3b), argH (argCBH in E.coli, Fig. 3c), ydfGtrpBA (trpBA in P.multocida, Fig. 3e) and tyrA (aroFtyrA in E.coli, Fig. 3g).
The second type of changes involves breaking of an operon into two parts, both of which retain the regulation. Two E.coli operons, purHD and glyA, both regulated by PurR, correspond, in H.influenzae, to the gene string HI0887-HI0889 with a PUR box upstream of HI0887 (Fig. 3a). Similarly, the tryptophan operon is broken in H.influenzae into two parts, trpEDC and trpBA, both of which have strong TRP boxes in the regulatory regions.
Finally, some operons lose or switch regulation. The most interesting case in this category is the elimination of purR autoregulation in H.influenzae. The loss of `regulation of regulators' appears to be a more general phenomenon: in E.coli, the repressor IlvY regulates both its own gene ilvY and the adjacent ilvC gene, which are transcribed from divergent promoters. By contrast, in H.influenzae, although the overall location of these genes is the same, the distance between them is much larger, and a candidate binding site is close to ilvC, but too distant from ilvY to expect autoregulation (M.Gelfand, unpublished observation). The elimination of this higher level of regulation may be linked to the evolution of the parasitic lifestyle of H.influenzae that requires much less versatility in the response of the bacterium to environmental changes than its free-living relatives, such as E.coli. Another clear case of simplification in regulation includes the loss of the TYR box by the H.influenzae mtr operon, which in E.coli is regulated by both TrpR and TyrR. The roadblock mechanism of repression of purB by purR in E.coli is not conserved in H.influenzae, although the repression itself seems to exist. Finally, it is possible that the gene aroG of H.influenzae has switched its regulation from TyrR to TrpR.
The conservation of a regulatory DNA-binding protein in an uncharacterized bacterial genome seems to be a reliable predictor of the conservation of the binding sites in at least some operons, even if most of the regulon is missing. For example, there are only three known genes in the arginine regulon of H.influenzae, including the repressor ArgR itself (but not counting the transport proteins predicted to belong to the arginine regulon in this work), but the ARG boxes are conserved. The E.coli ARG box recognition matrix seems capable of detecting the relevant signals even in the distantly related Bacillus subtilis genome, which also encodes an ortholog of ArgR (A.A.Mironov and M.S.Gelfand, unpublished observations). Conversely, there are no strong PUR boxes in the Helicobacter pylori genome that does not encode a PurR ortholog. Similarly, although there is a purine repressor in B.subtilis, it is unrelated to the E.coli PurR, and indeed, the type of regulation (mostly by attenuation) and regulatory sites (in a few genes regulated at the transcription level) of the B.subtilis purine regulon differ from those of E.coli. The P.aeruginosa operon trpBA is regulated by the repressor TrpI, which is unrelated to TrpR of E.coli and H.influenzae, and predictably, there are no TRP boxes in the region upstream of this operon.
This study allowed us to make several predictions that appear to be readily experimentally testable. One group of such predictions includes inferences about changes in regulation patterns, namely the loss of autoregulation in the H.influenzae ortholog of PurR, different mode of repression of purB, and the apparent change in the regulation of aroG. The second group of predictions extends the purine and arginine regulons both in E.coli and H.influenzae by inclusion of transport proteins (purine and arginine transporters). It is somewhat surprising that these transport systems, especially the large family of H+/purine symporters, have not been identified as part of the purine regulon by genetic analysis. A possible explanation is that all genes from this family that are predicted to be under the PurR regulation have close non-regulated paralogs, and thus the effect of mutations in the regulated genes might be manifest only under very specific conditions.
Further research directions will include analysis of global regulatory systems, such as SOS, CRP, Fur and Fnr regulons, and multiple interacting systems, for example the interaction between purine and pyrimidine regulation or the interaction between the regulation by repression and by attenuation in the aromatic amino acid regulon, as well as comparisons between more distant genomes, such as E.coli and B.subtilis. As a more distant goal, we envisage development of techniques for systematic characterization of regulatory pathways in newly sequenced genomes.
ACKNOWLEDGEMENTS
This work was partially supported by grants from the Russian Fund of Basic Research and the US Department of Energy (FG02-94ER61919).
REFERENCES
*To whom correspondence should be addressed at present address: State Center of Biotechnology, NIIGenetika, Moscow 113545, Russia. Tel: +7 095 948 82 19; Fax: +7 095 315 05 01; Email: misha{at}imb.imb.ac.ru
This page is run by Oxford University Press, Great Clarendon Street, Oxford OX2 6DP, as part of the OUP Journals
Comments and feedback: jnl.info{at}oup.co.uk
Last modification:
Copyright© Oxford University Press, 1999.
This article has been cited by other articles:
![]() |
S. J. Rahi, P. Virnau, L. A. Mirny, and M. Kardar Predicting transcription factor specificity with all-atom models Nucleic Acids Res., October 1, 2008; (2008) gkn589v1. [Abstract] [Full Text] [PDF] |
||||
![]() |
D. A. Rodionov, X. Li, I. A. Rodionova, C. Yang, L. Sorci, E. Dervyn, D. Martynowski, H. Zhang, M. S. Gelfand, and A. L. Osterman Transcriptional regulation of NAD metabolism in bacteria: genomic reconstruction of NiaR (YrxA) regulon Nucleic Acids Res., April 1, 2008; 36(6): 2032 - 2046. [Abstract] [Full Text] [PDF] |
||||
![]() |
D. A. Rodionov, J. De Ingeniis, C. Mancini, F. Cimadamore, H. Zhang, A. L. Osterman, and N. Raffaelli Transcriptional regulation of NAD metabolism in bacteria: NrtR family of Nudix-related regulators Nucleic Acids Res., April 1, 2008; 36(6): 2047 - 2059. [Abstract] [Full Text] [PDF] |
||||
![]() |
R. Quatrini, C. Lefimil, F. A. Veloso, I. Pedroso, D. S. Holmes, and E. Jedlicki Bioinformatic prediction and experimental verification of Fur-regulated genes in the extreme acidophile Acidithiobacillus ferrooxidans Nucleic Acids Res., April 1, 2007; 35(7): 2153 - 2166. [Abstract] [Full Text] [PDF] |
||||
![]() |
C. Yang, D. A. Rodionov, X. Li, O. N. Laikova, M. S. Gelfand, O. P. Zagnitko, M. F. Romine, A. Y. Obraztsova, K. H. Nealson, and A. L. Osterman Comparative Genomics and Experimental Characterization of N-Acetylglucosamine Utilization Pathway of Shewanella oneidensis J. Biol. Chem., October 6, 2006; 281(40): 29872 - 29885. [Abstract] [Full Text] [PDF] |
||||
![]() |
J. C. Akers and M. Tan Molecular Mechanism of Tryptophan-Dependent Transcriptional Regulation in Chlamydia trachomatis. J. Bacteriol., June 1, 2006; 188(12): 4236 - 4243. [Abstract] [Full Text] [PDF] |
||||
![]() |
Z. Su, V. Olman, F. Mao, and Y. Xu Comparative genomics analysis of NtcA regulons in cyanobacteria: regulation of nitrogen assimilation and its coupling to photosynthesis Nucleic Acids Res., September 12, 2005; 33(16): 5156 - 5171. [Abstract] [Full Text] [PDF] |
||||
![]() |
A. V. Favorov, M. S. Gelfand, A. V. Gerasimova, D. A. Ravcheev, A. A. Mironov, and V. J. Makeev A Gibbs sampler for identification of symmetrically structured, spaced DNA motifs with improved estimation of the signal length Bioinformatics, May 15, 2005; 21(10): 2240 - 2245. [Abstract] [Full Text] [PDF] |
||||
![]() |
K. Tan, L. A. McCue, and G. D. Stormo Making connections between novel transcription factors and their DNA motifs Genome Res., February 1, 2005; 15(2): 312 - 320. [Abstract] [Full Text] [PDF] |
||||
![]() |
D. A. Rodionov, M. S. Gelfand, and N. Hugouvieux-Cotte-Pattat Comparative genomics of the KdgR regulon in Erwinia chrysanthemi 3937 and other gamma-proteobacteria Microbiology, November 1, 2004; 150(11): 3571 - 3590. [Abstract] [Full Text] [PDF] |
||||
![]() |
L. Wang, J. D. Trawick, R. Yamamoto, and C. Zamudio Genome-wide operon prediction in Staphylococcus aureus Nucleic Acids Res., July 13, 2004; 32(12): 3689 - 3702. [Abstract] [Full Text] [PDF] |
||||
![]() |
D. A. Rodionov, A. G. Vitreschak, A. A. Mironov, and M. S. Gelfand Comparative genomics of the methionine metabolism in Gram-positive bacteria: a variety of regulatory systems Nucleic Acids Res., June 23, 2004; 32(11): 3340 - 3353. [Abstract] [Full Text] [PDF] |
||||
![]() |
R. Larsen, G. Buist, O. P. Kuipers, and J. Kok ArgR and AhrC Are Both Required for Regulation of Arginine Metabolism in Lactococcus lactis J. Bacteriol., February 15, 2004; 186(4): 1147 - 1157. [Abstract] [Full Text] [PDF] |
||||
![]() |
H. P. Fischer, N. A. Brunner, B. Wieland, J. Paquette, L. Macko, K. Ziegelbauer, and C. Freiberg Identification of Antibiotic Stress-Inducible Promoters: A Systematic Approach to Novel Pathway-Specific Reporter Assays for Antibacterial Drug Discovery Genome Res., January 1, 2004; 14(1): 90 - 98. [Abstract] [Full Text] [PDF] |
||||
![]() |
J. Liu, K. Tan, and G. D. Stormo Computational identification of the Spo0A-phosphate regulon that is essential for the cellular differentiation and development in Gram-positive spore-forming bacteria Nucleic Acids Res., December 1, 2003; 31(23): 6891 - 6903. [Abstract] [Full Text] [PDF] |
||||
![]() |
O. D. King and F. P. Roth A non-parametric model for transcription factor binding sites Nucleic Acids Res., October 1, 2003; 31(19): e116 - e116. [Abstract] [Full Text] [PDF] |
||||
![]() |
V. A. Schmidt, L. Scudder, C. E. Devoe, A. Bernards, L. D. Cupit, and W. F. Bahou IQGAP2 functions as a GTP-dependent effector protein in thrombin-induced platelet cytoskeletal reorganization Blood, April 15, 2003; 101(8): 3021 - 3028. [Abstract] [Full Text] [PDF] |
||||
![]() |
A. Ghochikyan, I. M. Karaivanova, M. Lecocq, P. Vusio, M.-C. Arnaud, M. Snapyan, P. Weigel, L. Guevel, M. Buckle, and V. Sakanyan Arginine Operator Binding by Heterologous and Chimeric ArgR Repressors from Escherichia coli and Bacillus stearothermophilus J. Bacteriol., December 1, 2002; 184(23): 6602 - 6614. [Abstract] [Full Text] [PDF] |
||||
![]() |
D. A. Rodionov, A. A. Mironov, and M. S. Gelfand Conservation of the Biotin Regulon and the BirA Regulatory Signal in Eubacteria and Archaea Genome Res., October 1, 2002; 12(10): 1507 - 1516. [Abstract] [Full Text] [PDF] |
||||
![]() |
L. A. McCue, W. Thompson, C. S. Carmack, and C. E. Lawrence Factors Influencing the Identification of Transcription Factor Binding Sites by Cross-Species Comparison Genome Res., October 1, 2002; 12(10): 1523 - 1532. [Abstract] [Full Text] [PDF] |
||||
![]() |
E. M. Panina, A. A. Mironov, and M. S. Gelfand Comparative analysis of FUR regulons in gamma-proteobacteria Nucleic Acids Res., December 15, 2001; 29(24): 5195 - 5206. [Abstract] [Full Text] [PDF] |
||||
![]() |
J. M. Bacher and A. D. Ellington Selection and Characterization of Escherichia coli Variants Capable of Growth on an Otherwise Toxic Tryptophan Analogue J. Bacteriol., September 15, 2001; 183(18): 5414 - 5425. [Abstract] [Full Text] [PDF] |
||||
![]() |
L. Reitzer and B. L. Schneider Metabolic Context and Possible Physiological Themes of {sigma}54-Dependent Genes in Escherichia coli Microbiol. Mol. Biol. Rev., September 1, 2001; 65(3): 422 - 444. [Abstract] [Full Text] [PDF] |
||||
![]() |
K. Tan, G. Moreno-Hagelsieb, J. Collado-Vides, and G. D. Stormo A Comparative Genomics Approach to Prediction of New Members of Regulons Genome Res., April 1, 2001; 11(4): 566 - 584. [Abstract] [Full Text] |
||||
![]() |
L. A. McCue, W. Thompson, C. S. Carmack, M. P. Ryan, J. S. Liu, V. Derbyshire, and C. E. Lawrence Phylogenetic footprinting of transcription factor binding sites in proteobacterial genomes Nucleic Acids Res., February 1, 2001; 29(3): 774 - 782. [Abstract] [Full Text] [PDF] |
||||
![]() |
A. M. McGuire, J. D. Hughes, and G. M. Church Conservation of DNA Regulatory Motifs and Discovery of New Motifs in Microbial Genomes Genome Res., June 1, 2000; 10(6): 744 - 757. [Abstract] [Full Text] |
||||
![]() |
M. S. Gelfand, E. V. Koonin, and A. A. Mironov Prediction of transcription regulatory sites in Archaea by a comparative genomic approach Nucleic Acids Res., February 1, 2000; 28(3): 695 - 705. [Abstract] [Full Text] [PDF] |
||||
![]() |
E. P.C. Rocha, A. Danchin, and A. Viari Evolutionary Role of Restriction/Modification Systems as Revealed by Comparative Genome Analysis Genome Res., |

















