Skip Navigation

Nucleic Acids Research 2006 34(Web Server issue):W366-W368; doi:10.1093/nar/gkl069
This Article
Right arrow Abstract Freely available
Right arrow Print PDF (49K) Freely available
Right arrow Screen PDF (50K) Freely available
Right arrow Alert me when this article is cited
Right arrow Alert me if a correction is posted
Services
Right arrow Email this article to a friend
Right arrow Similar articles in this journal
Right arrow Similar articles in PubMed
Right arrow Alert me to new issues of the journal
Right arrow Add to My Personal Archive
Right arrow Download to citation manager
Right arrow Commercial Re-use Guidelines
for Open Access NAR Content
Google Scholar
Right arrow Articles by Neph, S.
Right arrow Articles by Tompa, M.
Right arrow Search for Related Content
PubMed
Right arrow PubMed Citation
Right arrow Articles by Neph, S.
Right arrow Articles by Tompa, M.
Social Bookmarking
 Add to CiteULike   Add to Connotea   Add to Del.icio.us  
What's this?

© The Author 2006. Published by Oxford University Press. All rights reserved
The online version of this article has been published under an open access model. Users are entitled to use, reproduce, disseminate, or display the open access version of this article for non-commercial purposes provided that: the original authorship is properly and fully attributed; the Journal and Oxford University Press are attributed as the original place of publication with the correct citation details given; if an article is subsequently reproduced or disseminated not in its entirety but only in part or as a derivative work this must be clearly indicated. For commercial re-use, please contact journals.permissions@oxfordjournals.org


Article

MicroFootPrinter: a tool for phylogenetic footprinting in prokaryotic genomes

Shane Neph and Martin Tompa*

Department of Computer Science and Engineering and Department of Genome Sciences, University of Washington Box 352350, Seattle, WA 98195-2350, USA

*To whom correspondence should be addressed. Tel: +1 206 543 9263; Fax: +1 206 543 8331; Email: tompa{at}cs.washington.edu

Received February 11, 2006. Revised February 19, 2006. Accepted March 1, 2006.


    ABSTRACT
 TOP
 ABSTRACT
 INTRODUCTION
 USER INPUTS
 METHODS USED BY MicroFootPrinter
 DISCUSSION
 REFERENCES
 
Phylogenetic footprinting is a method for the discovery of regulatory elements in a set of homologous regulatory regions, usually collected from multiple species. It does so by identifying the most conserved motifs in those homologous regions. This note describes web software that has been designed specifically for this purpose in prokaryotic genomes, making use of the phylogenetic relationships among the homologous sequences in order to make more accurate predictions. The software is called MicroFootPrinter and is available at http://bio.cs.washington.edu/software.html.


    INTRODUCTION
 TOP
 ABSTRACT
 INTRODUCTION
 USER INPUTS
 METHODS USED BY MicroFootPrinter
 DISCUSSION
 REFERENCES
 
One of the current challenges facing biologists is the discovery of novel functional elements in noncoding genomic sequence. With the rapidly increasing number of genomes being sequenced, a comparative genomics approach called ‘phylogenetic footprinting’ has become a favored method for such discovery. The idea underlying phylogenetic footprinting is that selective pressure causes functional elements to evolve at a slower rate than the nonfunctional surrounding sequence. Therefore the most conserved motifs in a collection of homologous regions are excellent candidates as functional elements.

This note focuses on phylogenetic footprinting for the discovery of novel cis-regulatory elements in prokaryotic genomes. A web tool for this purpose has been implemented in a program called MicroFootPrinter, available at http://bio.cs.washington.edu/software.html. One reason to focus on prokaryotes is that over 300 prokaryotic genomes are completely sequenced at the time of this writing, making this by far the richest current medium for phylogenetic footprinting. MicroFootPrinter gives the user automatic, full access to all these genomes.


    USER INPUTS
 TOP
 ABSTRACT
 INTRODUCTION
 USER INPUTS
 METHODS USED BY MicroFootPrinter
 DISCUSSION
 REFERENCES
 
MicroFootPrinter is actually a front end for the FootPrinter phylogenetic footprinting program (1), but specifically tailored to prokaryotic genomes. The user simply supplies a prokaryotic species and gene of interest. MicroFootPrinter automatically takes care of the laborious tasks of (i) finding homologous genes in related prokaryotes, (ii) inferring their phylogenetic gene tree, (iii) extracting the noncoding cis-regulatory regions of each of these homologous genes, (iv) setting the most difficult of FootPrinter's parameters and (v) running FootPrinter on these regulatory regions. The result is the identification of motifs that are well conserved across the cis-regulatory regions of these homologous genes. [The reader is referred to earlier work (1,2) for details on FootPrinter and examples of its applications to biological data].

MicroFootPrinter's ‘Search’ feature is very useful for quickly finding species and genes of interest. The user enters any search terms, separated by spaces. All search fields are considered, and any partial or complete match found is included in the results. For instance, if the user enters ‘coli’ for the species search; MicroFootPrinter offers the list of all Escherichia coli strains available. After choosing a species, if the user enters ‘pyrim’ for the gene search, MicroFootPrinter offers a list of all genes with this text in their gene product descriptions, notably genes involved in processing of pyrimidines.

After choosing a species and gene, the user is asked to supply a few simple parameters (or leave them at their default values). These are the length of the desired motif (in base pairs), the target number of motifs for MicroFootPrinter to display, the target number of species in which to locate homologous genes, and the maximum parsimony score (number of mutations) to allow among the instances of each displayed motif. If desired, the search for other species can also be restricted to any taxonomic clade containing the user's chosen species, for instance, restricted to just {gamma}–proteobacteria.

For each of these user inputs there are links marked ‘?’ that lead to further description. These include explanations of the input parameters and advice on adjusting them.

After the user has set the parameters, it typically takes 1 to 2 min of elapsed time for MicroFootPrinter to perform all its computations and display FootPrinter's output. For a description and interpretation of FootPrinter's output, the reader is referred to earlier work (1).


    METHODS USED BY MicroFootPrinter
 TOP
 ABSTRACT
 INTRODUCTION
 USER INPUTS
 METHODS USED BY MicroFootPrinter
 DISCUSSION
 REFERENCES
 
MicroFootPrinter uses protein-level BLAST to find the closest homologs to the user's chosen gene. Specifically, it uses NCBI's BLink facility, which provides the results of BLAST searches that have been done for every protein sequence in the Entrez Proteins data domain. If there are close homologs in multiple sequenced strains of the same species, MicroFootPrinter will select only the single strain whose homolog's protein sequence is most similar to the query sequence.

FootPrinter requires as input a phylogeny relating the homologous sequences. MicroFootPrinter infers this phylogeny by using ClustalW (3) to align the homologous protein sequences. The guide tree returned by ClustalW is used as a reasonable approximation of the true gene tree.

For each of these homologous genes, MicroFootPrinter next extracts the cis-regulatory regions in which FootPrinter will report conserved motifs. Each of these regions consists of up to 500 bp of noncoding sequence upstream of the start codon. (It may be shorter, if there is another coding region fewer than 500 bp upstream.) Note that these regulatory regions typically contain both 5' untranslated region (5' UTR) and promoter sequences. The fact that 5' UTR is included makes MicroFootPrinter useful for discovery of cis-regulatory mRNA elements such as riboswitches. Indeed, it has already proven useful in this role (4).

The prevalence of operons in prokaryotic genomes complicates the extraction of the regulatory regions. Operons are contiguous collections of genes on the same DNA strand that are transcribed together. Typically the intergenic distance between consecutive genes in an operon is extremely small. The complication in this case is that the desired regulatory region may be upstream of the entire operon rather than immediately upstream of the selected gene. For most prokaryotes, it is not known which genes comprise operons.

To handle this complication in a conservative manner, MicroFootPrinter extracts and concatenates the noncoding sequences upstream of the gene and upstream of its plausible operon. Specifically, if the next coding region upstream is in the same orientation and fewer than 100 bp upstream, this short intergenic sequence is concatenated with the result of applying this same procedure to the upstream gene. This process continues until interrupted either by a coding region in the opposite orientation or an intergenic region longer than 100 bp. Up to 500 bp of this final intergenic region are also concatentated to the result. These concatenated noncoding sequences are actually separated from each other by the sequence NNNNNNNNNN so that, when inspecting the ultimate FootPrinter output, the user can identify when such concatenation has taken place.

In addition to providing the user with FootPrinter's output, MicroFootPrinter also provides the protein sequences, cis-regulatory sequences and gene tree. With these, the user can rerun FootPrinter directly, adjusting FootPrinter's parameters if desired, or use another motif discovery tool.


    DISCUSSION
 TOP
 ABSTRACT
 INTRODUCTION
 USER INPUTS
 METHODS USED BY MicroFootPrinter
 DISCUSSION
 REFERENCES
 
There are many programs available for motif discovery. Most of these are not intended for phylogenetic footprinting, as they implicitly assume that the input sequences are independent rather than homologous. The traditional approach to phylogenetic footprinting has been via multiple sequence alignment. We believe that, for sequences as diverged as the prokaryotes that are currently sequenced, this approach is less effective than the use of FootPrinter, which searches for conserved motifs directly in unaligned sequences.

MicroFootPrinter provides the microbiologist with a convenient front end for FootPrinter, whereby specification of only the species and gene of interest is sufficient for the extraction of all the data necessary for phylogenetic footprinting on that gene. Ultimately, we would like to extend this service to the eukaryotes, but this is still premature. For the few eukaryotes that are currently completely sequenced, a static catalog of all regulatory elements discovered by phylogenetic footprinting (58) is probably more appropriate at this time.

Another extension that could be very helpful is the ability to analyze multiple genes from a single species for common regulatory elements, using the homologs of each gene as well. This is a more difficult problem than simple phylogenetic footprinting, one for which FootPrinter was not intended. For discussion of what makes this problem more difficult and some approaches to its solution, the reader is referred to recent work (912).


    ACKNOWLEDGEMENTS
 
The authors thank Jieyang Hu, Martha Mercaldi, Scott Rose, Larry Ruzzo, Travis Wright and the NCBI User Service for advice and assistance in this project. This material is based upon work supported in part by the National Science Foundation under grant DBI-0218798 and by the National Institutes of Health under grant R01 HG02602. The Open Access publication charges for this article were waived by Oxford University Press.

Conflict of interest statement. None declared.


    REFERENCES
 TOP
 ABSTRACT
 INTRODUCTION
 USER INPUTS
 METHODS USED BY MicroFootPrinter
 DISCUSSION
 REFERENCES
 

  1. Blanchette, M. and Tompa, M. (2003) FootPrinter: a program designed for phylogenetic footprinting Nucleic Acids Res, . 31, 3840–3842[Abstract/Free Full Text] .

  2. Blanchette, M. and Tompa, M. (2002) Discovery of regulatory elements by a computational method for phylogenetic footprinting Genome Res, . 12, 739–748[Abstract/Free Full Text] .

  3. Chenna, R., Sugawara, H., Koike, T., Lopez, R., Gibson, T.J., Higgins, D.G., Thompson, J.D. (2003) Multiple sequence alignment with the Clustal series of programs Nucleic Acids Res, . 31, 3497–3500[Abstract/Free Full Text] .

  4. Yao, Z., Weinberg, Z., Ruzzo, W.L. (2006) CMfinder–a covariance model based RNA motif finding algorithm Bioinform, . 22, 445–452[Abstract/Free Full Text] .

  5. Cliften, P., Sudarsanam, P., Desikan, A., Fulton, L., Fulton, B., Majors, J., Waterston, R., Cohen, B.A., Johnston, M. (2003) Finding functional features in Saccharomyces genomes by phylogenetic footprinting Science, 301, 71–76[Abstract/Free Full Text] .

  6. Kellis, M., Patterson, N., Endrizzi, M., Birren, B., Lander, E.S. (2003) Sequencing and comparison of yeast species to identify genes and regulatory elements Nature, 423, 241–254[CrossRef][Medline] .

  7. Xie, X., Lu, J., Kulbokas, E.J., Golub, T.R., Mootha, V., Lindblad-Toh, K., Lander, E.S., Kellis, M. (2005) Systematic discovery of regulatory motifs in human promoters and 3' UTRs by comparison of several mammals Nature, 434, 338–345[CrossRef][Medline] .

  8. Prakash, A. and Tompa, M. (2005) Discovery of regulatory elements in vertebrates through comparative genomics Nat. Biotechnol, . 23, 1249–1256[CrossRef][Web of Science][Medline] .

  9. Wang, T. and Stormo, G.D. (2003) Combining phylogenetic data with coregulated genes to identify regulatory motifs Bioinform, . 19, 2369–2380[Abstract/Free Full Text] .

  10. Moses, A.M., Chiang, D.Y., Eisen, M.B. (2004) Phylogenetic motif detection by expectation-maximization on evolutionary mixtures In Altman, R.B., Dunker, A.K., Hunter, L., Jung, T.A., Klein, T.E. (Eds.). Pacific Symposium on Biocomputing, World Scientific Publishing Co. pp. 324–335 .

  11. Sinha, S., Blanchette, M., Tompa, M. (2004) PhyME: a probabilistic algorithm for finding motifs in sets of orthologous sequences BMC Bioinform, . 5, 170[CrossRef][Medline] .

  12. Siddharthan, R., Siggia, E.D., van Nimwegen, E. (2005) PhyloGibbs: a Gibbs sampling motif finder that incorporates phylogeny PLoS Comput. Biol, . 1, e67[CrossRef][Medline] .


Add to CiteULike CiteULike   Add to Connotea Connotea   Add to Del.icio.us Del.icio.us    What's this?


This article has been cited by other articles:


Home page
Microbiol. Mol. Biol. Rev.Home page
S. A. F. T. van Hijum, M. H. Medema, and O. P. Kuipers
Mechanisms and Evolution of Control Logic in Prokaryotic Transcriptional Regulation
Microbiol. Mol. Biol. Rev., September 1, 2009; 73(3): 481 - 509.
[Abstract] [Full Text] [PDF]


Home page
Brief BioinformHome page
J. Mrazek
Finding sequence motifs in prokaryotic genomes--a brief practical guide for a microbiologist
Brief Bioinform, September 1, 2009; 10(5): 525 - 536.
[Abstract] [Full Text] [PDF]


Home page
Nucleic Acids ResHome page
Z. Weinberg, J. E. Barrick, Z. Yao, A. Roth, J. N. Kim, J. Gore, J. X. Wang, E. R. Lee, K. F. Block, N. Sudarsan, et al.
Identification of 22 candidate structured RNAs in bacteria using the CMfinder comparative genomics pipeline
Nucleic Acids Res., July 9, 2007; (2007) gkm487v1.
[Abstract] [Full Text] [PDF]


This Article
Right arrow Abstract Freely available
Right arrow Print PDF (49K) Freely available
Right arrow Screen PDF (50K) Freely available
Right arrow Alert me when this article is cited
Right arrow Alert me if a correction is posted
Services
Right arrow Email this article to a friend
Right arrow Similar articles in this journal
Right arrow Similar articles in PubMed
Right arrow Alert me to new issues of the journal
Right arrow Add to My Personal Archive
Right arrow Download to citation manager
Right arrow Commercial Re-use Guidelines
for Open Access NAR Content
Google Scholar
Right arrow Articles by Neph, S.
Right arrow Articles by Tompa, M.
Right arrow Search for Related Content
PubMed
Right arrow PubMed Citation
Right arrow Articles by Neph, S.
Right arrow Articles by Tompa, M.
Social Bookmarking
 Add to CiteULike   Add to Connotea   Add to Del.icio.us  
What's this?