Skip Navigation

This Article
Right arrow Abstract Freely available
Right arrow Print PDF (551K) Freely available
Right arrow Alert me when this article is cited
Right arrow Alert me if a correction is posted
Services
Right arrow Email this article to a friend
Right arrow Similar articles in this journal
Right arrow Similar articles in ISI Web of Science
Right arrow Similar articles in PubMed
Right arrow Alert me to new issues of the journal
Right arrow Add to My Personal Archive
Right arrow Download to citation manager
Right arrow Search for citing articles in:
ISI Web of Science (72)
Right arrowRequest Permissions
Right arrow Commercial Re-use Guidelines
for Open Access NAR Content
Google Scholar
Right arrow Articles by Blanchette, M.
Right arrow Articles by Tompa, M.
Right arrow Search for Related Content
PubMed
Right arrow PubMed Citation
Right arrow Articles by Blanchette, M.
Right arrow Articles by Tompa, M.
Social Bookmarking
 Add to CiteULike   Add to Connotea   Add to Del.icio.us  
What's this?

Nucleic Acids Research, 2003, Vol. 31, No. 13 3840-3842
© 2003 Oxford University Press

FootPrinter: a program designed for phylogenetic footprinting

Mathieu Blanchette and Martin Tompa1

Center for Biomolecular Science and Engineering, University of California, Santa Cruz, 1156 High St, Santa Cruz, CA 95064, USA 1 Department of Computer Science and Engineering, University of Washington, Box 352350, Seattle, WA 98195-2350, USA

*To whom correspondence should be addressed. Tel. +1 2065439263; Fax: +1 2065438331; Email: tompa{at}cs.washington.edu

Received February 13, 2003; Revised and Accepted March 25, 2003


    ABSTRACT
 TOP
 ABSTRACT
 DESCRIPTION
 BASIC USER INPUTS
 BASIC FOOTPRINTER OUTPUT
 ADDITIONAL USER INPUTS
 ADDITIONAL FOOTPRINTER OUTPUT
 ONLINE HELP
 REFERENCES
 
Phylogenetic footprinting is a method for the discovery of regulatory elements in a set of homologous regulatory regions, usually collected from multiple species. It does so by identifying the best conserved motifs in those homologous regions. This note describes web software that has been designed specifically for this purpose, making use of the phylogenetic relationships among the homologous sequences in order to make more accurate predictions. The software is called FootPrinter and is available at http://bio.cs.washington.edu/software.html.


    DESCRIPTION
 TOP
 ABSTRACT
 DESCRIPTION
 BASIC USER INPUTS
 BASIC FOOTPRINTER OUTPUT
 ADDITIONAL USER INPUTS
 ADDITIONAL FOOTPRINTER OUTPUT
 ONLINE HELP
 REFERENCES
 
One of the current challenges facing biologists is the discovery of novel functional elements in non-coding genomic sequence. With the rapidly increasing number of genomes being sequenced, a comparative genomics approach called ‘phylogenetic footprinting’ (1) has become a favored method for such discovery.

This note focuses on the discovery of novel regulatory elements. The idea underlying phylogenetic footprinting is that selective pressure causes regulatory elements to evolve at a slower rate than the non-functional surrounding sequence. Therefore the best conserved motifs in a collection of homologous regulatory regions are excellent candidates as regulatory elements.

The traditional method that has been used for phylogenetic footprinting is to construct a global multiple alignment of the homologous regulatory sequences and then identify well conserved aligned regions (2). However, this approach fails if the regulatory regions considered are too diverged to be accurately aligned.

In earlier work (3,4), we described an algorithm designed specifically for phylogenetic footprinting. Instead of relying on multiple alignment, we attack the problem with a motif discovery approach. Given a set of homologous input sequences and the phylogenetic tree T relating them, the algorithm identifies every set of kmers, one from each input sequence, that have parsimony score at most d with respect to T, where k and d are parameters specified by the user. (The parsimony score is the minimum number of nucleotide substitutions along the branches of T that explain the set of identified kmers.) This algorithm has been implemented in a program called FootPrinter, available at http://bio.cs.washington.edu/software.html, both in source code and through a web interface. This note describes the web interface to FootPrinter. The reader is referred to earlier work (35) for details on FootPrinter's algorithm, its applications on biological data and comparison to other phylogenetic footprinting tools.


    BASIC USER INPUTS
 TOP
 ABSTRACT
 DESCRIPTION
 BASIC USER INPUTS
 BASIC FOOTPRINTER OUTPUT
 ADDITIONAL USER INPUTS
 ADDITIONAL FOOTPRINTER OUTPUT
 ONLINE HELP
 REFERENCES
 
The simple web form asks the user to supply the homologous input sequences in Fasta format. The first word of each sequence annotation line following ‘>’ must correspond to the name of a species in the phylogenetic tree. The user also supplies the phylogenetic tree relating the sequences, although if the tree is absent FootPrinter will use a default species tree containing many of the most commonly used eukaryotic species. If the user chooses to enter a phylogeny, it is given in the usual bracket form. For example, the tree for Figure 1 is ((salmon,(lates,fugu)),(chicken,(((rat,mouse),human),(dog, sheep)))).



View larger version (60K):
[in this window]
[in a new window]
 
Figure 1. Sample FootPrinter output on the upstream regions of nine growth hormone genes. We searched for motifs of size 8 with at most two mutations. Motif losses were allowed, at a cost of one mutation. The subregion size was set to 100 bp with a subregion change cost of one mutation. The motif list reported on the right has been trimmed to fit on the page.

 
The user also chooses a few parameter values that specify the type of motif FootPrinter should report. These include the motif size (k in the description above) and the maximum number of mutations (maximum parsimony score d in the description above).


    BASIC FOOTPRINTER OUTPUT
 TOP
 ABSTRACT
 DESCRIPTION
 BASIC USER INPUTS
 BASIC FOOTPRINTER OUTPUT
 ADDITIONAL USER INPUTS
 ADDITIONAL FOOTPRINTER OUTPUT
 ONLINE HELP
 REFERENCES
 
FootPrinter's results are made available in three formats, HTML, Postscript and plain text. The HTML format is assumed in what follows, as it provides the most information in a graphical, interactive form. See Figure 1 for an example of the results obtained on a set of homologous growth hormone regulatory regions.

Each of the input sequences is repeated on the results page, with FootPrinter's discovered motifs highlighted in color and in larger fonts. Instances of the same motif appear in the same color. The font size indicates the parsimony score, with larger fonts corresponding to solutions with smaller parsimony score.

At the top of the results page the user can see a schematic representation of all the motifs at a glance. Each sequence is represented by a horizontal line labeled at one end by the species name. The phylogenetic tree relating the sequences is also shown. Above each horizontal line colored bars indicate the positions of FootPrinter's discovered motifs. The bar colors correspond to the font colors used in the sequences themselves. A more classical text representation is also available in the lower right panel.

In the example of Figure 1, the motif size was set to 8 and the maximum parsimony score set to 2. The reader will notice, however, that reported motifs are sometimes longer than the prescribed length (for example, the yellow motif in fishes is of length 9). This is because FootPrinter merges together overlapping motifs that it discovers, provided every instance overlaps in exactly the same way. Conversely, if motifs overlap differently in different sequences, each will be assigned its own color. Nucleotides that belong to several motifs are colored according to the motif with the most significant degree of conservation (see, for example, the overlapping green and purple motifs).


    ADDITIONAL USER INPUTS
 TOP
 ABSTRACT
 DESCRIPTION
 BASIC USER INPUTS
 BASIC FOOTPRINTER OUTPUT
 ADDITIONAL USER INPUTS
 ADDITIONAL FOOTPRINTER OUTPUT
 ONLINE HELP
 REFERENCES
 
There are two further sets of input parameters that deserve mention. The first has to do with conservation of motif position. If the user does not want FootPrinter to report motifs whose locations within the input sequences vary too much, parameters called ‘subregion size’ (with units in bp) and ‘subregion change cost’ can be used. In this case, FootPrinter subdivides the input sequences into subregions of the given size and the given cost is charged every time a motif changes subregion during its evolution. This cost is added to the motif's parsimony score, so that, in order to be reported by FootPrinter, such motifs must be better conserved if they are not to exceed the maximum parsimony score specified by the user. A ‘soft boundary’ approach ensures that nearby motifs separated by a subregion boundary are not penalized (4).

The final set of input parameters is very important in practice. In sufficiently diverged sequences it may be common that some regulatory elements occur in only a subset of the input sequences. This could happen because the regulatory element is only functional in a subset of the sequences or also because some of the input sequences chosen happen to be too short to contain the regulatory element. In either case, it is useful for FootPrinter to allow for the loss of regulatory elements in some of the input sequences. To do so, FootPrinter starts by estimating the length of each branch of the tree based on the input sequences. The motifs reported are those whose parsimony score is unexpectedly low considering the amount of divergence of the subset of species containing the motif. If the user chooses this option, another parameter called the ‘motif loss cost’ is added to the parsimony score for every branch on which the motif is lost.


    ADDITIONAL FOOTPRINTER OUTPUT
 TOP
 ABSTRACT
 DESCRIPTION
 BASIC USER INPUTS
 BASIC FOOTPRINTER OUTPUT
 ADDITIONAL USER INPUTS
 ADDITIONAL FOOTPRINTER OUTPUT
 ONLINE HELP
 REFERENCES
 
Referring again to Figure 1, we have already discussed two motifs (green and yellow) that occur in only a subset of species. Even though these motifs are absent from many species, their level of conservation and span of the tree are judged significant by FootPrinter according to criteria previously described (3). The light blue motif, found only in fishes, may be a false positive, as it is clear from the schematic representation that its position is inconsistent with respect to the other two motifs in fishes. FootPrinter leaves this judgment call to the user.

There are a few more functionalities of the HTML output format that are useful to the user. If the mouse cursor is moved to a colored instance in one of the sequences (without clicking the mouse button), information about that motif is shown at the bottom of the browser screen: the position of this instance, the parsimony score of this motif, and the total branch length spanned by the input sequences containing instances of this motif. If the mouse button is now clicked the corresponding colored bars in the schematic at the top of the page are highlighted and the subtree containing instances of the motif is colored in the phylogeny at the top of the page. This allows for quick visual identification of motif occurrences. A textual summary of the motif clicked is also reported (lower right panel in Fig. 1).


    ONLINE HELP
 TOP
 ABSTRACT
 DESCRIPTION
 BASIC USER INPUTS
 BASIC FOOTPRINTER OUTPUT
 ADDITIONAL USER INPUTS
 ADDITIONAL FOOTPRINTER OUTPUT
 ONLINE HELP
 REFERENCES
 
At every step of the process there are links to relevant information to help the user. These include definitions of input parameters, examples of input format, guidance on parameter choices and advice on parameters to change if the user would like more or fewer motifs to be reported.


    ACKNOWLEDGEMENTS
 
We thank Saurabh Sinha for his help in developing the web interface and an anonymous referee who tested the interface carefully. This material is based upon work supported in part by a Natural Sciences and Engineering Research Council of Canada (NSERC) fellowship, by a Fonds Québécois de la Recherche sur la Nature et les Technologies fellowship, by the Howard Hughes Medical Institute, by the National Science Foundation under grants DBI-9974498 and DBI-0218798, and by the National Institutes of Health under grant HG02602-01.


    REFERENCES
 TOP
 ABSTRACT
 DESCRIPTION
 BASIC USER INPUTS
 BASIC FOOTPRINTER OUTPUT
 ADDITIONAL USER INPUTS
 ADDITIONAL FOOTPRINTER OUTPUT
 ONLINE HELP
 REFERENCES
 

  1. Tagle,D., Koop,B., Goodman,M., Slightom,J., Hess,D. and Jones,R. (1988) Embryonic {varepsilon} and {gamma} globin genes of a prosimian primate (Galago crassicaudatus); nucleotide and amino acid sequences, developmental regulation and phylogenetic footprints. J. Mol. Biol., 203, 439–455.[CrossRef][Web of Science][Medline]

  2. Duret,L. and Bucher,P. (1997) Searching for regulatory elements in human noncoding sequences. Curr. Op. Struct. Biol., 7, 399–405.[CrossRef][Web of Science][Medline]

  3. Blanchette,M., Schwikowski,B. and Tompa,M. (2002) Algorithms for phylogenetic footprinting. J. Comput. Biol., 9, 211–223.[CrossRef][Web of Science][Medline]

  4. Blanchette,M. and Tompa,M. (2002) Discovery of regulatory elements by a computational method for phylogenetic footprinting. Genome Res., 12, 739–748.[Abstract/Free Full Text]

  5. Blanchette,M., Kwong,S. and Tompa,M. (2003) An empirical comparison of tools for phylogenetic footprinting. In Third IEEE Symposium on Bioinformatics and Bioengineering, IEEE Press, Los Alamitos, CA, pp. 69–78.


Add to CiteULike CiteULike   Add to Connotea Connotea   Add to Del.icio.us Del.icio.us    What's this?


This article has been cited by other articles:


Home page
Nucleic Acids ResHome page
R. Gordan, L. Narlikar, and A. J. Hartemink
Finding regulatory DNA motifs using alignment-free evolutionary conservation information
Nucleic Acids Res., January 4, 2010; (2010) gkp1166v1.
[Abstract] [Full Text] [PDF]


Home page
J Exp BotHome page
O. Ahrazem, A. Rubio-Moraga, R. C. Lopez, and L. Gomez-Gomez
The expression of a chromoplast-specific lycopene beta cyclase gene is involved in the high production of saffron's apocarotenoid precursors
J. Exp. Bot., January 1, 2010; 61(1): 105 - 119.
[Abstract] [Full Text] [PDF]


Home page
J HeredHome page
S. Aceto, C. Cantone, P. Chiaiese, G. Ruotolo, M. Sica, and L. Gaudio
Isolation and Phylogenetic Footprinting Analysis of the 5'-Regulatory Region of the Floral Homeotic Gene OrcPI from Orchis italica (Orchidaceae)
J. Hered., January 1, 2010; 101(1): 124 - 131.
[Abstract] [Full Text] [PDF]


Home page
Microbiol. Mol. Biol. Rev.Home page
S. A. F. T. van Hijum, M. H. Medema, and O. P. Kuipers
Mechanisms and Evolution of Control Logic in Prokaryotic Transcriptional Regulation
Microbiol. Mol. Biol. Rev., September 1, 2009; 73(3): 481 - 509.
[Abstract] [Full Text] [PDF]


Home page
Genome ResHome page
D. Xie, J. Cai, N.-Y. Chia, H. H. Ng, and S. Zhong
Cross-species de novo identification of cis-regulatory modules with GibbsModule: Application to gene regulation in embryonic stem cells
Genome Res., August 1, 2008; 18(8): 1325 - 1335.
[Abstract] [Full Text] [PDF]


Home page
BioinformaticsHome page
X. Wang, J. Gu, M. Q. Zhang, and Y. Li
Identification of phylogenetically conserved microRNA cis-regulatory elements across 12 Drosophila species
Bioinformatics, January 15, 2008; 24(2): 165 - 171.
[Abstract] [Full Text] [PDF]


Home page
Brief BioinformHome page
M. Brilli, R. Fani, and P. Lio
Current trends in the bioinformatic sequence analysis of metabolic pathways in prokaryotes
Brief Bioinform, January 1, 2008; 9(1): 34 - 45.
[Abstract] [Full Text] [PDF]


Home page
Mol. Endocrinol.Home page
V. F. Bumaschny, F. S. J. de Souza, R. A. Lopez Leal, A. M. Santangelo, M. Baetscher, D. H. Levi, M. J. Low, and M. Rubinstein
Transcriptional Regulation of Pituitary POMC Is Conserved at the Vertebrate Extremes Despite Great Promoter Sequence Divergence
Mol. Endocrinol., November 1, 2007; 21(11): 2738 - 2749.
[Abstract] [Full Text] [PDF]


Home page
Plant Physiol.Home page
E. R. Valdivia, J. Sampedro, J. C. Lamb, S. Chopra, and D. J. Cosgrove
Recent Proliferation and Translocation of Pollen Group 1 Allergen Genes in the Maize Genome
Plant Physiology, March 1, 2007; 143(3): 1269 - 1281.
[Abstract] [Full Text] [PDF]


Home page
Genome ResHome page
S. Kumar and A. Filipski
Multiple sequence alignment: In pursuit of homologous DNA positions
Genome Res., February 1, 2007; 17(2): 127 - 135.
[Abstract] [Full Text] [PDF]


Home page
J. Biol. Chem.Home page
J. H. Faraco, L. Appelbaum, W. Marin, S. E. Gaus, P. Mourrain, and E. Mignot
Regulation of Hypocretin (Orexin) Expression in Embryonic Zebrafish
J. Biol. Chem., October 6, 2006; 281(40): 29753 - 29761.
[Abstract] [Full Text] [PDF]


Home page
Nucleic Acids ResHome page
D. GuhaThakurta
Computational identification of transcriptional regulatory elements in DNA sequence
Nucleic Acids Res., July 19, 2006; 34(12): 3585 - 3598.
[Abstract] [Full Text] [PDF]


Home page
Nucleic Acids ResHome page
S. Neph and M. Tompa
MicroFootPrinter: a tool for phylogenetic footprinting in prokaryotic genomes.
Nucleic Acids Res., July 1, 2006; 34(Web Server issue): W366 - W368.
[Abstract] [Full Text] [PDF]


Home page
Nucleic Acids ResHome page
F. Fang and M. Blanchette
FootPrinter3: phylogenetic footprinting in partially alignable sequences.
Nucleic Acids Res., July 1, 2006; 34(Web Server issue): W617 - W620.
[Abstract] [Full Text] [PDF]


Home page
Mol Biol EvolHome page
S. De Bodt, G. Theissen, and Y. Van de Peer
Promoter Analysis of MADS-Box Genes in Eudicots Through Phylogenetic Footprinting
Mol. Biol. Evol., June 1, 2006; 23(6): 1293 - 1303.
[Abstract] [Full Text] [PDF]


Home page
BioinformaticsHome page
Z. Yao, Z. Weinberg, and W. L. Ruzzo
CMfinder--a covariance model based RNA motif finding algorithm
Bioinformatics, February 15, 2006; 22(4): 445 - 452.
[Abstract] [Full Text] [PDF]


Home page
BioinformaticsHome page
T. Hindemitt and K. F. X. Mayer
CREDO: a web-based tool for computational detection of conserved sequence motifs in noncoding sequences
Bioinformatics, December 1, 2005; 21(23): 4304 - 4306.
[Abstract] [Full Text] [PDF]


Home page
Proc. Natl. Acad. Sci. USAHome page
X. Li, S. Zhong, and W. H. Wong
Reliable prediction of transcription factor binding sites by phylogenetic verification
PNAS, November 22, 2005; 102(47): 16945 - 16950.
[Abstract] [Full Text] [PDF]


Home page
Proc. Natl. Acad. Sci. USAHome page
W. F. Odenwald, W. Rasband, A. Kuzin, and T. Brody
EVOPRINTER, a multigenomic comparative tool for rapid identification of functionally important DNA
PNAS, October 11, 2005; 102(41): 14700 - 14705.
[Abstract] [Full Text] [PDF]


Home page
Proc. Natl. Acad. Sci. USAHome page
X. Li and W. H. Wong
Sampling motifs on phylogenetic trees
PNAS, July 5, 2005; 102(27): 9481 - 9486.
[Abstract] [Full Text] [PDF]


Home page
Nucleic Acids ResHome page
S. Aerts, P. Van Loo, G. Thijs, H. Mayer, R. de Martin, Y. Moreau, and B. De Moor
TOUCAN 2: the all-inclusive open source workbench for regulatory sequence analysis
Nucleic Acids Res., July 1, 2005; 33(suppl_2): W393 - W396.
[Abstract] [Full Text] [PDF]


Home page
BioinformaticsHome page
S. K. Palaniswamy, V. X. Jin, H. Sun, and R. V. Davuluri
OMGProm: a database of orthologous mammalian gene promoters
Bioinformatics, March 15, 2005; 21(6): 835 - 836.
[Abstract] [Full Text] [PDF]


Home page
Mol. Cell. Biol.Home page
J.-w. Kim, K. I. Zeller, Y. Wang, A. G. Jegga, B. J. Aronow, K. A. O'Donnell, and C. V. Dang
Evaluation of Myc E-Box Phylogenetic Footprints in Glycolytic Genes by Chromatin Immunoprecipitation Assays
Mol. Cell. Biol., July 1, 2004; 24(13): 5923 - 5936.
[Abstract] [Full Text] [PDF]


Home page
Proc. Natl. Acad. Sci. USAHome page
V. Karandashov, R. Nagy, S. Wegmuller, N. Amrhein, and M. Bucher
Evolutionary conservation of a phosphate transporter in the arbuscular mycorrhizal symbiosis
PNAS, April 20, 2004; 101(16): 6285 - 6290.
[Abstract] [Full Text] [PDF]


Home page
Genome ResHome page
E. H. Margulies, M. Blanchette, NISC Comparative Sequencing Program, D. Haussler, and E. D. Green
Identification and Characterization of Multi-Species Conserved Sequences
Genome Res., December 1, 2003; 13(12): 2507 - 2518.
[Abstract] [Full Text] [PDF]


This Article
Right arrow Abstract Freely available
Right arrow Print PDF (551K) Freely available
Right arrow Alert me when this article is cited
Right arrow Alert me if a correction is posted
Services
Right arrow Email this article to a friend
Right arrow Similar articles in this journal
Right arrow Similar articles in ISI Web of Science
Right arrow Similar articles in PubMed
Right arrow Alert me to new issues of the journal
Right arrow Add to My Personal Archive
Right arrow Download to citation manager
Right arrow Search for citing articles in:
ISI Web of Science (72)
Right arrowRequest Permissions
Right arrow Commercial Re-use Guidelines
for Open Access NAR Content
Google Scholar
Right arrow Articles by Blanchette, M.
Right arrow Articles by Tompa, M.
Right arrow Search for Related Content
PubMed
Right arrow PubMed Citation
Right arrow Articles by Blanchette, M.
Right arrow Articles by Tompa, M.
Social Bookmarking
 Add to CiteULike   Add to Connotea   Add to Del.icio.us  
What's this?