Skip Navigation



Nucleic Acids Research Advance Access published online on May 30, 2007

Nucleic Acids Research, doi:10.1093/nar/gkm362
This Article
Right arrow Abstract Freely available
Right arrow Print PDF (988K) Freely available
Right arrow Screen PDF (576K) Freely available
Right arrowOA All Versions of this Article:
35/suppl_2/W227    most recent
gkm362v3
gkm362v2
gkm362v1
Right arrow Alert me when this article is cited
Right arrow Alert me if a correction is posted
Services
Right arrow Email this article to a friend
Right arrow Similar articles in this journal
Right arrow Similar articles in PubMed
Right arrow Alert me to new issues of the journal
Right arrow Add to My Personal Archive
Right arrow Download to citation manager
Right arrow Commercial Re-use Guidelines
for Open Access NAR Content
Google Scholar
Right arrow Articles by Okumura, T.
Right arrow Articles by Nakai, K.
Right arrow Search for Related Content
PubMed
Right arrow PubMed Citation
Right arrow Articles by Okumura, T.
Right arrow Articles by Nakai, K.
Social Bookmarking
 Add to CiteULike   Add to Connotea   Add to Del.icio.us  
What's this?

© 2007 The Author(s)
This is an Open Access article distributed under the terms of the Creative Commons Attribution Non-Commercial License (http://creativecommons.org/licenses/by-nc/2.0/uk/) which permits unrestricted non-commercial use, distribution, and reproduction in any medium, provided the original work is properly cited.


Web Server Issue

Melina II: a web tool for comparisons among several predictive algorithms to find potential motifs from promoter regions

Toshiyuki Okumura1, Hiroki Makiguchi1, Yuko Makita2, Riu Yamashita3 and Kenta Nakai3,*

1Mitsui Knowledge Industry Co. Ltd, 2RIKEN Genomic Sciences Center and 3Human Genome Center, Institute of Medical Science, University of Tokyo, Japan

*To whom correspondence should be addressed: Tel: +81-3-5449-5131; Fax: +81-3-5449-5133; Email: knakai{at}ims.u-tokyo.ac.jp

Received January 30, 2007. Revised April 13, 2007. Accepted April 25, 2007.


    ABSTRACT
 TOP
 ABSTRACT
 INTRODUCTION
 OVERVIEW
 EXAMPLES AND DISCUSSION
 FUTURE PROSPECTS
 IMPLEMENTATION
 REFERENCES
 
We present the second version of Melina, a web-based tool for promoter analysis. Melina II shows potential DNA motifs in promoter regions with a combination of several available programs, Consensus, MEME, Gibbs sampler, MDscan and Weeder, as well as several parameter settings. It allows running a maximum of four programs simultaneously, and comparing their results with graphical representations. In addition, users can build a weight matrix from a predicted motif and apply it to upstream sequences of several typical genomes (human, mouse, S. cerevisiae, E. coli, B. subtilis or A. thaliana) or to public motif databases (JASPAR or DBTBS) in order to find similar motifs. Melina II is a client/server system developed by using Adobe (Macromedia) Flash and is accessible over the web at http://melina.hgc.jp.


    INTRODUCTION
 TOP
 ABSTRACT
 INTRODUCTION
 OVERVIEW
 EXAMPLES AND DISCUSSION
 FUTURE PROSPECTS
 IMPLEMENTATION
 REFERENCES
 
Transcription factor binding sites (TFBSs) play important roles in the regulation of gene expression. Extraction of a common TFBS from a set of DNA sequences is a practically important problem. Although a number of algorithms have been released so far to overcome this problem, none of them seem to be perfect (1–4). Thus, to avoid missing important motifs relying on only one algorithm or to check the effect of changing parameter values, it is useful to compare the prediction results obtained from different algorithms/parameter values. To support this function, we previously released a web tool named Melina (5). Recently, it was updated to its second version, Melina II. In Melina II, some of the integrated algorithms are replaced with more modern ones and the graphical representation is extensively improved. Melina II enables users to compare the results of promoter analysis more efficiently and easily.


    OVERVIEW
 TOP
 ABSTRACT
 INTRODUCTION
 OVERVIEW
 EXAMPLES AND DISCUSSION
 FUTURE PROSPECTS
 IMPLEMENTATION
 REFERENCES
 
Melina II allows running at most four out of five external algorithms [Consensus (6), MEME (7), Gibbs sampler (8), MDscan (9) and Weeder (10)] with users’ specified parameter values to avoid missing important motifs. MDscan and Weeder are newly added in this release. MDscan is a hybrid of two motif search strategies, word enumeration and position-specific weight matrix. Weeder adopts an enumerative pattern discovery algorithm carrying out an almost exhaustive search. The integration of algorithms based on different principles should help detecting subtle motifs and reducing the number of false positives. It may also be helpful to narrow down motif candidates or to detect alternative motifs by the combination of different algorithms and/or parameter values. Results of these algorithms are comparatively displayed with intuitive graphics (Figure 1).


Figure 1
View larger version (54K):
[in this window]
[in a new window]
[Download PowerPoint slide]
 
Figure 1. Basic usage of Melina II.

 
As shown in Figure 1, three simple steps are sufficient to use Melina II:

Step 1: Input query sequences (Figure 1a)

In the Query input panel, multiple input sequences are fed in the FASTA format.

Step 2: Select predictive algorithms and their parameters (Figure 1a and b)

Although defaults are provided, users can choose the prediction algorithms and their specific parameter values at this step. Default parameters are sometimes chosen originally to make the search conditions as similar as possible to each other. They are: (1) the motif length is around 10 bases (‘6–10’ for MEME and Weeder; otherwise, ‘10’); (2) both strands are searched and (3) multiple occurrences are allowed for each sequence. Selecting the same algorithm with different parameter values at the same time is allowed.

Step 3: Submit a query and get results (Figure 1c)

After submitting a query, a job ID is displayed on the screen while the job is running. Users can later access the results by using this job ID.

After Melina II finishes the motif detection, the results of each prediction are integrated and displayed graphically (Figure 1c). Detected motif candidates are illustrated with colored arrows in the summarized view (upper-right corner of the result view). If users click a motif candidate in the summarized view, more information is shown in the detailed view (lower-right corner) and the predicted motif is illustrated by Sequence Logo (11) [the script for its drawing was taken from WebLogo (12)] or a weight matrix. This integrated result helps finding motif candidates and figuring out the outline of cis-regulatory modules. With the ‘PDF’ button, the output can be saved as a pdf file, which is useful either for users’ further manipulation and inclusion in publication or for getting the entire view by adjusting the scale. The ‘FIT’ button is used for conveniently getting the entire view along its horizontal axis and for hiding the detailed information at its lower half.

Furthermore, users can build a weight matrix from a predicted motif and apply it to upstream sequences of several typical genomes (human, mouse, A. thaliana, S. cerevisiae, E. coli or B. subtilis) or to public motif databases [JASPAR (13) or DBTBS (14)] in order to find similar motifs. For the former search, we used the HMMER package by Sean Eddy (http://hmmer.janelia.org/). More details are available from the help document.


    EXAMPLES AND DISCUSSION
 TOP
 ABSTRACT
 INTRODUCTION
 OVERVIEW
 EXAMPLES AND DISCUSSION
 FUTURE PROSPECTS
 IMPLEMENTATION
 REFERENCES
 
To illustrate how Melina II works, we give two examples. The first is a set of artificial DNA sequences containing several known motifs. The second consists of upstream sequences of functionally related genes.

Example 1: Embedded motifs in artificial sequences
In this example, the dataset consists of three 250-bp long DNA sequences (Figure 2a). Each DNA sequence was randomly generated by the Random Sequence Generator, which is a function of Melina II. Three known consensus motifs were inserted into each sequence (Figure 2b). Motifs were set in random order to check the influence of their location. In general, it is difficult for multiple alignment programs to detect all motifs from this kind of dataset.


Figure 2
View larger version (18K):
[in this window]
[in a new window]
[Download PowerPoint slide]
 
Figure 2. Embedded motifs in artificial sequences.

 
In this case, we used four algorithms, Consensus, MEME, Gibbs sampler and Weeder, with their original default parameters. This result shows that there is no predictive algorithm which can correctly detect all motifs. However, we can recover all the inserted motifs if we take motifs detected by at least two algorithms, as illustrated in Figure 3.


Figure 3
View larger version (44K):
[in this window]
[in a new window]
[Download PowerPoint slide]
 
Figure 3. Result view of example 1.

 
For the same dataset, we show another result in Figure 4. In this case, we used Consensus with default parameters and Gibbs sampler with three different sets of parameter values. This result clearly shows that values of parameters such as motif size and cut-off value can significantly influence motif detection. Because Melina II enables fine specification of parameters, expert users can analyze datasets multilaterally.


Figure 4
View larger version (32K):
[in this window]
[in a new window]
[Download PowerPoint slide]
 
Figure 4. Another result of example 1 using different parameter values.

 
Example 2: Upstream sequences of functionally related genes
We present here an example of real promoters containing a common motif. This dataset consists of 300 bp upstream sequences from the translational start sites of five Bacillus subtilis genes, known to be regulated by a well-known global regulator, CcpA. As shown in Figure 5, a common motif is identified and, through the search against DBTBS, it is confirmed that the motif found corresponds to the CcpA motif. (Figure 5b and c).


Figure 5
View larger version (42K):
[in this window]
[in a new window]
[Download PowerPoint slide]
 
Figure 5. Real promoters and motif database search.

 

    FUTURE PROSPECTS
 TOP
 ABSTRACT
 INTRODUCTION
 OVERVIEW
 EXAMPLES AND DISCUSSION
 FUTURE PROSPECTS
 IMPLEMENTATION
 REFERENCES
 
One future direction is to endow Melina a function to ‘guide’ favorable parameter values to improve the detection accuracy. It is not an easy task because optimal parameter values for each algorithm could depend on, say, the length and the number of input sequences as well as the nature of the pattern to be sought. Nevertheless, it seems to be possible more or less to categorize typical cases with suggested optimal parameter values for each (15).


    IMPLEMENTATION
 TOP
 ABSTRACT
 INTRODUCTION
 OVERVIEW
 EXAMPLES AND DISCUSSION
 FUTURE PROSPECTS
 IMPLEMENTATION
 REFERENCES
 
Melina II was developed as a web-based tool by using Adobe (Macromedia) Flash. You may need to install the Flash Plug-in beforehand.


    ACKNOWLEDGEMENTS
 
We would like to thank all groups and authors including Gary Stormo, Bill Thompson, Charles E. Lawrence, Timothy L. Bailey, Douglas L. Brutlag, Xiaole S. Liu, Giulio Pavesi, Graziano Pesole, Boris Lenhard, Sean Eddy, Thomas D. Schneider and Steven E. Brenner that made the following algorithms freely available: Consensus, Gibbs sampler, MEME, MDscan, Weeder, RefSeq, JASPAR, HMMER and Sequence Logo/WebLogo. We thank Nicolas Sierro also for critically reading the manuscript. This work was partly supported by Grant-in-Aid for Scientific Research on Priority Areas ‘Comprehensive Genomics’ from the Ministry of Education, Culture, Sports, Science and Technology of Japan. Funding to pay the Open Access publication charges for this article was provided by a budget from the Ministry of Education, Culture, Sports, Science and Technology, Japan.

Conflict of interest statement. None declared.


    REFERENCES
 TOP
 ABSTRACT
 INTRODUCTION
 OVERVIEW
 EXAMPLES AND DISCUSSION
 FUTURE PROSPECTS
 IMPLEMENTATION
 REFERENCES
 

  1. GuhaThakurta D. Computational identification of transcriptional regulatory elements in DNA sequence. Nucleic Acids Res. (2006) 34:3585–3598.[Abstract/Free Full Text]

  2. Tompa M, Li N, Bailey TL, Church GM, Moor BD, Eskin E, Favorov AV, Frith MC, Fu Y, et al. Assessing computational tools for the discovery of transcription factor binding sites. Nat. Biotechnol. (2005) 23:137–144.[CrossRef][ISI][Medline]

  3. Stormo GD. DNA binding sites: representation and discovery. Bioinformatics (2000) 16:16–23.[Abstract/Free Full Text]

  4. Kel A, Kel-Margoulis O, Borlack J, Tchekmenev D, Wingender E. Databases and tools for in silico analysis of regulation of gene expression. In: Handbook of Toxicogenomics—Borlak J, ed. (2005) VCH Weinheim. 253–290.

  5. Poluliakh N, Takagi T, Nakai K. Melina: motif extraction from promoter regions of potentially co-regulated genes. Bioinformatics (2003) 19:423–424.[Abstract/Free Full Text]

  6. Stormo G.D, Hartzell G.W. Identifying protein-binding sites from unaligned DNA fragments. Proc. Natl Acad. Sci. USA (1989) 86:1183–1187.[Abstract/Free Full Text]

  7. Bailey TL, Elkan C. Fitting a mixture model by expectation maximization to discover motifs in biopolymers. (1994) Proceedings of 2nd International Conference on Intelligent Systems Molecular Biology. 28–36.

  8. Lawrence CE, Altschul SF, Boguski MS, Neuwald AF, Liu JS, Wootton JC. Detecting subtle sequence signals: a Gibbs sampling strategy for multiple alignment. Science (1993) 262:208–214.[Abstract/Free Full Text]

  9. Liu XS, Brutlag DL, Liu JS. An algorithm for finding protein-DNA binding sites with applications to chromatin-immunoprecipitation. Nat. Biotechnol. (2002) 835–839.

  10. Pavesi G, Mauri G, Pesole G. An algorithm for finding signals of unknown length in DNA sequences. Bioinformatics (2001) 32:S207–S214.

  11. Schneider TD, Stephens RM. Sequence logos: a new way to display consensus sequences. Nucleic Acids Res. (1990) 18:6097–6100.[Abstract/Free Full Text]

  12. Crooks GE, Hon G, Chandonia JM, Brenner SE. WebLogo: a sequence logo generator. Genome Res. (2004) 14:1188–1190.[Abstract/Free Full Text]

  13. Vlieghe D, Sandelin A, De Bleser PJ, Vleminckx K, Wasserman WW, van Roy F, Lenhard B. A new generation of JASPAR, the open-access repository for transcription factor binding site profiles. Nucleic Acids Res. (2006) 34:D95–D97.[Abstract/Free Full Text]

  14. Makita Y, Nakao M, Ogasawara N, Nakai K. DBTBS: database of transcriptional regulation in Bacillus subtilis and its contribution to comparative genomics. Nucleic Acids Res. (2004) 32:D75–D77.[Abstract/Free Full Text]

  15. Poluliakh N, Konno M, Horton P, Nakai K. Parameter landscape analysis for common motif discovery programs. Lecture Notes in Computer Science (2005) 3318:79–87. Springer.[ISI]


Add to CiteULike CiteULike   Add to Connotea Connotea   Add to Del.icio.us Del.icio.us    What's this?



This Article
Right arrow Abstract Freely available
Right arrow Print PDF (988K) Freely available
Right arrow Screen PDF (576K) Freely available
Right arrowOA All Versions of this Article:
35/suppl_2/W227    most recent
gkm362v3
gkm362v2
gkm362v1
Right arrow Alert me when this article is cited
Right arrow Alert me if a correction is posted
Services
Right arrow Email this article to a friend
Right arrow Similar articles in this journal
Right arrow Similar articles in PubMed
Right arrow Alert me to new issues of the journal
Right arrow Add to My Personal Archive
Right arrow Download to citation manager
Right arrow Commercial Re-use Guidelines
for Open Access NAR Content
Google Scholar
Right arrow Articles by Okumura, T.
Right arrow Articles by Nakai, K.
Right arrow Search for Related Content
PubMed
Right arrow PubMed Citation
Right arrow Articles by Okumura, T.
Right arrow Articles by Nakai, K.
Social Bookmarking
 Add to CiteULike   Add to Connotea   Add to Del.icio.us  
What's this?