Nucleic Acids Research 2006 34(Web Server issue):W400-W404; doi:10.1093/nar/gkl223
© The Author 2006. Published by Oxford University Press. All rights reserved
The online version of this article has been published under an open access model. Users are entitled to use, reproduce, disseminate, or display the open access version of this article for non-commercial purposes provided that: the original authorship is properly and fully attributed; the Journal and Oxford University Press are attributed as the original place of publication with the correct citation details given; if an article is subsequently reproduced or disseminated not in its entirety but only in part or as a derivative work this must be clearly indicated. For commercial re-use, please contact journals.permissions@oxfordjournals.org
CrossLink: visualization and exploration of sequence relationships between (micro) RNAs
Tobias Dezulian1,*,
Martin Schaefer1,
Roland Wiese2,
Detlef Weigel3 and
Daniel H. Huson1
1 Department of Algorithms in Bioinformatics, Center for Bioinformatics Tübingen, Tübingen University D-72076 Tübingen, Germany
2 yWorks GmbH, Research and Development D-72070 Tübingen, Germany
3 Department of Molecular Biology, Max-Planck-Institute for Developmental Biology D-72076 Tübingen, Germany
*To whom correspondence should be addressed. Tel: +49 7071 2970454; Fax: +49 7071 295148; Email: dezulian{at}informatik.uni-tuebingen.de
Received February 11, 2006. Revised March 25, 2006. Accepted March 27, 2006.
 |
ABSTRACT
|
|---|
CrossLink is a versatile tool for the exploration of relationships
between RNA sequences. After a parametrization phase, CrossLink
delegates the determination of sequence relationships to established
tools (BLAST, Vmatch and RNAhybrid) and then constructs a network.
Each node in this network represents a sequence and each link
represents a match or a set of matches. Match attributes are
reflected by graphical attributes of the links and corresponding
alignments are displayed on a mouse-click. The distributions
of match attributes such as
E-value, match length and proportion
of identical nucleotides are displayed as histograms. Sequence
sets can be highlighted and visibility of designated matches
can be suppressed by real-time adjustable thresholds for attribute
combinations. Powerful network layout operations (such as spring-embedding
algorithms) and navigation capabilities complete the exploration
features of this tool. CrossLink can be especially useful in
a microRNA context since Vmatch and RNAhybrid are suitable tools
for determining the antisense and hybridization relationships,
which are decisive for the interaction between microRNAs and
their targets. CrossLink is available both online and as a standalone
version at
http://www-ab.informatik.uni-tuebingen.de/software.
 |
INTRODUCTION
|
|---|
Explicitly visualizing sequences and their relationships as
a network provides concise and intuitive exploration possibilities.
In this respect, CrossLink nicely complements the software CLANS
(
1) which uses a network to visualize sequence similarity between
amino acid sequences. CrossLink delegates the determination
of sequence relationships to the established tools BLAST (
2),
Vmatch (
3) and RNAhybrid (
4). Users versed with these tools
will appreciate that (almost) all tool-specific parameters may
be set from within CrossLink. Furthermore, CrossLink allows
relationships determined by distinct tools to be visualized
within the same network. Both BLAST and Vmatch can detect local
sequence similarity in both sense and antisense directions and
are suitable for a wide range of scenarios. BLAST is a standard
tool using a fast seed-and-extend strategy. Vmatch employs a
suffix array-based approach that permits constraints on the
match length and on the number of mismatched bases within a
match. RNAhybrid is a specialized tool that can predict potential
binding sites of microRNAs in large target RNAs using an extension
of the classical RNA secondary structure prediction algorithm
(
5). In general, RNAhybrid finds the energetically most favorable
hybridization sites of a small RNA in a large RNA.
Although CrossLink can be put to use in many scenarios amenable to the above tools, it can be especially useful in a microRNA context: microRNAs interact with target transcripts by complementary base-pairing and can be classified into families on the basis of sequence similarityrelationships that can be detected by using Vmatch/RNAhybrid and BLAST, respectively (cf. examples below).
 |
DESIGN
|
|---|
Balancing flexibility and complexity, CrossLink allows the user
to independently specify three different kinds of relationship
searches, each with its own strategy (BLAST, Vmatch and RNAhybrid)
and a set of parameters. To this end, CrossLink's input consists
of two sets of RNA, A and B, each provided in the FASTA format.
The first kind of similarity search, S
AA, is performed between
all sequences of set A, yielding the set of matches M
AA. Likewise,
similarity searches S
AB and S
BB are performed to yield the set
of matches M
AB (between all sequences of set A and all sequences
of set B) and the set of matches M
BB (between all sequences
of set B), respectively. For clarity, a color scheme is associated
with each kind of match: reddish colors frame the parameter
input controls for S
AA as well as the match representations
of M
AA in the network, corresponding alignment windows and histograms.
Similarly, S
AB and M
AB are associated with greenish colors and
S
BB and M
BB are associated with bluish colors (
Figure 1). Within
each color scheme, shades indicate the orientation of each match:
a dark shade is associated with matches in sense orientation
and a light shade is associated with matches in antisense orientation.

View larger version (11K):
[in this window]
[in a new window]
|
Figure 1 Network of sequence sets A (red nodes) and B (blue nodes) with corresponding matches of set MAA, MAB and MBB represented by links in reddish, greenish and bluish colors, respectively.
|
|
In addition to orientation, each match has the following attributes:
E-value, length and the proportion of identical nucleotides
within the alignment when the match was determined by using
BLAST or Vmatch; minimal free energy (MFE), length and the proportion
of paired nucleotides within the alignment when the match was
determined by using RNAhybrid. For each match set, a visualization
option panel (
Figure 2) is provided that uses a histogram for
each match attribute to display the corresponding value distribution.
Sense matches and antisense matches are tallied separately in
each histogram. Note that the
E-value and MFE attribute histograms
run on a logarithmic scale and the length and identity/paired
proportion histograms run on a linear scale. Serving a 2-fold
purpose, the visualization option panel also allows manipulation
of the network: a threshold may be set for each attribute and
a specified combination of thresholds then determines which
matches will be considered for analysis and represented as links
in the network and which will be suppressed. This feature allows
the user to rapidly focus on matches with interesting characteristics.
A threshold is set by adjusting a slider for each attribute
and selecting a combination mode. Two combination modes are
available: in conjunction mode (logical AND) only
matches that pass all thresholds will be displayed. In disjunction
mode (logical OR) only matches that pass at least
one of the thresholds will be displayed. Whether the threshold
acts as a cutoff for smaller or higher values of an attribute
is specified by a radio button setting located on the left and
right of each attribute histogram. In addition, all sense and/or
antisense matches may be suppressed for a given match set. Exploration
can further be focused on an arbitrary selection of sequences
by removing all remaining sequences (along with their relationships)
from the exploration session using the menu bar (

View

Remove
all unselected nodes). All histograms are accordingly recalculated
on the basis of the remaining relationships.

View larger version (65K):
[in this window]
[in a new window]
|
Figure 2 The visualization options panel for matches in set MAA displaying the histograms associated with each match attribute.
|
|
An exploration session involves three phases that occur in order:
first, during a parametrization phase, the two input files are
chosen and for each of the three relationship searches a strategy
(BLAST, Vmatch or RNAhybrid) is selected and corresponding parameters
are specified. Next, in the search phase, CrossLink uploads
all necessary information to the server and the search is performed
remotely. Upon completion the results are passed back. During
the final exploration phase the resulting network is visualized
and relationships can be explored. A reset button permits the
user to jump back to the parametrization phase with the current
parameters.
Any two sequences can give rise to several distinct local sequence similarities. Representing each match by its own link may clutter up the network visualization when many sequence pairs each yield a multitude of local matches. Therefore, each match set can independently be displayed in either single match representative mode or multiple match representative mode. In single match representative mode each link between two network nodes represents a single match between the corresponding sequences. In the case of several matches between this pair of sequences each is represented by its own link running side by side between the two nodes. In multiple match representative mode a link between a pair of sequences represents all corresponding matches. One can select whether the representative of this match set should be the one with the smallest E-value/MFE, greatest length or highest identity/paired proportionas this may be relevant for the mentioned attribute histograms.
Clicking on a node or link of the network spawns a separate window displaying detail information about the corresponding sequence or match(es) (Figure 3), respectively. Note that the alignment is displayed in text form exactly as output by the originating tool. Clicking on a subset of selected nodes spawns a separate window displaying the corresponding sequences in the FASTA format. This enables export of sequence subsets for further scrutiny using other tools.
By default, sequences of set A and set B are displayed as red
and blue nodes, respectively, in the network. Arbitrary colors
may be assigned to subsets of sequences using the following
strategy: a color can be associated with a text pattern. Each
sequence, which contains the text pattern literally as a substring
in its FASTA header, will be colored accordingly. Optionally,
the pattern may contain a regular expression that is matched
accordingly. Any number of such patterncolor associations
may be specified (
Figure 4). A sequence thus associated with
several colors will appear multicolored.
To facilitate repeated exploration runs, the current parameter
set can be named and saved as a configuration template. Any
subsequent exploration task can be based on such a configuration
template either as is or after modification. Each
configuration template contains the following parameters: each
of the three sequence similarity search strategies including
all parameters, the custom patterncolor associations
and the two sequence input file names (as associated
default file names). Note that, for consistency, selecting a different
configuration template does not change the currently stated
input file names. However, the default file names associated
with the current template can be chosen explicitly.
A visualization window offers fast and powerful navigation of the network shown in the main view area (Figure 5): an overview area displays the currently visible clipping as a gray rectangle, which can be dragged, focussing the main view area accordingly. The mouse wheel permits rapid zooming. Network nodes can be selected and moved. Double-clicking on a sequence in the sequence selection pane (Figure 5, lower left) centers the view onto this sequence. Dragging the mouse cursor over a sequence displays its FASTA header.

View larger version (43K):
[in this window]
[in a new window]
|
Figure 5 The visualization window, with an overview area on the top left and a sequence selection panel on the lower left.
|
|
Several algorithms are available for network layout. The default
layout algorithm is a FruchtermanReingold (
6) spring-embedding,
similar to the one used in the BioLayout (
7) library, where
each link acts as a spring pulling at the sequences it is attached
to. A Reset node positions Button undoes all node
movements performed since the last application of a layout algorithm.
CrossLink's visualization component is based on the yFiles (
8)
graph library which provides the spring-embedding implementation.
 |
EXAMPLES
|
|---|
CrossLink provides three example configuration templates along
with the corresponding sequence files. To try out CrossLink,
one merely has to select one of the examples and press the Run
button. The following example scenarios are provided:
- Example 1: Sequence set A consists of all rice microRNAs of families 440446 available from miRBase (9). Sequence set B contains a subset of repetitive rice sequences downloaded from the TIGR Rice Genome Annotation Database. It is immediately visible that, for example, the rice microRNA family 445 exhibits very close sequence similarity to a family of repetitive rice sequences. Initially displaying a multitude of links in a tangle, this example demonstrates the power of the interactive histograms to focus on relevant relationships.
- Example 2: Sequence set A consists of all Arabidopsis microRNA precursors available at miRBase. Sequence set B contains all (
2000) sequences contained in the Arabidopsis Small RNA Project Database (10) to date. Setting these two sets in relationship with each other allows one to assess which microRNA families have been sequenced by the ASRP project. This example also demonstrates CrossLink's ability to handle large sets of sequences and also shows the power of the spring-embedding algorithm in clustering microRNAs into families.
- Example 3: Sequence set A consists of the Drosophila microRNAs dme-miR-3, dme-miR-4 and dme-miR-5. Sequence set B contains all corresponding targets which have been predicted (with an E-value < 1) in a study by Rehmsmeier et al. (4), plus some randomly picked sequences from the same study that have not been predicted as potential targets of these microRNAs. This example demonstrates the use of RNAhybrid, for example, revealing that one sequence (accession no. CG15125) is simultaneously targeted by two different microRNAs. Furthermore, the capability of custom patterncolor associations is shown as each predicted target set of the Rehmsmeier et al. (4) study is associated with its own color (yellow, magenta and cyan for the targets of dme-miR-3, dme-miR-4 and dme-miR-5, respectively) and the non-targets are shown in blue.
 |
AVAILABILITY
|
|---|
CrossLink is available both online and as a downloadable local
version. Both versions require an installed Java Runtime Environment
(JRE1.4.2 or later). To prevent overload of our server, the
online version restricts the size of the two input files to
1 MB. The local version requires locally installed NCBI BLAST,
Vmatch and RNAhybrid tools and a TCSH command line. The CrossLink
website at
http://www-ab.informatik.uni-tuebingen.de/software provides a user manual including a quick start guide plus detailed
descriptions of the example input data that CrossLink supplies.
 |
ACKNOWLEDGEMENTS
|
|---|
We thank Norman Warthmann, Rebecca Schwab, Heike Wollmann and
Matthias Zschunke for helpful suggestions. Furthermore, we are
grateful to Stefan Kurtz for his Vmatch software (
www.vmatch.de)
and to Marc Rehmsmeier for supplying us with the
Drosophila sequences. We especially appreciate that yWorks (
www.yworks.com)
provided us with yFiles library components. We would like to
thank all users who helped to improve our software with their
questions and feedback as well as two anonymous reviewers for
their helpful comments. Funding to pay the Open Access publication
charges for this article was provided by the Deutsche Forschungsgemeinschaft
(DFG).
Conflict of interest statement. None declared.
 |
REFERENCES
|
|---|
- Frickey, T. and Lupas, A. (2004) CLANS: a Java application for visualizing protein families based on pairwise similarity Bioinformatics, 20, 37023704[Abstract/Free Full Text]
.
- Altschul, S.F., Madden, T.L., Schaffer, A.A., Zhang, J., Zhang, Z., Miller, W., Lipman, D.J. (1997) Gapped BLAST and PSI-BLAST: a new generation of protein database search programs Nucleic Acids Res, . 25, 33893402[Abstract/Free Full Text]
.
- Kurtz, S., Choudhuri, J.V., Ohlebusch, E., Schleiermacher, C., Stoye, J., Giegerich, R. (2001) REPuter: the manifold applications of repeat analysis on a genomic scale Nucleic Acids Res, . 29, 46334642[Abstract/Free Full Text]
.
- Rehmsmeier, M., Steffen, P., Hochsmann, M., Giegerich, R. (2004) Fast and effective prediction of microRNA/target duplexes RNA, 10, 15071517[Abstract/Free Full Text]
.
- Zuker, M. and Stiegler, P. (1981) Optimal computer folding of large RNA sequences using thermodynamics and auxiliary information Nucleic Acids Res, 9, 133148[Abstract/Free Full Text]
.
- Fruchterman, T.M. and Reingold, E.M. (1991) Force directed placement Softw. Pract. Exp, 21, 11291164
.
- Enright, A.J. and Ouzounis, C.A. (2001) BioLayoutan automatic graph layout algorithm for similarity visualization Bioinformatics, 17, 853854[Abstract/Free Full Text]
.
- Wiese, R., Eiglsperger, M., Kaufmann, M. (2001) yFiles: visualization and automatic layout of graphs In Mutzel, P., Jünger, M., Leipert, S. (Eds.). Proceedings of the 9th International Symposium on Graph Drawing, Berlin Springer pp. 453454
.
- Griffiths-Jones, S., Grocock, R.J., van Dongen, S., Bateman, A., Enright, A.J. (2006) miRBase: microRNA sequences, targets and gene nomenclature Nucleic Acids Res, 34, D140D144[Abstract/Free Full Text]
.
- Gustafson, A.M., Allen, E., Givan, S., Smith, D., Carrington, J.C., Kasschau, K.D. (2005) ASRP: the Arabidopsis small RNA project database Nucleic Acids Res, 33, D637D640[Abstract/Free Full Text]
.

CiteULike
Connotea
Del.icio.us What's this?