Nucleic Acids Research Advance Access published online on October 8, 2008
Nucleic Acids Research, doi:10.1093/nar/gkn709
© 2008 The Author(s)
This is an Open Access article distributed under the terms of the Creative Commons Attribution Non-Commercial License (http://creativecommons.org/licenses/by-nc/2.0/uk/) which permits unrestricted non-commercial use, distribution, and reproduction in any medium, provided the original work is properly cited.
AthaMap, integrating transcriptional and post-transcriptional data
Lorenz Bülow1,
Stefan Engelmann2,
Martin Schindler2 and
Reinhard Hehl1,*
1Institut für Genetik,Technische Universität Braunschweig, Spielmannstr. 7, D-38106 Braunschweig and 2Software Systems Engineering Institute, Technische Universität Braunschweig, Mühlenpfordtstr. 23, D-38106 Braunschweig, Germany
*To whom correspondence should be addressed. Tel: +49 531 391 5772; Fax: +49 531 391 5765; Email: r.hehl{at}tu-braunschweig.de
Received September 1, 2008. Revised September 29, 2008. Accepted September 29, 2008.
 |
ABSTRACT
|
|---|
The AthaMap database generates a map of predicted transcription
factor binding sites (TFBS) for the whole
Arabidopsis thaliana genome. AthaMap has now been extended to include data on post-transcriptional
regulation. A total of 403 173 genomic positions of small RNAs
have been mapped in the
A. thaliana genome. These identify 5772
putative post-transcriptionally regulated target genes. AthaMap
tools have been modified to improve the identification of common
TFBS in co-regulated genes by subtracting post-transcriptionally
regulated genes from such analyses. Furthermore, AthaMap was
updated to the TAIR7 genome annotation, a graphic display of
gene analysis results was implemented, and the TFBS data content
was increased. AthaMap is freely available at
http://www.athamap.de/.
 |
INTRODUCTION
|
|---|
A large number of different databases are available for database-assisted
gene-expression analysis (
1). The first level of gene-expression
regulation is transcription which is controlled by the synchronized
binding of transcription factors (TFs) to adjacent
cis-regulatory
sequences. The bioinformatic identification of
cis-regulatory
sequences is an important tool to predict target genes of specific
TFs (
2). Towards these ends, the AthaMap database was developed.
AthaMap is a database that generates a genome-wide map of predicted
transcription factor binding sites (TFBS) and
cis-regulatory
elements for
Arabidopsis thaliana (
3,
4). Compared to similar
databases such as AGRIS, Athena and ATTED-II (
5–8), AthaMap
covers the whole-genome sequence and includes predicted TFBS
that were identified with positional weight matrices. Recently,
plant-related contents of the transcription and promoter databases
TRANSFAC and TRANSPRO (
9,
10) were integrated with plant proteome
and pathway data to the platform BKL Plant (BIOBASE Knowledge
library). This was combined with the previously reported ExPlain
tool that screens promoter regions with positional weight matrices
for TFBS and evaluates results using the Composite Module
Analyst (CMA) as core component (
11,
12). This commercial
product integrates promoter and pathway analysis of gene-expression
data (BIOBASE, Wolfenbüttel, Germany).
In contrast, AthaMap is in the public domain and provides online tools to display TFBS in user-selected genes or at specific genomic positions (3). The detection of combinatorial elements and their target genes allows the prediction of co-regulated genes (13). The gene analysis function detects common TFBS in user-provided genes (14). A short user manual has been published recently (15) and all tools are explained on the Description page on the AthaMap website as well. AthaMap has been linked with PathoPlant, a database on plant–pathogen interactions (16). Arabidopsis thaliana microarray experiments in PathoPlant can be screened for co-regulated genes that respond to up to three different stimuli (17). A list of co-regulated genes can directly be exported to AthaMap for identification of common TFBS. However, not all differentially expressed genes are transcriptionally regulated (18). One important factor for post-transcriptional regulation is the expression of small RNAs such as miRNA, siRNA and ta-siRNA (19). Although there are distinct pathways to generate these types of small RNAs, the resulting molecules are very similar in size and represent the small RNA transcriptome of the organism (20). Using a massive parallel sequencing approach, small transcriptome data became available for seedlings and inflorescence tissue of A. thaliana (21). The genome-wide nature of AthaMap and the availability of small RNA data provide a unique opportunity to combine transcriptional and post-transcriptional data in a single database. This may add significantly to the quality of cis-regulatory sequence identification involved in transcriptional regulation.
 |
ANNOTATION OF GENOMIC POSITIONS OF SMALL RNAS
|
|---|
Sequence signatures (17-mers) derived from a small RNA transcriptome
analysis of
A. thaliana inflorescence tissue and seedlings were
used for genomic screenings (
21). The complete lists of screening
sequences (Accession numbers GSM65747
[NCBI GEO]
and GSM65750
[NCBI GEO]
) were downloaded
from NCBI's Gene Expression Omnibus (GEO) repository (
22). Genomic
positions were determined by using a Perl script that screens
for occurrences of perfect matches of all 109 590 small RNA
17-mer screening sequences within the five chromosomes of
A. thaliana. Absolute positions and orientation of small RNA matches
from inflorescence tissue and seedlings were annotated to AthaMap
resulting in a total of 403 173 genomic matches. For screening
sequences yielding more than one genomic match, corresponding
loci were determined. A total of 5772 genes were predicted to
be post-transcriptionally regulated by small RNAs since their
transcribed regions are targets of at least one small RNA in
antisense orientation. A text file with the genome identifiers
of the 5772 predicted target genes of small RNAs can be downloaded
on the documentation page at AthaMap.
Genomic positions of small RNAs are displayed in AthaMap analogous to TFBSs and are symbolized as xxxxx>. The arrow head gives the orientation of the small RNA. A tool tip box appears when moving over the arrow indicating the absolute genomic position and screening library of the small RNA. Selecting the name adjacent to this symbol will open a new window giving additional information. Figure 1 shows a partial screen shot of position 11 911 on chromosome 1 with a small RNA from the inflorescence library, the tool tip box and the associated pop-up window. This new window shows the screening sequence, corresponding genomic positions for this particular small RNA and the reference.

View larger version (47K):
[in this window]
[in a new window]
[Download PowerPoint slide]
|
Figure 1. Small RNA binding sites in the Arabidopsis thaliana genome. Partial screen shot of the sequence display window with a small RNA binding site at position 11 911 on chromosome 1. The tool tip box indicates the absolute genomic position and screening library. A pop-up window with additional information on the small RNA is also shown.
|
|
Putative post-transcriptionally regulated genes are identified
within the Colocalization and Gene Analysis functions. These
genes are tagged on the result pages with an italicized genome
identifier. They can be subtracted in the Colocalization and
Gene Analysis functions by activating the checkbox exclude
genes regulated by smallRNA in order to restrict the
analyses exclusively to transcriptionally regulated genes.
 |
UPDATE TO TAIR7
|
|---|
The recent publication of the TAIR7
A. thaliana genome release
motivated the implementation of this genome annotation into
AthaMap (
23). The annotation of the gene structure is based
on five chromosomal XML flatfiles downloaded from the TAIR web
site (release 7). These files were parsed using a Perl script
and positional information for 5'- and 3'-UTRs, exons and introns
were annotated to AthaMap. These regions are displayed in AthaMap
with a colour code similar to the one used by TAIR. Due to the
significantly increased number of genes with annotated transcription
start site (TSS) in TAIR7, the Gene Analysis and Colocalization
functions of AthaMap have been changed to show positions of
TFBS relative to TSS of the nearest gene. This applies to 23
222 (73.1%) genes while for the remaining 8540 (26.9%) genes
results are still displayed relative to the translation start
site. In earlier versions of AthaMap, all positions were shown
relative to translation start sites as point of reference. Compared
to TAIR5 the previous version annotated to AthaMap, the nucleotide
sequence of the
A. thaliana genome in TAIR7 was not changed.
Therefore, the positional information of all previously determined
TFBS remained constant, except for TATA-boxes. Because of the
larger number of genes with an annotated TSS, the number of
annotated TATA boxes decreased from 16 277 (
13) to currently
15 955. The number of TATA boxes decreased because for genes
lacking a TSS a larger upstream region was screened for putative
TATA boxes than for genes with an annotated TSS (
3). Therefore,
the lower number of TATA boxes results from elimination of false
positives.
 |
GRAPHIC DISPLAY OF GENE ANALYSIS RESULTS
|
|---|
The Gene Analysis function of AthaMap generates long lists with
positional information on TFBSs in all genes investigated (
14).
Although overviews or summaries of the data can be displayed,
the positional information is difficult to perceive. Therefore,
a graphic display of TFBS in the analysed gene region was implemented
that enables easy comparison between genes and visual identification
of common binding site patterns. Every TF family as well as
the small RNAs and combinatorial elements are identified with
a different colour and their display can be selected individually.
Figure 2 shows the web interface with the buttons to select
the TF families and a graphic display of TFBS for selected TF
family members in the
Arabidopsis genes At2g42530 and At2g42540.
Also shown is a tool tip box that opens when the mouse pointer
moves over the colour-coded TFBS. The tool tip box gives additional
information for the TF that identified this particular TFBS.
Factor (RAV1) and factor family (AP2/EREBP) are identified as
well as the position relative to the TSS (–70). For TFBS
identified with positional weight matrices, threshold score,
maximum score and score of the binding site are given (
3).

View larger version (57K):
[in this window]
[in a new window]
[Download PowerPoint slide]
|
Figure 2. Graphic display of transcription factor and small RNA binding sites. Partial screen shot of the gene analysis tool with the checkboxes for TF families included in a graphic display and the graphic display of the upstream region of the genes At2g42530 and At2g42540. A tool tip box with additional information on one of the TFBS is also shown.
|
|
 |
DATA INCREASE
|
|---|
Recently published binding sites for the
Arabidopsis TFs TAC1,
RAP2.2 and MYB98 were annotated to AthaMap (
24–26). These
factors belong to the C2H2(Zn), AP2/EREBP and MYB TF families.
Detection and annotation of single binding sites was done as
described earlier (
4). Binding sites for two TFs for which positional
weight matrices could be generated were annotated as well. These
are the factors STF1 and SPL1 which belong to the bZIP and SBP
TF families (
27,
28). Detection and annotation of matrix-based
binding sites was done as described earlier (
3). AthaMap now
harbours 9 998 736 predicted TFBSs.
 |
FUNDING
|
|---|
German Federal Ministry for Education and Research through GABI-ADVANCIS
(BMBF 0315037B). Funding for open access charge: Technical University
of Braunschweig.
Conflict of interest statement. None declared.
 |
ACKNOWLEDGEMENTS
|
|---|
We would like to thank Anne-Kareen Blechert for help implementing
the TAIR7 genome annotation and for TFBS screenings.
 |
REFERENCES
|
|---|
- Hehl R, Bülow L. Internet resources for gene expression analysis in Arabidopsis thaliana. Curr. Genomics (2008) 9:375–380.[CrossRef]
- Hehl R, Wingender E. Database-assisted promoter analysis. Trends in Plant Sci. (2001) 6:251–255.[CrossRef]
- Steffens NO, Galuschka C, Schindler M, Bülow L, Hehl R. AthaMap: an online resource for in silico transcription factor binding sites in the Arabidopsis thaliana genome. Nucleic Acids Res. (2004) 32:D368–D372.[Abstract/Free Full Text]
- Bülow L, Steffens NO, Galuschka C, Schindler M, Hehl R. AthaMap: from in silico data to real transcription factor binding sites. In Silico Biol. (2006) 6:0023.
- Davuluri RV, Sun H, Palaniswamy SK, Matthews N, Molina C, Kurtz M, Grotewold E. AGRIS: Arabidopsis Gene Regulatory Information Server, an information resource of Arabidopsis cis-regulatory elements and transcription factors. BMC Bioinformatics (2003) 4:25.[CrossRef][Medline]
- O'Connor TR, Dyreson C, Wyrick JJ. Athena: a resource for rapid visualization and systematic analysis of Arabidopsis promoter sequences. Bioinformatics (2005) 21:4411–4413.[Abstract/Free Full Text]
- Palaniswamy SK, James S, Sun H, Lamb RS, Davuluri RV, Grotewold E. AGRIS and AtRegNet. a platform to link cis-regulatory elements and transcription factors into regulatory networks. Plant Physiol. (2006) 140:818–829.[Abstract/Free Full Text]
- Obayashi T, Kinoshita K, Nakai K, Shibaoka M, Hayashi S, Saeki M, Shibata D, Saito K, Ohta H. ATTED-II: a database of co-expressed genes and cis elements for identifying co-regulated gene groups in Arabidopsis. Nucleic Acids Res. (2007) 35:D863–D869.[Abstract/Free Full Text]
- Matys V, Fricke E, Geffers R, Gossling E, Haubrock M, Hehl R, Hornischer K, Karas D, Kel AE, Kel-Margoulis OV, et al. TRANSFAC: transcriptional regulation, from patterns to profiles. Nucleic Acids Res. (2003) 31:374–378.[Abstract/Free Full Text]
- Chen X, Wu JM, Hornischer K, Kel A, Wingender E. TiProD: the Tissue-specific Promoter Database. Nucleic Acids Res. (2006) 34:D104–D107.[Abstract/Free Full Text]
- Kel A, Voss N, Jauregui R, Kel-Margoulis O, Wingender E. Beyond microarrays: finding key transcription factors controlling signal transduction pathways. BMC Bioinformatics (2006) 7(Suppl 2):S13.
- Kel A, Konovalova T, Waleev T, Cheremushkin E, Kel-Margoulis O, Wingender E. Composite Module Analyst: a fitness-based tool for identification of transcription factor binding site combinations. Bioinformatics (2006) 22:1190–1197.[Abstract/Free Full Text]
- Steffens NO, Galuschka C, Schindler M, Bülow L, Hehl R. AthaMap web tools for database-assisted identification of combinatorial cis-regulatory elements and the display of highly conserved transcription factor binding sites in Arabidopsis thaliana. Nucleic Acids Res. (2005) 33:W397–W402.[Abstract/Free Full Text]
- Galuschka C, Schindler M, Bülow L, Hehl R. AthaMap web-tools for the analysis and identification of co-regulated genes. Nucleic Acids Res. (2007) 35:D857–D862.[Abstract/Free Full Text]
- Hehl R. The Handbook of Plant Functional Genomics: Concepts and Protocols.—Kahl G, Meksem K, eds. (2008) Weinheim, Germany: Wiley and Sons Ltd. 337–346.
- Bülow L, Schindler M, Choi C, Hehl R. PathoPlant®: a database on plant-pathogen interactions. In Silico Biol. (2004) 4:529–536.[Medline]
- Bülow L, Schindler M, Hehl R. PathoPlant®: a platform for microarray expression data to analyze co-regulated genes involved in plant defense responses. Nucleic Acids Res. (2007) 35:D841–D845.[Abstract/Free Full Text]
- Cheadle C, Fan J, Cho-Chung YS, Werner T, Ray J, Do L, Gorospe M, Becker KG. Control of gene expression during T cell activation: alternate regulation of mRNA transcription and mRNA stability. BMC Genomics (2005) 6:75.[CrossRef][Medline]
- Jones-Rhoades MW, Bartel DP, Bartel B. MicroRNAS and their regulatory roles in plants. Annu. Rev. Plant Biol. (2006) 57:19–53.[CrossRef][Medline]
- Vaucheret H. Post-transcriptional small RNA pathways in plants: mechanisms and regulations. Genes Dev. (2006) 20:759–771.[Abstract/Free Full Text]
- Lu C, Tej SS, Luo S, Haudenschild CD, Meyers BC, Green PJ. Elucidation of the small RNA component of the transcriptome. Science (2005) 309:1567–1569.[Abstract/Free Full Text]
- Barrett T, Troup DB, Wilhite SE, Ledoux P, Rudnev D, Evangelista C, Kim IF, Soboleva A, Tomashevsky M, Edgar R. NCBI GEO: mining tens of millions of expression profiles—database and tools update. Nucleic Acids Res. (2007) 35:D760–D765.[Abstract/Free Full Text]
- Swarbreck D, Wilks C, Lamesch P, Berardini TZ, Garcia-Hernandez M, Foerster H, Li D, Meyer T, Muller R, Ploetz L, et al. The Arabidopsis Information Resource (TAIR): gene structure and function annotation. Nucleic Acids Res. (2008) 36:D1009–D1014.[Abstract/Free Full Text]
- Ren S, Mandadi KK, Boedeker AL, Rathore KS, McKnight TD. Regulation of telomerase in Arabidopsis by BT2, an apparent target of TELOMERASE ACTIVATOR1. Plant Cell (2007) 19:23–31.[Abstract/Free Full Text]
- Welsch R, Maass D, Voegel T, Dellapenna D, Beyer P. Transcription factor RAP2.2 and its interacting partner SINAT2: stable elements in the carotenogenesis of Arabidopsis leaves. Plant Physiol. (2007) 145:1073–1085.[Abstract/Free Full Text]
- Punwani JA, Rabiger DS, Drews GN. MYB98 positively regulates a battery of synergid-expressed genes encoding filiform apparatus localized proteins. Plant Cell (2007) 19:2557–2568.[Abstract/Free Full Text]
- Song YH, Yoo CM, Hong AP, Kim SH, Jeong HJ, Shin SY, Kim HJ, Yun DJ, Lim CO, Bahk JD, et al. DNA-binding study identifies C-box and hybrid C/G-box or C/A-box motifs as high-affinity binding sites for STF1 and LONG HYPOCOTYL5 proteins. Plant Physiol. (2008) 146:1862–1877.[Abstract/Free Full Text]
- Liang X, Nazarenus TJ, Stone JM. Identification of a consensus DNA-binding site for the Arabidopsis thaliana SBP domain transcription factor, AtSPL14, and binding kinetics by surface plasmon resonance. Biochemistry (2008) 47:3645–3653.[CrossRef][Web of Science][Medline]

CiteULike
Connotea
Del.icio.us What's this?
This article has been cited by other articles:

|
 |

|
 |
 
C. Lelandais-Briere, L. Naya, E. Sallet, F. Calenge, F. Frugier, C. Hartmann, J. Gouzy, and M. Crespi
Genome-Wide Medicago truncatula Small RNA Analysis Revealed Novel MicroRNAs and Isoforms Differentially Regulated in Roots and Nodules
PLANT CELL,
September 1, 2009;
21(9):
2780 - 2796.
[Abstract]
[Full Text]
[PDF]
|
 |
|