Nucleic Acids Research Advance Access published online on September 23, 2008
Nucleic Acids Research, doi:10.1093/nar/gkn611
| ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
Database Issue |
GabiPD: the GABI primary database—a plant integrative omics database
1GabiPD team, Bioinformatics group, Max Planck Institute of Molecular Plant Physiology, Am Mühlenberg 1, 2Department of Molecular Biology, University of Potsdam, Karl-Liebknecht-Strasse 24-25, Haus 20, 14476 Potsdam-Golm and 3Former RZPD German Resource Center for Genome Research GmbH, Berlin, Germany
*To whom correspondence should be addressed. Tel: +49-(0)331-567-8750; Fax: +49-(0)331-567-898-750; Email: kersten{at}mpimp-golm.mpg.de
Received August 6, 2008. Accepted September 9, 2008.
| ABSTRACT |
|---|
|
|
|---|
The GABI Primary Database, GabiPD (http://www.gabipd.org/), was established in the frame of the German initiative for Genome Analysis of the Plant Biological System (GABI). The goal of GabiPD is to collect, integrate, analyze and visualize primary information from GABI projects. GabiPD constitutes a repository and analysis platform for a wide array of heterogeneous data from high-throughput experiments in several plant species. Data from different omics fronts are incorporated (i.e. genomics, transcriptomics, proteomics and metabolomics), originating from 14 different model or crop species. We have developed the concept of GreenCards for text-based retrieval of all data types in GabiPD (e.g. clones, genes, mutant lines). All data types point to a central Gene GreenCard, where gene information is integrated from genome projects or NCBI UniGene sets. The centralized Gene GreenCard allows visualizing ESTs aligned to annotated transcripts as well as displaying identified protein domains and gene structure. Moreover, GabiPD makes available interactive genetic maps from potato and barley, and protein 2DE gels from Arabidopsis thaliana and Brassica napus. Gene expression and metabolic-profiling data can be visualized through MapManWeb. By the integration of complex data in a framework of existing knowledge, GabiPD provides new insights and allows for new interpretations of the data.
| INTRODUCTION |
|---|
|
|
|---|
Experimental studies in the post-genomic era generate a very large amount of data from high-throughput experiments on biological systems. Current studies include, among others, expression and metabolite profiles, proteome and interaction data (e.g. DNA–protein and protein–protein interactions), collected at different space and time scales. This increasing flow of data requires computational systems that, besides managing efficiently the enormous quantity of data, are capable of integrating and displaying these disparate data collections in a meaningful and user-friendly way. We have developed the GABI primary database, GabiPD, in order to fulfill these requirements.
GabiPD is a web-accessible database that was developed in the frame of the German initiative for Genome Analysis of the Plant Biological System (Genomanalyse im biologischen System Pflanze, GABI). GabiPD allows a seamless integration of varied omics data types obtained from plant systems and will follow the MIAME (1) and MIAMET (2) standards for storing gene expression and metabolic-profiling data, respectively. Its flexible design allows for a high level of data integration, and eases cross-referencing the different GabiPD data types among each other (e.g. mapping information, sequences and single nucleotide polymorphisms (SNP), 2DE gel images and protein information) and access to public gene/protein-specific information, which in turn provides the users a comprehensive overview of the available information for their particular gene or protein of interest. The integration with genome databases like TAIR (3) and general nucleotide databases like GenBank, as well as cross-links to secondary databases, such as ARAMEMNON (4), PlnTFDB (5), GABI-KAT (6), PhosPhAt (7) and ProMEX (8) further increase the usefulness of GabiPD.
| METHODS AND CONTENTS |
|---|
|
|
|---|
Design and implementation
GabiPD's web interface was developed using Perl and Java in combination with template processing to separate the visualization from the application logic. Our applications are database driven, which means that the application interface logic is derived directly from the database structure (shown in blue in Figure 1). To achieve this, we deploy reverse engineering methods in combination with template processing to generate interfaces to programming languages like Perl or Java, thus supporting all the database-specific actions like insert, update, delete or select. These object-oriented interfaces are automatically generated from the database schema, supporting inheritance, automated key generation and advanced exception handling. The separation of the database application interface from the application and visualization logic (shown in yellow in Figure 1) facilitates fast adjustment to modifications of the data structure and diminishes the efforts on fixing existing application logic during larger database changes.
|
GabiPD content and gene-centric views
Currently, GabiPD includes data originating from 14 different angiosperm species representing the most important lineages in the flowering plants (Figure 2). Arabidopsis thaliana is the most widely represented model species, followed by the crop plants Solanum tuberosum (potato) and Hordeum vulgare (barley). In GabiPD, genomic, transcriptomic, proteomic and metabolomic data are integrated from those species. Genomic data comprise mapping information, sequences and SNP/InDel information. Transcriptomics is represented by a large number of ESTs and corresponding sequence trace files. ESTs are further analysed by BLAST and ORF analysis. For barley, in addition, EST clustering results and corresponding information on a new 27K unigene set is accessible and downloadable. As a type of proteomic data, annotated 2DE gel images from Arabidopsis thaliana and Brassica napus are integrated. Moreover, transcript and metabolite-profiling data are provided via MapManWeb.
|
Most entries of all GabiPD data types are pointing to the central Gene GreenCard and vice versa (see Figure 3 and next section). In the Gene GreenCard, gene information from genome annotation projects or NCBI UniGene sets is integrated and useful links to secondary databases are provided. Currently, the genome annotation (TAIR version 7.0) for A. thaliana (3) is integrated. Annotations for other sequenced species will follow. In order to ease the transfer of knowledge from sequenced to non-sequenced species, i.e. crop plants, we have performed similarity-based mappings between closely related species, i.e. Arabidopsis and Brassica spp.
|
Querying the database
We have developed the concept of GreenCards as a central entry-point for text-based data queries and visualization, which grant public as well as credentials-based access to the integrated data in GabiPD. GreenCards enable users to comprehensively query GabiPD by genotype name, marker or gene name, keyword or GenBank sequence accession number. Searches can be restricted to selected species or data types, while wildcards can be used to broaden the scope of the query. The result of a GreenCard search is presented as a list of hits with links to complete descriptions, i.e. GreenCards. Figure 3 shows an example of this type of search, where the user had entered the gene name FLOWERING LOCUS T as a search term. This search retrieves, among others, an Arabidopsis Gene GreenCard (gene: AT1G65480.1) corresponding to the genome annotation project and Plant GreenCards representing T-DNA insertion lines, e.g. plant 290E08, with flanking sequences that have BLAST hits to the gene AT1G65480.1 and with seeds available from GABI-KAT (6). Moreover, several Clone GreenCards of cDNA clones (e.g. clone: MPMGp2011E01215) in which the keyword is found were retrieved by the search. A more strict relationship between the Gene GreenCard and the Plant and Clone GreenCards is established by similarity-based searches. The best BLAST hit of the sequence, e.g. representing relationship to a cDNA clone or a mutant plant line, appears in the section Related with, and the users can go from the clone or the plant line to the associated gene or vice versa.
Furthermore, the Gene GreenCard, which displays information from genome annotation projects and NCBI UniGene sets, has been extended to include links to secondary databases, such as ARAMEMNON (4), GABI-KAT (6) and ProMEX (8). Additionally, schematic representations of gene sequence features are provided to highlight protein domains identified using the latest PFAM library (9), exon–exon borders and untranslated regions (UTRs) identified by the genome annotation projects (Figure 3). These features are displayed onto a representation of the cDNA sequence.
Alternatively, users can enter their own amino acid or nucleotide sequence to identify, by a BLAST search (10), similar sequences integrated in GabiPD.
In addition to the GreenCard and BLAST search functionality, users can browse and search the genetic maps and 2DE gels stored in GabiPD, through specifically designed visualization tools: (i) 2DEGelViewer by which 2DE gel images can be viewed in an interactive way, which allows retrieving extra information on 2D spots as identified by mass spectrometry (Figure 3); (ii) genetic mapping data can be visualized using YAMB (Yet Another Map Browser; Figure 4) with the possibility to view details on all mapped elements (11), (iii) MapManWeb (Figure 5) allows the visualization and extraction of relevant information from transcript and metabolite-profiling data and the graphical mapping of such data onto diagrams of metabolic pathways and other biological processes (12); and (iv) an extended version of JTrev (13) allows the display of sequence traces with integrated SNP information.
|
|
The GabiPD project page serves as an additional gateway to specific data by providing project-specific views, such as BreedCAM or PoMaMo (11) where potato genomic data and Solanaceae function maps for pathogen resistance are accessible.
| ADDITIONAL TOOLS AVAILABLE FROM GABIPD |
|---|
|
|
|---|
In addition to the data and data visualization available from our site, the newest versions of the following tools are made available for download:
MapMan desktop version: a user-driven software tool that displays large datasets (e.g. gene expression data from Arabidopsis Affymetrix arrays) onto diagrams of biological processes, such as metabolic pathways (12).
SATlotyper: a software tool designed for inferring haplotypes and phased genotypes from unphased SNP data for polyploid and polyallelic heterozygous populations (14).
| FUTURE DIRECTIONS |
|---|
|
|
|---|
The presentation of a wide spectrum of different plant species in GabiPD paves the way for cross-species comparisons that are facilitated by the availability of BLAST hits between the GabiPD sequences and plant NCBI UniGene sets. To ease the transfer of knowledge from sequenced to non-sequenced plant species, the genome annotations of Oryza sativa, Populus trichocarpa and Vitis vinifera will be added and mapped to closely related species in the near future. Furthermore, information about orthologous genes will be included for cross-species studies. Moreover, we will extend our WebServices to provide programmatic access to multiple data types for all plant species in GabiPD.
| FUNDING |
|---|
|
|
|---|
German Ministry for Education and Research (BMBF) (GABI I: 0312272, GABI II: 0313112 and GABI-FUTURE: 0315046); the former German Resource Center for Genome Research (RZPD) GmbH; Max Planck Society. Funding for open access charge: Max Planck Society.
Conflict of interest statement. None declared.
| ACKNOWLEDGEMENTS |
|---|
The GabiPD team thanks the GABI and the WPG (Wirtschaftsverbund Pflanzengenomforschung GABI e.V.) community for providing data and supporting the continuous development of the database. Dr Björn Usadel is acknowledged for helpful discussions and his support in the development of MapManWeb. We wish to thank Dr Patrick Schweitzer, Dr Lothar Altschmied, Dr Uwe Scholz and Dr Nils Stein for the collaboration in the generation of the new 27K barley unigene set and Dr Kathryn F. Beal for sharing the source code of the trace viewer JTrev. Özgür Demir and Sebastian Köhler are acknowledged for further developments of user interfaces and for data integration.
| REFERENCES |
|---|
|
|
|---|
- Ball CA, Brazma A. MGED standards: work in progress. Omics (2006) 10:138–144.[CrossRef][Web of Science][Medline]
- Jenkins H, Hardy N, Beckmann M, Draper J, Smith AR, Taylor J, Fiehn O, Goodacre R, Bino RJ, Hall R, et al. A proposed framework for the description of plant metabolomics experiments and their results. Nat. Biotechnol. (2004) 22:1601–1606.[CrossRef][Web of Science][Medline]
- Swarbreck D, Wilks C, Lamesch P, Berardini TZ, Garcia-Hernandez M, Foerster H, Li D, Meyer T, Muller R, Ploetz L, et al. The Arabidopsis Information Resource (TAIR): gene structure and function annotation. Nucleic Acids Res. (2008) 36:D1009–D1014.
[Abstract/Free Full Text] - Schwacke R, Schneider A, van der Graaff E, Fischer K, Catoni E, Desimone M, Frommer WB, Flugge UI, Kunze R. ARAMEMNON, a novel database for Arabidopsis integral membrane proteins. Plant Physiol. (2003) 131:16–26.
[Abstract/Free Full Text] - Riaño-Pachón DM, Ruzicic S, Dreyer I, Mueller-Roeber B. PlnTFDB: an integrative plant transcription factor database. BMC Bioinformatics (2007) 8:42.[CrossRef][Medline]
- Li Y, Rosso MG, Viehoever P, Weisshaar B. GABI-Kat SimpleSearch: an Arabidopsis thaliana T-DNA mutant database with detailed information for confirmed insertions. Nucleic Acids Res. (2007) 35:D874–D878.
[Abstract/Free Full Text] - Heazlewood JL, Durek P, Hummel J, Selbig J, Weckwerth W, Walther D, Schulze WX. PhosPhAt: a database of phosphorylation sites in Arabidopsis thaliana and a plant-specific phosphorylation site predictor. Nucleic Acids Res. (2008) 36:D1015–D1021.
[Abstract/Free Full Text] - Hummel J, Niemann M, Wienkoop S, Schulze W, Steinhauser D, Selbig J, Walther D, Weckwerth W. ProMEX: a mass spectral reference database for proteins and protein phosphorylation sites. BMC Bioinformatics (2007) 8:216.[CrossRef][Medline]
- Finn RD, Tate J, Mistry J, Coggill PC, Sammut SJ, Hotz HR, Ceric G, Forslund K, Eddy SR, Sonnhammer EL, et al. The Pfam protein families database. Nucleic Acids Res. (2008) 36:D281–D288.
[Abstract/Free Full Text] - Altschul SF, Gish W, Miller W, Myers EW, Lipman DJ. Basic local alignment search tool. J. Mol. Biol. (1990) 215:403–410.[CrossRef][Web of Science][Medline]
- Meyer S, Nagel A, Gebhardt C. PoMaMo-a comprehensive database for potato genome data. Nucleic Acids Res. (2005) 33:D666–D670.
[Abstract/Free Full Text] - Usadel B, Nagel A, Thimm O, Redestig H, Blaesing OE, Palacios-Rojas N, Selbig J, Hannemann J, Piques MC, Steinhauser D, et al. Extension of the visualization tool MapMan to allow statistical analysis of arrays, display of corresponding genes, and comparison with known responses. Plant Physiol. (2005) 138:1195–1204.
[Abstract/Free Full Text] - Bonfield JK, Beal KF, Betts MJ, Staden R. Trev: a DNA trace editor and viewer. Bioinformatics (2002) 18:194–195.
[Abstract/Free Full Text] - Neigenfind J, Gyetvai G, Basekow R, Diehl S, Achenbach U, Gebhardt C, Selbig J, Kersten B. Haplotype inference from unphased SNP data in heterozygous polyploids based on SAT. BMC Genomics (2008) 9:356.[CrossRef][Medline]
- Soltis PS, Soltis DE. The origin and diversification of angiosperms. Am. J. Bot. (2004) 91:1614–1626.
[Abstract/Free Full Text] - Angiosperm Phylogeny Group. An update of the Angiosperm Phylogeny Group classification for the orders and families of flowering plants: APG II. Bot. J. Linn. Soc. (2003) 141:399–436.[CrossRef][Web of Science]
- Knapp S. Tobacco to tomatoes: a phylogenetic perspective on fruit diversity in the Solanaceae. J. Exp. Bot. (2002) 53:2001–2022.
[Abstract/Free Full Text] - Stein N, Prasad M, Scholz U, Thiel T, Zhang H, Wolf M, Kota R, Varshney RK, Perovic D, Grosse I, et al. A 1,000-loci transcript map of the barley genome: new anchoring points for integrative grass genomics. Theor. Appl. Genet. (2007) 114:823–839.[CrossRef][Web of Science][Medline]
This article has been cited by other articles:
![]() |
P. Durek, R. Schmidt, J. L. Heazlewood, A. Jones, D. MacLean, A. Nagel, B. Kersten, and W. X. Schulze PhosPhAt: the Arabidopsis thaliana phosphorylation site database. An update Nucleic Acids Res., October 30, 2009; (2009) gkp810v1. [Abstract] [Full Text] [PDF] |
||||
![]() |
S. M. Brady and N. J. Provart Web-Queryable Large-Scale Data Sets for Hypothesis Generation in Plant Biology PLANT CELL, April 1, 2009; 21(4): 1034 - 1051. [Abstract] [Full Text] [PDF] |
||||
| ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||






