Skip Navigation



Nucleic Acids Research Advance Access published online on November 14, 2007

Nucleic Acids Research, doi:10.1093/nar/gkm981
This Article
Right arrow Abstract Freely available
Right arrow Print PDF (5486K) Freely available
Right arrow Screen PDF (605K) Freely available
Right arrowOA All Versions of this Article:
36/suppl_1/D553    most recent
gkm981v1
Right arrow Alert me when this article is cited
Right arrow Alert me if a correction is posted
Services
Right arrow Email this article to a friend
Right arrow Similar articles in this journal
Right arrow Similar articles in PubMed
Right arrow Alert me to new issues of the journal
Right arrow Add to My Personal Archive
Right arrow Download to citation manager
Right arrow Commercial Re-use Guidelines
for Open Access NAR Content
Google Scholar
Right arrow Articles by Gajria, B.
Right arrow Articles by Brunk, B. P.
Right arrow Search for Related Content
PubMed
Right arrow PubMed Citation
Right arrow Articles by Gajria, B.
Right arrow Articles by Brunk, B. P.
Social Bookmarking
 Add to CiteULike   Add to Connotea   Add to Del.icio.us  
What's this?

© 2007 The Author(s)
This is an Open Access article distributed under the terms of the Creative Commons Attribution Non-Commercial License (http://creativecommons.org/licenses/by-nc/2.0/uk/) which permits unrestricted non-commercial use, distribution, and reproduction in any medium, provided the original work is properly cited.


Database Issue

ToxoDB: an integrated Toxoplasma gondii database resource

Bindu Gajria1, Amit Bahl1, John Brestelli2, Jennifer Dommer2, Steve Fischer2, Xin Gao2, Mark Heiges3, John Iodice2, Jessica C. Kissinger3,4,*, Aaron J. Mackey1, Deborah F. Pinney2, David S. Roos1, Christian J. Stoeckert, Jr2, Haiming Wang3 and Brian P. Brunk2

1Department of Biology, 2Department of Genetics, Center for Bioinformatics, University of Pennsylvania, Philadelphia, PA, 3Center for Tropical & Emerging Global Diseases and 4Department of Genetics, University of Georgia, Athens, GA, USA

*To whom correspondence should be addressed: Tel: +1 706 542 6562; Fax: +1 706 542 3582; Email: jkissing{at}uga.edu

Received September 17, 2007. Revised October 17, 2007. Accepted October 18, 2007.


    ABSTRACT
 TOP
 ABSTRACT
 INTRODUCTION
 CONTENT OF THE CURRENT...
 DATA-MINING TOOLS
 FUTURE DIRECTIONS
 REFERENCES
 
ToxoDB (http://ToxoDB.org) is a genome and functional genomic database for the protozoan parasite Toxoplasma gondii. It incorporates the sequence and annotation of the T. gondii ME49 strain, as well as genome sequences for the GT1, VEG and RH (Chr Ia, Chr Ib) strains. Sequence information is integrated with various other genomic-scale data, including community annotation, ESTs, gene expression and proteomics data. ToxoDB has matured significantly since its initial release. Here we outline the numerous updates with respect to the data and increased functionality available on the website.


    INTRODUCTION
 TOP
 ABSTRACT
 INTRODUCTION
 CONTENT OF THE CURRENT...
 DATA-MINING TOOLS
 FUTURE DIRECTIONS
 REFERENCES
 
Toxoplasma gondii is an intracellular apicomplexan parasite capable of infecting humans. Infection is typically asymptomatic in healthy individuals, but may lead to congenital birth defects and encephalitis in immuno-suppressed individuals (1,2). ToxoDB, initially released in May 2001, has been substantially updated in both content and functionality since last described in January 2003 (3). ToxoDB provides access to the genome sequence and annotation of the T. gondii ME49 strain. It also incorporates the genomic sequence of multiple other strains. The parasite genome is ~63 Mb in size and consists of 14 chromosomes (4).

The initial ToxoDB release was not supported by a relational database and thus the site had restricted functionality and little capability to integrate diverse data types such as gene expression data and single nucleotide polymorphism data (SNPs) with genomic sequence. Since initial publication, ToxoDB has been completely rebuilt using a common architecture similar to another apicomplexan database project, PlasmoDB (5). Both sites, along with CryptoDB, are component sites of ApiDB, the Apicomplexan Bioinformatics Resource Center (6). Many of the new methods of data loading, querying and presentation that are mentioned here have been applied to all of the ApiDB sites to provide a common research platform and facilitate data access among this group of related organisms. ApiDB (http://apidb.org/) serves as an ‘umbrella’ site for cross-species comparisons. Researchers can mine for Toxoplasma genes at ApiDB directly or via their orthologous relationship(s) to genes in other apicomplexan species.


    CONTENT OF THE CURRENT RELEASE
 TOP
 ABSTRACT
 INTRODUCTION
 CONTENT OF THE CURRENT...
 DATA-MINING TOOLS
 FUTURE DIRECTIONS
 REFERENCES
 
Data
ToxoDB provides access to the genome sequence and annotation of T. gondii (ME49 strain) and the genomic sequence of the GT1, VEG and RH (Chr Ia and Chr Ib) strains. Annotation is also available for the apicoplast genome. The current database version (Release 4.2) also contains manual annotation (solicited in the initial genome annotation and entered by users as user comments), ESTs, TIGR Gene Indices clustered ESTs, SAGE tags, SNPs, cosmid and BAC ends, microarray and proteomics studies, all of which have been mapped to the genome (7,8). The database contains the results of automated analyses including gene predictions (using various algorithms), open reading frames (ORFs) greater than 50 aa and protein feature predictions [signal peptides, transmembrane domains, hydrophobicity plots, AA content and InterPro domains (9)], Gene Ontology function predictions, and BLAST similarities to the NCBI non-redundant protein database (Table 1).


View this table:
[in this window]
[in a new window]

 
Table 1. Data and analyses that have been integrated into ToxoDB and the number of genes that are impacted

 
In addition, we have used the OrthoMCL algorithm to group genes from T. gondii with orthologous genes from 86 other eukaryotic and prokaryotic genomes (10). A mapping of immune epitopes identified in Toxoplasma provided by the Immune Epitope Database and Analysis Resource (IEDB) (11) has been integrated. Affymetrix probes mapped to the genome are visible in GBrowse, as are SNPs generated from nucmer alignments of sequences from the GT1, VEG and RH (Chr Ia and Ib) strains against the reference ME49 sequence. Two expression experiments utilizing a Toxoplasma Affymetrix array have also been deposited in ToxoDB. Users gain access to these new data types in record pages and by queries using the powerful query interface (see Data-Mining section).

Database architecture
As a part of the complete restructure of the ToxoDB resource, the practice of using flat files as a means of data storage was abandoned in early 2006. We now use GUS 3.5, and load data into an underlying Oracle database in a systematic fashion. GUS (Genomics Unified Schema) is an open source project (www.gusdb.org) with a rich relational schema including sequence annotation, expression data and proteomics using controlled vocabularies and ontologies (12).

ToxoDB also employs the GUS WDK (Web Development Kit, www.gusdb.org/wdk), to access the database from the internet dramatically improving the way the website operates. This transformation has added considerable increased functionality for database users and conforms to the model used by all ApiDB projects, making it possible for us to generate future database releases in short cycles.


    DATA-MINING TOOLS
 TOP
 ABSTRACT
 INTRODUCTION
 CONTENT OF THE CURRENT...
 DATA-MINING TOOLS
 FUTURE DIRECTIONS
 REFERENCES
 
ToxoDB currently provides 40 different queries of the data and several ancillary tools for analyzing, retrieving or viewing the data such as BLAST, Pathway Tools and an installation of the GMOD project Genome Browser (13). The ToxoDB ‘Query & Tools’ page has been restructured to make all queries available at a glance. Most of the individual queries have been reorganized into categories such as ‘Position’, ‘Expression’ and ‘Function’ to make them more intuitive to the average researcher. Enhanced functionality for the queries has also been added. For example, the ToxoDB keyword search has been significantly improved, offering the user control over which fields in the database are searched, including the official annotation, synonyms, user-supplied comments, domain names, BLAST similarities, etc. Many queries, such as ‘Find SNPs based on Gene ID’, now allow a gene ID list as input [either typed (or copied) by hand or uploaded from a file] facilitating analyses on large groups of genes. The results from all queries can be sorted based on various criteria (columns in the returned data set) and users can also add additional criteria for display (e.g. add columns to display protein features, GO annotation, expression characteristics for gene results, etc.) and sort on them as well. Once the appropriate selection of data types to display has been achieved, users can integrate these search results with other search results using the ‘Query History’ page, or the data can be downloaded in multiple formats for further analysis by the researcher (Figure 1).


Figure 1
View larger version (66K):
[in this window]
[in a new window]
[Download PowerPoint slide]
 
Figure 1. Screenshots showing the flow of a query in ToxoDB. From the Query & Tools page, users can go to particular queries for expression evidence (EST or Mass Spec Evidence), to the Results page where they can sort, manage (add or delete) columns of data and open gene pages. The Query History page permits users to manipulate previous queries including combining them and/or downloading the resulting data. Individual genes are listed on the Gene Results page and each gene has its own gene page, illustrated here by the gene encoding elongation factor 1-alpha. The gene page summarizes all information that is available for a gene including gene model predictions, SNPs, BLAST similarities, protein domains, ESTs, proteomic evidence of expression and microarray expression analyses.

 
ToxoDB uses the GBrowse genome browser (www.gmod.org) (13) to display gene models, EST alignments, SNPs, SAGE tags, etc. GBrowse enables visualization of the parasite genome and gene models, custom restriction-site identification, open reading frame identification, and facilitates download of data in various formats. Different data sets or analyses are displayed as individual tracks within the genome browser. There are approximately 50 GBrowse tracks available in the current version of ToxoDB. All genome sequences [ME49, GT1, VEG and RH (Chr Ia, Ib)] are also available in BLAST-searchable databases and for download in FASTA, GenBank and EMBL formats.

ToxoDB users may now register and log in to the site. Doing so enables a researcher to add comments to genes and genomic sequences. It also lets users save query results permanently. Queries in the Query History page can be organized (re-named or deleted) as well as combined with other results (Figure 1). This is a very powerful feature that allows users to refine their results so that precise sets of genes can be discovered.

The results may be downloaded using ToxoDB's improved reporting facility. It supports summary reports (Excel compatible tab delimited text), GFF, FASTA and a detailed report that includes almost all available data for each gene in the users result table. Use of this facility as well as many others on the site are now described in short video tutorials that are accessible from the database home page.


    FUTURE DIRECTIONS
 TOP
 ABSTRACT
 INTRODUCTION
 CONTENT OF THE CURRENT...
 DATA-MINING TOOLS
 FUTURE DIRECTIONS
 REFERENCES
 
The last two years were spent on major infrastructure and design elements for ToxoDB. Our future growth will be in the area of increased data acquisition and integration with existing and future data sets. Specifically, we are planning to load and integrate many expression data sets (RNA expression and protein expression) that are just becoming available. We also expect to load and integrate other array-based data sets such as ChIP on Chip and array CGH. As new data are added, we will be adding additional queries and tools to view these data. An area of significant future development will be improving the ability of users to compare the various different sequenced parasite strains visually and download sequence alignments between them.


    ACKNOWLEDGEMENTS
 
The authors would like to acknowledge the significant contributions of data to the database by members of the Toxoplasma research community and the genome sequencing centers. Without their generous contributions, often of pre-publication data, this integrated database resource would not be possible. We also thank the numerous staff and students, both past and present, of the ApiDB-BRC and our research laboratories whose contributions have facilitated the creation and maintenance of this database resource. This project has been funded in whole or in part with Federal funds from the National Institute of Allergy and Infectious Diseases, National Institutes of Health, Department of Health and Human Services, under Contract No. HHSN266200400037C. Funding to pay the Open Access publication charges for this article was provided by this contract.

Conflict of interest statement. None declared.


    Footnotes
 
Present address: Aaron J. Mackey, GlaxoSmithKline, Collegeville, PA, USA


    REFERENCES
 TOP
 ABSTRACT
 INTRODUCTION
 CONTENT OF THE CURRENT...
 DATA-MINING TOOLS
 FUTURE DIRECTIONS
 REFERENCES
 

  1. Remington JS, Desmonts G. Toxoplasmosis. In: Infectious Diseases of the Fetus and Newborn Infant.—Remington JS, Klein JO, eds. (1989) Philadelphia, PA: W. B. Saunders. 89–195.

  2. Luft BJ, Remington JS. Toxoplasmic encephalitis in AIDS patients. Clin. Infect. Dis. (1992) 15:211–222.[ISI][Medline]

  3. Kissinger JC, Gajria B, Li L, Paulsen IT, Roos DS. ToxoDB: accessing the Toxoplasma gondii genome. Nucleic Acids Res. (2003) 31:234–236.[Abstract/Free Full Text]

  4. Khan A, Taylor S, Su C, Mackey AJ, Boyle J, Cole R, Glover D, Tang K, Paulsen IT, et al. Composite genome map and recombination parameters derived from three archetypal lineages of Toxoplasma gondii. Nucleic Acids Res. (2005) 33:2980.[Abstract/Free Full Text]

  5. Bahl A, Brunk B, Crabtree J, Fraunholz MJ, Gajria B, Grant GR, Ginsburg H, Gupta D, Kissinger JC, et al. PlasmoDB. The Plasmodium genome resource: a database integrating experimental and computational data. Nucleic Acids Res. (2003) 31:212–215.[Abstract/Free Full Text]

  6. Aurrecoechea C, Heiges M, Wang H, Wang Z, Fischer S, Rhodes P, Miller J, Kraemer E, Stoeckert CJ, et al. ApiDB: integrated resources for the apicomplexan bioinformatics resource center. Nucleic Acids Res. (2007) 35:D427–D430.[Abstract/Free Full Text]

  7. Hu K, Johnson J, Florens L, Fraunholz M, Suravajjala S, DiLullo C, Yates J, Roos DS, Murray JM. Cytoskeletal components of an invasion machine – the apical complex of Toxoplasma gondii. PLoS Pathog. (2006) 2:e13.[CrossRef][Medline]

  8. Bradley PJ, Ward C, Cheng SJ, Alexander DL, Coller S, Coombs GH, Dunn JD, Ferguson DJ, Sanderson SJ, et al. Proteomic analysis of rhoptry organelles reveals many novel constituents for host-parasite interactions in Toxoplasma gondii. J. Biol. Chem. (2005) 280:34245–34258.[Abstract/Free Full Text]

  9. Mulder NJ, Apweiler R, Attwood TK, Bairoch A, Bateman A, Binns D, Bork P, Buillard V, Cerutti L, et al. New developments in the InterPro database. Nucleic Acids Res. (2007) 35:D224–D228.[Abstract/Free Full Text]

  10. Chen F, Mackey AJ, Stoeckert C.J. Jr, Roos DS. OrthoMCL-DB: querying a comprehensive multi-species collection of ortholog groups. Nucleic Acids Res. (2006) 34:D363–D368.[Abstract/Free Full Text]

  11. Peters B, Sidney J, Bourne P, Bui HH, Buus S, Doh G, Fleri W, Kronenberg M, Kubo R, et al. The immune epitope database and analysis resource: from vision to blueprint. PLoS Biol. (2005) 3:e91.[CrossRef][Medline]

  12. Davidson S, Crabtree J, Brunk BP, Schug J, Tannen V, Overton GC, Stoeckert C.J. Jr. K2/Klesli and GUS: experiments in integrated access to genomic data sources. IBM Syst. J. (2001) 40:512–531.

  13. Stein LD, Mungall C, Shu S, Caudy M, Mangone M, Day A, Nickerson E, Stajich JE, Harris TW, et al. The generic genome browser: a building block for a model organism system database. Genome Res. (2002) 12:1599–1610.[Abstract/Free Full Text]

  14. Radke JR, Behnke MS, Mackey AJ, Radke JB, Roos DS, White MW. The transcriptome of Toxoplasma gondii. BMC Biol. (2005) 3:26.[CrossRef][Medline]


Add to CiteULike CiteULike   Add to Connotea Connotea   Add to Del.icio.us Del.icio.us    What's this?


This article has been cited by other articles:


Home page
Infect. Immun.Home page
Y. Goto, D. Carter, and S. G. Reed
Immunological Dominance of Trypanosoma cruzi Tandem Repeat Proteins
Infect. Immun., September 1, 2008; 76(9): 3967 - 3974.
[Abstract] [Full Text] [PDF]


This Article
Right arrow Abstract Freely available
Right arrow Print PDF (5486K) Freely available
Right arrow Screen PDF (605K) Freely available
Right arrowOA All Versions of this Article:
36/suppl_1/D553    most recent
gkm981v1
Right arrow Alert me when this article is cited
Right arrow Alert me if a correction is posted
Services
Right arrow Email this article to a friend
Right arrow Similar articles in this journal
Right arrow Similar articles in PubMed
Right arrow Alert me to new issues of the journal
Right arrow Add to My Personal Archive
Right arrow Download to citation manager
Right arrow Commercial Re-use Guidelines
for Open Access NAR Content
Google Scholar
Right arrow Articles by Gajria, B.
Right arrow Articles by Brunk, B. P.
Right arrow Search for Related Content
PubMed
Right arrow PubMed Citation
Right arrow Articles by Gajria, B.
Right arrow Articles by Brunk, B. P.
Social Bookmarking
 Add to CiteULike   Add to Connotea   Add to Del.icio.us  
What's this?