Skip Navigation

Nucleic Acids Research 2006 34(Database Issue):D419-D422; doi:10.1093/nar/gkj078
This Article
Right arrow Abstract Freely available
Right arrow Print PDF (2223K) Freely available
Right arrow Screen PDF (299K) Freely available
Right arrow Alert me when this article is cited
Right arrow Alert me if a correction is posted
Services
Right arrow Email this article to a friend
Right arrow Similar articles in this journal
Right arrow Similar articles in PubMed
Right arrow Alert me to new issues of the journal
Right arrow Add to My Personal Archive
Right arrow Download to citation manager
Right arrow Commercial Re-use Guidelines
for Open Access NAR Content
Google Scholar
Right arrow Articles by Heiges, M.
Right arrow Articles by Kissinger, J. C.
Right arrow Search for Related Content
PubMed
Right arrow PubMed Citation
Right arrow Articles by Heiges, M.
Right arrow Articles by Kissinger, J. C.
Social Bookmarking
 Add to CiteULike   Add to Connotea   Add to Del.icio.us  
What's this?

Nucleic Acids Research, 2006, Vol. 34, Database issue D419-D422
© The Author 2006. Published by Oxford University Press. All rights reserved
The online version of this article has been published under an open access model. Users are entitled to use, reproduce, disseminate, or display the open access version of this article for non-commercial purposes provided that: the original authorship is properly and fully attributed; the Journal and Oxford University Press are attributed as the original place of publication with the correct citation details given; if an article is subsequently reproduced or disseminated not in its entirety but only in part or as a derivative work this must be clearly indicated. For commercial re-use, please contact journals.permissions{at}oxfordjournals.org


Article

CryptoDB: a Cryptosporidium bioinformatics resource update

Mark Heiges, Haiming Wang, Edward Robinson, Cristina Aurrecoechea, Xin Gao1,2, Nivedita Kaluskar1, Philippa Rhodes1, Sammy Wang1, Cong-Zhou He1, Yanqi Su1, John Miller1, Eileen Kraemer1 and Jessica C. Kissinger2,*

Center for Tropical and Emerging Global Diseases, University of Georgia Athens, GA, USA 1Department of Computer Science, University of Georgia Athens, GA, USA 2Department of Genetics, University of Georgia Athens, GA, USA

*To whom correspondence should be addressed. Tel: +1 706 542 6562; Fax: +1 706 542 3910; Email: jkissing{at}uga.edu

Received September 15, 2005. Accepted October 11, 2005.


    ABSTRACT
 TOP
 ABSTRACT
 INTRODUCTION
 UPDATED DATASET
 IMPROVED DATABASE AND WEB...
 ANALYSIS AND RESEARCH TOOLS
 AFFILIATIONS
 WEB SERVICES AND DATABASE...
 FUTURE PLANS
 REFERENCES
 
The database, CryptoDB (http://CryptoDB.org), is a community bioinformatics resource for the AIDS-related apicomplexan-parasite, Cryptosporidium. CryptoDB integrates whole genome sequence and annotation with expressed sequence tag and genome survey sequence data and provides supplemental bioinformatics analyses and data-mining tools. A simple, yet comprehensive web interface is available for mining and visualizing the data. CryptoDB is allied with the databases PlasmoDB and ToxoDB via ApiDB, an NIH/NIAID-fundedBioinformatics Resource Center. Recent updates to CryptoDB include the deposition of annotated genome sequences for Cryptosporidium parvum and Cryptosporidium hominis, migration to a relational database (GUS), a new query and visualization interface and the introduction of Web services.


    INTRODUCTION
 TOP
 ABSTRACT
 INTRODUCTION
 UPDATED DATASET
 IMPROVED DATABASE AND WEB...
 ANALYSIS AND RESEARCH TOOLS
 AFFILIATIONS
 WEB SERVICES AND DATABASE...
 FUTURE PLANS
 REFERENCES
 
The Apicomplexan parasite Cryptosporidium is a global causative agent of severe and chronic diarrheal disease in humans and other animals. As no reliable chemo- or immuno-therapy is currently available, infections can be life threatening for people with a compromised immune system, such as AIDS patients. The pathogen is typically spread via contaminated drinking water and is resistant to water chlorination and filtration (1). Because of the water safety threat to public health, Cryptosporidium is ranked as a Category B Biodefense Pathogen by the National Institutes of Health. Bioinformatics analysis plays an important role in understanding the biology of and identification of potential drug targets in this medically important parasite. To aid the research community in this line of inquiry, the online database CryptoDB continues to update and expand its role of warehousing and interfacing Cryptosporidium genome sequence, annotation, sequence analysis and other Cryptosporidium-related information.


    UPDATED DATASET
 TOP
 ABSTRACT
 INTRODUCTION
 UPDATED DATASET
 IMPROVED DATABASE AND WEB...
 ANALYSIS AND RESEARCH TOOLS
 AFFILIATIONS
 WEB SERVICES AND DATABASE...
 FUTURE PLANS
 REFERENCES
 
Version 3.0 of CryptoDB was released in April 2005 and contains the published genome sequence and annotation for Cryptosporidium hominis, strain TU502 (2) and Cryptosporidium parvum, strain IOWA (3). The database houses copies of assembled genome contigs and gene annotations deposited in GenBank (4) by the sequence generators. The C.parvum genome sequence is represented by 18 contigs ranging in size from 17 kb to 1.2 Mb in length and annotated with 3885 total genes. The C.hominis genome sequence is represented in 1422 contigs ranging in size from a few hundred to 90 thousand base pairs in length and annotated with 3956 total genes. C.parvum chromosome 6 has been independently sequenced and annotated (5) and is represented in the database.

In addition to the data provided by genome sequencing efforts, ~6 Mb of genome survey sequence (GSS) and expressed sequence tag (EST) (6) data are incorporated. The ESTs are clustered into RNA transcripts and aligned to the genome using the methodology applied at ApiEST-DB (7).

Gene annotations provided by the genome sequencing centers are augmented with supplemental analyses. Pre-computed BLASTX analyses of Cryptosporidium contigs versus the GenBank non-redundant protein database and EST alignments to contig sequences offer supporting evidence for gene predictions. Potential syntenic relationships of the C.hominis and C.parvum contig sequences are calculated and graphically displayed. Protein feature predictions of signal peptides and transmembrane domains are provided. Open reading frames >50 and >100 amino acids in length have been calculated for all nucleic acid sequences in all six reading frames.

All sequence datasets are available for bulk download in FASTA format. Programmatic access to selected resources is provided via Web service interfaces.


    IMPROVED DATABASE AND WEB INTERFACE
 TOP
 ABSTRACT
 INTRODUCTION
 UPDATED DATASET
 IMPROVED DATABASE AND WEB...
 ANALYSIS AND RESEARCH TOOLS
 AFFILIATIONS
 WEB SERVICES AND DATABASE...
 FUTURE PLANS
 REFERENCES
 
CryptoDB 3.0 is backed by a relational database utilizing the Genomics Unified Schema (GUS, GUSdb.org) (8) and Oracle 10g. Migration to a relational database architecture marks a major improvement over previous releases because of the new services and resources that can now be offered.

The CryptoDB web interface provides a set of forms through which users can easily query the annotation and pre-computed analysis data (Figure 1A). Queries for contig sequence, gene and protein features are possible and can be restricted to either or both of the hosted species genomes. At the gene level, users can conduct text searches of gene product descriptions, search for genes by RNA type (mRNA, rRNA, snoRNA and tRNA) and find genes having alignments to C.parvum ESTs. For protein features, users can select genes predicted by SignalP (9) to encode a signal peptide or predicted by TMHMM (10) to contain transmembrane domains. Users may also retrieve a specific gene by locus tag or a contig sequence by accession number. Ad hoc data selections not obtainable via the provided queries may be requested by email to help{at}cryptodb.org.



View larger version (69K):
[in this window]
[in a new window]
 
Figure 1 Database functionality. (A) Searches are initiated via queries provided on the web site's front page. (B) The results are returned as a summary table with links to detailed record pages. (C) Detailed record page with summary of all available data/information for this gene or contig sequence. (D) GBrowse of genomic region of interest provides a graphical view of annotations and similarity analyses. (E) Search results, such as BLAST reports, are linked to detailed records and to a sequence retrieval utility.

 
Gene pages and contig sequence pages provide a detailed view of annotation and analysis for a given record in the database. Gene pages contain a text overview of the gene, including the coordinate position on its contig and product description when available (Figure 1C). GBrowse (11) has been utilized to provide a graphic display of annotated gene features and the data mapped to the genome, BLAST hits, ESTs, etc. (Figure 1D).

The web interface includes a mechanism to allow users to readily download the sequences and other attributes associated with their query result set. A query history permits users to track their searches and combine them into more complex queries across data types (e.g. ‘list all genes on chromosome 3 that contain transmembrane domains’).


    ANALYSIS AND RESEARCH TOOLS
 TOP
 ABSTRACT
 INTRODUCTION
 UPDATED DATASET
 IMPROVED DATABASE AND WEB...
 ANALYSIS AND RESEARCH TOOLS
 AFFILIATIONS
 WEB SERVICES AND DATABASE...
 FUTURE PLANS
 REFERENCES
 
Several tools for data mining augment the published annotations and pre-computed analyses. Users may BLAST their own sequences against the genomic contig, annotated protein, GSS and EST sequence databases (Figure 1E). A motif search tool finds protein sequences with PROSITE (12) or user-defined amino acid patterns. Keywords from Cryptosporidium genomic sequences versus GenBank NRDB BLASTX results are indexed and searchable. In each case, the results contain links back to detailed gene, protein or contig record pages or to external databases (e.g. GenBank) as appropriate (Figure 1B).

To facilitate tracking of the latest literature, PubCrawler (13) is used to poll NCBI's PubMed and GenBank each week day for new Cryptosporidium-related updates.


    AFFILIATIONS
 TOP
 ABSTRACT
 INTRODUCTION
 UPDATED DATASET
 IMPROVED DATABASE AND WEB...
 ANALYSIS AND RESEARCH TOOLS
 AFFILIATIONS
 WEB SERVICES AND DATABASE...
 FUTURE PLANS
 REFERENCES
 
CryptoDB is a member of ApiDB.org, an NIH/NIAID funded Bioinformatics Resource Center (BRC) for Biodefense and Emerging or Re-Emerging Infectious Diseases (www.niaid.nih.gov/dmid/genomes/brc/default.htm). Other ApiDB members include the genome databases for Plasmodium (PlasmoDB) (14,15) and Toxoplasma (ToxoDB) (16) and the Apicomplexan EST database, ApiEST-DB (7). CryptoDB and other member databases are linking to ApiDB in a coordinated effort to promote comparative studies and ease of access across these apicomplexan genomes.


    WEB SERVICES AND DATABASE NEWS
 TOP
 ABSTRACT
 INTRODUCTION
 UPDATED DATASET
 IMPROVED DATABASE AND WEB...
 ANALYSIS AND RESEARCH TOOLS
 AFFILIATIONS
 WEB SERVICES AND DATABASE...
 FUTURE PLANS
 REFERENCES
 
To facilitate database integration with ApiDB, other NIAID BRC's and programmatic access of CryptoDB by others, web services for CryptoDB have been implemented. Web services are pieces of software that can communicate across the Internet to build distributed applications. They can do this regardless of the software used for their implementation as long as they use a common protocol, SOAP (17). CryptoDB uses SOAP and provides published WSDL files and sample client software in Java (using Axis) and PERL (using SOAP::Lite). Currently, one service that retrieves FASTA sequence files is active. Additional services and infrastructure (18) are planned.

To facilitate the dissemination of news and updates related to CryptoDB, we have established a Really Simple Syndication news feed (RSS) that is displayed on the home page of CryptoDB and ApiDB and can be read by any RSS news aggregator.


    FUTURE PLANS
 TOP
 ABSTRACT
 INTRODUCTION
 UPDATED DATASET
 IMPROVED DATABASE AND WEB...
 ANALYSIS AND RESEARCH TOOLS
 AFFILIATIONS
 WEB SERVICES AND DATABASE...
 FUTURE PLANS
 REFERENCES
 
CryptoDB is fully funded and staffed with biologists and software developers with close ties to software developers for GUS, ToxoDB and PlasmoDB. This fertile ground will support many opportunities for frequent database updates and expansions with new data types, analyses, data-mining tools and visual displays. Gene ontology terms and protein feature signatures from InterProScan (19) analyses will be included with gene records. Improvements to visualization of genome-wide synteny are planned. SRI International's Pathway Tools software (20) is being added to facilitate analyses of metabolic pathways in both annotated Cryptosporidium genome sequences. Future releases of CryptoDB will publish this information, as ‘CryptoCyc’ for querying and visualization in a graphical display.

To facilitate data sharing, CryptoDB has the capacity to activate a Distributed Annotation Server (DAS) (21) via a DAS-GUS adapter if needed.

CryptoDB will continue to work with the ApiDB consortium to further integrate its resources with other apicomplexan genome sites. Database federating technologies, web services and portal designs are currently being implemented toward this end. Data exchange and interoperability with other NIAID Bioinformatics Resource Centers will be a continued effort.


    ACKNOWLEDGEMENTS
 
CryptoDB is one component of the ApiDB project co-administered by David Roos, Christian Stoeckert and Jessica Kissinger. The architecture and infrastructure development applied to CryptoDB has greatly benefited from collaborative ApiDB efforts directed toward database integration, GUS schema development and the new web development kit. The authors thank our database development collaborators: Steve Fischer, Bindu Gajria, Thomas Gan, John Iodice, Aaron Mackey, Jules Milgram, Debbie Pinney and Michael Saffitz. This project has been funded in whole or in part with Federal funds from the National Institute of Allergy and Infectious Diseases, National Institutes of Health, Department of Health and Human Services, under Contract No. HHSN266200400037C. Funding to pay the Open Access publication charges for this article was provided by the National Institute of Allergy and Infectious Diseases, National Institutes of Health, Department of Health and Human Services, under Contract No. HHSN266200400037C.

Conflict of interest statement. None declared.


    REFERENCES
 TOP
 ABSTRACT
 INTRODUCTION
 UPDATED DATASET
 IMPROVED DATABASE AND WEB...
 ANALYSIS AND RESEARCH TOOLS
 AFFILIATIONS
 WEB SERVICES AND DATABASE...
 FUTURE PLANS
 REFERENCES
 

  1. Fayer, R. Cryptosporidium and Cryptosporidiosis, (1997) Boca Raton, FL CRC Press, Inc .

  2. Xu, P., Widmer, G., Wang, Y., Ozaki, L.S., Alves, J.M., Serrano, M.G., Puiu, D., Manque, P., Akiyoshi, D., Mackey, A.J., et al. (2004) The genome of Cryptosporidium hominis Nature, 431, 1107–1112[CrossRef][Medline] .

  3. Abrahamsen, M.S., Templeton, T.J., Enomoto, S., Abrahante, J.E., Zhu, G., Lancto, C.A., Deng, M., Liu, C., Widmer, G., Tzipori, S., et al. (2004) Complete genome sequence of the apicomplexan, Cryptosporidium parvum Science, 304, 441–445[Abstract/Free Full Text] .

  4. Xu, P., Widmer, G., Wang, Y., Ozaki, L.S., Alves, J.M., Serrano, M.G., Puiu, D., Manque, P., Akiyoshi, D., Mackey, A.J., et al. (2004) Corrigendum: the genome of Cryptosporidium hominis Nature, 432, 415[CrossRef] .

  5. Bankier, A.T., Spriggs, H.F., Fartmann, B., Konfortov, B.A., Madera, M., Vogel, C., Teichmann, S.A., Ivens, A., Dear, P.H. (2003) Integrated mapping, chromosomal sequencing and sequence analysis of Cryptosporidium parvum Genome Res, . 13, 1787–1799[Abstract/Free Full Text] .

  6. Strong, W.B. and Nelson, R.G. (2000) Preliminary profile of the Cryptosporidium parvum genome: an expressed sequence tag and genome survey sequence analysis Mol. Biochem. Parasitol, . 107, 1–32[CrossRef][Web of Science][Medline] .

  7. Li, L., Crabtree, J., Fischer, S., Pinney, D., Stoeckert, C.J., Jr, Sibley, L.D., Roos, D.S. (2004) ApiEST-DB: analyzing clustered EST data of the apicomplexan parasites Nucleic Acids Res, . 32, D326–D328[Abstract/Free Full Text] .

  8. Davidson, S.B., Crabtree, J., Brunk, B., Schug, J., Tannen, V., Overton, G.C., Stoeckert, J.C.J. (2001) K2/Kleisli and GUS: experiments in integrated access to genomic data sources IBM systems Journal, 40, 512–531 .

  9. Bendtsen, J.D., Nielsen, H., von Heijne, G., Brunak, S. (2004) Improved prediction of signal peptides: SignalP 3.0 J. Mol. Biol, . 340, 783–795[CrossRef][Web of Science][Medline] .

  10. Krogh, A., Larsson, B., von Heijne, G., Sonnhammer, E.L. (2001) Predicting transmembrane protein topology with a hidden Markov model: application to complete genomes J. Mol. Biol, . 305, 567–580[CrossRef][Web of Science][Medline] .

  11. Stein, L.D., Mungall, C., Shu, S., Caudy, M., Mangone, M., Day, A., Nickerson, E., Stajich, J.E., Harris, T.W., Arva, A., et al. (2002) The generic genome browser: a building block for a model organism system database Genome Res, . 12, 1599–1610[Abstract/Free Full Text] .

  12. Sigrist, C.J., Cerutti, L., Hulo, N., Gattiker, A., Falquet, L., Pagni, M., Bairoch, A., Bucher, P. (2002) PROSITE: a documented database using patterns and profiles as motif descriptors Brief. Bioinform, . 3, 265–274[Abstract/Free Full Text] .

  13. Hokamp, K. and Wolfe, K.H. (2004) PubCrawler: keeping up comfortably with PubMed and GenBank Nucleic Acids Res, . 32, W16–W19[Abstract/Free Full Text] .

  14. Bahl, A., Brunk, B., Crabtree, J., Fraunholz, M.J., Gajria, B., Grant, G.R., Ginsburg, H., Gupta, D., Kissinger, J.C., Labo, P., et al. (2003) PlasmoDB: the Plasmodium genome resource. A database integrating experimental and computational data Nucleic Acids Res, . 31, 212–215[Abstract/Free Full Text] .

  15. Kissinger, J.C., Brunk, B.P., Crabtree, J., Fraunholz, M.J., Gajria, B., Milgram, A.J., Pearson, D.S., Schug, J., Bahl, A., Diskin, S.J., et al. (2002) The Plasmodium genome database Nature, 419, 490–492[CrossRef][Medline] .

  16. Kissinger, J.C., Gajria, B., Li, L., Paulsen, I.T., Roos, D.S. (2003) ToxoDB: accessing the Toxoplasma gondii genome Nucleic Acids Res, . 31, 234–236[Abstract/Free Full Text] .

  17. Stein, L. (2002) Creating a bioinformatics nation Nature, 417, 119–120[CrossRef][Medline] .

  18. Sivashanmugam, K., Miller, J.A., Sheth, A.P., Verma, K. (2004) Framework for semantic web process composition International Journal of Electronic Commerce, 9, 71–106 .

  19. Quevillon, E., Silventoinen, V., Pillai, S., Harte, N., Mulder, N., Apweiler, R., Lopez, R. (2005) InterProScan: protein domains identifier Nucleic Acids Res, . 33, W116–W120[Abstract/Free Full Text] .

  20. Karp, P.D., Paley, S., Romero, P. (2002) The Pathway Tools software Bioinformatics, 18, Suppl. 1, S225–S232[Abstract] .

  21. Dowell, R.D., Jokerst, R.M., Day, A., Eddy, S.R., Stein, L. (2001) The distributed annotation system BMC Bioinformatics, 2, 7[CrossRef][Medline] .


Add to CiteULike CiteULike   Add to Connotea Connotea   Add to Del.icio.us Del.icio.us    What's this?


This article has been cited by other articles:


Home page
Nucleic Acids ResHome page
C. Aurrecoechea, J. Brestelli, B. P. Brunk, S. Fischer, B. Gajria, X. Gao, A. Gingle, G. Grant, O. S. Harb, M. Heiges, et al.
EuPathDB: a portal to eukaryotic pathogen databases
Nucleic Acids Res., November 13, 2009; (2009) gkp941v1.
[Abstract] [Full Text] [PDF]


Home page
Nucleic Acids ResHome page
G. Ostlund, T. Schmitt, K. Forslund, T. Kostler, D. N. Messina, S. Roopra, O. Frings, and E. L. L. Sonnhammer
InParanoid 7: new algorithms and tools for eukaryotic orthology analysis
Nucleic Acids Res., November 5, 2009; (2009) gkp931v1.
[Abstract] [Full Text] [PDF]


Home page
Eukaryot CellHome page
J. W. Wanyiri, P. Techasintana, R. M. O'Connor, M. J. Blackman, K. Kim, and H. D. Ward
Role of CpSUB1, a Subtilisin-Like Protease, in Cryptosporidium parvum Infection In Vitro
Eukaryot. Cell, April 1, 2009; 8(4): 470 - 477.
[Abstract] [Full Text] [PDF]


Home page
Nucleic Acids ResHome page
C. Aurrecoechea, J. Brestelli, B. P. Brunk, J. M. Carlton, J. Dommer, S. Fischer, B. Gajria, X. Gao, A. Gingle, G. Grant, et al.
GiardiaDB and TrichDB: integrated genomic resources for the eukaryotic protist pathogens Giardia lamblia and Trichomonas vaginalis
Nucleic Acids Res., January 1, 2009; 37(suppl_1): D526 - D530.
[Abstract] [Full Text] [PDF]


Home page
Nucleic Acids ResHome page
U. Pieper, N. Eswar, B. M. Webb, D. Eramian, L. Kelly, D. T. Barkan, H. Carter, P. Mankoo, R. Karchin, M. A. Marti-Renom, et al.
MODBASE, a database of annotated comparative protein structure models and associated resources
Nucleic Acids Res., January 1, 2009; 37(suppl_1): D347 - D354.
[Abstract] [Full Text] [PDF]


Home page
Nucleic Acids ResHome page
J. W. Whitaker, I. Letunic, G. A. McConkey, and D. R. Westhead
metaTIGER: a metabolic evolution resource
Nucleic Acids Res., January 1, 2009; 37(suppl_1): D531 - D538.
[Abstract] [Full Text] [PDF]


Home page
Mol Biol EvolHome page
C.-H. Kuo, J. P. Wares, and J. C. Kissinger
The Apicomplexan Whole-Genome Phylogeny: An Analysis of Incongruence among Gene Trees
Mol. Biol. Evol., December 1, 2008; 25(12): 2689 - 2698.
[Abstract] [Full Text] [PDF]


Home page
Nucleic Acids ResHome page
C. Aurrecoechea, M. Heiges, H. Wang, Z. Wang, S. Fischer, P. Rhodes, J. Miller, E. Kraemer, C. J. Stoeckert Jr., D. S. Roos, et al.
ApiDB: integrated resources for the apicomplexan bioinformatics resource center
Nucleic Acids Res., January 12, 2007; 35(suppl_1): D427 - D430.
[Abstract] [Full Text] [PDF]


Home page
BioinformaticsHome page
H. Wang, Y. Su, A. J. Mackey, E. T. Kraemer, and J. C. Kissinger
SynView: a GBrowse-compatible approach to visualizing comparative genome data
Bioinformatics, September 15, 2006; 22(18): 2308 - 2309.
[Abstract] [Full Text] [PDF]


This Article
Right arrow Abstract Freely available
Right arrow Print PDF (2223K) Freely available
Right arrow Screen PDF (299K) Freely available
Right arrow Alert me when this article is cited
Right arrow Alert me if a correction is posted
Services
Right arrow Email this article to a friend
Right arrow Similar articles in this journal
Right arrow Similar articles in PubMed
Right arrow Alert me to new issues of the journal
Right arrow Add to My Personal Archive
Right arrow Download to citation manager
Right arrow Commercial Re-use Guidelines
for Open Access NAR Content
Google Scholar
Right arrow Articles by Heiges, M.
Right arrow Articles by Kissinger, J. C.
Right arrow Search for Related Content
PubMed
Right arrow PubMed Citation
Right arrow Articles by Heiges, M.
Right arrow Articles by Kissinger, J. C.
Social Bookmarking
 Add to CiteULike   Add to Connotea   Add to Del.icio.us  
What's this?