Nucleic Acids Research, 2003, Vol. 31, No. 1 212-215
© 2003 Oxford University Press
PlasmoDB: the Plasmodium genome resource. A database integrating experimental and computational data
Department of Biology, University of Pennsylvania, Philadelphia, PA 19104, USA 1 Center for Bioinformatics, University of Pennsylvania, Philadelphia, PA 19104, USA 2 Department of Biological Chemistry, Institute of Life Sciences, The Hebrew University, Jerusalem 91904, Israel 3 International Center for Genetic Engineering and Biotechnology, Delhi 110067, India 4 Department of Genetics, and Center for Tropical and Emerging Global Diseases, University of Georgia, Athens, GA 30602, USA 5 BBN, 10 Moulton Street, Cambridge, MA 02138, USA
*To whom correspondence should be addressed. Tel: +1 2158982118; Fax: +1 2157466697; Email: droos{at}sas.upenn.edu
Received September 24, 2002; Revised and Accepted October 11, 2002
ABSTRACT
PlasmoDB (http://PlasmoDB.org) is the official database of the Plasmodium falciparum genome sequencing consortium. This resource incorporates the recently completed P. falciparum genome sequence and annotation, as well as draft sequence and annotation emerging from other Plasmodium sequencing projects. PlasmoDB currently houses information from five parasite species and provides tools for intra- and inter-species comparisons. Sequence information is integrated with other genomic-scale data emerging from the Plasmodium research community, including gene expression analysis from EST, SAGE and microarray projects and proteomics studies. The relational schema used to build PlasmoDB, GUS (Genomics Unified Schema) employs a highly structured format to accommodate the diverse data types generated by sequence and expression projects. A variety of tools allow researchers to formulate complex, biologically-based, queries of the database. A stand-alone version of the database is also available on CD-ROM (P. falciparum GenePlot), facilitating access to the data in situations where internet access is difficult (e.g. by malaria researchers working in the field). The goal of PlasmoDB is to facilitate utilization of the vast quantities of genomic-scale data produced by the global malaria research community. The software used to develop PlasmoDB has been used to create a second Apicomplexan parasite genome database, ToxoDB (http://ToxoDB.org).
CONTENTS OF THE CURRENT RELEASE
Data: complete sequence for Plasmodium falciparum, annotation and functional genomics datasets
Integrated efforts at The Institute for Genome Research, the Sanger Institute, and the Stanford University Genome Technology Center have produced an effectively complete genome sequence for P. falciparum strain 3D7 (1). The finished sequence is featured in PlasmoDB (2) version 4.0, released October 2002 (3). New sequence data is also available for P. yoelii and other Plasmodium species.
Also new to PlasmoDB 4.0 are large-scale datasets derived from proteomics analysis of several life cycle stages (4), expression profiling results from throughout the intraerythrocytic cycle (oligonucleotide-based microarrays in both glass-slides and Affymetrix formats), and single nucleotide polymorphism (SNP) analysis for P. falciparum strains HB3, DD2, D10, 7G8 and 3D7 (5). Protein and RNA data can be used to identify genes expressed in different life stages. Expression data can be analyzed with clustering software, e.g. XCluster (http://genome-www.stanford.edu/~sherlock/cluster.html), or used to look for differentially expressed genes using PaGE (6). Genes containing these SNPs can be retrieved and assessed for non-synonymous amino acid changes.
The urgent need to identify potential drug and vaccine targets has driven the P. falciparum genome sequencing project from its inception (7). In addition to DNA sequence data and analyses of predicted genes and proteins, Gene Ontology (GO) assignments have been provided by the sequencing centers and others in the malaria research community, and this curated annotation greatly facilitates drug target discovery. To assist in the identification of potential vaccine targets, annotated genes have been scored for potential T-cell epitopes using the SYFPEITHI method (8).
Tools and queries
The PlasmoDB 4.0 web interface incorporates several improvements over previous releases. A new version of the gene display page makes more extensive use of graphical elements to present a concise single-page summary for each annotated gene in the database. This summary page allows users to quickly examine both the genomic arrangement of a gene (intron/exon structure, placement and identity of neighboring genes, etc) and its likely function, based on graphical summaries of predicted protein features (signal peptides, protein motifs, transmembrane domains, etc) and pre-computed database search results. Other pages linked to the gene present specialized views of relevant data, such as predicted mRNA and protein sequences, detailed protein motif predictions and microarray expression results. PlasmoDB now provides direct links to external data sources and sites, including the Malaria Parasite Metabolic Pathways database (http://sites.huji.ac.il/malaria/) and the Malaria Research and Reference Reagent Resource Centre, MR4 (9).
In addition to supporting new queries, release 4.0 improves the ease with which queries run against the relational database GUS can be combined with other data analysis tools, and with previously run queries. For example, one can now query for all genes in a subtelomeric region of chromosome 4 that contain a user-defined protein motif and with an additional click or two, these results can be downloaded as a FASTA-formatted list. The first part of this query uses a (new) SQL query against the relational database, while the second part uses the Amino Acid Motif Search tool; these tools are combined through the query history feature of the web interface.
PlasmoDB 4.0 enables several queries that exploit the new data types present in this release (Table 1). For example, one can quickly retrieve all predicted genes for which at least two lines of experimental evidence (e.g. mass spectrometric analysis of proteins and oligonucleotide-based microarray data) suggest expression in merozoites. Polymorphism queries permit identification of (for example) all genes with non-synonymous amino acid changes in the HB3 strain relative to the 3D7 strain. Queries on pre-computed BLAST results have been extended to incorporate taxonomic information, supporting (for example) queries for P. falciparum genes that are closely conserved in at least one other Plasmodium species but have no apparent homolog in the human genome.
|
Plasmodium falciparum GenePlot
GenePlot was designed to provide researchers around the world with access to the genome sequence and annotations for the malaria parasite P. falciparum. Access is available online, or through a stand-alone CD-ROM that does not require high-speed internet connectivity (10). The re-written and greatly enhanced, release of GenePlot contains complete P. falciparum genome sequence, all annotations provided by the sequencing centers, gene predictions from three applications (trained on P. falciparum data), DNA sequence repeats, pre-computed TBLASTX analysis of the entire genome sequence, BLASTP similarities of all predicted genes and protein feature predictions for all predicted and annotated genes.
As illustrated in Figure 1, a graphical interface permits browsing the genome and genome annotations, including predicted and annotated protein features. Search capabilities allow compound text-based queries of curated gene/protein annotations, automated analyses and the results from pre-computed comparisons with GenBank/EMBL and other relevant databases. Annotated and predicted gene sequences can be selectively retrieved using the genome search interface and sequences can be retrieved in multiple formats from many different contexts. A tutorial on GenePlot usage is also provided.
|
P. falciparum GenePlot can be accessed directly or downloaded from the PlasmoDB web site. The CD-ROM version is also available free of charge (along with other materials of interest to malaria researchers) from helpcd{at}plasmodb.org or via the Malaria Research and Reference Reagent Resource Center (MR4); malaria{at}atcc.org. Email requests should include Nature malaria CD-ROM in the subject line and a full postal address in the body of the message.
FUTURE PLANS
With the official release of the finished P. falciparum genome sequence, a large influx of new data is anticipated from functional genomics studies, including expression profiling, proteomics, population genetics and other projects. Redesign of the display and query infrastructure will allow users to set and save preferences defining how DNA sequences, genes, proteins and expression data should be viewed and downloaded. Users will also be able to store queries for use in future sessions and share lists of genes with other interested PlasmoDB users. The underlying GUS architecture (11; http://www.gusdb.org) has already been exploited to develop a database for the related parasite Toxoplasma gondii (12), and other organism-specific applications can be envisaged.
ACKNOWLEDGEMENTS
Financial support for PlasmoDB was provided by the Burroughs Wellcome Fund, and the database was developed using computational infrastructure from the Liniac project at the University of Pennsylvania Genomics Institute. We thank the numerous researchers who have collaborated with and contributed to PlasmoDB by depositing both published and unpublished data, by making software available and by making useful suggestions on how to improve this community resource. We wish to thank the scientists and funding agencies comprising the international Malaria Genome Project for making sequence data from the genome of P. falciparum (3D7) public prior to publication of the completed sequence. The Sanger Institute provided sequence for chromosomes 1, 39 and 13, with financial support from the Wellcome Trust. A consortium involving The Institute for Genome Research and the Naval Medical Research Center sequenced chromosomes 2, 10, 11 and 14, with support from NIAID/NIH, the Burroughs Wellcome Fund and the Department of Defense. The Stanford Genome Technology Center sequenced chromosome 12, with support from the Burroughs Wellcome Fund.
REFERENCES
- Gardner,M.J., Hall,N., Fung,E., White,O., Berriman,M., Hyman,R.W., Carlton,J.M., Pain,A., Nelson,K.E., Bowman,S. et al. (2002) The genome sequence of the human malaria parasite Plasmodium falciparum. Nature, 419, 498511.[CrossRef][Medline]
- Bahl,A., Brunk,B., Coppel,R.L., Crabtree,J., Diskin,S.J., Fraunholz,M.J., Grant,G.R., Gupta,D., Huestis,R.L., Kissinger,J.C. et al. (2002) Plasmo-DB: The Plasmodium genome resource. Nucleic Acids Res., 30, 8790.
[Abstract/Free Full Text] - Kissinger,J.C., Brunk,B.P., Crabtree,J., Fraunholz,M.J., Gajria,B., Milgram,A.J., Pearson,D.S., Schug,J., Bahl,A., Diskin,S.J. et al. (2002) PlasmoDB: The Plasmodium genome resource. Nature, 419, 490492.[CrossRef][Medline]
- Florens,L., Washburn,M.P., Raine,J.D., Anthony,R.M., Grainger,M., Haynes,J.D., Moch,J.K., Muster,N., Sacci,J.B., Tabb,D.L. et al. (2002) A proteomic view of Plasmodium falciparum life cycle. Nature, 419, 520526.[CrossRef][Medline]
- Mu,J., Duan,J., Makova,K.D., Joy,D.A., Huynh,C.Q., Branch,O.H., Li,W.H. and Su,X.Z. (2002) Chromosome-wide SNPs reveal an ancient origin for Plasmodium falciparum. Nature, 418, 323326.[CrossRef][Medline]
- Manduchi,E., Grant,G.R., McKenzie,S.E., Overton,G.C., Surrey,S. and Stoeckert,C.J.,Jr (2000) Generation of patterns from gene expression data by assigning confidence to differentially expressed genes. Bioinformatics, 16, 685698.
[Abstract/Free Full Text] - Fletcher,C. (1998) The Plasmodium falciparum genome project. Parasitol. Today, 14, 342344.
- Donnes,P. and Elofsson,A. (2002) Prediction of MHC class I binding peptides using SVMHC. BMC Bioinformatics, 3, 25.[CrossRef][Medline]
- Wu,Y. and Rogers,M.J. (2002) Shared knowledge can combat malaria. Nature, 419, 15.[Medline]
- Milgram,A.J., Gajria,B., Kissinger,J.C., Pearson,D.S. and Roos,D.S. (2002) Plasmodium falciparum GenePlot (CD-ROM). Nature, 419, in press.
- Davidson,S., Crabtree,J., Brunk,B.P., Schug,J., Tannen,V., Overton,G.C. and Stoeckert,C.J.,Jr (2001) K2/Klesli and GUS: Experiments in integrated access to genomic data sources. IBM Systems J., 40, 512531.
- Kissinger,J.C., Gajria,B., Li,L., Paulsen,I. and Roos,D.S. (2003) ToxoDB: Accessing the Toxoplasma gondii genome. Nucleic Acids Res., 31, 234236.
[Abstract/Free Full Text]
This article has been cited by other articles:
![]() |
C. Aurrecoechea, J. Brestelli, B. P. Brunk, J. M. Carlton, J. Dommer, S. Fischer, B. Gajria, X. Gao, A. Gingle, G. Grant, et al. GiardiaDB and TrichDB: integrated genomic resources for the eukaryotic protist pathogens Giardia lamblia and Trichomonas vaginalis Nucleic Acids Res., September 29, 2008; (2008) gkn631v1. [Abstract] [Full Text] [PDF] |
||||
![]() |
J. L. Gordon, W. L. Beatty, and L. D. Sibley A Novel Actin-Related Protein Is Associated with Daughter Cell Formation in Toxoplasma gondii Eukaryot. Cell, September 1, 2008; 7(9): 1500 - 1512. [Abstract] [Full Text] [PDF] |
||||
![]() |
H. Curtidor, G. Arevalo, M. Vanegas, C. Vizcaino, M. A. Patarroyo, M. Forero, and M. E. Patarroyo Characterization of Plasmodium falciparum integral membrane protein Pf25-IMP and identification of its red blood cell binding sequences inhibiting merozoite invasion in vitro Protein Sci., September 1, 2008; 17(9): 1494 - 1504. [Abstract] [Full Text] [PDF] |
||||
![]() |
J. Wu, D. H. Sieglaff, J. Gervin, and X. S. Xie Discovering regulatory motifs in the Plasmodium genome using comparative genomics Bioinformatics, September 1, 2008; 24(17): 1843 - 1849. [Abstract] [Full Text] [PDF] |
||||
![]() |
E. K. De Silva, A. R. Gehrke, K. Olszewski, I. Leon, J. S. Chahal, M. L. Bulyk, and M. Llinas Specific DNA-binding by Apicomplexan AP2 transcription factors PNAS, June 17, 2008; 105(24): 8393 - 8398. [Abstract] [Full Text] [PDF] |
||||
![]() |
T. Saito, M. Nishi, M. I. Lim, B. Wu, T. Maeda, H. Hashimoto, T. Takeuchi, D. S. Roos, and T. Asai A Novel GDP-dependent Pyruvate Kinase Isozyme from Toxoplasma gondii Localizes to Both the Apicoplast and the Mitochondrion J. Biol. Chem., May 16, 2008; 283(20): 14041 - 14052. [Abstract] [Full Text] [PDF] |
||||
![]() |
R. L. Sherrer, P. O'Donoghue, and D. Soll Characterization and evolutionary history of an archaeal kinase involved in selenocysteinyl-tRNA formation Nucleic Acids Res., March 27, 2008; 36(4): 1247 - 1259. [Abstract] [Full Text] [PDF] |
||||
![]() |
B. Gajria, A. Bahl, J. Brestelli, J. Dommer, S. Fischer, X. Gao, M. Heiges, J. Iodice, J. C. Kissinger, A. J. Mackey, et al. ToxoDB: an integrated Toxoplasma gondii database resource Nucleic Acids Res., January 11, 2008; 36(suppl_1): D553 - D556. [Abstract] [Full Text] [PDF] |
||||
![]() |
N. Bhat, A. Joe, M. PereiraPerrin, and H. D. Ward Cryptosporidium p30, a Galactose/N-Acetylgalactosamine-specific Lectin, Mediates Infection in Vitro J. Biol. Chem., November 30, 2007; 282(48): 34877 - 34887. [Abstract] [Full Text] [PDF] |
||||
![]() |
K. Chakrabarti, M. Pearson, L. Grate, T. Sterne-Weiler, J. Deans, J. P. Donohue, and M. Ares Jr Structural RNAs of known and unknown function identified in malaria parasites by comparative genomics and RNA analysis RNA, November 1, 2007; 13(11): 1923 - 1939. [Abstract] [Full Text] [PDF] |
||||
![]() |
E. A. O'Brien, L. B. Koski, Y. Zhang, L. Yang, E. Wang, M. W. Gray, G. Burger, and B. F. Lang TBestDB: a taxonomically broad database of expressed sequence tags (ESTs) Nucleic Acids Res., January 12, 2007; 35(suppl_1): D445 - D451. [Abstract] [Full Text] [PDF] |
||||
![]() |
C. Aurrecoechea, M. Heiges, H. Wang, Z. Wang, S. Fischer, P. Rhodes, J. Miller, E. Kraemer, C. J. Stoeckert Jr., D. S. Roos, et al. ApiDB: integrated resources for the apicomplexan bioinformatics resource center Nucleic Acids Res., January 12, 2007; 35(suppl_1): D427 - D430. [Abstract] [Full Text] [PDF] |
||||
![]() |
C. Lavazec, S. Sanyal, and T. J. Templeton Hypervariability within the Rifin, Stevor and Pfmc-2TM superfamilies in Plasmodium falciparum Nucleic Acids Res., December 5, 2006; (2006) gkl942v1. [Abstract] [Full Text] [PDF] |
||||
![]() |
H. Wang, Y. Su, A. J. Mackey, E. T. Kraemer, and J. C. Kissinger SynView: a GBrowse-compatible approach to visualizing comparative genome data Bioinformatics, September 15, 2006; 22(18): 2308 - 2309. [Abstract] [Full Text] [PDF] |
||||
![]() |
C. Hyland, J. W. Pinney, G. A. McConkey, and D. R. Westhead metaSHARK: a WWW platform for interactive exploration of metabolic networks. Nucleic Acids Res., July 1, 2006; 34(Web Server issue): W725 - W728. [Abstract] [Full Text] [PDF] |
||||
![]() |
M. Llinas, Z. Bozdech, E. D. Wong, A. T. Adai, and J. L. DeRisi Comparative whole genome transcriptome analysis of three Plasmodium falciparum strains Nucleic Acids Res., February 21, 2006; 34(4): 1166 - 1173. [Abstract] [Full Text] [PDF] |
||||
![]() |
M. Heiges, H. Wang, E. Robinson, C. Aurrecoechea, X. Gao, N. Kaluskar, P. Rhodes, S. Wang, C.-Z. He, Y. Su, et al. CryptoDB: a Cryptosporidium bioinformatics resource update Nucleic Acids Res., January 1, 2006; 34(suppl_1): D419 - D422. [Abstract] [Full Text] [PDF] |
||||
![]() |
X. W. Zhou, B. F. C. Kafsack, R. N. Cole, P. Beckett, R. F. Shen, and V. B. Carruthers The Opportunistic Pathogen Toxoplasma gondii Deploys a Diverse Legion of Invasion and Survival Proteins J. Biol. Chem., October 7, 2005; 280(40): 34233 - 34244. [Abstract] [Full Text] [PDF] |
||||
![]() |
P. Piccinelli, M. A. Rosenblad, and T. Samuelsson Identification and analysis of ribonuclease P and MRP RNA in a broad range of eukaryotes Nucleic Acids Res., August 8, 2005; 33(14): 4485 - 4495. [Abstract] [Full Text] [PDF] |
||||
![]() |
X. Cai, D. Herschap, and G. Zhu Functional Characterization of an Evolutionarily Distinct Phosphopantetheinyl Transferase in the Apicomplexan Cryptosporidium parvum Eukaryot. Cell, July 1, 2005; 4(7): 1211 - 1220. [Abstract] [Full Text] [PDF] |
||||
![]() |
O. Krishnadev, N. Rekha, S. B. Pandit, S. Abhiman, S. Mohanty, L. S. Swapna, S. Gore, and N. Srinivasan PRODOC: a resource for the comparison of tethered protein domain architectures with in-built information on remotely related domain families Nucleic Acids Res., July 1, 2005; 33(suppl_2): W126 - W129. [Abstract] [Full Text] [PDF] |
||||
![]() |
A. Kumar, A. Vaid, C. Syin, and P. Sharma PfPKB, a Novel Protein Kinase B-like Enzyme from Plasmodium falciparum: I. IDENTIFICATION, CHARACTERIZATION, AND POSSIBLE ROLE IN PARASITE DEVELOPMENT J. Biol. Chem., June 4, 2004; 279(23): 24255 - 24264. [Abstract] [Full Text] [PDF] |
||||
![]() |
E. Gaskins, S. Gilk, N. DeVore, T. Mann, G. Ward, and C. Beckers Identification of the membrane receptor of a class XIV myosin in Toxoplasma gondii J. Cell Biol., May 10, 2004; 165(3): 383 - 393. [Abstract] [Full Text] [PDF] |
||||
![]() |
A. Stabenau, G. McVicker, C. Melsopp, G. Proctor, M. Clamp, and E. Birney The Ensembl Core Software Libraries Genome Res., May 1, 2004; 14(5): 929 - 933. [Abstract] [Full Text] [PDF] |
||||
![]() |
I. Yeh, T. Hanekamp, S. Tsoka, P. D. Karp, and R. B. Altman Computational Analysis of Plasmodium falciparum Metabolism: Organizing Genomic Information to Facilitate Drug Discovery Genome Res., May 1, 2004; 14(5): 917 - 924. [Abstract] [Full Text] [PDF] |
||||
![]() |
G. Pessi, G. Kociubinski, and C. B. Mamoun A pathway for phosphatidylcholine biosynthesis in Plasmodium falciparum involving phosphoethanolamine methylation PNAS, April 20, 2004; 101(16): 6206 - 6211. [Abstract] [Full Text] [PDF] |
||||
![]() |
M. A. Robien, K. T. Nguyen, A. Kumar, I. Hirsh, S. Turley, D. Pei, and W. G.J. Hol An improved crystal form of Plasmodium falciparum peptide deformylase Protein Sci., April 1, 2004; 13(4): 1155 - 1163. [Abstract] [Full Text] [PDF] |
||||
![]() |
L. Li, J. Crabtree, S. Fischer, D. Pinney, C. J. Stoeckert Jr, L. D. Sibley, and D. S. Roos ApiEST-DB: analyzing clustered EST data of the apicomplexan parasites Nucleic Acids Res., January 1, 2004; 32(90001): D326 - 328. [Abstract] [Full Text] [PDF] |
||||
![]() |
D. Puiu, S. Enomoto, G. A. Buck, M. S. Abrahamsen, and J. C. Kissinger CryptoDB: the Cryptosporidium genome resource Nucleic Acids Res., January 1, 2004; 32(90001): D329 - 331. [Abstract] [Full Text] [PDF] |
||||
![]() |
C. Hertz-Fowler, C. S. Peacock, V. Wood, M. Aslett, A. Kerhornou, P. Mooney, A. Tivey, M. Berriman, N. Hall, K. Rutherford, et al. GeneDB: a resource for prokaryotic and eukaryotic organisms Nucleic Acids Res., January 1, 2004; 32(90001): D339 - 343. [Abstract] [Full Text] [PDF] |
||||
![]() |
M. Luchtan, C. Warade, D. B. Weatherly, W. M. Degrave, R. L. Tarleton, and J. C. Kissinger TcruziDB: an integrated Trypanosoma cruzi genome resource Nucleic Acids Res., January 1, 2004; 32(90001): D344 - 346. [Abstract] [Full Text] [PDF] |
||||
![]() |
D. L. Doolan, J. C. Aguiar, W. R. Weiss, A. Sette, P. L. Felgner, D. P. Regis, P. Quinones-Casas, J. R. Yates III, P. L. Blair, T. L. Richie, et al. Utilization of genomic sequence information to develop malaria vaccines J. Exp. Biol., November 1, 2003; 206(21): 3789 - 3802. [Abstract] [Full Text] [PDF] |
||||
![]() |
L. Li, C. J. Stoeckert Jr., and D. S. Roos OrthoMCL: Identification of Ortholog Groups for Eukaryotic Genomes Genome Res., September 1, 2003; 13(9): 2178 - 2189. [Abstract] [Full Text] [PDF] |
||||
![]() |
G. Zehetner OntoBlast function: from sequence similarities directly to potential functional annotations by ontology terms Nucleic Acids Res., July 1, 2003; 31(13): 3799 - 3803. [Abstract] [Full Text] [PDF] |
||||
| ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||










