Nucleic Acids Research Advance Access originally published online on October 25, 2008
Nucleic Acids Research 2009 37(Database issue):D531-D538; doi:10.1093/nar/gkn826
| ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
Nucleic Acids Research, 2009, Vol. 37, Database issue D531-D538
© 2008 The Author(s)
This is an Open Access article distributed under the terms of the Creative Commons Attribution Non-Commercial License (http://creativecommons.org/licenses/by-nc/2.0/uk/) which permits unrestricted non-commercial use, distribution, and reproduction in any medium, provided the original work is properly cited.
This article appears in the following Nucleic Acids Research issue: Database issue [View the issue table of contents]
Articles |
metaTIGER: a metabolic evolution resource
1Institute of Molecular and Cellular Biology, Garstang Building, University of Leeds, Leeds, W. Yorks, LS2 9JT, UK and 2EMBL Heidelberg, Meyerhofstraße 1, 69117 Heidelberg, Germany
*To whom correspondence should be addressed. Tel: +44 113 34 33116; Fax: +44 113 34 33167; Email: d.r.westhead{at}leeds.ac.uk
Received June 10, 2008. Revised October 7, 2008. Accepted October 14, 2008.
| ABSTRACT |
|---|
|
|
|---|
Metabolic networks are a subject that has received much attention, but existing web resources do not include extensive phylogenetic information. Phylogenomic approaches (phylogenetics on a genomic scale) have been shown to be effective in the study of evolution and processes like horizontal gene transfer (HGT). To address the lack of phylogenomic information relating to eukaryotic metabolism, metaTIGER (www.bioinformatics.leeds.ac.uk/metatiger) has been created, using genomic information from 121 eukaryotes and 404 prokaryotes and sensitive sequence search techniques to predict the presence of metabolic enzymes. These enzyme sequences were used to create a comprehensive database of 2257 maximum-likelihood phylogenetic trees, some containing over 500 organisms. The trees can be viewed using iTOL, an advanced interactive tree viewer, enabling straightforward interpretation of large trees. Complex high-throughput tree analysis is also available through user-defined queries, allowing the rapid identification of trees of interest, e.g. containing putative HGT events. metaTIGER also provides novel and easy-to-use facilities for viewing and comparing the metabolic networks in different organisms via highlighted pathway images and tables. metaTIGER is demonstrated through evolutionary analysis of Plasmodium, including identification of genes horizontally transferred from chlamydia.
| INTRODUCTION |
|---|
|
|
|---|
The volume and diversity of eukaryotic sequence data have grown exponentially over the past decade, and recent advances in sequencing will ensure that this trend increases. With this data comes increased potential for comparative genomics of the eukaryotes, and studies of eukaryotic evolution. The metabolic network is the simplest biomolecular system to predict from sequence data, because many core processes have ancient origins and are conserved across all kingdoms of life. Thus, reliable identification of orthologues is easier for core metabolic enzymes than for genes involved in less conserved processes, making them attractive components for phylogenetic studies. Equally, metabolic processes remain key drug targets, particularly for eukaryotic parasites like Plasmodium falciparum, so that study of pathogen metabolic networks and the comparison of these with their eukaryotic hosts is an important aspect of drug target discovery. Evolutionary information is important in this process because it can indicate distance from host enzymes and the likelihood of inhibitor cross reactivity.
There are a variety of web resources enabling the study of metabolic networks. The Kyoto Encyclopedia of Genes and Genomes (KEGG) (1) offers a set of reference pathways which have been automatically annotated for 120 eukaryotes and 592 prokaryotes, (based on KEGG 44.0). Comparing the metabolic profiles of organisms in KEGG is a manual process. BioCyc (2) pathway/genome databases are made up of three tiers of annotational accuracy (based on BioCyc 11.5), and cover a total of 360 organisms with only Escherichia coli in tier 1 (detailed manual annotation), another 20 organisms in tier 2 (manually checked automatic annotation) and the remainder in tier 3 (automatic annotation). BioCyc provides facilities for comparing the metabolic networks of organisms, but no phylogenetic information is provided. PUMA2 (3) contains chromosomal sequence from 369 prokaryotes and 33 eukaryotes which can be compared in terms of their metabolic networks. The evolution of different protein families can be examined, although to a limited extent as the trees are produced interactively, limiting the number of sequences that can be included. Reactome (4) is an expert-annotated predominately human database which also contains some other highly annotated organisms. It offers facilities for comparing organisms, but these do not focus on evolution, and comparisons are limited to the 23 organisms covered (based on release 22.0).
None of the above-mentioned databases bring together a broad spectrum of eukaryotic organisms with the facilities to look at the evolution of their metabolic networks on a large scale. Comparison of the enzymes present in different organisms allows the build-up and loss of pathways over evolution to be observed. The construction of phylogenetic trees on a genome scale is termed phylogenomics and allows the evolution of individual genes, as well as, whole genomes to be considered. In particular, it allows the extent to which horizontal gene transfer (HGT) has occurred in eukaryotes to be investigated. HGT has for sometime been recognized as an important influence on the evolution of prokaryotes (5). It is now being realized that HGT also takes place in eukaryotes, particularly involving the gain of genes from bacteria.
In this paper we present metaTIGER, a metabolic resource that focuses upon aspects of metabolism that are not addressed elsewhere. In particular, in-depth evolutionary information about enzymes is provided in the form of 2257 maximum-likelihood phylogenetic trees, some of which contain over 500 organisms and more than 100 eukaryotes. The trees can be viewed interactively with iTOL (6) which produces intelligible displays of even the largest trees. Complex high-throughput analysis of the trees can be carried out with PhyloGenie's PHAT program (7), allowing users to define their own tree queries, which are then submitted to a Beowulf cluster for processing. Additionally, metaTIGER offers facilities that permit comparisons between eukaryotic metabolic networks in a variety of formats. The metabolic enzymes within metaTIGER are predicted using SHARKhunt (8), which operates with raw nucleic acid sequence data, including unannotated/unassembled sequence, meaning that metaTIGER can offer information on organisms that are not annotated by other facilities. As SHARKhunt's predictions are based upon sensitive sequence profile comparison techniques, enzyme assertions are likely to be more specific, and highly divergent homologues are more likely to be found than would be the case for simpler BLAST-based methods.
| DATABASE CONSTRUCTION |
|---|
|
|
|---|
Metabolic profiles
The sequence database behind metaTIGER metabolic profile and phylogenetic trees was constructed using SHARKhunt (8). The genomic sequence of the organisms which are covered in metaTIGER was downloaded from a number of resources (9–17) (see SI 1 for complete details), and includes information on a wide variety of eukaryotes with poor metabolic characterization and levels of genome annotation. In particular, eukaryotic taxonomic coverage was broadened by using expressed sequence tag data from the TBestDB (18). SHARKhunt scans the sequences with PSI-BLAST (19) and hidden Markov models, looking for the presence of enzyme sequence profiles that were obtained from PRIAM (20). SHARKhunt was updated to use the latest version of PRIAM which contains 2908 profiles for 2192 different E.C. (Enzyme Commission) numbers. Each profile hit is assigned an E-value. The results are then stored in the metaTIGER database.
Phylogenetic trees
Owing to the diverse taxonomic range of organisms sampled in metaTIGER, phylogenetic trees with a broad taxonomic sample could be produced. This broad taxonomic range increases the potential for new insight to be gained from exploration of the tree data. For each of the enzyme profiles a phylogenetic tree is produced from the amino acid sequences of the hits. It is advantageous to use profile-hit sequences rather than whole genes, as a hit is made of the conserved region of a protein and thus the proportion of the alignment that is made up of unconserved regions is reduced. The exclusion of non-conserved regions from alignments is important if an accurate phylogenetic tree is to be produced (21). To ensure that the trees are produced are of high quality, only sequences with profile match E-values <10–30 were included in the trees. If the more than one sequence for a particular enzyme profile was beneath this cut-off then only the sequence with the lowest E-value was used in the tree reconstruction, reducing the chances of including paralogues. The sequences were aligned using MUSCLE (22) on default settings. Then the trees were produced using PhyML (23) using the evolutionary model JTT, the gamma-distribution model, four rate categories and invariant position. The JTT model was chosen as this has been found, most frequently, to be the best-fitting model during other phylogenomic reconstructions (24). The gamma parameter and the fraction of invariant positions were estimated from the data. The trees were optimized for topology, branch length and rate parameters. Each tree was subjected to 100 bootstrap replicates. To increase the speed for the larger trees an MPI version of PhyML was used (25); 12 of these larger trees had to have the number of sequences present reduced (see SI 2 for details) owing to memory issues when using the MPI version of PhyML. This pipeline resulted in the production of 2257 maximum-likelihood phylogenetic trees.
The user of the resource should note that each phylogenetic tree contains orthologous sequences for a specific E.C. number, and is, as far as possible, free of paralogues. They are not intended for the study of gene families containing paralogues with a variety of different functions. Rather, the trees allow the study of the evolutionary origin of specific metabolic functions and pathways in particular species or species groups. They are suitable for the detection of functional gain by HGT (as illustrated below), and for assessing the degree of evolutionary divergence between orthologous enzymes in different species. This latter application is indicated in drug target discovery, where good drug targets in a pathogen should be as divergent as possible from the host orthologue to ensure specificity. The exclusion of paralogues means that some of the trees are suitable for the estimation of species phylogenies, but this should be approached with care, because not all enzyme sequences contain sufficient phylogenetic signal to resolve species, particularly at deep branches. In these cases the tree should be viewed as interesting starting points for detailed manual study and verification. Another potential application of the trees, when used with the search procedures below, is to identify sets of possible sequences to use in constructing species phylogenies by gene concatenation or consensus methods (26) where the avoidance of HGTs is of paramount importance.
| DATABASE INTERFACE |
|---|
|
|
|---|
Exploring metabolic networks
The metabolic networks in metaTIGER can be explored in four ways: (i) by using a simple search facility, (ii) by viewing KEGG map images [produced using the KEGG SOAP API (1)] that highlight the enzymes that are present in each organism, (iii) the enzymes present in a particular pathway can be compared between two organisms via a coloured KEGG pathway image and (iv) two or more organisms can be compared in a table format. A detailed description of metaTIGER search facilities along with other metaTIGER facilities can be found at: http://www.bioinformatics.leeds.ac.uk/metatiger/help.html. Additionally, all of the SHARKhunt metabolic profiles predictions can be downloaded from the site.
Viewing and searching phylogenetic trees
The phylogenetic trees can be viewed interactively using the web-based viewer iTOL (6). The use of an interactive tree viewer is necessary as some of the trees contain over 500 taxa and would not be clearly displayed through less sophisticated tree viewers. iTOL colours taxa labels according to kingdoms which means that events such as prokaryote to eukaryote HGTs can be easily identified on the large trees. As the trees are large and arbitrarily rooted the user can redefine the root and collapse branches to suit their needs. Images can be exported to files in a range of formats. An example tree for 4-hydroxy-3-methylbut-2-enyl diphosphate reductase is shown in Figure 2; this enzyme was identified below as a putative case of HGT into Plasmodium (see below).
It is not feasible to search manually each of the 2257 phylogenetic trees in metaTIGER for trees of interest such as those containing putative HGT events or those that would be suitable for concatenation to obtain a consensus tree depicting the evolution of species. To overcome these problems metaTIGER has a high-throughput tree searching facility, which allows users to submit their own custom tree queries. The tree queries make use of the phylogenetic analysis tool PHAT, which is part of the PhyloGenie package (7). PHAT has its own tree query language and employs a sophisticated re-rooting technique to ensure that clades being tested do not cross the root of the tree, which is important when asserting potential HGTs. As well as a tree query the user provides a minimum bootstrap value, and branches with bootstrap support below this are ignored during the analysis. The queries can be computationally demanding and are sent to a 440 core (Opteron/Linux) Beowulf cluster; the results are sent by e-mail to a user-specified address.
| ILLUSTRATIVE ANALYSES |
|---|
|
|
|---|
Organism profiles comparison
Unlike other Apicomplexans, including P. falciparum, Cryptosporidium parvum is not capable of de novo pyrimidine synthesis, owing to the absence of a six-enzyme pathway. To compensate for this C. parvum has three salvage enzymes (27), one of which is bi-functional. Figure 1 illustrates how metaTIGER pathway comparison facility can be used to identify such differences. If comparisons of more than two organisms are desired then the list comparison facility can be used (see Table 1). List comparisons allow for the rapid identification of organisms that contain all/part/none of a pathway of interest. In eukaryotes the shikimate pathway is present to varying degrees of completeness (28) which makes it a good test case for comparative genomics using metaTIGER. The shikimate pathway is known to be completely present in fungi, plants and red algae, heterokonts and the alveolates Toxoplasma gondii (29,30) and Tetrahymena thermophila (28), but absent in metazoans and Cryptosporidium (31), while P. falciparum is thought to have only the last three enzymes (32,33). Table 1 shows that metaTIGER's results agree with what is already known about the abundance of the shikimate pathway in eukaryotes. These two examples of using metaTIGER's pathway comparison facilities illustrate the quality of results that can be obtained from using metaTIGER.
|
|
Horizontal gene transfer analysis
In unicellular eukaryotes there are many examples of metabolic genes which have been acquired via HGT (30,34). For example, an ancestor of Plasmodium is thought to have gained a specialist organelle, the apicoplast, by secondary endosymbiosis, involving the gain of a red algal plastid that was originally acquired from a cyanobacterium. This secondary endosymbiosis event will have brought with it the opportunity for the endosymbiotic gene transfer (EGT), a type of HGT, of genes from the red alga to the ancestor of Plasmodium. Also, it has recently been suggested that the cyanobacterium's transition from free living to the endosymbiont that became the plant/algal chloroplast may have been aided by a chlamydial endosymbiont or parasite (35). Thus, genes of plant origin in Plasmodium may originate from cyanobacteria or chlamydia. To assess the number of metabolic enzymes that were acquired via EGT that are still present in Plasmodium, a high-throughput assessment of the genes, which are putatively, of plant, cyanobacterial and chlamydial origin in Plasmodium was carried out. When creating tree selection queries to identify trees of interest there are many variations of select statements that can be made. After optimization, the following select statement that was found to be the most accurate at identifying genes acquired from plants.
Tree select statement for plant to Plasmodium EGT tree selection
(Plasmodium{>1}) & ((Viridiplantae | Rhodophyta | Glaucocystophyceae){>1}) & !(Fungi | Fungi/Metazoa group | Pelobiontida | Malawimonadidae | Mycetozoa | Entamoebidae | Acanthamoebidae | Lobosea | Archaea) & ((Bacteria){<10})
This statement identifies clades within trees containing more than one Plasmodium (to avoid the possibility of bacterial contamination in a single genome leading to false identification of HGT) and plant sequences. Sequences from eukaryotes that have not ever contained plastid were not allowed because the presence of these in the clade would suggest that the gene was a general eukaryotic gene and not one gained by EGT. Selection statements that allowed one sequence from eukaryotes which have never contained a plastid were also tried: this identified 12 trees, five of which were new, although manual inspection of trees indicated that only one additional enzyme was likely to have been acquired by Plasmodium via EGT from plants (this extra tree was included in the analysis). Up to nine bacterial sequences were allowed because bacteria exhibit high rates of HGT and a large number of bacterial genomes (375 out of 525) are in metaTIGER. When trees were rejected on the basis of clades containing bacteria the number of trees found was reduced from seven to three. A bootstrap value of 70 was used as this has been shown to correspond to a 95% probability that the clade is real (36). Similar selection statements (shown in SI 3) were used to select trees relating to cyanobacterial and chlamydial EGT and found two and three genes, respectively. The selection statements shown in SI 3 were found to be the most accurate possible, although manual inspection of other results yielded one additional enzyme of each type. This gives a total of 11 genes: eight plant, three cyanobacteria and four chlamydia (NB. enzymes can fall in more than one group). Full details are given in Table 2 and an example is shown in Figure 2. These results were compared to the results of Huang et al. (37), which had previously carried out HGT analysis in P. falciparum. We found that four out the 10 plant and cyanobacteria predictions were in common. It is not surprising that not all predictions are common as only 28% of the P. falciparum genes that Huang et al. analysed made it to the stage of phylogenetic analysis. To gain an idea of an upper limit on the number of enzymes that have been potentially gained by EGT, as some may have been missed due to the bootstrap cut-off, tree selection was carried out using no bootstrap cut-off. This found a total of 29 enzymes: 25 plant, six cyanobacteria and eight chlamydia. These results show evidence of genes with all three origins and the transport of genes into Plasmodium from cynaobacteria and chlamydia via the plants. Earlier studies have not included chlamydia in similar analyses or covered more than one Plasmodium species, but Gardner and co-workers (38) found some 30 P. falciparum genes (not restricted to metabolic enzymes) of probable plastid origin, consistent with our results, and with an over-representation of metabolic enzymes in EGT genes.
|
|
The enzyme (1.17.1.2 [EC] ) of Figure 2 is illustrative of the difficulties of this type of analysis. This enzyme is part of the non-mevalonate isoprenoid biosynthesis pathway, known to be localized in apicoplast of Plasmodium (39), and a case of a pathway that should be confidently of plastid evolutionary origin. On this tree 13 apicomplexans are located as a sister clade to 11 chlamydiales with 98% bootstrap support, but 12 members of plantae, two diatoms and two protozoan alga are located in another clade of 18 cyanobacteria with 70% bootstrap support. This suggests an EGT origin of this gene in plants, with the possibility of a later orthologous replacement of the gene in the Apicomplexa by HGT from chlamydials, although it is possible that this effect may be caused by low taxon sampling in the plantae. It is equally revealing that only two of the genes in this pathway have been identified as of plastid origin in our analysis. Inspection of the trees from other genes in this pathway indicates that four are potential EGTs that were excluded by our query because of bootstrap support below our stringent threshold. With this observation, our suggestion that up to 29 Plasmodium metabolic enzymes may have EGT origin, based on a query with no bootstrap support threshold, seems reasonable. Interestingly, the tree for the one remaining enzyme on this pathway (1-deoxy-D-xylulose-5-phosphate reductoisomerase: 1.1.1.267 [EC] ) indicates another possible orthologous replacement, this time from a bacterium belonging to the order rickettsiales (see SI 4 for the tree).
| CONCLUDING REMARKS |
|---|
|
|
|---|
We have presented a new resource for examining the predicted metabolic networks of a large number of organisms, including more than 100 eukaryotes. Because our automated enzyme annotation software is dependent only on nucleic acid sequences and not on the existence of accurate gene models and predicted protein-coding sequences, the resource is able to cover new genome sequences prior to detailed annotation. This will be a significant advantage, given the expected increases in eukaryotic sequence production in the next few years. The sensitive method of conserved motif prediction, predicts proteins of high divergence, which can be used by experimental scientists to identify proteins not in current annotations. A significant addition to the capability of existing resources is the extensive phylogenetic information that is available in the form of trees generated for enzyme coding genes, the ability to visualize these with state of the art methods, and to query them to identify trees of interest and enzymes suitable for use in phylogenetic tree generation from multiple genes. The resource also provides highly convenient facilities for comparison of networks across multiple organisms. Currently, enzyme predictions are based on the most up-to-date set of high-quality enzyme-sequence profiles from PRIAM (20), but a future improvement will be to add to these profiles enzymes as yet without full E.C. numbers, or for which profiles are absent from PRIAM for other reasons.
| FUNDING |
|---|
|
|
|---|
The BBSRC; BBSRC Research Development Fellowship (BB/C52101X/1 to D.R.W.). Funding for open access charge: University of Leeds.
Conflict of interest statement. None declared.
| ACKNOWLEDGEMENTS |
|---|
Comparative genomics of this scale would not be possible without the large quantities of genomic data that are currently publicly available and for this reason the authors would like to thank all the sequencing centres and sequence repositories from which data used in this study were gained. In particular, the DOE the US Department of Energy Joint Genome Institute http://www.jgi.doe.gov/ for many genome sequences. Thanks to KEGG for their repository of reference metabolic pathways. Thanks to John W. Pinney for creating the SHARKhunt software which made it possible to conduct such large-scale genomic analysis. Thanks to Tancred Frickey for advising us on the use of PHAT. Thanks to Simon Kenworthy for helping with the aesthetics of the metaTIGER site. Finally, we thank the editor and three anonymous reviewers whose inputs have led to significant improvements in this work.
| REFERENCES |
|---|
|
|
|---|
- Kanehisa M, Goto S, Hattori M, Aoki-Kinoshita KF, Itoh M, Kawashima S, Katayama T, Araki M, Hirakawa M. From genomics to chemical genomics: new developments in KEGG. Nucleic Acids Res. (2006) 34:D354–D357.
[Abstract/Free Full Text] - Caspi R, Foerster H, Fulcher CA, Kaipa P, Krummenacker M, Latendresse M, Paley S, Rhee SY, Shearer AG, Tissier C, et al. The MetaCyc Database of metabolic pathways and enzymes and the BioCyc collection of Pathway/Genome Databases. Nucleic Acids Res. (2008) 36:D623–D631.
[Abstract/Free Full Text] - Maltsev N, Glass E, Sulakhe D, Rodriguez A, Syed MH, Bompada T, Zhang Y, D'Souza M. PUMA2–grid-based high-throughput analysis of genomes and metabolic pathways. Nucleic Acids Res. (2006) 34:D369–D372.
[Abstract/Free Full Text] - Joshi-Tope G, Gillespie M, Vastrik I, DEustachio P, Schmidt E, de Bono B, Jassal B, Gopinath GR, Wu GR, Matthews L, et al. Reactome: a knowledgebase of biological pathways. Nucleic Acids Res. (2005) 33:D428–D432.
[Abstract/Free Full Text] - Koonin EV, Makarova KS, Aravind L. Horizontal gene transfer in prokaryotes: quantification and classification. Annu. Rev. Microbiol. (2001) 55:709–742.[CrossRef][Web of Science][Medline]
- Letunic I, Bork P. Interactive Tree Of Life (iTOL): an online tool for phylogenetic tree display and annotation. Bioinformatics (2007) 23:127–128.
[Abstract/Free Full Text] - Frickey T, Lupas AN. PhyloGenie: automated phylome generation and analysis. Nucleic Acids Res. (2004) 32:5231–5238.
[Abstract/Free Full Text] - Pinney JW, Shirley MW, McConkey GA, Westhead DR. metaSHARK: software for automated metabolic network prediction from DNA sequence and its application to the genomes of Plasmodium falciparum and Eimeria tenella. Nucleic Acids Res. (2005) 33:1399–1409.
[Abstract/Free Full Text] - Aguero F, Zheng W, Weatherly DB, Mendes P, Kissinger JC. TcruziDB: an integrated, post-genomics community resource for Trypanosoma cruzi. Nucleic Acids Res. (2006) 34:D428–D431.
[Abstract/Free Full Text] - Arnaud MB, Costanzo MC, Skrzypek MS, Shah P, Binkley G, Lane C, Miyasato SR, Sherlock G. Sequence resources at the Candida Genome Database. Nucleic Acids Res. (2007) 35:D452–D456.
[Abstract/Free Full Text] - Bahl A, Brunk B, Crabtree J, Fraunholz MJ, Gajria B, Grant GR, Ginsburg H, Gupta D, Kissinger JC, Labo P, et al. PlasmoDB: the Plasmodium genome resource. A database integrating experimental and computational data. Nucleic Acids Res. (2003) 31:212–215.
[Abstract/Free Full Text] - Benson DA, Karsch-Mizrachi I, Lipman DJ, Ostell J, Wheeler DL. GenBank. Nucleic Acids Res. (2008) 36:D25–D30.
[Abstract/Free Full Text] - Chisholm RL, Gaudet P, Just EM, Pilcher KE, Fey P, Merchant SN, Kibbe WA. dictyBase, the model organism database for Dictyostelium discoideum. Nucleic Acids Res. (2006) 34:D423–D427.
[Abstract/Free Full Text] - Gajria B, Bahl A, Brestelli J, Dommer J, Fischer S, Gao X, Heiges M, Iodice J, Kissinger JC, Mackey AJ, et al. ToxoDB: an integrated Toxoplasma gondii database resource. Nucleic Acids Res. (2008) 36:D553–D556.
[Abstract/Free Full Text] - Heiges M, Wang H, Robinson E, Aurrecoechea C, Gao X, Kaluskar N, Rhodes P, Wang S, He C.-Z, Su Y, et al. CryptoDB: a Cryptosporidium bioinformatics resource update. Nucleic Acids Res. (2006) 34:D419–D422.
[Abstract/Free Full Text] - Pruitt KD, Tatusova T, Maglott DR. NCBI reference sequences (RefSeq): a curated non-redundant sequence database of genomes, transcripts and proteins. Nucleic Acids Res. (2007) 35:D61–D65.
[Abstract/Free Full Text] - Sherman D, Durrens P, Iragne F, Beyne E, Nikolski M, Souciet J.-L. Genolevures complete genomes provide data and tools for comparative genomics of hemiascomycetous yeasts. Nucleic Acids Res. (2006) 34:D432–D435.
[Abstract/Free Full Text] - OBrien EA, Koski LB, Zhang Y, Yang L, Wang E, Gray MW, Burger G, Lang BF. TBestDB: a taxonomically broad database of expressed sequence tags (ESTs). Nucleic Acids Res. (2007) 35:D445–D451.
[Abstract/Free Full Text] - Altschul SF, Madden TL, Schaffer AA, Zhang J, Zhang Z, Miller W, Lipman DJ. Gapped BLAST and PSI-BLAST: a new generation of protein database search programs. Nucleic Acids Res. (1997) 25:3389–3402.
[Abstract/Free Full Text] - Claudel-Renard C, Chevalet C, Faraut T, Kahn D. Enzyme-specific profiles for genome annotation: PRIAM. Nucleic Acids Res. (2003) 31:6633–6639.
[Abstract/Free Full Text] - Talavera G, Castresana J. Improvement of phylogenies after removing divergent and ambiguously aligned blocks from protein sequence alignments. Syst. Biol. (2007) 56:564–577.[CrossRef][Web of Science][Medline]
- Edgar RC. MUSCLE: multiple sequence alignment with high accuracy and high throughput. Nucleic Acids Res. (2004) 32:1792–1797.
[Abstract/Free Full Text] - Guindon S, Gascuel O. A simple, fast, and accurate algorithm to estimate large phylogenies by maximum likelihood. Syst. Biol. (2003) 52:696–704.[CrossRef][Web of Science][Medline]
- Huerta-Cepas J, Dopazo H, Dopazo J, Gabaldon T. The human phylome. Genome Biol. (2007) 8:R109.[CrossRef][Medline]
- Torres M, Vieira C, Gonçalves G, Junior Z. Brazilian Symposium on Bioinformatics 2007 (2007) Brazil. 115–127.
- Ciccarelli FD, Doerks T, von Mering C, Creevey CJ, Snel B, Bork P. Toward automatic reconstruction of a highly resolved tree of life. Science (2006) 311:1283–1287.
[Abstract/Free Full Text] - Striepen B, Pruijssers AJ, Huang J, Li C, Gubbels MJ, Umejiego NN, Hedstrom L, Kissinger JC. Gene transfer in the evolution of parasite nucleotide biosynthesis. Proc. Natl Acad. Sci. USA (2004) 101:3154–3159.
[Abstract/Free Full Text] - Richards TA, Dacks JB, Campbell SA, Blanchard JL, Foster PG, McLeod R, Roberts CW. Evolutionary origins of the eukaryotic shikimate pathway: gene fusions, horizontal gene transfer, and endosymbiotic replacements. Eukaryot. Cell (2006) 5:1517–1531.
[Abstract/Free Full Text] - Campbell SA, Richards TA, Mui EJ, Samuel BU, Coggins JR, McLeod R, Roberts CW. A complete shikimate pathway in Toxoplasma gondii: an ancient eukaryotic innovation. Int. J. Parasitol. (2004) 34:5–13.[CrossRef][Web of Science][Medline]
- Nosenko T, Bhattacharya D. Horizontal gene transfer in chromalveolates. BMC Evol. Biol. (2007) 7:173.[CrossRef][Medline]
- Abrahamsen MS, Templeton TJ, Enomoto S, Abrahante JE, Zhu G, Lancto CA, Deng M, Liu C, Widmer G, Tzipori S, et al. Complete genome sequence of the apicomplexan, Cryptosporidium parvum. Science (2004) 304:441–445.
[Abstract/Free Full Text] - McConkey GA, Pinney JW, Westhead DR, Plueckhahn K, Fitzpatrick TB, Macheroux P, Kappes B. Annotating the Plasmodium genome and the enigma of the shikimate pathway. Trends Parasitol. (2004) 20:60–65.[CrossRef][Web of Science][Medline]
- McRobert L, Jiang S, Stead A, McConkey GA. Plasmodium falciparum: Interaction of shikimate analogues with antimalarial drugs. Exp. Parasitol. (2005) 111:178.[CrossRef][Web of Science][Medline]
- Huang J, Mullapudi N, Lancto CA, Scott M, Abrahamsen MS, Kissinger JC. Phylogenomic evidence supports past endosymbiosis, intracellular and horizontal gene transfer in Cryptosporidium parvum. Genome Biol. (2004) 5:R88.[CrossRef][Medline]
- Huang J, Gogarten JP. Did an ancient chlamydial endosymbiosis facilitate the establishment of primary plastids? Genome Biol. (2007) 8:R99.[CrossRef][Medline]
- Hillis DM, Bull JJ. An empirical test of bootstrapping as a method for assessing confidence in phylogenetic analysis. Syst. Biol. (1993) 42:182.
- Huang J, Mullapudi N, Sicheritz-Ponten T, Kissinger JC. A first glimpse into the pattern and scale of gene transfer in Apicomplexa. Int. J. Parasitol. (2004) 34:265–274.[CrossRef][Web of Science][Medline]
- Gardner MJ, Hall N, Fung E, White O, Berriman M, Hyman RW, Carlton JM, Pain A, Nelson KE, Bowman S, et al. Genome sequence of the human malaria parasite Plasmodium falciparum. Nature (2002) 419:498.[CrossRef][Web of Science][Medline]
- Roos DS, Crawford MJ, Donald RG, Fraunholz M, Harb OS, He CY, Kissinger JC, Shaw MK, Striepen B. Mining the Plasmodium genome database to define organellar function: what does the apicoplast do? Philos. Trans. R Soc. Lond. B Biol. Sci. (2002) 357:35–46.
[Abstract/Free Full Text] - Reyes P, Rathod PK, Sanchez DJ, Mrema JE, Rieckmann KH, Heidrich HG. Enzymes of purine and pyrimidine metabolism from the human malaria parasite, Plasmodium falciparum. Mol. Biochem. Parasitol. (1982) 5:275–290.[CrossRef][Web of Science][Medline]
- Walsh CJ, Sherman IW. Isolation, characterization and synthesis of DNA from a malaria parasite. J. Protozool. (1968) 15:503–508.[Medline]
| ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||

