Skip Navigation


Nucleic Acids Research Advance Access originally published online on November 9, 2009
Nucleic Acids Research 2010 38(Database issue):D190-D195; doi:10.1093/nar/gkp951
This Article
Right arrow Abstract Freely available
Right arrow Print PDF (1607K) Freely available
Right arrow Screen PDF (904K) Freely available
Right arrowOA All Versions of this Article:
38/suppl_1/D190    most recent
gkp951v1
Right arrow Alert me when this article is cited
Right arrow Alert me if a correction is posted
Services
Right arrow Email this article to a friend
Right arrow Similar articles in this journal
Right arrow Similar articles in PubMed
Right arrow Alert me to new issues of the journal
Right arrow Add to My Personal Archive
Right arrow Download to citation manager
Citing Articles
Right arrowScopus Links
Right arrow Commercial Re-use Guidelines
for Open Access NAR Content
Google Scholar
Right arrow Articles by Muller, J.
Right arrow Articles by Bork, P.
PubMed
Right arrow PubMed Citation
Right arrow Articles by Muller, J.
Right arrow Articles by Bork, P.
Social Bookmarking
 Add to CiteULike   Add to Connotea   Add to Del.icio.us  
What's this?

Nucleic Acids Research, 2010, Vol. 38, Database issue D190-D195
© The Author(s) 2009. Published by Oxford University Press.
This is an Open Access article distributed under the terms of the Creative Commons Attribution Non-Commercial License (http://creativecommons.org/licenses/by-nc/2.5/uk/) which permits unrestricted non-commercial use, distribution, and reproduction in any medium, provided the original work is properly cited.

This article appears in the following Nucleic Acids Research issue: Database issue [View the issue table of contents]

Articles

eggNOG v2.0: extending the evolutionary genealogy of genes with enhanced non-supervised orthologous groups, species and functional annotations

J. Muller1, D. Szklarczyk1,2, P. Julien3, I. Letunic1, A. Roth4, M. Kuhn1, S. Powell1, C. von Mering4, T. Doerks1, L. J. Jensen2 and P. Bork1,5,*

1European Molecular Biology Laboratory, Meyerhofstrasse 1, 69117 Heidelberg, Germany, 2Novo Nordisk Foundation Center for Protein Research, Faculty of Health Sciences, University of Copenhagen, 2200 Copenhagen N, Denmark, 3The Center for Integrative Genomics, University of Lausanne, Lausanne, 4University of Zurich and Swiss Institute of Bioinformatics, Winterthurerstrasse 190, 8057 Zurich, Switzerland and 5Max-Delbrück-Centre for Molecular Medicine, Robert-Rössle-Strasse 10, 13092 Berlin, Germany

*To whom correspondence should be addressed. Tel: +49 6221 3878526; Fax: +49 6221 387519; Email: bork{at}embl.de

Received September 15, 2009. Revised October 9, 2009. Accepted October 12, 2009.


    ABSTRACT
 TOP
 ABSTRACT
 INTRODUCTION
 CONSTRUCTION OF HIERARCHICAL OGs
 ANNOTATION RESULTS
 ACCESS OPTIONS
 CONCLUSIONS/PERSPECTIVES
 FUNDING
 REFERENCES
 
The identification of orthologous relationships forms the basis for most comparative genomics studies. Here, we present the second version of the eggNOG database, which contains orthologous groups (OGs) constructed through identification of reciprocal best BLAST matches and triangular linkage clustering. We applied this procedure to 630 complete genomes (529 bacteria, 46 archaea and 55 eukaryotes), which is a 2-fold increase relative to the previous version. The pipeline yielded 224 847 OGs, including 9724 extended versions of the original COG and KOG. We computed OGs for different levels of the tree of life; in addition to the species groups included in our first release (i.e. fungi, metazoa, insects, vertebrates and mammals), we have now constructed OGs for archaea, fishes, rodents and primates. We automatically annotate the non-supervised orthologous groups (NOGs) with functional descriptions, protein domains, and functional categories as defined initially for the COG/KOG database. In-depth analysis is facilitated by precomputed high-quality multiple sequence alignments and maximum-likelihood trees for each of the available OGs. Altogether, eggNOG covers 2 242 035 proteins (built from 2 590 259 proteins) and provides a broad functional description for at least 1 966 709 (88%) of them. Users can access the complete set of orthologous groups via a web interface at: http://eggnog.embl.de.


    INTRODUCTION
 TOP
 ABSTRACT
 INTRODUCTION
 CONSTRUCTION OF HIERARCHICAL OGs
 ANNOTATION RESULTS
 ACCESS OPTIONS
 CONCLUSIONS/PERSPECTIVES
 FUNDING
 REFERENCES
 
Next-generation sequencing technologies are now generating a vast amount of sequence data. This leads to a dramatic increase in the number of predicted protein sequences, which serve as a starting point for structural, functional and phylogenomic studies. In such studies, high-throughput comparative analyses are often required to transfer information between organisms, for which the concept of orthology is crucial. The original definition by Fitch (1) describes orthologs as genes that diverged through a speciation event, as opposed to paralogs, which diverged after a duplication event. This has been extended and refined by introducing the concepts of orthologous groups (OGs) (2), in-paralogs and out-paralogs (3,4). In practice, however, the identification and classification of homologous genes remain very difficult and rely on operational definitions. An enormous effort is being put into the development of different approaches to establish orthologous relationships between genes from different genomes. This includes several algorithms using the simple graph-based methods, including reciprocal-best-hit approach (5), identification of best-hit triangles (2,6–8) and clustering-based approaches (9–11) as well as tree-based methods (12–16).

In addition to the quality of the grouping of genes, the practical usability of OGs is determined by the ability to provide a robust functional annotation. Thus, newer projects not only aggregate orthology information from various sources to allow comparison between methods but also aim to provide annotation tools (17,18). Nevertheless, evolutionary genealogy of genes: non-supervised OGs (eggNOG) (19) and the COG/KOG/arCOG resources (2,6,7) are still the only databases providing explicit functional annotations for the OGs at different hierarchical levels, whereby the COG/KOG resource is based on a robust manual expert annotation, which eggNOG is using and automatically extending (19).

Here, we describe the new features of the second version of eggNOG, a resource that provides OGs from the three domains of life at several levels of resolution. eggNOG v2 contains twice as many species and proteins as the previous version, additional hierarchical levels allowing higher resolution for a number of taxonomic groups, new annotation sources and an extended interface for an in-depth analysis of orthologous relationships.


    CONSTRUCTION OF HIERARCHICAL OGs
 TOP
 ABSTRACT
 INTRODUCTION
 CONSTRUCTION OF HIERARCHICAL OGs
 ANNOTATION RESULTS
 ACCESS OPTIONS
 CONCLUSIONS/PERSPECTIVES
 FUNDING
 REFERENCES
 
The automated procedure described previously (19) has been used to assemble proteins into OGs from 630 complete genomes (529 bacteria, 46 archaea and 55 eukaryotes). Complete proteomes were downloaded from the RefSeq (20), Ensembl (21), GiardiaDB (22) or TAIR (23) databases. This particular data set also forms the basis for STRING v8 (24) and STITCH v2 (25), allowing for easy integration across these databases.

Altogether, the protein data set covers 2 590 259 proteins of which 2 242 035 (87%) were included in at least one of 224 847 OGs generated by eggNOG. The growing number of species and proteins included in this release drastically increased the computational time. All-against-all similarity searches have therefore been performed using Basic Local Alignment Search Tool (BLAST) (26) instead of the Smith–Waterman algorithm (27).

Compared to the 4873 COGs and the 4850 KOGs that are constructed across all three domains of life and for all eukaryotes, respectively, this procedure assembles additional proteins into NOGs (440 359 proteins into 59 497 NOGs and 181 427 into 17 845 euNOGs). These complement the published COGs and KOGs built respectively for 66 and seven species (6), which are extended in eggNOG to cover 630 species encompassing, respectively, 1 547 381 and 483 043 proteins.

To provide a higher resolution of OGs in frequently used taxonomic groupings, we applied our procedure to several subsets of organisms separately. We updated the previously computed more fine-grained NOGs at the level of fungi (fuNOGs), metazoans (meNOGs), insects (inNOGs), vertebrates (veNOGs) and mammals (maNOGs) and added groups for archaea (arNOGs), fishes (fiNOGs), rodents (roNOGs) and primates (prNOGs).

Extending the automated annotation of protein function
An important feature of eggNOG is the functional annotations of the OGs. Our original pipeline, providing functional descriptions for the NOGs, is now complemented by an automatic inference of functional categories (FCs) which were taken from the COG database (2). The 25 FCs available from the COG resource have been widely used to assess comparative genomics studies and will enable higher-order analyses of OGs identified in any data set.

We use two complementary methods to infer FCs of OGs based on the 4617 COGs (used for NOGs and arNOGs) and 4381 KOGs (used for all other OGs). The first method uses Support Vector Machines (SVM) trained on the COGs and KOGs to classify NOGs into the 25 FCs based on feature vectors. Two feature vectors were created for each OG. One was built from functional information mapped onto the eggNOG protein data set, including KEGG pathways and modules (28), GO terms (29), SMART domains (30), PFAM domains (31), UniProt keywords (32) and words from UniProt/RefSeq (20) description lines. The second feature vector includes also words from MEDLINE abstracts referring to a particular protein (24). Each attribute in the feature vector encodes the fraction of proteins in the group having the feature in question.

The second method for assigning FCs makes use of the hierarchical structure of eggNOG, namely that the same proteins can be assigned to OGs at several levels in the tree of life (e.g. a KOG and a meNOG). In case an FC could not be assigned to a NOG by the SVM method, we check if most of the proteins in the NOG belong to a common functionally annotated COG or KOG, in which case we transfer the FCs from the coarse-grained level (COGs or KOGs) to the more fine-grained one (e.g. arNOGs or meNOGs). The assignment of an FC to a single NOG is achieved on the basis of a coverage value determined by the occurrence of that FC (via the proteins shared with the reference level) in respect to the total number of proteins in that NOG.


    ANNOTATION RESULTS
 TOP
 ABSTRACT
 INTRODUCTION
 CONSTRUCTION OF HIERARCHICAL OGs
 ANNOTATION RESULTS
 ACCESS OPTIONS
 CONCLUSIONS/PERSPECTIVES
 FUNDING
 REFERENCES
 
In addition to providing functional annotations via description lines for many NOGs (19), we are now able to predict functional categories as well. At the universal level, our function annotation pipeline provides description lines for 14 956 (25%) and an FC for 6262 (11%) of the 59 497 coarse-grained NOGs. At the eukaryotic level, 7566 euNOGs (52%) have a description line and 4120 (34%) have an FC. In addition, eggNOG contains 137 782 more fine-grained OGs of which 100 750 (73%) and 89 232 (65%) have been annotated with a description line and an FC, respectively (Table 1).


View this table:
[in this window]
[in a new window]

 
Table 1. Annotation statistics at different taxonomic levels

 
This enables us to assign 2 242 035 of the 2 590 259 genes (87% of the genes in the analyzed genomes) to an OG and to provide at least a broad functional description or FC for 1 966 709 of them (78% of the genes that could be assigned to an OG). The corresponding numbers for each set of OGs as well as for each individual genome are summarized in Figure 1.


Figure 1
View larger version (97K):
[in this window]
[in a new window]
[Download PowerPoint slide]
 
Figure 1. Statistics on the content of the eggNOG database. The eggNOG assignments for 630 complete genomes were mapped onto the tree of life. The stacked bar charts outside the tree show the proportion of genes from each genome that can be assigned to a functionally annotated orthologous group (green), an unannotated orthologous group (orange) or no orthologous group (gray). The length of each bar is proportional to the logarithm of the number of genes in the respective genome. The pie charts inside the tree show the fractions of orthologous groups at each level in the hierarchy that could be annotated with a functional category (green for NOGs, light green for extended COGs and KOGs) or not (orange for NOGs, light orange for extended COGs and KOGs). An interactive version is available in the ‘Overview’ section at: http://eggnog.embl.de. This figure was made using iTOL.

 
Extended features in eggNOG v2.0
To facilitate the in-depth analysis of the orthologous relationships within the groups of proteins, we now provide precomputed high-quality Multiple Sequence Alignments (MSAs) and maximum-likelihood trees via the web interface (Figure 2).


Figure 2
View larger version (44K):
[in this window]
[in a new window]
[Download PowerPoint slide]
 
Figure 2. Screenshot of the detailed results page. The eggNOG database was queried for the term ‘mTERF’, the mitochondrial precursor of the transcription termination factor 1. The navigation tree at the top of the page allows the user to change the view to more coarse-grained orthologous groups, for example, the mammalian orthologous groups. The tab menu, shown here, enables several in-depth interactions with the new data (i.e. MSA or phylogenetic trees, here displayed with SMART domains).

 
Numerous methods are available to build MSAs [e.g. ClustalW (33), Muscle (34), MAFFT (35) and PRANK (36)] but some programs appear to be more suitable for particular protein families than others (37). Thus, we applied a new approach, named Automated QUality improvement for multiple sequence Alignments (AQUA) (Muller et al., submitted for publication), which combines existing tools to deliver high-quality MSAs.

The construction of the different phylogenetic trees was carried out using the following steps. One hundred bootstrap replicates were created from the MSA using the SEQBOOT program from the Phylip package (38). Following this, PhyML (39) was used to find the maximum-likelihood tree for each of the 100 bootstrap replicates and for the original alignment using default parameters. Finally, a consensus tree was constructed, using the CONSENSE program from the Phylip package. We used ReadSeq (40) to convert between the different sequence file formats used by those programs.


    ACCESS OPTIONS
 TOP
 ABSTRACT
 INTRODUCTION
 CONSTRUCTION OF HIERARCHICAL OGs
 ANNOTATION RESULTS
 ACCESS OPTIONS
 CONCLUSIONS/PERSPECTIVES
 FUNDING
 REFERENCES
 
The eggNOG resource can be queried via a web interface; data can be downloaded under the Creative Commons Attribution 3.0 License at: http://eggnog.embl.de or via FTP at: ftp://eggnog.embl.de/eggNOG/2.0/. Gene and protein names, database identifiers, amino acid sequences, or OG names can be used to query the database. As a default, the most fine-grained OGs available are displayed for maximal resolution. The user can navigate among the different levels of orthology using an available guide-tree of organisms to find the desired balance between phylogenetic coverage and functional specificity within our hierarchy of OGs. Through the new interface, users can access different information panels encompassing the detailed list of proteins belonging to a particular OG as well as the corresponding MSA and phylogenetic tree. The MSA can be interactively displayed using the Jalview applet (41) or downloaded in FASTA format. The phylogenetic trees are accessed through a dedicated iTOL (42) viewer together with mapped PFAM and SMART domains, via the ATV program applet (43), or can be downloaded in Newick format.


    CONCLUSIONS/PERSPECTIVES
 TOP
 ABSTRACT
 INTRODUCTION
 CONSTRUCTION OF HIERARCHICAL OGs
 ANNOTATION RESULTS
 ACCESS OPTIONS
 CONCLUSIONS/PERSPECTIVES
 FUNDING
 REFERENCES
 
With 630 genomes covered, an increased OG hierarchy, and a high coverage of newly categorized functional annotation, the new version of eggNOG is one of the most comprehensive and complete resources for deciphering the orthologous relationships between proteins from various species. The changes and improvements in the interface and the availability of the OGs for download will not only facilitate the daily use of the database, but also the integration of eggNOG in high-throughput comparative genomics studies. Our future plans include the addition of more complete genomes and development of a more scalable and flexible pipeline for generating the groups.


    FUNDING
 TOP
 ABSTRACT
 INTRODUCTION
 CONSTRUCTION OF HIERARCHICAL OGs
 ANNOTATION RESULTS
 ACCESS OPTIONS
 CONCLUSIONS/PERSPECTIVES
 FUNDING
 REFERENCES
 
EMBL, the European Commission Programme, Eurasnet EU [Grant LSHG-CT-2005-518238 (FP6), IMPACT 213037 (FP7)]; the Novo Nordisk Foundation Center for Protein Research, the Swiss Institute of Bioinformatics; and the University of Zurich (partial, through its Research Priority Program ‘Systems Biology and Functional Genomics’). This work was supported in part by the Bundesministerium fuer Bildung und Forschung (Nationales Genomforschungsnetz Foerderkennzeichen 01GS08169). Funding for open access charge: European Molecular Biology Laboratory.

Conflict of interest statement. None declared.


    Footnotes
 
Present address: M. Kuhn, Biotechnology Center, TU Dresden, 01062 Dresden, Germany.


    REFERENCES
 TOP
 ABSTRACT
 INTRODUCTION
 CONSTRUCTION OF HIERARCHICAL OGs
 ANNOTATION RESULTS
 ACCESS OPTIONS
 CONCLUSIONS/PERSPECTIVES
 FUNDING
 REFERENCES
 

  1. Fitch WM. Distinguishing homologous from analogous proteins. Syst. Zool. (1970) 19:99–113.[Abstract/Free Full Text]

  2. Tatusov RL, Koonin EV, Lipman DJ. A genomic perspective on protein families. Science (1997) 278:631–637.[Abstract/Free Full Text]

  3. Koonin EV. Orthologs, paralogs, and evolutionary genomics. Annu. Rev. Genet. (2005) 39:309–338.[CrossRef][Web of Science][Medline]

  4. Sonnhammer EL, Koonin EV. Orthology, paralogy and proposed classification for paralog subtypes. Trends Genet. (2002) 18:619–620.[CrossRef][Web of Science][Medline]

  5. Berglund AC, Sjolund E, Ostlund G, Sonnhammer EL. InParanoid 6: eukaryotic ortholog clusters with inparalogs. Nucleic Acids Res. (2008) 36:D263–D266.[Abstract/Free Full Text]

  6. Tatusov RL, Fedorova ND, Jackson JD, Jacobs AR, Kiryutin B, Koonin EV, Krylov DM, Mazumder R, Mekhedov SL, Nikolskaya AN, et al. The COG database: an updated version includes eukaryotes. BMC Bioinformatics (2003) 4:41.[CrossRef][Medline]

  7. Makarova KS, Sorokin AV, Novichkov PS, Wolf YI, Koonin EV. Clusters of orthologous genes for 41 archaeal genomes and implications for evolutionary genomics of archaea. Biol. Direct. (2007) 2:33.[CrossRef][Medline]

  8. Kriventseva EV, Rahman N, Espinosa O, Zdobnov EM. OrthoDB: the hierarchical catalog of eukaryotic orthologs. Nucleic Acids Res. (2008) 36:D271–D275.[Abstract/Free Full Text]

  9. Roth AC, Gonnet GH, Dessimoz C. Algorithm of OMA for large-scale orthology inference. BMC Bioinformatics (2008) 9:518.[CrossRef][Medline]

  10. Li L, Stoeckert CJ Jr, Roos DS. OrthoMCL: identification of ortholog groups for eukaryotic genomes. Genome Res. (2003) 13:2178–2189.[Abstract/Free Full Text]

  11. Uchiyama I. MBGD: a platform for microbial comparative genomics based on the automated construction of orthologous groups. Nucleic Acids Res. (2007) D343–D346.

  12. van der Heijden RT, Snel B, van Noort V, Huynen MA. Orthology prediction at scalable resolution by phylogenetic tree analysis. BMC Bioinformatics (2007) 8:83.[CrossRef][Medline]

  13. Wapinski I, Pfeffer A, Friedman N, Regev A. Automatic genome-wide reconstruction of phylogenetic gene trees. Bioinformatics (2007) 23:i549–i558.[Abstract/Free Full Text]

  14. Huerta-Cepas J, Bueno A, Dopazo J, Gabaldon T. PhylomeDB: a database for genome-wide collections of gene phylogenies. Nucleic Acids Res. (2008) 36:D491–D496.[Abstract/Free Full Text]

  15. Vilella AJ, Severin J, Ureta-Vidal A, Heng L, Durbin R, Birney E. EnsemblCompara GeneTrees: complete, duplication-aware phylogenetic trees in vertebrates. Genome Res. (2009) 19:327–335.[Abstract/Free Full Text]

  16. Datta RS, Meacham C, Samad B, Neyer C, Sjolander K. Berkeley PHOG: PhyloFacts orthology group prediction web server. Nucleic Acids Res. (2009) W84–W89.

  17. Eyre TA, Wright MW, Lush MJ, Bruford EA. HCOP: a searchable database of human orthology predictions. Brief Bioinform. (2007) 8:2–5.[Abstract/Free Full Text]

  18. Kuzniar A, Lin K, He Y, Nijveen H, Pongor S, Leunissen JA. ProGMap: an integrated annotation resource for protein orthology. Nucleic Acids Res. (2009) 37:W428–W434.[Abstract/Free Full Text]

  19. Jensen LJ, Julien P, Kuhn M, von Mering C, Muller J, Doerks T, Bork P. eggNOG: automated construction and annotation of orthologous groups of genes. Nucleic Acids Res. (2008) 36:D250–D254.[Abstract/Free Full Text]

  20. Pruitt KD, Tatusova T, Maglott DR. NCBI reference sequences (RefSeq): a curated non-redundant sequence database of genomes, transcripts and proteins. Nucleic Acids Res. (2007) 35:D61–D65.[Abstract/Free Full Text]

  21. Hubbard TJ, Aken BL, Ayling S, Ballester B, Beal K, Bragin E, Brent S, Chen Y, Clapham P, Clarke L, et al. Ensembl 2009. Nucleic Acids Res. (2009) 37:D690–D697.[Abstract/Free Full Text]

  22. Aurrecoechea C, Brestelli J, Brunk BP, Carlton JM, Dommer J, Fischer S, Gajria B, Gao X, Gingle A, Grant G, et al. GiardiaDB and TrichDB: integrated genomic resources for the eukaryotic protist pathogens Giardia lamblia and Trichomonas vaginalis. Nucleic Acids Res. (2009) 37:D526–D530.[Abstract/Free Full Text]

  23. Swarbreck D, Wilks C, Lamesch P, Berardini TZ, Garcia-Hernandez M, Foerster H, Li D, Meyer T, Muller R, Ploetz L, et al. The Arabidopsis Information Resource (TAIR): gene structure and function annotation. Nucleic Acids Res. (2008) 36:D1009–D1014.[Abstract/Free Full Text]

  24. Jensen LJ, Kuhn M, Stark M, Chaffron S, Creevey C, Muller J, Doerks T, Julien P, Roth A, Simonovic M, et al. STRING 8–a global view on proteins and their functional interactions in 630 organisms. Nucleic Acids Res. (2009) 37:D412–D416.[Abstract/Free Full Text]

  25. Kuhn M, von Mering C, Campillos M, Jensen LJ, Bork P. STITCH: interaction networks of chemicals and proteins. Nucleic Acids Res. (2008) 36:D684–D688.[Abstract/Free Full Text]

  26. Altschul SF, Madden TL, Schaffer AA, Zhang J, Zhang Z, Miller W, Lipman DJ. Gapped BLAST and PSI-BLAST: a new generation of protein database search programs. Nucleic Acids Res. (1997) 25:3389–3402.[Abstract/Free Full Text]

  27. Saebo PE, Andersen SM, Myrseth J, Laerdahl JK, Rognes T. PARALIGN: rapid and sensitive sequence similarity searches powered by parallel computing technology. Nucleic Acids Res. (2005) 33:W535–W539.[Abstract/Free Full Text]

  28. Kanehisa M, Araki M, Goto S, Hattori M, Hirakawa M, Itoh M, Katayama T, Kawashima S, Okuda S, Tokimatsu T, et al. KEGG for linking genomes to life and the environment. Nucleic Acids Res. (2008) 36:D480–D484.[Abstract/Free Full Text]

  29. Harris MA, Clark J, Ireland A, Lomax J, Ashburner M, Foulger R, Eilbeck K, Lewis S, Marshall B, Mungall C, et al. The Gene Ontology (GO) database and informatics resource. Nucleic Acids Res. (2004) 32:D258–D261.[Abstract/Free Full Text]

  30. Letunic I, Doerks T, Bork P. SMART 6: recent updates and new developments. Nucleic Acids Res. (2009) 37:D229–D232.[Abstract/Free Full Text]

  31. Finn RD, Tate J, Mistry J, Coggill PC, Sammut SJ, Hotz HR, Ceric G, Forslund K, Eddy SR, Sonnhammer EL, et al. The Pfam protein families database. Nucleic Acids Res. (2008) 36:D281–D288.[Abstract/Free Full Text]

  32. The Universal Protein Resource (UniProt). Nucleic Acids Res. (2009) 37:D169–D174.[Abstract/Free Full Text]

  33. Thompson JD, Higgins DG, Gibson TJ. CLUSTAL W: improving the sensitivity of progressive multiple sequence alignment through sequence weighting, position-specific gap penalties and weight matrix choice. Nucleic Acids Res. (1994) 22:4673–4680.[Abstract/Free Full Text]

  34. Edgar RC. MUSCLE: multiple sequence alignment with high accuracy and high throughput. Nucleic Acids Res. (2004) 32:1792–1797.[Abstract/Free Full Text]

  35. Katoh K, Toh H. Recent developments in the MAFFT multiple sequence alignment program. Brief Bioinform. (2008) 9:286–298.[Abstract/Free Full Text]

  36. Loytynoja A, Goldman N. Phylogeny-aware gap placement prevents errors in sequence alignment and evolutionary analysis. Science (2008) 320:1632–1635.[Abstract/Free Full Text]

  37. Thompson JD, Koehl P, Ripp R, Poch O. BAliBASE 3.0: latest developments of the multiple sequence alignment benchmark. Proteins (2005) 61:127–136.[CrossRef][Web of Science][Medline]

  38. Felsenstein J. PHYLIP – Phylogeny Inference Package (Version 3.2). Cladistics (1989) 5:164–166.

  39. Guindon S, Gascuel O. A simple, fast, and accurate algorithm to estimate large phylogenies by maximum likelihood. Syst. Biol. (2003) 52:696–704.[Abstract/Free Full Text]

  40. Gilbert D. Sequence file format conversion with command-line readseq. Curr. Protoc. Bioinformatics (2003) Appendix 1, Appendix 1E.

  41. Waterhouse AM, Procter JB, Martin DM, Clamp M, Barton GJ. Jalview Version 2–a multiple sequence alignment editor and analysis workbench. Bioinformatics (2009) 25:1189–1191.[Abstract/Free Full Text]

  42. Letunic I, Bork P. Interactive Tree Of Life (iTOL): an online tool for phylogenetic tree display and annotation. Bioinformatics (2007) 23:127–128.[Abstract/Free Full Text]

  43. Zmasek CM, Eddy SR. ATV: display and manipulation of annotated phylogenetic trees. Bioinformatics (2001) 17:383–384.[Abstract/Free Full Text]


Add to CiteULike CiteULike   Add to Connotea Connotea   Add to Del.icio.us Del.icio.us    What's this?



This Article
Right arrow Abstract Freely available
Right arrow Print PDF (1607K) Freely available
Right arrow Screen PDF (904K) Freely available
Right arrowOA All Versions of this Article:
38/suppl_1/D190    most recent
gkp951v1
Right arrow Alert me when this article is cited
Right arrow Alert me if a correction is posted
Services
Right arrow Email this article to a friend
Right arrow Similar articles in this journal
Right arrow Similar articles in PubMed
Right arrow Alert me to new issues of the journal
Right arrow Add to My Personal Archive
Right arrow Download to citation manager
Right arrowScopus Links
Right arrow Commercial Re-use Guidelines
for Open Access NAR Content
Google Scholar
Right arrow Articles by Muller, J.
Right arrow Articles by Bork, P.
PubMed
Right arrow PubMed Citation
Right arrow Articles by Muller, J.
Right arrow Articles by Bork, P.
Social Bookmarking
 Add to CiteULike   Add to Connotea   Add to Del.icio.us  
What's this?