Nucleic Acids Research Advance Access originally published online on February 19, 2008
Nucleic Acids Research 2008 36(7):2230-2239; doi:10.1093/nar/gkn038
| ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
Nucleic Acids Research, 2008, Vol. 36, No. 7 2230-2239
© 2008 The Author(s)
This is an Open Access article distributed under the terms of the Creative Commons Attribution Non-Commercial License (http://creativecommons.org/licenses/by-nc/2.0/uk/) which permits unrestricted non-commercial use, distribution, and reproduction in any medium, provided the original work is properly cited.
Computational Biology |
Phylogenetic classification of short environmental DNA fragments
1Center for Biotechnology (CeBiTec), 2Bioinformatics Resource Facility (BRF), Bielefeld University, D-33594 Bielefeld, Germany, 3Department of Biology, San Diego State University, 4Center for Microbial Sciences, San Diego, CA, USA, 5Applied Neuroinformatics Group, Bielefeld University, D-33594 Bielefeld, Germany, 6Department of Computer Science, San Diego State University, San Diego, CA, 7Mathematics & Computer Science Division, Argonne National Laboratory, Argonne, IL, USA and 8AG Genominformatik, Faculty of Technology, Bielefeld University, D-33594 Bielefeld, Germany
*To whom correspondence should be addressed. Tel: +49 521 106 4823; Fax: +49 521 106 6419; Email: lutz.krause{at}cebitec.uni-bielefeld.de
Received November 30, 2007. Revised January 18, 2008. Accepted January 22, 2008.
Metagenomics is providing striking insights into the ecology of microbial communities. The recently developed massively parallel 454 pyrosequencing technique gives the opportunity to rapidly obtain metagenomic sequences at a low cost and without cloning bias. However, the phylogenetic analysis of the short reads produced represents a significant computational challenge. The phylogenetic algorithm CARMA for predicting the source organisms of environmental 454 reads is described. The algorithm searches for conserved Pfam domain and protein families in the unassembled reads of a sample. These gene fragments (environmental gene tags, EGTs), are classified into a higher-order taxonomy based on the reconstruction of a phylogenetic tree of each matching Pfam family. The method exhibits high accuracy for a wide range of taxonomic groups, and EGTs as short as 27 amino acids can be phylogenetically classified up to the rank of genus. The algorithm was applied in a comparative study of three aquatic microbial samples obtained by 454 pyrosequencing. Profound differences in the taxonomic composition of these samples could be clearly revealed.
![]()
CiteULike
Connotea
Del.icio.us What's this?
This article has been cited by other articles:
![]() |
C. Manichanh, C. E. Chapple, L. Frangeul, K. Gloux, R. Guigo, and J. Dore A comparison of random sequence reads versus 16S rDNA sequences for estimating the biodiversity of a metagenomic library Nucleic Acids Res., August 5, 2008; (2008) gkn496v1. [Abstract] [Full Text] [PDF] |
||||
