Nucleic Acids Research Advance Access originally published online on May 5, 2009
Nucleic Acids Research 2009 37(10):e76; doi:10.1093/nar/gkp285
| ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
Nucleic Acids Research, 2009, Vol. 37, No. 10 e76
© 2009 The Author(s)
This is an Open Access article distributed under the terms of the Creative Commons Attribution Non-Commercial License (http://creativecommons.org/licenses/by-nc/2.0/uk/) which permits unrestricted non-commercial use, distribution, and reproduction in any medium, provided the original work is properly cited.
Methods Online |
ESPRIT: estimating species richness using large collections of 16S rRNA pyrosequences
1Interdisciplinary Center for Biotechnology Research, 2Department of Electrical and Computer Engineering, University of Florida, Gainesville, FL 32610-3622 and 3Materials Technology Directorate, Air Force Technical Applications Center, 1030 S. Highway A1A, Patrick AFB, FL 32925-3002, USA
*To whom correspondence should be addressed. Tel: +352-273-8065; Fax: +352-273-8070; Email: sunyijun{at}biotech.ufl.edu
Received January 28, 2009. Revised April 14, 2009. Accepted April 15, 2009.
Recent metagenomics studies of environmental samples suggested that microbial communities are much more diverse than previously reported, and deep sequencing will significantly increase the estimate of total species diversity. Massively parallel pyrosequencing technology enables ultra-deep sequencing of complex microbial populations rapidly and inexpensively. However, computational methods for analyzing large collections of 16S ribosomal sequences are limited. We proposed a new algorithm, referred to as ESPRIT, which addresses several computational issues with prior methods. We developed two versions of ESPRIT, one for personal computers (PCs) and one for computer clusters (CCs). The PC version is used for small- and medium-scale data sets and can process several tens of thousands of sequences within a few minutes, while the CC version is for large-scale problems and is able to analyze several hundreds of thousands of reads within one day. Large-scale experiments are presented that clearly demonstrate the effectiveness of the newly proposed algorithm. The source code and user guide are freely available at http://www.biotech.ufl.edu/people/sun/esprit.html.
The author wish it to be known that, in their opinion, the first two authors should be regarded as joint First Authors.