Skip Navigation

Nucleic Acids Research 2006 34(Web Server issue):W394-W399; doi:10.1093/nar/gkl244
This Article
Right arrow Abstract Freely available
Right arrow Print PDF (501K) Freely available
Right arrow Screen PDF (401K) Freely available
Right arrow Alert me when this article is cited
Right arrow Alert me if a correction is posted
Services
Right arrow Email this article to a friend
Right arrow Similar articles in this journal
Right arrow Similar articles in PubMed
Right arrow Alert me to new issues of the journal
Right arrow Add to My Personal Archive
Right arrow Download to citation manager
Right arrow Commercial Re-use Guidelines
for Open Access NAR Content
Google Scholar
Right arrow Articles by DeSantis, T. Z.
Right arrow Articles by Andersen, G. L.
Right arrow Search for Related Content
PubMed
Right arrow PubMed Citation
Right arrow Articles by DeSantis, T. Z., Jr
Right arrow Articles by Andersen, G. L.
Social Bookmarking
 Add to CiteULike   Add to Connotea   Add to Del.icio.us  
What's this?

© The Author 2006. Published by Oxford University Press. All rights reserved
The online version of this article has been published under an open access model. Users are entitled to use, reproduce, disseminate, or display the open access version of this article for non-commercial purposes provided that: the original authorship is properly and fully attributed; the Journal and Oxford University Press are attributed as the original place of publication with the correct citation details given; if an article is subsequently reproduced or disseminated not in its entirety but only in part or as a derivative work this must be clearly indicated. For commercial re-use, please contact journals.permissions@oxfordjournals.org


Article

NAST: a multiple sequence alignment server for comparative analysis of 16S rRNA genes

T. Z. DeSantis, Jr1,4,*, P. Hugenholtz2, K. Keller5,4, E. L. Brodie1, N. Larsen3, Y. M. Piceno1, R. Phan1,4 and G. L. Andersen1,4,*

1 Lawrence Berkeley National Laboratory, Center for Environmental Biotechnology Berkeley, CA, USA 2 DOE Joint Genome Institute, Microbial Ecology Program Walnut Creek, CA, USA 3 Danish Genome Institute Aarhus, Denmark 4 Lawrence Berkeley National Laboratory, Virtual Institute for Microbial Stress and Survival Berkeley, CA, USA 5 University of California, Quantitative Biomedical Research Berkeley, CA, USA

*To whom correspondence should be addressed. Email: tdesantis{at}lbl.gov

Received February 14, 2006. Revised March 8, 2006. Accepted March 29, 2006.


    ABSTRACT
 TOP
 ABSTRACT
 INTRODUCTION
 NAST ALGORITHM
 NAST WEB SERVER
 APPLICATIONS
 FUTURE DEVELOPMENT
 CONCLUSIONS
 REFERENCES
 
Microbiologists conducting surveys of bacterial and archaeal diversity often require comparative alignments of thousands of 16S rRNA genes collected from a sample. The computational resources and bioinformatics expertise required to construct such an alignment has inhibited high-throughput analysis. It was hypothesized that an online tool could be developed to efficiently align thousands of 16S rRNA genes via the NAST (Nearest Alignment Space Termination) algorithm for creating multiple sequence alignments (MSA). The tool was implemented with a web-interface at http://greengenes.lbl.gov/NAST. Each user-submitted sequence is compared with Greengenes' ‘Core Set’, comprising ~10 000 aligned non-chimeric sequences representative of the currently recognized diversity among bacteria and archaea. User sequences are oriented and paired with their closest match in the Core Set to serve as a template for inserting gap characters. Non-16S data (sequence from vector or surrounding genomic regions) are conveniently removed in the returned alignment. From the resulting MSA, distance matrices can be calculated for diversity estimates and organisms can be classified by taxonomy. The ability to align and categorize large sequence sets using a simple interface has enabled researchers with various experience levels to obtain bacterial and archaeal community profiles.


    INTRODUCTION
 TOP
 ABSTRACT
 INTRODUCTION
 NAST ALGORITHM
 NAST WEB SERVER
 APPLICATIONS
 FUTURE DEVELOPMENT
 CONCLUSIONS
 REFERENCES
 
DNA sequence information from the 1.5 kb small subunit 16S ribosomal RNA (rRNA) gene has been used to successfully identify and phylogenetically classify microorganisms from environmental and medical samples (13). In more ambitious efforts, the relative abundance of bacterial groups has been estimated by sequencing hundreds to thousands of 16S rRNA genes derived from a sample (49). However, a bottleneck in data analysis is encountered in creating multiple sequence alignments (MSA). The MSA is a common means of communicating a proposed positional homology among many genes using a column-by-column format. It can be stored and presented in a variety of formats but in all cases it represents a two-dimensional matrix with each row describing a gene and each column holding the nucleotide found at a certain position along the gene. Alignments are useful when gaps have been appropriately added to mark an inference of an insertion or deletion event where one sequence has a base while another sequence lacks a base at the corresponding position. This process yields sequence strings occupying an equal number of columns allowing the matrix to form a true rectangle. MSAs are desirable for annotation of conserved versus variable gene loci by observing heterogeneity along the columns, recruiting columns with sufficient data for inter-row (sequence) comparison, and calculating distance matrices for row clustering. ClustalW (10) is a commonly used progressive MSA method for inserting gaps into sequences to achieve perfect rectangles. Hundreds of diverse sequences can be aligned using this approach to establish a ‘profile’ alignment. Later, new sequences can be added to this profile without re-computing the optimal gap placements for the entire alignment.

Frequently, when adding a candidate sequence to a MSA profile, one or more internal insertions will be discovered which cannot be accommodated in the profile. This event requires a researcher to make one of two choices: (i) allow the column count to grow whenever an insertion is required, which requires each sequence to gain more characters or (ii) allow a local misalignment within a sequence (row) so that the insertion does not disrupt the entire alignment format of the profile. Until now, the choice readily available has been the former as implemented by ClustalW. Certain objectives are left unsatisfied by this approach. In some instances, the apparent need to create new columns in the MSA owes to the presence of poor quality sequences. If allowed, the MSA could expand to a cumbersome collection of unsubstantiated insertion inferences (gaps). Ongoing comparative sequence analysis projects benefit from having a fixed column count in the MSA, enabling unchanging annotation of position-dependent features such as primer annealing locations, secondary structures and column masks. Furthermore, collaborative MSA construction becomes problematic when copies of a single profile diverge in column content as individual researchers add their own unique data. To enable fixed column counts, allow piecemeal MSA curation and support collaborations in comparative genomics the local misalignment approach is now available and implemented via NAST (Nearest Alignment Space Termination).

We have established a web service for creating NAST MSAs from user data which is intended to facilitate comparison of 16S rRNA gene sequences from bacteria and archaea. This service has performed well in aligning thousands of user-supplied sequences into a single MSA while optionally intercalating genes from reference organisms. It was created to handle large datasets produced in exploratory microbial ecology, medical microbiology and metagenomics. One unique feature is that NAST can output the MSA in a standard, consistent format of 7682 characters per sequence so that similar loci are located at dependable positions from batch to batch (necessary for large, ongoing projects). An optional pre-processing of data based on chromatogram quality scores is allowed and post-processing options include distance matrix creation and taxonomic classification using five independent curators' nomenclature.

We have received considerable positive feedback from diverse users who have collectively submitted over 1600 jobs.


    NAST ALGORITHM
 TOP
 ABSTRACT
 INTRODUCTION
 NAST ALGORITHM
 NAST WEB SERVER
 APPLICATIONS
 FUTURE DEVELOPMENT
 CONCLUSIONS
 REFERENCES
 
The first version of the alignment compression algorithm, NAST (11), was designed to produce uniform MSAs of 16S rRNA genes obtained from public repositories. The current version contains improvements and has been made available via a web-interface for alignment of pre-published collections.

A current set of >80 000 16S rDNA genes gleaned from GenBank is maintained in aligned format on the Greengenes server (12). From this collection, a smaller, high-quality reference group was sought. The size of this group is a consequence of balancing both the need to encompass the full set's diversity and the application's requirement of rapid searching. Similarity comparison for one sequence against 105 can occur in a reasonable time period for an online tool. After clustering the full set with a sliding scale of similarity, it was found that a 96% identity threshold produced a cluster count of the ideal magnitude. From each cluster, one sequence record was chosen by favoring long gene sequences with low nucleotide ambiguity from published microbial isolates according to the default weighting in the de-replication tool (http://greengenes.lbl.gov/Derep). The resulting ‘Core Set’ of 10 270 aligned records were non-chimeric (13) and >1250 ntin length. The terminal ends of incomplete sequences (with lengths between 1250 and 1500 nt) were imputed from known sequence data of near-neighbors. The projected termini of the sequences in the Core Set are only used as a reference for the NAST alignment tool and are not entered into the Greengenes database. The Core Set is considered the profile MSA and consists of the template sequences aligned into 7682 columns.

An unaligned sequence is termed the ‘candidate’ and is matched to templates by comparison of 7mers in common (Figure 1). A BLAST (14) alignment with parameter ‘q = –1' is performed to pair bases from candidate to template. BioPerl (15) parsers are used for extracting detail from the BLAST report. To eliminate extra-16S rRNA sequence, the candidate sequence is trimmed to that which is bound by the beginning and end points of a single BLAST alignment span. Although this process will fractionate chimeras where the candidate is composed of sequence fragments from vastly dissimilar organisms, NAST should not be regarded as a substitution for dedicated chimera detection software. Finally, the trimmed candidate is reverse complemented whenever opposite strands from the subject and query are paired.


Figure 1
View larger version (24K):
[in this window]
[in a new window]
 
Figure 1 Locating a NAST alignment template for a user-supplied candidate sequence. Candidate sequence in green is matched to a near-neighbor aligned template in Greengenes' Core Set (grey) by tallying 7mers in common. The alignment ‘template’ is BLAST aligned to the candidate parameter q = –1 (favors long match). The candidate is then trimmed of flanking sequence data such as tRNA, intergenic spacer regions, vector sequence, 23S rDNA and sequence outside of the high-scoring pair (HSP) boundaries. If the HSP pairs opposite strands, then the candidate is reverse complemented.

 
As a result of the pairwise alignment performed by BLAST, new alignment gaps (hyphens) are introduced between the bases of the template whenever the candidate contains additional internal bases (insertions) compared with the template (Figure 2A, B). Any pairwise alignment algorithm must do this to compensate for nucleotides not shared by both sequences. This expansion, when intercalated with the original template spacing, results in candidates occupying more columns (characters) than the original template format (Figure 2C). Since a consistent column count may be an option chosen by the user, the candidate-template alignment is compressed back to 7682 characters with NAST. After insertion bases are identified (Figure 2C), a bi-directional search for the nearest alignment space (hyphen) relative to the insertion results in character deletion of the proximal place holders. Ultimately, local misalignments, spanning from the insertion base to the deleted alignment space, are permitted to preserve the global MSA format.


Figure 2
View larger version (33K):
[in this window]
[in a new window]
 
Figure 2 Example of NAST compression of a BLAST pairwise alignment using a 38 character aligned template. Template and candidate is extended to 40 characters after (A) BLAST gap insertion and (B) retention of original template spacing. (C) Nucleotide insertions in the candidate relative to the template which force additional characters to be added in the template are identified at positions {alpha} and ß. (D) A bi-directional search for the nearest alignment space (hyphen) relative to the insertion terminates at the positions indicated by the black arrows. The leftward search from the {alpha} position was shorter in distance compared with the rightward, thus the space to the left of ‘GT’ was removed. (E) The search from the ß position encountered the alignment edge on the right, thus the position to the left of ‘AC’ was removed. (F) Lastly, the two template-extending spaces are deleted from the template. Notice that sequence data are not added to or overwritten in the candidate. The NAST removal of two characters from both sequences allowed local misalignments (underlined) while preserving the 38 character format of the global MSA.

 

    NAST WEB SERVER
 TOP
 ABSTRACT
 INTRODUCTION
 NAST ALGORITHM
 NAST WEB SERVER
 APPLICATIONS
 FUTURE DEVELOPMENT
 CONCLUSIONS
 REFERENCES
 
Submitting Jobs
Instructions for using the NAST aligner as well as other Greengenes tools are available in the tutorial at http://greengenes.lbl.gov/Download/Tutorial/. In brief, batches of sequences are submitted in FASTA format. Users can select the minimum candidate-template similarity in terms of length and percent identity to constrain the NAST alignment from attempting to align records that may not be 16S rRNA gene sequences. Users may also opt to have near-neighbor sequences from the Core Set added to the resulting MSA. Formatting options allow either all columns to be returned (each returned sequence will be exactly 7682 characters long) or removal of common alignment gap characters (returned sequences will contain an equal number of characters ≤7682, all columns containing only place holders will be removed). Lastly, users choose the output format from a list that includes FASTA, ClustalW, MEGA (16), PHYLIP (17) and others.

Result Files
Two files are returned to the user by Email. The MSA is delivered as a compressed document and a summary table is sent as a tab-delimitated text file (Table 1). The summary describes the fate of each sequence in the submitted batch. If a sequence diverges from the Core Set beyond the user's thresholds, then an informative error message is reported. Otherwise, the following information is returned: candidate's sequence length as submitted, a Greengenes sequence identifier for the template, the longest nucleotide insertion relative to template and the post-NAST nucleotide count. Comparing the submitted length with the post-NAST length can alert the user of unexpected sequence truncation. Minor truncation of one to five bases occurs when terminal bases cannot be accurately aligned. Large truncations indicate that either non-gene data were in the record or that BLAST found matches distributed to multiple Core Set sequences, possibly suggesting chimeric content. Long insertions are reported for identification of sequences divergent from the Core Set.


View this table:
[in this window]
[in a new window]
 
Table 1 Example summary of NAST output describing the fate of each sequence in the submitted batch. This report is delivered with the MSA

 
Performance
With the current Greengenes hardware configuration using Intel Xeon 2.4GHz processors, NAST is able to align ~10 16S rRNA gene sequences per minute. Since each sequence in a batch does not require comparisons with all other batched sequences, the method scales linearly. Pre-release users from five research institutions verified that the NAST web server reliably returned aligned sequences in a timely manner, but cautioned that returned alignment files can be large and may not be accepted by some Email servers. In all cases, breaking the batch into sub-sets overcame the constraint.

Suggested strategy for microbial community assessment using NAST
Greengenes supplies not only an aligned 16S rRNA gene reference database but also maintains a suite of sequence analysis tools (Figure 3). In the typical scenario, a researcher obtains a complex pool of 16S rRNA genes from a variety of bacterial genomes present in an environmental or medical sample. The DNA is serially sampled by cloning and sequencing. The raw sequencing reads can be trimmed of low quality terminal fragments using the ‘Trim’ tool following phred (18) chromatogram scoring. The NAST tool is then used to create the MSA and maintain the 7682-character format. Once aligned, the entire batch can be classified by Greengenes using taxonomic nomenclature proposed by independent curators [NCBI, RDP/Bergey's (19) Ludwig (20) Hugenholtz (21) and Pace (22)]. Since more than one estimation of phylogenetic descent is returned, the user can implement a balanced approach for node nomenclature when generating project-specific dendrograms. In addition, Greengenes is able to calculate a distance matrix using PHYLIP's DNADIST providing a suitable input format for construction of collector's and rarefaction curves using DOTUR (23) while also permitting de novo tree plotting. Combining pertinent public records into a project can be accomplished automatically at the NAST step or advanced users can export individual records or the entire Greengenes database in multiple popular formats including ARB (20), MEGA and FASTA.


Figure 3
View larger version (26K):
[in this window]
[in a new window]
 
Figure 3 Greengenes pre-processing and post-processing tools for use with the NAST aligner. ‘Trim’ can be used to remove poor quality DNA data before alignment. ‘Classify’ and ‘Distance’ receive NAST MSAs as input. ‘Export’ and ‘Download’ allow advanced users to append their MSA with select sequences from the public repositories.

 

    APPLICATIONS
 TOP
 ABSTRACT
 INTRODUCTION
 NAST ALGORITHM
 NAST WEB SERVER
 APPLICATIONS
 FUTURE DEVELOPMENT
 CONCLUSIONS
 REFERENCES
 
The NAST web service has already been used to process data for a recent publication and several manuscripts currently in preparation or review. Exploring the diversity of bacterial symbionts found in vertebrate digestive tracts has required massive MSAs. The NAST web server was used to align and compare 16S rRNA genes recovered from the gut of fish, humans and mice to those found in the open ocean (24). An extended investigation covering the intestinal microbiota of 30 mammalian species aligned ~20 000 16S rRNA gene sequences (R. E. Ley, P. Turnbaugh and J. I. Gordon, unpublished data). The NAST approach was also utilized to align and subsequently categorize 4023 feces-derived 16S rDNA sequences obtained as part of a longitudinal study of the human neonatal GI tract (C. Palmer, P. O. Brown, E. M. Bik and D. A. Relman, unpublished data).

Environmental microbial sampling has produced sizable sets of 16S rRNA gene sequences as well. An 1800 sequence set from uranium contaminated soil, deep sub-surface water, and urban aerosols was NAST aligned, allowing analysis of sample diversity as well as evaluation of parallel 16S rRNA microarray results (T. Z. DeSantis, E. L. Brodie, Y. M. Piceno and G. L. Andersen, manuscript in review). To uncover novel 16S rRNA types in an environmental sample, short ‘miniprimers’ (~10 nt) were tested to expand the scope of recoverable by PCR sequences and although the amplicons were unique from existing Greengenes database entries, NAST alignment was successful, facilitating import into ARB for phylogenetic categorization (T. A. Isenbarger, M. Finney, J. Handelsman and G. Ruvkun, in preparation). A comprehensive annotation of 18 000 GenBank 16S rRNA genes from natural environments has also been made possible with NAST (C. Lozupone and R. Knight, unpublished data).


    FUTURE DEVELOPMENT
 TOP
 ABSTRACT
 INTRODUCTION
 NAST ALGORITHM
 NAST WEB SERVER
 APPLICATIONS
 FUTURE DEVELOPMENT
 CONCLUSIONS
 REFERENCES
 
Since the NAST approach relies on a well-aligned profile of diverse sequences we endeavor to make manual improvements to the Core Set as needed. We are aware that research groups who rely upon accurate 16S rRNA gene comparisons are regularly curating individual sequence alignments. Since these efforts can benefit other users, we conceive a conduit for transmitting the improvements to interested parties. When manifest, users will be able to not only suggest gap-placement alterations in the Core Set or other publicly distributed sequences but also to recommend sequence additions to the Core Set as new levels of microbial diversity are unearthed.

To extend the applicability of the NAST aligner for use with other standard 16S rRNA gene alignment formats we plan to add options for building MSAs with Ludwig or RDP column positioning. Also, a stand-alone version of the NAST software is in development. This will allow installation of NAST on a laboratory computer/server for more rapid analysis and customization and will enable the local curation of MSAs in Greengenes' or other formats. In theory, NAST's utility is not limited to 16S rRNA data. Sizeable MSAs of other genes or proteins, such as those encoding 18S rRNA, rpoB or recA can be built and maintained. Sequences can be merged into any existing MSA profile providing the trade-off between fixed total alignment string length and the extent of local misalignment is acceptable.


    CONCLUSIONS
 TOP
 ABSTRACT
 INTRODUCTION
 NAST ALGORITHM
 NAST WEB SERVER
 APPLICATIONS
 FUTURE DEVELOPMENT
 CONCLUSIONS
 REFERENCES
 
The NAST web server is available for creating MSAs of small and large 16S rRNA gene sequencing projects. NAST allows retention of fixed MSA column counts regardless of the quantity of records added to a profile alignment. This permits ongoing MSA curation, and supports collaborative efforts in comparative genomics. NAST and supporting tools at the Greengenes website enable microbiologists to rapidly compare sampled sequences to publicly available reference sequences as well as to each other.


    ACKNOWLEDGEMENTS
 
The computational infrastructure was provided in part by the Virtual Institute for Microbial Stress and Survival (http://VIMSS.lbl.gov) supported by the US Department of Energy, Office of Science, Office of Biological and Environmental Research, Genomics:GTL Program and the Natural and Accelerated Bioremediation Research Program through contract DE-AC02-05CH11231 between Lawrence Berkeley National Laboratory and the US Department of Energy. Web application development was funded in part by the Department of Homeland Security under grant number HSSCHQ04X00037. Funding to pay the Open Access publication charges for this article was provided by the U.S. Department of Homeland Security.

Conflict of interest statement. None declared.


    Footnotes
 
*Correspondence may also be addressed to G. L. Andersen. Tel: +1 510 495 2795; Fax: +1 510 486 7152; Email: GLAndersen{at}lbl.gov


    REFERENCES
 TOP
 ABSTRACT
 INTRODUCTION
 NAST ALGORITHM
 NAST WEB SERVER
 APPLICATIONS
 FUTURE DEVELOPMENT
 CONCLUSIONS
 REFERENCES
 

  1. Fox, G.E., Stackebrandt, E., Hespell, R.B., Gibson, J., Maniloff, J., Dyer, T.A., Wolfe, R.S., Balch, W.E., Tanner, R.S., Magrum, L.J., et al. (1980) The phylogeny of prokaryotes Science, 209, 457–463[Free Full Text] .

  2. Woese, C.R., Fox, G.E., Zablen, L., Uchida, T., Bonen, L., Pechman, K., Lewis, B.J., Stahl, D. (1975) Conservation of primary structure in 16S ribosomal RNA Nature, 254, 83–86[CrossRef][Medline] .

  3. Kong, Y., Ong, S.L., Ng, W.J., Liu, W.T. (2002) Diversity and distribution of a deeply branched novel proteobacterial group found in anaerobic–aerobic activated sludge processes Environ. Microbiol, . 4, 753–757[CrossRef][Medline] .

  4. Hughes, J.B., Hellmann, J.J., Ricketts, T.H., Bohannan, B.J. (2001) Counting the uncountable: statistical approaches to estimating microbial diversity Appl. Environ. Microbiol, . 67, 4399–4406[Free Full Text] .

  5. Eckburg, P.B., Bik, E.M., Bernstein, C.N., Purdom, E., Dethlefsen, L., Sargent, M., Gill, S.R., Nelson, K.E., Relman, D.A. (2005) Diversity of the human intestinal microbial flora Science, 308, 1635–1638[Abstract/Free Full Text] .

  6. Ley, R.E., Backhed, F., Turnbaugh, P., Lozupone, C.A., Knight, R.D., Gordon, J.I. (2005) Obesity alters gut microbial ecology Proc. Natl Acad. Sci. USA, 102, 11070–11075[Abstract/Free Full Text] .

  7. Radosevich, J.L., Wilson, W.J., Shinn, J.H., DeSantis, T.Z., Andersen, G.L. (2002) Development of a high-volume aerosol collection system for the identification of air-borne micro-organisms Lett. Appl. Microbiol, . 34, 162–167[CrossRef][Web of Science][Medline] .

  8. Tringe, S.G., von Mering, C., Kobayashi, A., Salamov, A.A., Chen, K., Chang, H.W., Podar, M., Short, J.M., Mathur, E.J., Detter, J.C., et al. (2005) Comparative metagenomics of microbial communities Science, 308, 554–557[Abstract/Free Full Text] .

  9. Griffiths, R.I., Whiteley, A.S., O'Donnell, A.G., Bailey, M.J. (2003) Physiological and community responses of established grassland bacterial populations to water stress Appl. Environ. Microbiol, . 69, 6961–6968[Abstract/Free Full Text] .

  10. Thompson, J.D., Higgins, D.G., Gibson, T.J. (1994) CLUSTAL W: improving the sensitivity of progressive multiple sequence alignment through sequence weighting, position-specific gap penalties and weight matrix choice Nucleic Acids Res, . 22, 4673–4680[Abstract/Free Full Text] .

  11. DeSantis, T.Z., Dubosarskiy, I., Murray, S.R., Andersen, G.L. (2003) Comprehensive aligned sequence construction for automated design of effective probes (CASCADE-P) using 16S rDNA Bioinformatics, 19, 1461–1468[Abstract/Free Full Text] .

  12. DeSantis, T.Z., Hugenholtz, P., Larsen, N., Rojas, M., Brodie, E.L., Keller, K., Huber, T., Dalevi, D., Hu, P., Andersen, G.L. (2006) Greengenes: Chimera-checked 16S rRNA gene database and workbench compatible with ARB Appl. Environ. Microbiol, . accepted .

  13. Huber, T., Faulkner, G., Hugenholtz, P. (2004) Bellerophon: a program to detect chimeric sequences in multiple sequence alignments Bioinformatics, 20, 2317–2319[Abstract/Free Full Text] .

  14. Altschul, S.F., Gish, W., Miller, W., Myers, E.W., Lipman, D.J. (1990) Basic local alignment search tool J. Mol. Biol, . 215, 403–410[CrossRef][Web of Science][Medline] .

  15. Stajich, J.E., Block, D., Boulez, K., Brenner, S.E., Chervitz, S.A., Dagdigian, C., Fuellen, G., Gilbert, J.G., Korf, I., Lapp, H., et al. (2002) The Bioperl toolkit: Perl modules for the life sciences Genome Res, . 12, 1611–1618[Abstract/Free Full Text] .

  16. Kumar, S., Tamura, K., Nei, M. (2004) MEGA3: Integrated software for Molecular Evolutionary Genetics Analysis and sequence alignment Brief Bioinform, . 5, 150–163[Abstract/Free Full Text] .

  17. Felsenstein, J. (1989) PHYLIP—phylogeny inference package (Version 3.65) Cladistics, 5, 164–166 .

  18. Ewing, B. and Green, P. (1998) Base-calling of automated sequencer traces using phred. II. Error probabilities Genome Res, . 8, 186–194[Abstract/Free Full Text] .

  19. Cole, J.R., Chai, B., Farris, R.J., Wang, Q., Kulam, S.A., McGarrell, D.M., Garrity, G.M., Tiedje, J.M. (2005) The Ribosomal Database Project (RDP-II): sequences and tools for high-throughput rRNA analysis Nucleic Acids Res, . 33, D294–D296[Abstract/Free Full Text] .

  20. Ludwig, W., Strunk, O., Westram, R., Richter, L., Meier, H., Yadhukumar, H., Buchner, A., Lai, T., Steppi, S., Jobb, G., et al. (2004) ARB: a software environment for sequence data Nucleic Acids Res, . 32, 1363–1371[Abstract/Free Full Text] .

  21. Hugenholtz, P. (2002) Exploring prokaryotic diversity in the genomic era Genome Biol, . 3, 1–8[Medline] .

  22. Pace, N.R. (1997) A molecular view of microbial diversity and the biosphere Science, 276, 734–740[Abstract/Free Full Text] .

  23. Schloss, P.D. and Handelsman, J. (2005) Introducing DOTUR, a computer program for defining operational taxonomic units and estimating species richness Appl. Environ. Microbiol, . 71, 1501–1506[Abstract/Free Full Text] .

  24. Ley, R.E., Peterson, D.A., Gordon, J.I. (2006) An extended view of ourselves: ecological and evolutionary forces that shape microbial diversity and genome content in the human intestine Cell, 124, 837–848[CrossRef][Web of Science][Medline] .


Add to CiteULike CiteULike   Add to Connotea Connotea   Add to Del.icio.us Del.icio.us    What's this?


This article has been cited by other articles:


Home page
Genome ResHome page
M. Hamady and R. Knight
Microbial community profiling for human microbiome projects: Tools, techniques, and challenges
Genome Res., July 1, 2009; 19(7): 1141 - 1152.
[Abstract] [Full Text] [PDF]


Home page
Appl. Environ. Microbiol.Home page
J. P. Davis, N. H. Youssef, and M. S. Elshahed
Assessment of the Diversity, Abundance, and Ecological Distribution of Members of Candidate Division SR1 Reveals a High Level of Phylogenetic Diversity but Limited Morphotypic Diversity
Appl. Envir. Microbiol., June 15, 2009; 75(12): 4139 - 4148.
[Abstract] [Full Text] [PDF]


Home page
Nucleic Acids ResHome page
Y. Sun, Y. Cai, L. Liu, F. Yu, M. L. Farrell, W. McKendree, and W. Farmerie
ESPRIT: estimating species richness using large collections of 16S rRNA pyrosequences
Nucleic Acids Res., June 1, 2009; 37(10): e76 - e76.
[Abstract] [Full Text] [PDF]


Home page
Appl. Environ. Microbiol.Home page
T. J. Hamp, W. J. Jones, and A. A. Fodor
Effects of Experimental Choices and Analysis Noise on Surveys of the "Rare Biosphere"
Appl. Envir. Microbiol., May 15, 2009; 75(10): 3263 - 3270.
[Abstract] [Full Text] [PDF]


Home page
Appl. Environ. Microbiol.Home page
C. E. Robertson, J. R. Spear, J. K. Harris, and N. R. Pace
Diversity and Stratification of Archaea in a Hypersaline Microbial Mat
Appl. Envir. Microbiol., April 1, 2009; 75(7): 1801 - 1810.
[Abstract] [Full Text] [PDF]


Home page
Appl. Environ. Microbiol.Home page
E. J. Biers, S. Sun, and E. C. Howard
Prokaryotic Genomes and Diversity in Surface Ocean Waters: Interrogating the Global Ocean Sampling Metagenome
Appl. Envir. Microbiol., April 1, 2009; 75(7): 2221 - 2229.
[Abstract] [Full Text] [PDF]


Home page
Proc. Natl. Acad. Sci. USAHome page
H. Zhang, J. K. DiBaise, A. Zuccolo, D. Kudrna, M. Braidotti, Y. Yu, P. Parameswaran, M. D. Crowell, R. Wing, B. E. Rittmann, et al.
Human gut microbiota in obesity and after gastric bypass
PNAS, February 17, 2009; 106(7): 2365 - 2370.
[Abstract] [Full Text] [PDF]


Home page
Appl. Environ. Microbiol.Home page
E. K. Costello, S. R. P. Halloy, S. C. Reed, P. Sowell, and S. K. Schmidt
Fumarole-Supported Islands of Biodiversity within a Hyperarid, High-Elevation Landscape on Socompa Volcano, Puna de Atacama, Andes
Appl. Envir. Microbiol., February 1, 2009; 75(3): 735 - 747.
[Abstract] [Full Text] [PDF]


Home page
Proc. Natl. Acad. Sci. USAHome page
N. Fierer, M. Hamady, C. L. Lauber, and R. Knight
The influence of sex, handedness, and washing on the diversity of hand surface bacteria
PNAS, November 18, 2008; 105(46): 17994 - 17999.
[Abstract] [Full Text] [PDF]


Home page
J. Bacteriol.Home page
G. D. Wiens, D. D. Rockey, Z. Wu, J. Chang, R. Levy, S. Crane, D. S. Chen, G. R. Capri, J. R. Burnett, P. S. Sudheesh, et al.
Genome Sequence of the Fish Pathogen Renibacterium salmoninarum Suggests Reductive Evolution away from an Environmental Arthrobacter Ancestor
J. Bacteriol., November 1, 2008; 190(21): 6970 - 6982.
[Abstract] [Full Text] [PDF]


Home page
Proc. Natl. Acad. Sci. USAHome page
E. S. Balakirev, V. A. Pavlyuchkov, and F. J. Ayala
DNA variation and symbiotic associations in phenotypically diverse sea urchin Strongylocentrotus intermedius
PNAS, October 21, 2008; 105(42): 16218 - 16223.
[Abstract] [Full Text] [PDF]


Home page
Proc. Natl. Acad. Sci. USAHome page
S. A. Crowe, C. Jones, S. Katsev, C. Magen, A. H. O'Neill, A. Sturm, D. E. Canfield, G. D. Haffner, A. Mucci, B. Sundby, et al.
Photoferrotrophs thrive in an Archean Ocean analogue
PNAS, October 14, 2008; 105(41): 15938 - 15943.
[Abstract] [Full Text] [PDF]


Home page
Appl. Environ. Microbiol.Home page
F. Godoy-Vitorino, R. E. Ley, Z. Gao, Z. Pei, H. Ortiz-Zuazaga, L. R. Pericchi, M. A. Garcia-Amado, F. Michelangeli, M. J. Blaser, J. I. Gordon, et al.
Bacterial Community in the Crop of the Hoatzin, a Neotropical Folivorous Flying Bird
Appl. Envir. Microbiol., October 1, 2008; 74(19): 5905 - 5912.
[Abstract] [Full Text] [PDF]


Home page
Nucleic Acids ResHome page
Z. Liu, T. Z. DeSantis, G. L. Andersen, and R. Knight
Accurate taxonomy assignments from 16S rRNA sequences produced by highly parallel pyrosequencers
Nucleic Acids Res., October 1, 2008; 36(18): e120 - e120.
[Abstract] [Full Text] [PDF]


Home page
Appl. Environ. Microbiol.Home page
C. R. Jackson and A. Q. Weeks
Influence of Particle Size on Bacterial Community Structure in Aquatic Sediments as Revealed by 16S rRNA Gene Sequence Analysis
Appl. Envir. Microbiol., August 15, 2008; 74(16): 5237 - 5240.
[Abstract] [Full Text] [PDF]


Home page
Appl. Environ. Microbiol.Home page
J. R. Hall, K. R. Mitchell, O. Jackson-Weaver, A. S. Kooser, B. R. Cron, L. J. Crossey, and C. D. Takacs-Vesbach
Molecular Characterization of the Diversity and Distribution of a Thermal Spring Microbial Community by Using rRNA and Metabolic Genes
Appl. Envir. Microbiol., August 1, 2008; 74(15): 4910 - 4922.
[Abstract] [Full Text] [PDF]


Home page
Genome ResHome page
E. A. Grice, H. H. Kong, G. Renaud, A. C. Young, NISC Comparative Sequencing Program, G. G. Bouffard, R. W. Blakesley, T. G. Wolfsberg, M. L. Turner, and J. A. Segre
A diversity profile of the human skin microbiota
Genome Res., July 1, 2008; 18(7): 1043 - 1050.
[Abstract] [Full Text] [PDF]


Home page
Appl. Environ. Microbiol.Home page
M. O. Mendez, J. W. Neilson, and R. M. Maier
Characterization of a Bacterial Community in an Abandoned Semiarid Lead-Zinc Mine Tailing Site
Appl. Envir. Microbiol., June 15, 2008; 74(12): 3899 - 3907.
[Abstract] [Full Text] [PDF]


Home page
Appl. Environ. Microbiol.Home page
R. T. Jones, K. F. McCormick, and A. P. Martin
Bacterial Communities of Bartonella-Positive Fleas: Diversity and Community Assembly Patterns
Appl. Envir. Microbiol., March 1, 2008; 74(5): 1667 - 1670.
[Abstract] [Full Text] [PDF]


Home page
Appl. Environ. Microbiol.Home page
T. A. Isenbarger, M. Finney, C. Rios-Velazquez, J. Handelsman, and G. Ruvkun
Miniprimer PCR, a New Lens for Viewing the Microbial World
Appl. Envir. Microbiol., February 1, 2008; 74(3): 840 - 849.
[Abstract] [Full Text] [PDF]


Home page
Appl. Environ. Microbiol.Home page
N. Fierer, Z. Liu, M. Rodriguez-Hernandez, R. Knight, M. Henn, and M. T. Hernandez
Short-Term Temporal Variability in Airborne Bacterial and Fungal Populations
Appl. Envir. Microbiol., January 1, 2008; 74(1): 200 - 207.
[Abstract] [Full Text] [PDF]


Home page
Proc. Natl. Acad. Sci. USAHome page
J. K. Harris, M. A. De Groote, S. D. Sagel, E. T. Zemanick, R. Kapsner, C. Penvari, H. Kaess, R. R. Deterding, F. J. Accurso, and N. R. Pace
Molecular identification of bacteria in bronchoalveolar lavage fluid from children with cystic fibrosis
PNAS, December 18, 2007; 104(51): 20529 - 20533.
[Abstract] [Full Text] [PDF]


Home page
Appl. Environ. Microbiol.Home page
J. R. Spear, H. A. Barton, C. E. Robertson, C. A. Francis, and N. R. Pace
Microbial Community Biofabrics in a Geothermal Mine Adit
Appl. Envir. Microbiol., October 1, 2007; 73(19): 6172 - 6180.
[Abstract] [Full Text] [PDF]


Home page
Appl. Environ. Microbiol.Home page
D. M. Cook, E. DeCrescenzo Henriksen, R. Upchurch, and J. B. D. Peterson
Isolation of Polymer-Degrading Bacteria and Characterization of the Hindgut Bacterial Community from the Detritus-Feeding Larvae of Tipula abdominalis (Diptera: Tipulidae)
Appl. Envir. Microbiol., September 1, 2007; 73(17): 5683 - 5686.
[Abstract] [Full Text] [PDF]


Home page
Appl. Environ. Microbiol.Home page
J.-S. Kim and D. E. Crowley
Microbial Diversity in Natural Asphalts of the Rancho La Brea Tar Pits
Appl. Envir. Microbiol., July 15, 2007; 73(14): 4579 - 4591.
[Abstract] [Full Text] [PDF]


Home page
J Wildl DisHome page
A. M. Bojesen, J. Larsen, A. G. Pedersen, T. Morner, R. Mattson, and M. Bisgaard
IDENTIFICATION OF A NOVEL MANNHEIMIA GRANULOMATIS LINEAGE FROM LESIONS IN ROE DEER (CAPREOLUS CAPREOLUS)
J. Wildl. Dis., July 1, 2007; 43(3): 345 - 352.
[Abstract] [Full Text] [PDF]


Home page
J. Clin. Microbiol.Home page
J. L. Flanagan, E. L. Brodie, L. Weng, S. V. Lynch, O. Garcia, R. Brown, P. Hugenholtz, T. Z. DeSantis, G. L. Andersen, J. P. Wiener-Kronish, et al.
Loss of Bacterial Diversity during Antibiotic Treatment of Intubated Patients Colonized with Pseudomonas aeruginosa
J. Clin. Microbiol., June 1, 2007; 45(6): 1954 - 1962.
[Abstract] [Full Text] [PDF]


Home page
Appl. Environ. Microbiol.Home page
J. J. Walker and N. R. Pace
Phylogenetic Composition of Rocky Mountain Endolithic Microbial Ecosystems
Appl. Envir. Microbiol., June 1, 2007; 73(11): 3497 - 3504.
[Abstract] [Full Text] [PDF]


Home page
Appl. Environ. Microbiol.Home page
S. K. Fagervold, H. D. May, and K. R. Sowers
Microbial Reductive Dechlorination of Aroclor 1260 in Baltimore Harbor Sediment Microcosms Is Catalyzed by Three Phylotypes within the Phylum Chloroflexi
Appl. Envir. Microbiol., May 1, 2007; 73(9): 3009 - 3018.
[Abstract] [Full Text] [PDF]


Home page
Proc. Natl. Acad. Sci. USAHome page
Z. Gao, C.-h. Tseng, Z. Pei, and M. J. Blaser
Molecular analysis of human forearm superficial skin bacterial biota
PNAS, February 20, 2007; 104(8): 2927 - 2932.
[Abstract] [Full Text] [PDF]


Home page
Proc. Natl. Acad. Sci. USAHome page
E. L. Brodie, T. Z. DeSantis, J. P. M. Parker, I. X. Zubietta, Y. M. Piceno, and G. L. Andersen
Urban aerosols harbor diverse and dynamic bacterial populations
PNAS, January 2, 2007; 104(1): 299 - 304.
[Abstract] [Full Text] [PDF]


This Article
Right arrow Abstract Freely available
Right arrow Print PDF (501K) Freely available
Right arrow Screen PDF (401K) Freely available
Right arrow Alert me when this article is cited
Right arrow Alert me if a correction is posted
Services
Right arrow Email this article to a friend
Right arrow Similar articles in this journal
Right arrow Similar articles in PubMed
Right arrow Alert me to new issues of the journal
Right arrow Add to My Personal Archive
Right arrow Download to citation manager
Right arrow Commercial Re-use Guidelines
for Open Access NAR Content
Google Scholar
Right arrow Articles by DeSantis, T. Z.
Right arrow Articles by Andersen, G. L.
Right arrow Search for Related Content
PubMed
Right arrow PubMed Citation
Right arrow Articles by DeSantis, T. Z., Jr
Right arrow Articles by Andersen, G. L.
Social Bookmarking
 Add to CiteULike   Add to Connotea   Add to Del.icio.us  
What's this?