Skip Navigation

This Article
Right arrow Abstract Freely available
Right arrow Print PDF (370K) Freely available
Right arrow Alert me when this article is cited
Right arrow Alert me if a correction is posted
Services
Right arrow Email this article to a friend
Right arrow Similar articles in this journal
Right arrow Similar articles in PubMed
Right arrow Alert me to new issues of the journal
Right arrow Add to My Personal Archive
Right arrow Download to citation manager
Right arrow Search for citing articles in:
ISI Web of Science (93)
Right arrowRequest Permissions
Right arrow Commercial Re-use Guidelines
for Open Access NAR Content
Google Scholar
Right arrow Articles by Hershkovitz, M.
Right arrow Articles by Zimmer, E.
Right arrow Search for Related Content
PubMed
Right arrow PubMed Citation
Right arrow Articles by Hershkovitz, M.
Right arrow Articles by Zimmer, E.
Social Bookmarking
 Add to CiteULike   Add to Connotea   Add to Del.icio.us  
What's this?

© 1996 Oxford University Press 2857-2868

Footnote

Conservation patterns in angiosperm rDNA ITS2 sequences

Conservation patterns in angiosperm rDNA ITS2 sequences Mark A. Hershkovitz* and Elizabeth A. Zimmer

Laboratory of Molecular Systematics, MRC534, Smithsonian Institution, Washington , DC 20560, USA

Received December 18, 1995; Revised and Accepted June 18, 1996

ABSTRACT

The two internal transcribed spacers (ITS1 and ITS2) of nuclear ribosomal DNA have become commonly exploited sources of informative variation for interspecific-/intergeneric-level phylogenetic analyses among angiosperms and other eukaryotes. We present an alignment in which one-third to one-half of the ITS2 sequence is alignable above the family level in angiosperms and a phenetic analysis showing that ITS2 contains information sufficient to diagnose lineages at several hierarchical levels. Base compositional analysis shows that angiosperm ITS2 is inherently GC-rich, and that the proportion of T is much more variable than that for other bases. We propose a general model of angiosperm ITS2 secondary structure that shows common pairing relationships for most of the conserved sequence tracts. Variations in our secondary structure predictions for sequences from different taxa indicate that compensatory mutation is not limited to paired positions.

INTRODUCTION

The two internal transcribed spacers (ITS1 and ITS2) of nuclear ribosomal DNA (rDNA) have emerged from the status of `alternative' phylogenetic marker ( 1 ) to become one of the more widely applied DNA sequences in angiosperm molecular systematics (reviewed in 2 ). At this writing, GenBank contains ITS sequences from ~800 angiosperm species, a figure likely to increase dramatically over the next few years. Likewise, ITS is increasingly used in lower-level phylogenetics in other organisms, e.g. fungi ( 3 ), algae ( 4 , 5 ) and insects ( 6 - 9 ).

These spacer sequences have been reputed to contain inadequate signal for phylogenetic analyses at the interfamilial and deeper levels ( 1 ). Notwithstanding, sequence motifs conserved throughout angiosperms have been recognized ( 10 ), and Hershkovitz and Lewis ( 11 ) have demonstrated that ITS2 has retained sufficient phylogenetic signal to discriminate among sequences from green algae, fungi and seed plants (conifers and angiosperms). The purpose of the present paper is to document ITS2 sequence conservation among angiosperms, and to consider the implications for phylogenetic inference, secondary structure determination, functional constraints and molecular evolution. ITS1 will be considered in a separate paper (Hershkovitz and Zimmer, in preparation) because it is structurally and functionally unrelated to ITS2 and exhibits a more complex pattern of sequence conservation.

MATERIALS AND METHODS

Taxon sampling

In order to evaluate ITS2 sequence/structural conservation and the degree of phylogenetic signal at familial levels and deeper, we compared sequences from 75 species representing available phylogenetic diversity among angiosperms and including multiple members of several well-represented orders and families (Table 1 ; 12 - 17 ). Sequences reported for Mimulus (Scrophulariaceae; 18 ) were examined but determined to be green algal in origin ( 11 ). Sequences from Arceuthobium (Viscaceae) were not included because, like other DNA sequences from parasitic plants, they were exceptionally divergent from those of angiosperms in general ( 19 ). The Arceuthobium sequences do not appear to be fungal or algal contaminants, however; when included in parsimony and distance analyses of eukaryotic 5.8S sequences (data not shown), they emerge as a highly divergent branch among the seed plants.

DNA extraction, amplification and sequencing

Sequences for members of the angiosperm order Caryophyllales were determined as part of an ongoing analysis of relationships within the order. The ITS regions were amplified from CTAB-extracted (reviewed in 20 ) genomic DNA using the primer pairs: (i) ITS4 ( 21 ) and ITS5 modified for plants (GGAAGGAGAAGTCGTAACAAGG); or (ii) 26A ( 22 ) and Nnc18S10 (bases 4-21 of the modified ITS5). Cloned or direct amplification products were sequenced in both directions using either manual or automated protocols. For the manual protocol, single-stranded DNA (both coding and non-coding) was derived via asymmetric amplification of 2 [mu]l crude double-stranded amplification product using the ITS4 or N18L18 (AAGTCGTAACAAGGTTTC) primer. 7-deaza-dGTP was substituted for dGTP in order to relax secondary structure in the single-stranded sequencing template. Asymmetric amplification products were purified according to the Gene Clean protocol (Bio 101, Inc.) and sequenced with the primers ITS3 ( 21 ), C58S ( 22 ), ITS4 and N18L18 following the Sequenase version 2.0 protocol (US Biochemicals) for the 7-deaza-dGTP reagent kit and 35 S-dATP labeling. Sequencing reactions were electrophoresed and the gels exposed to radiographic film according to standard protocols. Automated sequencing used the same primer set and followed the dye-terminator cycle-sequencing protocol for the ABI model 373A sequencer (Applied Biosystems, Inc.). Chromatograms were analyzed using Sequencher (Gene Codes, Inc.). For apparently polymorphic taxa, double-stranded ITS amplification product was ligated and cloned following the T/A cloning protocol (Invitrogen, Inc.) and double-stranded products amplified directly from recombinant colonies were sequenced as described above.

The ITS region of CTAB-extracted Ravenala madagascariensis genomic DNA was amplified with restriction-site-tagged primers CLOA ( Kpn I-26A) and CLOE ( Xba I-N18L18) and the purified product ligated into doubly-digested pBluescript II KS + phagemid vector (Stratagene, Inc.), transformed into Escherichia coli XL1B cells (Stratagene, Inc.) rendered heat-shock competent ( 23 ). Transformations were cultured on LB/X-gal/IPTG/ampicillin agar plates and white colonies cultured in LB/ampicillin broth ( 23 ). The cloned vector was purified following the RPM Rapid Pure miniprep protocol (Bio 101, Inc.), and the insert amplified and sequenced following the automated sequencing protocol described above.

Sequence alignment

A taxonomically representative subset of the ITS2 sequences was aligned manually with the aid of the GDE version 2.2 ( 24 ) multiple sequence editor. Regions of conserved versus variable sequence were approximated by eye. Similarities among all angiosperms or recognized major groups therein were considered as evidence for conservation.

Analysis

The objective of this analysis was to evaluate, on the basis of alignability, the degree to which assorted angiosperm ITS2 sequences clustered according to independently derived phylogenetic groupings. Alignability was estimated using the `guide-tree' feature in CLUSTAL W ( 25 ), which yields a neighbor-joining tree based on similarity scores for pairs of sequences aligned optimally for given gap-opening and gap-extension penalties. The tree is not a phylogram, but a phenogram. Nominally, the guide tree topology functions as a `phylogenetic template' for successive sequence alignment, but it also provides a means for comparing, without imposition of a fixed multiple alignment, all base positions in a set of highly divergent and length-variable (i.e. poorly alignable) sequences. The most similar sequences form terminal bifurcations in the tree, and the internal branches cluster mutually more similar sequences. The branch lengths, however, are distance-distorted relative to evolutionary divergence. For example, an inferred optimal multiple alignment of the sequence tracts AGAGAA, AGGAA and AAGAA might not maintain the optimal pairwise alignments. Each shorter sequence differs from the longer by a single insertion/deletion (indel), but not at the same position. This implies that the shorter sequences differ from each other by two indels rather than a single substitution. The guide tree branch lengths, however, reflect the optimal pairwise alignments, even if these are evolutionarily illogical. Nonetheless, the procedure will cluster sequences sharing alignable motifs relative to those that do not. A guide tree generated using randomly-generated sequences of comparable, and comparably variable, length would be starlike, lacking significant internal branching. We generated trees using the `slow, accurate' algorithm, and high (15-50) gap-opening penalties in order to detect conserved motifs combined with low (0-1) gap-extension penalties in order to extend unconserved sequence as necessary to accommodate conserved motif alignment.

CLUSTAL W does not output the pairwise percent similarity scores to a data file, but these can be recovered (rounded to two decimal places) by capturing the screen output as text, which can then be transferred to a word processing or spreadsheet program.

Base composition of aligned conserved versus variable regions was calculated using PAUP* ( 26 ).

Secondary structure prediction

Secondary structure was explored using the minimum free-energy (MFE) program MFOLD ( 27 ) in GCG version 8.1 ( 28 ). Unlike its predecessor (FOLD, see 27 ), MFOLD provides all suboptimal foldings up to a user-specified free-energy limit. Sequences including up to 50 bases of each of the flanking coding regions were folded at 25, 37 and 42oC. Structures within 5.7 kcal/mol of the optimal structure and differing from each other by at least three window units were recovered using the CONNECT option in PLOTFOLD ( 28 ), visualized using the Olsen format in LoopDLoop version 1.2a64 ( 29 ), and redrawn to emphasize similarities using CANVAS (Deneba Software, Inc.). From among the multiple foldings generated for each sequence, a set of shared structural features was inferred (see Results), and the sequences were resubmitted to MFOLD constraining each as necessary to include the complete set. All sequences were constrained, e.g. to force specified pairing relations between the 5'-end of the 26S and 3'-end of the 5.8S regions. Only simple, canonical base pairings, including G[middot]U, were considered.

RESULTS

Angiosperm ITS2 sequence alignment

Figure 1 identifies six regions of ITS2 (c1-c6) that are conserved in all sampled angiosperms or at least in major groups therein. The six alternating variable regions (v1-v6) are, except among obviously similar sequences, arbitrarily aligned to the longest sequence. Although c1 is variable across angiosperms, a 5' RYYR motif, especially ATCG, is common among higher dicots or `eudicots' ( 30 , 31 ). Canella and monocots differ from eudicots at the 5'-end, notably in sharing a Y at the first position, but all angiosperms share a C-rich 3'-end. The G (or A) at position 7 of some taxa, including Caryophyllales, might align better with the G at position 10 of other angiosperms. This would maintain Y-richness in the 3'-end of region c1. Region v1 becomes C- to G-rich moving 5' to 3', but is highly length variable (13-41 bases).


Figure 1 . Alignment of representative angiosperm ITS2 regions. Taxon acronyms and GenBank accession numbers are given in Table 1. The alignment is numbered beginning with the first position of aligned ITS2. Upper/lower case letters denote ITS2/flanking sequence as indicated in GenBank documentation. The limit of available sequence is denoted with an asterisk. Conserved regions are numbered from 5' to 3', c1-c6, and delimited by lines above the alignment. The variable regions, v1-v6, are arbitrarily aligned to the longest sequence, except among evidently similar sequences, e.g. those of portulacaceous Caryophyllales.

The entire sequence spanning c2-c4 might be regarded as a single conserved region, but we distinguished the most conserved motifs from the short, variable regions separating them. Fourteen of 102 positions in c2-c4 are universally shared across the sampled taxa. The 17- and 5-base motifs previously recognized as conserved by Liu and Shardl ( 10 ) occur within c3 and c4, respectively.

Table 1 Taxa sampled, their GenBank accession number or literature reference for ITS2 sequences, and the acronyms used in this paper a
DICOTYLEDONS

Asteraceae

AMBE_MOSC

Amberboa moschata

Cardueae

L35883

ARGY_CALI

Argyroxyphium caliginis

Heliantheae

M93788

ARNI_MOLL

Arnica mollis

Heliantheae

M93789

CENT_TRIC

Centaurea trichocephala

Cardueae

L35874

ECHI_RITR

Echinops ritro

Cardueae

L35882

JURI_HUMI

Jurinea humilis

Cardueae

L35868

KRIG_VIRG

Krigia virginia

Lactuceae

L13949

SENE_ATRO

Senecio atropurpureus

Senecioneae

L33214

STEP_PAUC

Stephanomeria pauciflora

Lactuceae

L13956

TRAG_PRAT

Tragopogon pratensis

Lactuceae

L35855

Rosaceae

ARIA_ALNI

Aria alnifolia

Maloideae

U16185

CORM_DOME

Cormus domestica

Maloideae

U16187

COTO_LACT

Cotoneaster lacteus

Maloideae

U16188

CRAT_MOLL

Craetegus mollis

Maloideae

U16190

ERIO_JAPO

Eriobotrya japonica

Maloideae

U16192

PRUN_CERA

Prunus cerasifera

Amygaloideae

U16200

ROSA_XXXX

Rosa sp.

Rosoideae

U16206

SPIR_VANH

Spiraea x vanhouttei

Spireoideae

U16205

VAUQ_CALI

Vauquelinia californica

Maloideae

U16191

Fabaceae

CLIA_PUNI

Clianthus puniceus

L10801

VICI_FABA

Vicia faba

X17535

VIGN_RADI

Vigna radiata

X14337

Capparales

ARAB_THAL

Arabidopsis thaliana

Brassicaceae

X52320

BRAD_NAPU

Brassica napus

Brassicaceae

D10840

ISOM_ARBO

Isomeris arborea

Capparidaceae

- b

Caryophyllales

ALLU_PROC

Alluaudia procera

Didiereaceae

L49486

AMAR_RETR

Amaranthus retroflexus

Amaranthaceae

L48798

ANRE_CORD

Anredera cordifolia

Basellaceae

L49487

BOUG_SPEC

Bougainvillea spectabilis

Nyctaginaceae

L49488

CHEN_ALBU

Chenopodium album

Chenopodiaceae

L49489

CIST_TWEE

Cistanthe tweedyi

Portulacaceae

L49490

GYPS_ELEG

Gyposophila elegans

Caryophyllaceae

L49491

MAIH_POEP

Maihuenia poeppigii

Cactaceae

L49492

MIRA_JALA

Mirabilis jalapa

Nyctaginaceae

L49493

MOLL_VERT

Mollugo verticillata

Molluginaceae

L48799

PHYT_AMER

Phytolacca americana

Phytolaccaceae

L48800

SILE_FRUT

Silene fruticosa

Caryophyllaceae

X83839

STEL_MEDE

Stellaria media

Caryophyllaceae

L49495

TETR_ TETR

Tetragonia tetragonioides

Aizoaceae

L48802

VACC_HISP

Vaccaria hispanica

Caryophyllaceae

X83847

Ranunculales/Papaverales

BERB_DICT

Berberis dictyophylla

Berberidaceae

X83829

COPT_TRIF

Coptis trifolia

Ranunculaceae

X83846

CORY_BRAC

Corydalis bracteata

Fumariaceae

X85474

GLAU_PALM

Glaucidium palmatum

Glaucidiaceae

X83837

HYDR_CANA

Hydrastis canadensis

Ranunculaceae

X83840

Table 1. Continued

KING_UNIF

Kingdonia uniflora

Circeasteraceae

X83844

PAPA_RHOE

Papaver rhoeas

Papaveraceae

X85482

PODO_PELT

Podophyllum peltatum

Podophyllaceae

X83831

PTER_RACE

Pteridophyllum racemosum

Pteridophyllaceae

X83832

RANU_ENYS

Ranunculus enysii

Ranunculaceae

X83848

TROL_RANU

Trollius ranunculoides

Ranunculaceae

X83849

Misc. dicots

AEON_VISC

Aeonium viscosum

Crassulaceae

X80587

BETU_PEND

Betula pendula

Betulaceae

X68136

CANE_WINT

Canella winterana

Canellaceae

L03844

CUCU_MELO

Cucurbita melo

Cucurbitaceae

M36377

DAUC_CARO

Daucus carota

Apiaceae

X17534

GAYO_NUTT

Gayophytum nuttallii

Onagraceae

L28022

GODD_DAVI

Gossypium davidsonii

Malvaceae

U12729

LYCO_ESCU

Lycopersicon esculentum

Solanaceae

X52265

PAEO_JAPO

Paeonium japonicum

Paeoniaceae

U27681

POPU_DELT

Populus deltoides

Saliacaceae

X64764

MONOCOTYLEDONS

Asparagales

AGAV_LECH

Agave lechuguilla

Agavaceae

U24000

ASPA_OFFI

Asparagus officinalis

Asparagaceae

U24004

CALI_HOOK

Calibanus hookeri

Nolinaceae

U24009

CAMA_SCIL

Camassia scillioides

Hyacinthaceae

U24010

CORD_TERM

Cordyline terminalis

Asteliaceae

U24016

DRAC_MARG

Dracaena marginata

Dracaenaceae

U24036

MAIA_RACE

Maianthemum racemosum

Convallariaceae

U24041

XANT_XXXX

Xanthorrhoea sp.

Xanthorrhoeaceae

U24051

Poaceae

BROM_DIAN

Bromus diandrus

Pooideae

L36509

ORYZ_SATI

Oryza sativa

Bambusoideae

M16845

SORG_BICO

Sorghum bicolor

Panicoideae

U04789

TRIT_URAR

Triticum urartu

Pooideae

X66108

Misc. monocots

ARIS_INER

Arisaema inermis

Araceae

- c

RAVE_MADA

Ravenala madagascariensis

Zingiberaceae

L49494

a Taxa are arranged by major taxonomic group, with familial/subfamilial classification provided for multiply-sampled orders/families. Familial classification is according to GenBank documention. Subfamilial classification is according to: Bremer (12) for Asteraceae; Campbell et al . (13) for Rosaceae; Chase et al . (14) and Dahlgren et al . (15) for Asparagales, and Kellogg and Linder (16) for Poaceae b . b Not accessioned; sequence provided by W. J. Hahn, Columbia University, NY. c Not accessioned; transcribed manually from Ko et al . (17).

ITS2 downstream of c4 is highly variable in length (67-111 bases) and sequence, but two common elements occur. Our recognition of c5 is based in part on its pairing with c4 in our secondary structure models (see below). Alternative alignments of a given sequence in c5 might seem equally likely, but most sequences contain an RYRYYRYRY motif in this region. Region c6 is only tentatively designated because of its small size, the occurrence in several sequences of an additional ACCC, RCCC or AYYY upstream or downstream of the aligned motif, and its variable pairing relations in our secondary structure models (see below). Establishing that the indicated motif is conserved will require additional sampling and evaluation at lower phylogenetic levels.

Despite poor alignability over all angiosperms, regions v1-v6 show conservation above the traditional familial level. For example, v4 aligned in the sequences representing Portulacaceae (CIST_TWEE), Cactaceae (MAIH_POEP), Didiereaceae (ALLU_PROC) and Basellaceae (ANRE_CORD; cf. 1 ), and the 3'-end of v6 is more similar among Caryophyllales (Table 1 ) than among angiosperms as a whole.

Phenetic analysis

Figure 2 illustrates a guide tree generated with gap-opening/-extension penalties of 17.5/0.2. Adjusting the parameters away from these values yielded more starlike trees.


Figure 2 . Guide tree for angiosperm ITS2 sequences produced by CLUSTALW using the slow/accurate algorithm and gap-opening/-extension penalties of 17.5/0.2. Taxon acronyms are defined in Table 1.

Several natural suprafamilial groupings are evident in Figure 2 , including the orders Caryophyllales, Ranunculales, Papaverales, Asparagales and Capparales, and the supraordinal groups of Ranunculales+Papaverales, eudicots, `higher' eudicots ( 30 ), commelinid monocots, and Asparagales plus commelinid monocots ( 14 , 33 ). Except for the position of the Nyctaginaceae representatives, the Caryophyllales topology is similar to the rbc L tree for the order ( 34 ). Likewise, the Asparagales topology is consistent with relationships evidenced from rbc L, although not with morphology or combined morphology/ rbc L ( 14 , 33 ). Familial and subfamilial clustering in Figure 2 is also generally consistent with either independent evidence or more rigorous analysis of these and additional ITS data: Maloideae (the apple subfamily) are clustered and separated from other Rosaceae as they appeared in the formal ITS analysis of these taxa ( 13 ); carduid (thistles), helianthid (sunflowers) and lactucid (lettuces) Asteraceae are each clustered, as are (barely) Asteroideae with lactucids, although Senecio erroneously clusters with lactucids ( 12 ); Brassicaceae (mustards) are paired among the Capparales; and Poaceae cluster among the commelinid monocots, as do the pooid with panicoid Poaceae (cf. 16 ). Within Ranunculales/Papaverales, Pteridophyllum groups with Papaverales. Paradoxically, Lidén ( 35 ), while describing morphological evidence for this relationship, claimed that his own unpublished ITS data indicated a closer relationship between Pteridophyllum and Ranunculaceae. The remoteness of Paeonia from Ranunculales suggested by rbc L ( 30 ) and 18S rDNA ( 36 ) sequences is also evident in ITS2 sequences.

The positions of taxa of singly-represented families, e.g. Cucurbita (Cucurbitaceae), Betula (Betulaceae), Gayophytum (Onagraceae), do not appear correlated with well-corroborated independent evidence, although the relationships among many of these taxa cannot be considered firmly established. The similarity apparent between the Populus and Betula sequences is greater than that for some of the interfamilial similarities, e.g. between Vigna and Vicia .

For perspective on branch lengths, the CLUSTAL W pairwise similarity percentages [mean (range)] were 80 (76-85) among both the portulacaceous taxa and maloid Rosaceae, 64 (53-85) among Caryophyllales, 47 ( 36 -60) between Caryophyllales and other angiosperms, and 83 between ARNI_MOLL and ARGY_CALI. For comparison, average uncorrected ITS2 similarity among the maloid Rosaceae is reported as ~85% ( 13 ), and Baldwin ( 21 ) reported 86.5% similarity between ARNI_MOLL and ARGY_CALI. These similarities appear higher because ambiguously aligned regions, hence the mismatches contained therein, were excluded from the calculations.

Base compositional patterns

High GC content among angiosperm ITS sequences has been noted previously ( 2 ). Figure 3 (data in Appendix 1) shows that the conserved regions are inherently GC-rich. GC content in the variable regions appears correlated with that in the conserved regions but is more extreme and more erratic (Fig. 3 ). In 24/30 samples, GC content is lower in the variable regions than in the conserved.


Figure 3 . GC-content in conserved versus variable regions of angiosperm ITS2. Taxon acronyms are defined in Table 1.

The T content is most variable relative to the other nucleotides, perhaps related to the dual base-pairing relations of U in RNA secondary structure. Across our sample, G, C and A contents vary from 1.5-1.8-fold in conserved regions to 2.3-2.7-fold in the variable regions. In contrast, T content in these regions varies 3.7- and 9.8-fold, respectively.

Secondary structure

Preliminary inputs into MFOLD yielded up to 16 secondary structures for each of the nine analyzed taxa. These varied markedly in the disposition of the coding and conserved ITS2 regions. In order to infer a `consensus' structure, the substructural disposition of the conserved regions were scored in all structures generated for all taxa. The criteria for deriving a set of mutually compatible consensus substructural features were, in order of preference: (i) common presence among all structures generated for all taxa; (ii) presence in at least one of the structures generated for each taxon; and (iii) presence in at least one of the structures for a majority of taxa. The consensus features are as follows: (i) pairing of the 3'-end of the 5.8S sequence with the 5'-end of the 26S sequence, as inferred experimentally in yeast ( 37 ) and believed common to eukaryotes ( 11 ); (ii) pairing of the 3'-end of c1 with the 5'-end of c2; (iii) pairing of the 3'-end of c2 with the 5'-end of c3; (iv) a long stem structure with c4 subterminal along the 5' flank; and (v) the 5'-end of c4 pairing with c5. All of these features co-occurred in only one MFOLD-generated structure, that for tomato (LYCO_ESCU). At least one of the MFOLD-generated structures for each taxon included three or four of the five substructural features, and we found that manual adjustments easily yielded the remainder. Thus, sequences were resubmitted to MFOLD using constraints for plausible base-pairing relations that would generate all five consensus substructural features.

Figure 4 illustrates nine putative angiosperm ITS2 secondary structures. The pairing relations of c1-5' and c6 vary. The former includes pairings with the ITS2-3' end (v6) in Figure 4 C, G and I. Figure 4 B and E includes c2-c6 pairing, but in Figure 4 E, this c2 tract also shows pairing potential with the unpaired region of c3. Figure 4 E is the only structure lacking a stem in v6.


Figure 4 . Secondary structural models for angiosperm ITS2 sequences (upper case letters) with portions of flanking coding regions (lower case letters). Taxon acronyms are defined in Table 1. Conserved regions (c1-c6; Table 2) are indicated.

Figure 4 A-I exaggerates structural similarities by depicting all stems at right angles and camouflaging differences in the size, asymmetry and disposition of bulges. The structural differences were more apparent in the LoopDLoop format ( 29 ), in which bulges affect stem orientation. The LoopDLoop diagrams of the structures in Figure 4 A-I show a prominent central loop in the region of the c1-c2-c3-c6 intersection, with variation in the orientation and angles of the emerging stems and also in the geometry of the long stem in the c4-c5 vicinity. Most of the LoopDLoop diagrams, however, orient the c2-c3 stem parallel to the v6 stem, whereas these are depicted at right angles in Figure 4 A-I.

DISCUSSION

The present results reveal that ITS2 sequence is more conserved among angiosperms than previously appreciated. Regions conserved at least among eudicots averaged 47% of the ITS2 length (conserved/total bases; Appendix 1), whereas 40% (c2-c4/total bases) is alignable across angiosperms. The previously-detected conserved motifs ( 10 ) included only ~10% of ITS2 length. Some variable-sequence regions exhibit structural conservation, e.g. the C- to G-rich transition in v1. At the same time, the present results do not challenge the notion that ITS spacers are hypervariable relative to other molecules. For example, the 5.8S sequence is essentially completely alignable and typically only 10% divergent across angiosperms (Hershkovitz and Lewis, submitted).

While we do not suggest that ITS is optimal or even suitable for angiosperm-wide phylogenetics, the present results do indicate that it contains sequence tracts diagnostic at more, and more inclusive, phylogenetic levels than previously demonstrated (cf. 2 ). The clustering of higher monocots, eudicots and Caryophyllales in Figure 2 suggests not only that conserved, diagnostic motifs exist, but also that ITS2 is not mutationally saturated at such divergence levels. Thus, ITS sequences might provide phylogenetic evidence auxiliary to that from other genes, and at a relatively small cost considering its short length.

The principal shortcoming of ITS for deeper-level investigation is the lack of a rigorous, adequately quantifiable analytical method applicable to highly length-variable sequence regions. The best developed methods (parsimony, minimum evolution, maximum likelihood; 38 ) all rely on a priori assumed homology of aligned sequence positions. The ITS2 data are not amenable to such methods partly because of the paucity of phylogenetically informative sites in the most confidently alignable portion of the conserved regions. The guide tree method circumvents the requirement for a priori alignment, but it is a clustering rather than phylogenetic method. Moreover, it yields no overall statistic for comparison to alternative trees, the data from different genes cannot be directly combined in a single analysis, and the robustness of individual groupings cannot be evaluated using such techniques as bootstrap or decay analyses (cf. 38 ). The value of the guide tree approach, however, is in its unique ability to exploit alignability information in a rapid sequence similarity comparison. This value might be optimized by the incorporation of corrections for empirically-estimated substitution and base compositional biases and dynamic gap-opening and extension weights based on sequence divergence.

We also deferred conventional analyses of the ITS2 data because of the sparse and sporadic sampling, which in turn affected confidence in the alignment. In fact, inadequate sampling can obfuscate any phylogenetic method using any data source ( 39 ). For example, we noted that the guide tree did not correctly cluster subf. Asteroideae (the two Heliantheae plus Senecio ) among the 10 Asteraceae sequences; indeed, neither did parsimony analysis of seven Asteraceae rbc L sequences ( 40 ). The rbc L taxon sampling was smaller, but the sequence is ca. seven times longer than ITS2 and perfectly alignable. Therefore, the guide tree method might have performed better with broader sampling from among the ~20 000 Asteraceae species ( 12 ). Thus, while phylogenetically unsatisfactory guide tree results for members of singly-represented groups, e.g. Cucurbita and Betula , might result from phylogenetic signal loss via mutational saturation or convergence, they also simply may represent inadequate sampling.

For analyses at interspecific/intergeneric levels for which ITS sequences are typically exploited, the elucidation of conservation patterns will facilitate weighting and optimization of character-state reconstruction. Here, DNA sequence data provide a relatively large number of characters and discrete character states, but they have not afforded much `circumspection' for phylogenetic analysis, i.e., interpretability in view of ample data from the broader group of interest and in the context of obvious and potentially correlated developmental, physiological and/or environmental variables. Such potential for circumspection is especially limiting for ITS data given their high divergence, sporadic sampling, and the lack of even such obvious variability correlates as codon position. While these problems can be mitigated using techniques for estimating appropriate stochastic models of sequence evolution (e.g. Kimura-2-parameter; summarized in 38 ), empirically-observed patterns such as those elucidated here can be used to examine site-specific constraints on sequence evolution, e.g. sites where C-T transitions are less likely to occur than at others.

The utility of the generalized model of angiosperm ITS2 secondary structure is 3-fold: (i) it provides a means for evaluating evidence for correlates of RNA sequence evolution, e.g. compensatory mutation; (ii) it provides an empirically testable model for functional analysis of the angiosperm ITS2; and (iii) it underscores the limitations of predicting RNA secondary structure using computer programs and single sequences. We will consider these points in turn.

Compensatory mutations, or secondary mutations that maintain RNA base-pairing relations consequential to an initial mutation, are regarded as correlated characters that should be downweighted in phylogenetic analyses ( 41 ). Differences among our (admittedly unproved) ITS2 models evidence structural evolution not involving paired-position compensation, e.g. in the c4-c5 stem, which consistently recurred among MFOLD structures of different taxa, but which varied in specific stem/bulge pattern and relative disposition of the conserved motifs. In interspecific-level comparisons, Baldwin et al . ( 2 , 42 ) reported that mutations in Calycadenia (Asteraceae) also showed no evidence of paired-position compensation in the putative ITS2 secondary structure. These results suggested that either compensatory mutation was not a significant evolutionary force, or that compensation had involved mutations at non-paired positions (`cryptic [mutational] non-independence'), or that the secondary structural models were incorrect ( 2 ). Olsthoorn et al . ( 43 ) provided evidence for compensatory mutation at non-paired positions (i.e. cryptic non-independence) during in vivo viral RNA evolution.

There are no clearly conserved primary ITS2 sequence motifs shared between angiosperms, green algae and yeast ( 11 ). Likewise, our predicted angiosperm ITS2 secondary structures differ from those experimentally deduced in yeasts ( 37 , 44 ) and proposed for green algae ( 4 ), as well as mosquitoes ( 7 , 45 ). In particular, the ITS2-3' sequence, highly variable in angiosperms, is conserved in yeasts, and it pairs with a 5' conserved sequence structurally analogous to the angiosperm c2 region. The yeast model, however, does include a long stem with a distal conserved pairing region, analogous to the stem formed between c3 and c6 in the angiosperm model and the c4-c5 pairing. Functional analyses of ITS2 in yeasts indicate that evolutionarily conserved secondary structural motifs are critical for rDNA processing ( 44 ). Thus, the evidence for secondary structural evolution suggests that the rDNA processing mechanism has also been phylogenetically labile.

Our attempt to reconcile secondary structure with angiosperm-wide ITS2 sequence conservation illustrates the problem inherent in inferring secondary structure from a single MFE folding of a single sequence. For example, our models differ markedly from Yeh and Lee's ( 37 ) computer-generated MFE folding of mung bean ( Vigna radiata ) ITS2, which they characterized as `surprisingly similar' to their biochemically-deduced yeast model. The putative similarity (and others claimed elsewhere among ITS2 secondary structures) is an illusory effect of comparing `Squiggle' diagrams ( 28 ), which force foldings to conform to limited geometric shapes and condense interior bulges of any length to fit within the fixed length of a single paired-base space. Yeh and Lee's ( 37 ) mung bean model pairs c4 and c5, but it also pairs the conserved ITS2 5'-end (including c1-c2) with the hypervariable 3'-end. We submitted to MFOLD the Vigna radiata sequence, but did not recover a structure similar to Yeh and Lee's model. Baldwin et al .'s computer-generated MFE folding of Calycadenia (Asteraceae; 2 , 42 ) is similar to our model for rice (Fig. 4 G) in pairing c1 with c2, c2 with c3, c4 with c5, and c6 with c2. The pairing indicated in Calycadenia between the c1-5' and v6-3', however, is not reflected in any of our models. We found that the Calycadenia sequence can be folded readily to include all of the consensus substructural elements, and the resulting structure (not shown) closely resembles our tomato model (Fig. 4 A). While we found the MFE criterion to be useful for developing our model, in only one case did we adopt a computer-generated MFE structure, and we found that multiple and often radically different structures have similar minimum free-energy (cf. 2 ). The limitation of MFE as a sole criterion in secondary structure prediction ( 46 , 47 ) is also underscored by experimental evidence for rDNA 5.8S structures having subminimal free-energy in Chlamydomonas and yeast ( 48 ). Thus, constraints imposed by as yet unknown intermolecular interactions and the cellular environment likely have a significant influence on secondary structure.

The prospects for advancing ITS secondary structural resolution and incorporation of this information into phylogenetic analysis will depend upon progress using biochemical, simulation ( 46 ) and phylogenetic/statistical approaches ( 47 , 49 ). The last offer the advantage of detecting character correlations independent of an a priori presumed secondary structural model. The limitation, however, is the requirement for an adequate number of phylogenetically independent correlated substitutions for a given pair of bases, i.e., from rare evolutionary events, one cannot distinguish between inevitable, necessary compensation and phylogenetic coincidence. We emphasize that the secondary structures presented in Figure 4 are for heuristic purposes, but, pending application of more advanced methods of secondary structural analyses, this simple correlation of structural with sequence conservation provides a preliminary basis for broader consideration of secondary structure in angiosperm ITS2 evolutionary analyses.

Besides phylogenetic and functional implications, the present results also enhance the value of ITS as a paradigm for DNA sequence evolution. The previously cited advantages of ITS included its high information content at lower phylogenetic levels and ease of amplification in diverse eukaryotes ( 2 ). The present analysis demonstrates that ITS2 also exhibits conserved sequence patterns diagnostic at many hierarchical levels and substantial alignability across angiosperms. The combination of angiosperm-wide sequence conservation with species-level sequence variability renders ITS a unique window for examining the behavior of a rapidly-evolving, homologous, non-coding DNA sequence through divergence times spanning relatively ancient (90-130 million years; 50 ) to the most contemporary.

ACKNOWLEDGEMENTS

We are indebted to Dave Swofford for access to and permission to publish results using prerelease versions of PAUP* 4.0, and to NIH-National Cancer Research Institute, (Frederick, MD) for GCG access and computer support. Caleb Gordon (Department of Ecology and Evolutionary Biology, University of Arizona, Tucson, AZ) generated the Ravenala ITS2 sequence as part of an ongoing phylogenetic study of Zingiberales with EAZ and John Kress (Smithsonian Institution). William J. Hahn (Columbia University, NY) generously provided the Isomeris ITS2 sequence. We thank Bruce Baldwin and Louise Lewis for critical comments. MAH was supported by a Smithsonian Postdoctoral Fellowship in Molecular Evolution.

REFERENCES

1 Soltis,P.S. and Soltis,D.E. (1995) Ann. Missouri Bot. Gard., 82, 147.

2 Baldwin,B.G., Sanderson,M.J., Porter,J.M., Wojciechowski,M.F., Campbell,C.S. and Donoghue,M.J. (1995) Ann. Missouri Bot. Gard., 82, 247-277.

3 Vilgalys,R. and Sun,B.L. (1994) Proc. Natl. Acad. Sci. USA, 91, 4599-4603. MEDLINE Abstract

4 Bakker,F.T., Olsen,J.L. and Stam,W.T. (1995) J. Mol. Evol., 40, 640-651. MEDLINE Abstract

5 Coleman,A.W., Suarez,A. and Goff,L.J. (1994) J. Phycol., 30, 80-90.

6 Campbell,B.C., Steffen-Campbell,J.D. and Werren,J.H. (1993) Insect Mol. Biol., 2, 225-237.

7 Fritz,G.N., Conn,J., Cockburn,A. and Seawright,J. (1994) Mol. Biol. Evol., 11, 406-416. MEDLINE Abstract

8 Schlstterer,C., Hauser,M.-T., von Haesleler,A. and Tautz,D. (1994) Mol. Biol. Evol., 11, 513-522. MEDLINE Abstract

9 Vogler,A.P. and DeSalle,R. (1994) Mol. Biol. Evol., 11, 393-405. MEDLINE Abstract

10 LiuJ.-S. and Schardl,C.L. (1994) Plant Mol. Biol., 26, 775-778.

11 Hershkovitz,M.A. and Lewis,L.A., submitted.

12 Bremer,K. (1994) Asteraceae: Cladistics and Classification. Timber Press, OR.

13 Campbell,C.S., Donoghue,M.J., Baldwin,B.G. and Wojciechowski,M.F. (1995) Am. J. Bot., 82, 903-918.

14 Chase,M.W., Duvall,M.R., Hills,H.G., Conran,J.G., Cox,A.V., Eguiarte,L.E., Hartwell,J., Fay,M.F., Caddick,L.R., Cameron,K.G. and Hoot,S.B. (1995) In Rudall,P.J., Cribb,P.J., Cutler,D.F. and Humphries,C.J. (eds), Monocotyledons: Systematics and Evolution. Royal Botanic Gardens, Kew, London, pp. 109-137.

15 Dahlgren,R.M.T., Clifford,H.T. and Yeo,P.F. (1985) The Families of the Monocotyledons. Springer-Verlag, Berlin.

16 Kellogg,E.A. and Linder,H.P. (1995) Phylogeny of the Poales. In Rudall,P.J., Cribb,P.J., Cutler,D.F. and Humphries,C.J. (eds), Monocotyledons: Systematics and Evolution. Royal Botanic Gardens, Kew, London, pp. 511-542.

17 Ko,S.C., O'Kane,S.L.,Jr and Schaal,B.A. (1993) Rhodora, 95, 254-277.

18 Ritland,C.E., Ritland,K. and Straus,N.A. (1993) Mol. Biol. Evol., 10, 1273-1278. MEDLINE Abstract

19 Nickrent,D.L., Schuette,K.P. and Starr,E.M. (1994) Amer. J. Bot., 81, 1149-1160.

20 Milligan,B.G. (1992) In Hoelzel,A.R. (ed.), Molecular Analysis of Populations. IRL Press, Oxford, pp. 59-88.

21 Baldwin,B.G. (1992) Mol. Phyl. Evol., 1, 3-16.

22 Suh,Y, Thien,L.B., Reeve,H. and Zimmer,E.A. (1993) Amer. J. Bot., 80, 1042-1055.

23 Sambrook,J., Fritsch,E.F. and Maniatis,T. (1989) Molecular Cloning: A Laboratory Manual, 2nd Ed. Cold Spring Harbor Laboratory Press, Cold Spring Harbor, NY.

24 Smith,S. (1994) GDE version 2.2. University of Illinois, Urbana, IL.

25 Thompson,J.D., Higgins,D.G. and Gibson,T.J. (1994) Nucleic Acids Res., 22, 4673-4680. MEDLINE Abstract

26 Swofford,D.L. PAUP* version 4.0. Sinauer Assoc., Sunderland, MA, in press.

27 Zuker,M. (1989) Science, 244, 48-52. MEDLINE Abstract

28 Genetics Computer Group (1994) Program Manual for the Wisconsin Package, version 8. Genetics Computer Group, Madison, WI.

29 Gilbert,D.G. (1992) LoopDloop. Available via anonymous ftp to ftp.bio.indiana.edu

30 Chase,M.W., Soltis,D.E., Olmstead,R.G., Morgan,D., Les,D.H., Mishler,B.D., Duvall,M.R., Price,R.A., Hills,H.G., Qiu,Y.-L. et al. (1993) Ann. Missouri Bot. Gard., 80, 528-580.

31 Doyle,J.A., Donoghue,M.J. and Zimmer,E.A. (1994) Ann. Missouri Bot. Garden, 81, 419-450.

32 Hershkovitz,M.A. (1993) Ann. Missouri Bot. Gard., 80, 333-365.

33 Chase,M.W., Stevenson,D.W., Wilkin,P. and Rudall,P.J. (1995) In Rudall,P.J., Cribb,P.J., Cutler,D.F. and Humphries,C.J. (eds), Monocotyledons: Systematics and Evolution. Royal Botanic Gardens, Kew, London, pp. 685-730.

34 Manhart,J.R. and Rettig,J.H. (1994) In Behnke,H.-D. and Mabry,T.J. (eds), Caryophyllales, Evolution and Systematics. Springer-Verlag, Berlin, pp. 235-246.

35 Lidén,M. (1993) In Kubitzki,K. Rohwer,J.G. and Bittrich,V. (eds), The Families and Genera of Flowering Plants. Springer-Verlag, Berlin., Vol. 2, pp. 556-557.

36 Nickrent,D.L. and Soltis,D.E. (1995) Ann. Missouri. Bot. Gard., 82, 208-234.

37 Yeh,L.-C.C. and Lee,J.C. (1990) J. Mol. Biol., 211, 699-712. MEDLINE Abstract

38 Swofford,D.L., Olsen,G.J., Waddell,P.J. and Hillis,D.M. (1996) In Hillis,D.M. and Moritz,C. (eds), Molecular Systematics. Sinauer Assoc., Sunderland, Massachusetts, 2nd Ed., pp. 407-514.

39 Penny,D., Lockhart,P.S., Steel,M.A. and Hendy,M.D. (1994) In Scotland,R.W., Siebert,D.J. and Williams,D.M. (eds), Models in Phylogeny Reconstruction, Systematics Association Special, Vol. 52. Clarendon Press, Oxford, pp. 211-230.

40 Michaels,H.J., Scott,K.M., Olmstead,R.G., Szaro,T., Jansen,R.K. and Palmer,J.D. (1993) Ann. Missouri Bot. Gard., 80, 742-751.

41 Wheeler,W.C. and Honeycutt,R.L. (1988) Mol. Biol Evol., 5, 90-96. MEDLINE Abstract

42 Baldwin,B.G., Sanderson,M J. and Donoghue,M.J. (1996) Ann. Missouri Bot. Gard., 83, 151.

43 Olsthoorn,R.C.L., Licis,N. and van Duin,J. (1994) EMBO J., 13, 2660-2668. MEDLINE Abstract

44 van Nues,R.W., Rientes,J.M.J., Morr,S.A., Mollee,E., Planta,R.J., Venema,J. and Rau,H.A. (1995) J. Mol. Biol., 250, 24-36. MEDLINE Abstract

45 Wesson,D.M., Porter,C.H. and Collins,F.H. (1992) Mol. Phyl. Evol., 1, 253-269.

46 Gultyaev,A.P., van Batenburg,F.H.D. and Pleij,C.W.A. (1995) J. Mol. Biol., 250, 37-51. MEDLINE Abstract

47 Gutell,R.R., Larsen,N. and Woese,C.R. (1994) Microbiol. Rev., 58, 10-26. MEDLINE Abstract

48 Thompson,A.J. and Herrin,D.L. (1994) J. Mol. Biol., 236, 455-468. MEDLINE Abstract

49 Muse,S.V. (1995) Genetics, 139, 1429-1439. MEDLINE Abstract


Add to CiteULike CiteULike   Add to Connotea Connotea   Add to Del.icio.us Del.icio.us    What's this?


This article has been cited by other articles:


Home page
ANN BOT (LOND)Home page
G. W. Grimm and T. Denk
ITS Evolution in Platanus (Platanaceae): Homoeologues, Pseudogenes and Ancient Hybridization
Ann. Bot., February 1, 2008; 101(3): 403 - 419.
[Abstract] [Full Text] [PDF]


Home page
Am. J. Bot.Home page
H. Kathriarachchi, R. Samuel, P. Hoffmann, J. Mlinarec, K. J. Wurdack, H. Ralimanana, T. F. Stuessy, and M. W. Chase
Phylogenetics of tribe Phyllantheae (Phyllanthaceae; Euphorbiaceae sensu lato) based on nrITS and plastid matK DNA sequence data
Am. J. Botany, April 1, 2006; 93(4): 637 - 655.
[Abstract] [Full Text] [PDF]


Home page
Am. J. Bot.Home page
R. A. Levin, N. R. Myers, and L. Bohs
Phylogenetic relationships among the "spiny solanums" (Solanum subgenus Leptostemonum, Solanaceae)
Am. J. Botany, January 1, 2006; 93(1): 157 - 169.
[Abstract] [Full Text] [PDF]


Home page
Am. J. Bot.Home page
J. A. Tate, J. Fuertes Aguilar, S. J. Wagstaff, J. C. La Duke, T. A. Bodo Slotta, and B. B. Simpson
Phylogenetic relationships within the tribe Malveae (Malvaceae, subfamily Malvoideae) as inferred from ITS sequence data
Am. J. Botany, April 1, 2005; 92(4): 584 - 602.
[Abstract] [Full Text] [PDF]


Home page
Am. J. Bot.Home page
R. A. Levin, K. Watson, and L. Bohs
A four-gene study of evolutionary relationships in Solanum section Acanthophora
Am. J. Botany, April 1, 2005; 92(4): 603 - 612.
[Abstract] [Full Text] [PDF]


Home page
Am. J. Bot.Home page
A. N. Muellner, R. Samuel, M. W. Chase, C. M. Pannell, and H. Greger
Aglaia (Meliaceae): an evaluation of taxonomic concepts based on DNA data and secondary metabolites
Am. J. Botany, March 1, 2005; 92(3): 534 - 543.
[Abstract] [Full Text] [PDF]


Home page
Am. J. Bot.Home page
P. E. Berry, W. J. Hahn, K. J. Sytsma, J. C. Hall, and A. Mast
Phylogenetic relationships and biogeography of Fuchsia (Onagraceae) based on noncoding nuclear and chloroplast DNA data
Am. J. Botany, April 1, 2004; 91(4): 601 - 614.
[Abstract] [Full Text] [PDF]


Home page
ANN BOT (LOND)Home page
S. S. NEVES and M. F. WATSON
Phylogenetic Relationships in Bupleurum (Apiaceae) Based on Nuclear Ribosomal DNA ITS Sequence Data
Ann. Bot., April 1, 2004; 93(4): 379 - 398.
[Abstract] [Full Text] [PDF]


Home page
Am. J. Bot.Home page
K. Andreasen and B. G. Baldwin
Reexamination of relationships, habital evolution, and phylogeography of checker mallows (Sidalcea; Malvaceae) based on molecular phylogenetic data
Am. J. Botany, March 1, 2003; 90(3): 436 - 444.
[Abstract] [Full Text] [PDF]


Home page
Nucleic Acids ResHome page
C. A. Cote and B. A. Peculis
Role of the ITS2-proximal stem and evidence for indirect recognition of processing sites in pre-rRNA processing in yeast
Nucleic Acids Res., May 15, 2001; 29(10): 2106 - 2116.
[Abstract] [Full Text] [PDF]


Home page
Am. J. Bot.Home page
A. E. Schwarzbach and R. E. Ricklefs
Systematic affinities of Rhizophoraceae and Anisophylleaceae, and intergeneric relationships within Rhizophoraceae, based on chloroplast DNA, nuclear ribosomal DNA, and morphology
Am. J. Botany, April 1, 2000; 87(4): 547 - 564.
[Abstract] [Full Text]


Home page
Am. J. Bot.Home page
S. R. Downie, D. S. Katz-Downie, and K. Spalik
A phylogeny of Apiaceae tribe Scandiceae: evidence from nuclear ribosomal DNA internal transcribed spacer sequences
Am. J. Botany, January 1, 2000; 87(1): 76 - 95.
[Abstract] [Full Text]


Home page
Am. J. Bot.Home page
R. D. Noyes and L. H. Rieseberg
ITS sequence data support a single origin for North American Astereae (Asteraceae) and reflect deep geographic divisions in Aster s.l.
Am. J. Botany, March 1, 1999; 86(3): 398 - 412.
[Abstract] [Full Text] [PDF]


This Article
Right arrow Abstract Freely available
Right arrow Print PDF (370K) Freely available
Right arrow Alert me when this article is cited
Right arrow Alert me if a correction is posted
Services
Right arrow Email this article to a friend
Right arrow Similar articles in this journal
Right arrow Similar articles in PubMed
Right arrow Alert me to new issues of the journal
Right arrow Add to My Personal Archive
Right arrow Download to citation manager
Right arrow Search for citing articles in:
ISI Web of Science (93)
Right arrowRequest Permissions
Right arrow Commercial Re-use Guidelines
for Open Access NAR Content
Google Scholar
Right arrow Articles by Hershkovitz, M.
Right arrow Articles by Zimmer, E.
Right arrow Search for Related Content
PubMed
Right arrow PubMed Citation
Right arrow Articles by Hershkovitz, M.
Right arrow Articles by Zimmer, E.
Social Bookmarking
 Add to CiteULike   Add to Connotea   Add to Del.icio.us  
What's this?