ABSTRACT
A method is described for measuring the diversity of combinatorial oligonucleotide libraries that entails extrapolating the base composition of a co-synthesized model library (dNC, N = A, C, G, T) to that of a multibase library template. The base composition of dNC was measured by HPLC. The ability of dNC to predict the base composition of a multibase library template was corroborated by measuring the composition of a 12 base combinatorial library. The base composition of the 12 base library was determined by several template dependent incorporation assays: measurement of restriction fragment specific activities from polymerase incorporation/restriction enzyme digests, template directed radionucleotide primer extension and quantitative dideoxynucleotide sequencing. Additionally, a convention for describing oligomeric combinatorial library (OCL) diversity is proposed. The convention uses a quantity termed the diversity quotient (Qd) to describe library breadth and the mole fraction of the least represented monomeric unit of the OCL to calculate minimum library quantity requirements. Similar methods/conventions could presumably be developed/adopted for non-nucleic acid libraries.
Random sequence oligodeoxyribo and ribonucleic acid libraries have been used to isolate and identify sequences that bind to sequence specific ligands and less obvious targets (recently reviewed in 1-6). Such libraries are composed of random sequence blocks flanked by defined/primer sequences. Defined/primer sequences are for the amplifications used during the selection process. In practice, a DNA library is prepared from a synthetic template using the polymerase chain reaction. RNA libraries are prepared by transcription from such DNA libraries. In either case, the first step in preparing an oligonucleotide library is solid phase synthesis of a library template. The diversity of a library's amplification/transcription products can at best equal the parent synthetic template's diversity. Development of sequence independent amplification/transcription protocols will ensure that an aptamer's diversity is equal to its parent template's diversity. Such protocols would also benefit selections by eliminating target independent sequence biasing.
The random sequence section of the template can be prepared by allowing a DNA synthesizer to mix the building blocks or by filling an auxiliary bottle with a building block mixture (7,8). In planning an in vitro selection experiment, one must consider the amount of library initially being screened for molecules that display the property of interest. The initial amount of library is dependent upon combinatorial sequence length, desired pool complexity and practicality. Due to the reaction volumes necessary for inclusion of all possible sequences, it may be impractical to begin a selection experiment with a library pool that contains all possible sequences. In any case, assaying a library's population distribution before the commencement of a selection protocol is desirable. This is sometimes accomplished by sequencing a number of the individual library members (9,10; M.Yarus, personal communication). Here we describe methods for measuring the diversity of a library's synthetic template, henceforth referred interchangeably as the library or template.
The probability of finding a particular sequence in a library sample will depend upon the breadth of library diversity and the number of molecules sampled. Adequate breadth will serve to enhance the probability that failed or missed selections are not due to library deficiencies. It was recently reported that a first attempt at selecting binding sequences for a 19 base triplex forming oligonucleotide (TFO) failed due to deficiencies in the composition of the initially prepared template (9). Selections starting from a subsequently synthesized template successfully identified binding sites for the TFO. At the other end of the library length extreme, a class I ribozyme ligase has been isolated starting from a 220 nucleotide library (11). It was fortuitous that the ligase evolved from the pool since statistically such molecules should be isolated only once in 2000 selections (11). Intuitively, a deficient or biased library could make the odds even worse. Since library deficiencies can result in failed selection experiments and likely reduce the probability of fortuitous isolations, knowledge of a library's diversity prior to selection should enable one to design experiments that minimize library deficiencies.
To facilitate the discovery of novel structures using combinatorial techniques, it is hoped that the burden of library preparation will shift from user to secondary and/or commercial sources. In this event, it will be imperative that the user know the breadth of a supplier's library. It is anticipated that conventions for reporting library diversities and accepted methods for their measurement will be essential. Our immediate interest in combinatorial oligonucleotide libraries is derived from the preparation of duplex libraries for combinatorial type IIS restriction enzyme footprinting experiments (9,12). It occurred to us that having a quantity that describes a library's diversity, a quantity used for calculating minimal sequence representation and less demanding methods for their measurement would be desirable.
It was our hypothesis that the base composition of a template could be conveniently measured by determining the base composition of a co-synthesized one base library (dNC). To demonstrate this point, we found that the composition of dNC was consistent with an identically prepared multibase library (dN12 lib). The base composition of the model was measured by HPLC, dN12 lib was measured by several template dependent incorporation assays (TDIA). The quantitative TDIA included measurement of restriction fragment specific activities from polymerase incorporation/restriction enzyme digests, template directed radionucleotide primer extension and quantitative dideoxynucleotide sequencing. Quantitative dideoxy sequencing revealed that base addition during solid phase synthesis is approximately sequence independent. These data support our contention that easily prepared and characterized one base model libraries should be useful for the characterization of any length combinatorial template. Similar methods could presumably be developed for non-nucleic acid oligomeric libraries.
To report library diversity, we propose that a quantity termed the diversity quotient (Qd) be used. Qd is the mole fraction of the least abundant combinatorial monomer (l) divided by the most (m). This convention yields for completely random libraries a Qd maximum of 1. Libraries with Qds that are <1 have reduced molar diversities (i.e. the number of individual sequences per mole of library). The quantity of library that achieves minimal sequence representation for an initial selection experiment is calculated using the quantity l. Since l is the mole fraction of the least represented combinatorial monomer (L), the greatest likelihood for all oligomers to be statistically represented is if the homo L containing oligomer is present. Since these two quantities are independent of oligomer structure, they could be used to describe the diversity of any oligomeric library.
Unless otherwise noted, all materials and reagents were from Sigma. Radionucleotides were from Amersham. Oligonucleotides were prepared on a Biosearch 8600 DNA synthesizer using standard synthesis programs. The mixed base sequences were prepared by filling the auxiliary fifth base bottle (`U') with a solution prepared from a 3:3:2:2 molar ratio of dA:dC:dG:dT CED phosphoramidites (2,10).
The dimer standards were purified using conditions similar to the analytical separations. Dimer purity was assayed by HPLC and electrophoresis. Chromatograms of the individual dimers contained single-homogeneous peaks. Homogeneity was demonstrated by comparison of the up and down slope spectra with that of the apex. In all cases the up slope, down slope and apex spectra were in absolute agreement and had correlation coefficients that were >= 0.999. 5'-32P-end-labeled dimers were resolved among themselves and shown to be single-banded by 20% PAGE. The concentrations of the individual dimers that were used for the preparation of the equimolar (1 mM each) HPLC standard were measured optically assuming the extinction coefficients for dAC, dCC, dGC and dTC are 10.6, 7.3, 8.8 and 8.1 mM-1 cm-1 respectively (13). The ratios of the dimer standards were also assayed by HPLC analysis of the exhaustively hydrolyzed (overnight digestion using snake venom phosphodiesterase from crotalus durissus terrificus) dimer mixture (14,15). The analysis revealed that the dimers were quantitatively hydrolyzed to nucleosides. The ratios of the nucleosides, and hence the dimer ratios, were calculated assuming the 260 nm extinction coefficients are 15.4, 7.4, 11.5 and 8.7/mM/cm for dA, dC, dG and dT respectively (13).
Analytical HPLC was performed using Shimadzu hardware (LC-10AT liquid chromatograph, SCL-10A system controller, SPD-M10AV diode array detector) and software (EZChrom v.3) with a 250 × 4.6 mm Vydac 218TP C18 reversed phase column. The mobile phases were 10% aqueous acetonitrile (A), 1 M triethyl ammonium acetate (B) (Glen Research) and water (C). The gradient was from 30 to 70% A (1 ml/min, 20 min), B remained at 10% throughout. Dilute acetonitrile (Buffer A) was used because the separations were not reproducible when a 3-7% acetonitrile gradient was attempted using neat acetonitrile.
Using standard methods (16), 15 µl of a 1:100 dilution of the crude template was 5'-32P-end-labeled and purified on a 12% denaturing polyacrylamide gel. The full length band was excised and eluted by agitating the gel slice in 1 mM EDTA overnight. The oligonucleotide was concentrated by repeated extraction with sec-butanol followed by ethanol precipitation. The pellet was suspended in 15 µl of water (1× library) and stored at -70°C between uses.
Modified T7 DNA polymerase/BglII.To seven 0.5 ml microcentrifuge tubes were aliquoted 6 µl of library mix [0.1× library, 8.4 µM P2 and 104 µM dNTPs (N = A, C, G, T)]. To tubes 1-7 were added respectively 1 µl of H2O, H2O, [[alpha]-32P]dATP, [[alpha]-32P]dATP, [[alpha]-32P]dCTP, [[alpha]-32P]dGTP and [[alpha]-32P]dTTP (3000 Ci/mM, 10 Ci/l). To each tube was added 3 µl of diluted polymerase [3.33× reaction buffer (Amersham), 25 mM DTT, 0.32 U/µl modified T7 DNA polymerase (Amersham)]. The reactions were incubated at 37°C for 5 min, temperature ramped to 65°C at 2.5°C/min followed by a 30 min 65°C incubation. The tubes were centrifuged (15 s) and equilibrated at 37°C. To reactions 2 and 4-7 were added 1 µl of BglII (10 U/µl). The restriction digests were incubated at 37°C for 30 min and quenched by the addition of 5 µl of gel loading buffer (Amersham). After heat denaturation (2 min, 94°C), 5 µl aliquots were loaded onto a 0.4 mm × 40 cm long 12% denaturing polyacrylamide gel. The gel was run at 85 W (constant power) until the xylene cyanol marker had migrated ~2/3 down the gel. The bands were excised from the gel and quantitated by scintillation methods using a Beckman 3801LS scintillation counter.
Taq DNA polymerase/BglII, NsiI. To four tubes, each containing 4 µl of [[alpha]-32P]dNTP (N = A, C, G and T, respectively, at 3000 Ci/mM, 10 Ci/l) was added 21 µl of PCR mix. The PCR mix was prepared to contain: 2 × 10-5× dilution of the crude dN12 library, 2.3 µM P1 and P2, 1.2× PCR buffer, 120 µM dNTPs and 0.05 U/µl Taq DNA polymerase. The amplification protocol was 10 cycles of 95°C for 30 s, 65°C for 30 s, 72°C for 60 s, followed by 10 min incubation at 72°C and 12°C soak. To polish the 3'-termini, 10 U of T4 DNA polymerase was added to each reaction and incubated at 12°C for 15 min (17). The amplimers were spin filter purified (Qiagen) and concentrated by rotary evaporation. After adjusting the volumes to 45 µl, 20 µl of each reaction were aliquoted into two sets of four 0.5 ml microcentrifuge tubes. Aliquots of 2.5 µl each of 10× reaction buffer and BglII (10 U/µl) were added to each tube of the first set. 10× buffer and NsiI (20 U/µl) were identically added to the second set. The restriction digests were incubated at 37°C for 1 h. The reactions were quenched by the addition of 10 µl of gel loading buffer (Amersham). Denaturation, gel loading, electrophoresis and restriction fragment quantitation were as described for the modified T7 polymerase/BglII experiment.
An aliquot of 9 µl of extension mix [0.1× library, 43 nM P2, 1.12× reaction buffer (Amersham), 5 mM DTT, 0.2 U/µl modified T7 DNA polymerase (Amersham)] were aliquoted into four tubes each containing 1 µl of 3000 Ci/mM, 10 Ci/l [[alpha]-32P]dATP, [[alpha]-32P]dCTP, [[alpha]-32P]dGTP and [[alpha]-32P]dTTP (Amersham) respectively. To maximize the ability of the polymerase to read through secondary structures, the reactions were incubated at 37°C for 5 min, temperature ramped to 65°C at 2.5°C/min followed by a 5 min incubation at 65°C (B. Ward, unpublished). The reactions were quenched and heat denatured as previously described. Aliquots (4 µl) were loaded in three sets onto a 20% denaturing polyacrylamide gel and electrophoresed at 85 W (constant power). To visualize the less intense bands, several sheets of film (XAR-5) were serially exposed to the gel. The first six bands (i.e. P2+1 through P2+6) were excised from the gel and scintillation counted. The triplicate scintillation data were averaged and the base compositions were calculated for the 72 bands as described in the appendix.
To three sets of sequencing reactions were aliquoted 2 µl of [[alpha]-33P]ddNTP (450 mCi/l, 1500 ci/mmol) and 4 µl of reaction mix [1:6.25 reaction buffer (Amersham), 0.1× library, 0.1 µM P3, 0.6 U/µl modified Taq DNA polymerase (Amersham)]. To each set were added respectively 4 µl of 7.5, 3.75 and 1.87 µM dNTPs. The reactions were layered with 15 µl of mineral oil and ramped to 95°C. The ramp protocol was [temperature (°C)/time (min)]: 50/3, 55/3, 60/4, 65/6, 70/9, 75/9, 80/9, 85/6, 90/3, 95/3, 4/soak. The reactions were quenched, heat denatured and electrophoresed (12% PAGE) as described above. The relative amount of each termination product was measured by microdensitometry of three separate autoradiograms using a Molecular Dynamics Personal Densitometer SI. Band volumes were measured using identical rectangles that encompassed the entire band. Background was compensated for by subtracting the average volumes of five identically sized background rectangles from the band volumes. Base compositions were calculated as described in the appendix.
In the search for oligomers that have a desired property, the quantity of library initially present to achieve a desired level of representation is of critical significance. Intuitively, libraries that are deficient in key populations will have a lower probability of evolving sequences demonstrating the desired properties. For experiments aimed at characterizing the consensus binding sites for sequence specific ligands, one would ideally want to begin a selection experiment with a library pool that contained all possible sequences. To avoid missed target sequences due to sequence concentration effects, an ideal experiment would also have all of the possible sequences at equal concentrations. In the search for sequences that display other specific properties, it may be that subsets of the combinatorial sequence be thoroughly represented. In some in vitro evolutions one may want to skew the library away from maximal diversity (18). In any case, knowing the base composition of the library will allow one to begin a selection experiment with a library pool that contains a desired level of sequence representation.
A library's diversity can minimally be described by its length and Qd. Since Qd is defined as l/m, for libraries containing j randomized positions the ratio of the least represented (Lj) to the most abundant (Mj) library member is Qdj, the library's breadth. From this it follows that the ratio of any two library members is >= Qdj. Figure 1A shows that as a function of combinatorial sequence length, a library's breadth decreases rapidly as Qd decreases. Even modestly low Qds (e.g. Qd = 0.68) result in vanishingly small amounts ( <= 1%) of the minor sequences compared with the major at short library lengths ( >= 11 bases). Though Qd describes a library's breadth, it alone cannot be used to determine the amount of library necessary for a selection experiment. For this, the mole fraction of the least represented monomeric species suffices. Statistically, a single copy of Lj can be achieved by l-j molecules. For libraries that contain an equal distribution of all nucleotides at each randomized position (i.e. Qd = 1), the probability (P) of a library sample containing a unique sequence is ~1 - e-k, where k, the representation factor, is the number of molecules in the pool (p) divided by the number of possible monomeric species raised to the length of the combinatorial sequence (i.e. k = p/4j) (19,20). For libraries with Qds <1, it follows that the probability of a library pool containing Lj, and hence all possible sequences, is given when k = p × lj. To calculate the quantity of library necessary to attain a desired level of representation, one sets P equal to an acceptable probability and solves for p. For example, a pool of 4.6 × l-j molecules will have a 99% probability (P = 0.99) of having one Lj. In such a pool, all members are represented with a probability that is >= 0.99. From this it is immediately apparent that the amount of library initially needed to achieve minimal sequence representation is critically dependent upon l. For example, it requires 180 µg (4.6 × 0.25-25 molecules, P = 0.99) of a Qd = 1 (i.e. l = 0.25), 65 base library (25 combinatorial positions and two 20mer primer sites) to have at least one copy of each possible sequence. If l were 0.23, that representation is achieved with 1480 µg of library. Figure 1B shows for several values of l (l = 0.2-0.25) the amount of library necessary to attain singular representation as a function of combinatorial sequence length. As described in the Figure legend, the library amounts were normalized to a Qd = 1 library. From this it is seen that modest reductions in l result in large increases in the amount of library necessary to begin an all inclusive selection experiment. For libraries with combinatorial sequences that are beyond the practical limits of all inclusivity, k = p × lj is the representation factor for calculating the lowest probability of finding a unique sequence. From this it is evident that Qd and l are adequate for describing any oligomeric library's breadth and for calculating pool requirements. For oligonucleotide libraries, it is sufficient to report the mole fraction of each monomeric unit. However, such convention does not directly address a library's diversity. Qd intuitively expresses library diversity. This may particularly benefit libraries built from a larger number of monomers.
We have described a convenient method for assessing oligonucleotide library diversity. It was our hypothesis that the base composition of a co-synthesized one base library (dNC) would be approximately equal to that of a multibase library. This proposal was tested by comparing the base composition of dNC with that of a 12 base combinatorial library template (dN12 lib). The base composition of dNC was determined by HPLC, dN12 was measured by several template dependent incorporation assays (TDIA). The TDIA experiments yielded dN12 base compositions that were in excellent agreement with the base composition of dNC. From a quantitative dideoxy sequencing experiment we have shown that base addition during solid phase oligonucleotide synthesis is essentially independent of the growing oligomer. From these data we conclude that dNC should adequately model the diversity of any synthetic oligonucleotide library. Assuming that sequence independent amplification and transcription protocols are used, the model dimer too will predict the diversity of selection libraries. Assuming that building block addition is independent of all previous additions, similar methods could be developed for other types of oligomeric libraries.
A convention for reporting combinatorial oligomeric library diversity was proposed. The diversity coefficient (Qd) and the mole fraction of the least represented monomer (l) minimally represent the diversity of such libraries. Qd is l/m where m is the mole fraction of the most represented monomer. Libraries are most random at the Qd maximum of 1. Qds that are <1 contain fewer members per mole. The probability that a library pool (p) will contain at least one copy of each possible library member is 1 - e-k, where k is p × lj. Qd and l have been presented here in the context of random sequence oligonucleotides. It is not necessary that these quantities be limited to the diversity of randomized nucleic acids. Due to their respective definitions, they can be universally applied to any oligomeric combinatorial library.
We thank, in alphabetical order, Drs A. Ellington, R. Hernon, R. Lirette, M. Van Dyke and M. Yarus for critical manuscript review and helpful suggestions. We also thank Dr J. Szostak for help with reference 10.
The mole fraction ratios calculated for the dimer library were based upon the standard's ratios according to Equation 1.
n, n[prime], AdNC and AdN[prime]C are respectively the mole fractions and peak areas for bases N and N[prime]. The subscripts lib and std refer to the library and standard. The mole fraction (n) of base N in dNClib was calculated using Equation 2. n, n[prime], n[prime][prime] and n[prime][prime][prime] are the mole fractions of dNC, dN[prime]C, dN[prime][prime]C and dN[prime][prime][prime]C. Incorporation of [[alpha]-32P]dNTPs (N = A, C, G or T) into copies of the library followed by restriction digestion produces restriction fragments with specific activities that are proportional to the number of Ns contained in the fragment. The relationship between the defined sequence fragments and the combinatorial sequence containing fragments is:
Ndef and Ncomb are the number of known dNs contained in the defined and combinatorial sequence containing restriction fragments, j is the length of the combinatorial sequence and n is the mole fraction of dN in the combinatorial sequence. Bdef and Bcomb are the specific activities of the defined and combinatorial sequence containing bands. After rearrangement, the mole fraction of base dN in the combinatorial sequence is given by Equation 4. The positional mole fraction (ni) of dN for the ith primer addition product (P2 + i) relative to the first labeled primer extension product (P2 + 1) is: Bi is the specific activity of the P2 + i band and j is the length of the combinatorial sequence. In practice j = 6 (see below), though strictly the summation should include any bases incorporated after i = j. The summations result from the total P2 + i being the measured P2 + i and its elongated homologs. Similarly, ni was calculated relative to its preceding addition product according to Equation 6. The summation of the i's through j = 6 is the result of only being able to quantitate bands corresponding to P2 + 1 through P2 + 6. This is understandable because the specific activity of P2 + i (Bi) is proportional to the number of labeled nucleotides added to P2 times the mole fraction of dN in the template raised to the number of incorporated nucleotides less the P2 + i homologous addition products (i.e. Equation 7). For an idealized Qd = 1 library, the relative specific activities of P2 + 1 through P2 + 6 are: 1.00, 0.625, 0.250, 0.0859, 0.0273 and 0.00830.
To minimize errors due to lane to lane variations, the positional dN mole fractions (ni) were calculated relative to the average of the defined sequence band volumes according to Equation 8 followed by normalization (i.e. ai + ci + gi + t i = 1). Vi and VN are the combinatorial and defined sequence autoradiogram band volumes respectively. N is the number of defined sequence band volumes used for the normalization. The mole fraction of dN in dN12 (n) was calculated according to Equation 9. The summations begin at i = 6 because the first nucleotide added to P3 that is complementary to a combinatorial position corresponds to P3 + 6.
Nucleic Acids Research
Pages
Introduction
Materials And Methods
dNC standard
dNC HPLC analysis
Library labeling and purification
[[alpha]-32P]dNTP incorporation/restriction enzyme digestion assay
Template directed radionucleotide primer extension
Quantitative dideoxynucleotide sequencing
Results And Discussion
Conclusion
Acknowledgements
References
REFERENCES
APPENDIX
dNC HPLC

1

2
Restriction fragment quantitation

3

4
Homologous base addition quantitation

5

6

7
Dideoxynucleotide termination product quantitation

8

9
*To whom correspondence should be addressed. Tel: +1 800 521 8956; Fax: +1 314 772 6797; Email: bwardhome@rocketmail.com
This page is run by Oxford University Press, Great Clarendon Street, Oxford OX2 6DP, as part of the OUP Journals
Comments and feedback: www-admin{at}oup.co.uk
Last modification: 10 Feb 1998
Copyright© Oxford University Press, 1998.
![]()
CiteULike
Connotea
Del.icio.us What's this?
This article has been cited by other articles:
![]() |
J. Yanez, M. Arguello, J. Osuna, X. Soberon, and P. Gaytan Combinatorial codon-based amino acid substitutions Nucleic Acids Res., November 10, 2004; 32(20): e158 - e158. [Abstract] [Full Text] [PDF] |
||||
![]() |
C. Neylon Chemical and biochemical strategies for the randomization of protein encoding DNA sequences: library construction methods for directed evolution Nucleic Acids Res., February 27, 2004; 32(4): 1448 - 1459. [Abstract] [Full Text] [PDF] |
||||
![]() |
P. Gaytan, J. Osuna, and X. Soberon Novel ceftazidime-resistance {beta}-lactamases generated by a codon-based mutagenesis method and selection Nucleic Acids Res., August 15, 2002; 30(16): e84 - e84. [Abstract] [Full Text] [PDF] |
||||
![]() |
P. Gaytan, J. Yanez, F. Sanchez, and X. Soberon Orthogonal combinatorial mutagenesis: a codon-level combinatorial mutagenesis method useful for low multiplicity and amino acid-scanning protocols Nucleic Acids Res., February 1, 2001; 29(3): e9 - e9. [Abstract] [Full Text] [PDF] |
||||
| ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||

