Nucleic Acids Research, 2002, Vol. 30, No. 23 e133
© 2002 Oxford University Press
Quantitative codon optimisation of DNA libraries encoding sub-random peptides: design and characterisation of a novel library encoding transmembrane domain peptides
Center for Genomics and Bioinformatics (CGB), Karolinska Institutet, 171 77 Stockholm, Sweden
*To whom correspondence should be addressed. Tel: +46 8 7286371; Fax: +46 8 327934; Email: zicai.liang{at}cgb.ki.se
Received June 25, 2002; Revised September 16, 2002; Accepted October 9, 2002
| ABSTRACT |
|---|
|
|
|---|
Codons for amino acids sharing similar chemical properties seem to cluster on the genetic codon table. Such a geographical distribution of the codons was exploited to create chemically synthesised DNA that encodes peptide libraries containing only a subset of the 20 natural amino acids. The frequency of each amino acid in the subset was further optimised by quantitatively manipulating the ratio of the four phosphoamidites during chemical synthesis of the libraries. Peptides encoded by such libraries show a reduced complexity and could be enriched in peptides of a desired property, which are thus more suitable when screening for functional peptides. Proof of concept for the codon-biased design of peptide libraries was shown by design, synthesis, and characterisation of a transmembrane peptide library that contains >80% transmembrane peptides, representing a 160-fold enrichment compared with a fully randomised library.
| INTRODUCTION |
|---|
|
|
|---|
Currently, and in the past, researchers have considered all possible combinations of amino acids, initially by site-directed mutagenesis (1,2) and partial randomisation (3,4), as means for functional annotation of known genes as well as screening methods for peptides with novel properties. The 20 amino acids enable the creation of peptide libraries with very high complexity even when only short stretches of randomised regions are included. Although such libraries should contain peptides of a desired property, the specific peptide population may become extremely diluted, and screening for the peptides of interest may become difficult. One question is whether such a high degree of complexity is necessary in an artificial peptide library? Compared with the astronomical number of possible combinations of the 20 natural amino acids (20400 species for peptides with an average size of 40 kDa), the number of natural proteins (100 000 in human cells) is just a fraction. Although the bias towards this fraction of peptides has evolutionary reasons, it nevertheless argues for fully functional systems with limited complexity. Aptamer screening, using random nucleic acids or ribonucleic acids, has demonstrated that an affinity tag can be obtained for almost any bio-molecule from a library composed of only four different nucleotide building blocks (5). Can this be done using peptide libraries of similar complexity?
Peptide libraries can be generated either by direct chemical synthesis of polypeptides or by using peptide-encoding nucleic acid libraries. DNA-encoded peptide libraries are being employed more and more due to their compatibility with peptide panning technologies such as phage display, bacterial display and ribosome display. The philosophy of DNA (mostly oligonucleotide) library construction is still heavily dominated by the concept of total randomisation using all 20 amino acids. However, more limited randomisation involving fewer amino acids is widely used in site-directed optimisation (maturation) during peptide screening. In spite of numerous successes in the application of fully randomised libraries, two obvious drawbacks are the abundance of truncated peptides (due to stop codons) and the conflict between the theoretical library complexity and the physical complexity of the libraries (1011 for high-quality phage display libraries and 10121013 for ribosome display libraries).
We have started to explore the concept of dissecting the totally randomised peptide library into a series of libraries that each contain only a subset of the amino acids at a predefined frequency. By doing so, we wish to transform the enormous unstructured fully randomised peptide library into a set of sub-libraries, well structured in terms of a desired property e.g. hydrophilicity, net charge, polarity or side chain size. In this paper we have designed a series of DNA-encoded peptide libraries containing a high percentage of transmembrane (TM) peptides as a proof of concept. Transmembrane receptors are typically anchored to the plasma membrane by a peptide stretch of 1530 amino acids. These amino acids could function as an anchoring part of the receptor. However, recent data suggest that the membrane part of the receptor is important for intra- and inter-molecular interactions (613), e.g. it is known that interactions between the seven TM domains in G-protein coupled receptors (GPCRs) are important for receptor dimerisation and signalling. It has also been shown that peptides that perturb the interactions between the TM domains disrupt the dimerisation process and interfere with GPCR function (1416). Thus, it is likely that the membrane part of the receptor has more functions than just anchoring the receptor to the membrane, e.g. mediation of specific interactions or conformational changes. Our TM peptide libraries can be used to screen for peptides that can be used for intervention of membraneprotein interactions.
| MATERIALS AND METHODS |
|---|
|
|
|---|
The natural clustering of the codons
Codons for the same amino acid (apart from serine and arginine) differ from each other only in the third nucleotide of the triplet (Table 1). Codons are, however, not only clusters for single amino acids. In our assessment of the codon table (Table 1), we observed that amino acids with similar properties seem to cluster to the same location of the codon table. One striking example is that the first column of the table contains only hydrophobic amino acids. Based on these phenomena, we have designed a series of so-called codon-biased nucleotide libraries that encode random peptides, which contain only a subset or a biased selection of the 20 natural amino acids
|
Library construction
TM library 1, TM library 2, TM library 3 (Table 2) (BioAsia, Shanghai, China) and a randomised library [NNN]18 (Interactiva, Germany) contain the library part flanked by a 5' start codon followed by an N-glycosylation signal (ATG AAT GCT AGT) and a 3' sequence harbouring a BamHI restriction site (GGA TCC TAT GCA CC). The yield from the synthesis of the TM libraries after PAGE purification was
5 nmol. This indicates a library size of 3 x 1014. The construct was PCR amplified using primers 5'-C ACC TAA AGA ATG AAT GCT AGT and 5'-AAC TGA ATT AGG TGC ATA GGA to select for full-length oligos [(94°C 1 min, 39°C 1 min, 72°C 30 s)2 (94°C 30 s, 55°C 30 s, 72°C 30 s)32 72°C 4 min]. The PCR selected library was cloned into the pcDNA 3.1D/V5-His-TOPO vector (Invitrogen, Sweden).
|
Library sequencing
We sequenced TM library 3 clones to verify that the encoding of peptides can be accurately synthesized as designed. We also sequenced clones from the randomised library to generate negative controls. DH10b cells (Life Technologies, Sweden) were transfected with the library and individual colonies were PCR amplified using the T7 primer (5'-TAA TAC GAC TCA CTA TAG GG) and the BGH reverse primer (5'-TAG AAG GCA CAG TCG AGG). Full-length inserts were selected on an agarose gel for sequencing, which was performed using DYEnamic ET Terminator Cycle Sequencing (Amersham Pharmacia, Sweden) and the T7 primer. Sequences were analysed using DNAtools (www.dnatools.dk).
Cellular localisation
Thirteen TM peptides and three peptides derived from a fully randomised library were selected for localisation studies. These were PCR amplified with primers 5'-CGA TCA GAG AAT TCA TGA ATG CTA GT-3' and 5'-GCT AGT TTA TTA GCT GGT GCA TAG GA-3' and then cloned into pEGFP-N3 (Clontech, USA) using BamHI and EcoRI sites. The TM peptides expressed from these constructs were all attached to the N-terminus of the recombinant green fluorescent protein (GFP). Constructs were sequence verified. The three random peptides were the only stop codon-free peptides out of 20 clones sequenced from a fully randomised library. We did not include additional random peptides as negative controls as it was demonstrated that only
0.1% of such random peptides could have a membrane localisation (17). We did not include the other 17 peptides from the random library in this assay because they would simply have abolished the GFP translation. GFP was used as a negative control and the
2c-adrenergic receptor, fused to GFP, was used as a positive control. All constructs were transfected into Cos7 cells or 293 cells using Trans Fast transfection reagent (Promega, USA) and harvested after 24 h. The Cos7 cells were permeabilised (0.5% Triton X-100, 20 mM TrisHCl pH 7.4, 50 mM NaCl, 300 mM sucrose, 3 mM MgCl2) for 1 min, washed twice in PBS, fixed in 3.7% formaldehyde (PBS) for 5 min, washed three times in PBS and mounted onto glass slides. The 293 cells were washed once in PBS and mounted. The cellular localisation of the GFP-coupled peptides was observed using a fluorescence microscope (Leica DMRXA).
| RESULTS AND DISCUSSION |
|---|
|
|
|---|
Library design using the codon bias approach
By exploiting the fact that amino acids that share some bases in their triplets, and have similar properties, form clusters in the codon table (Table 1), it is possible to design encoded peptide libraries with different properties using codon-biased library design. This is done by defining each base of the triplet as a mixture of all bases: (TA%CB%AC%GD%) where A + B + C + D = 100%. The different proportions of the bases are mixed and used during the conventional synthesis of the oligo so that a biased randomness is created. Figure 1 shows a subset of libraries that can be constructed: e.g. a library containing hydrophilic negatively charged peptides (50% D and 50% E) could be constructed by synthesising a library with the formula [GAN]n, meaning that the first base in the triplet is (G100%), the second is (A100%) and the third is (A25%T25%G25%C25%.). In a similar way many different libraries with varying properties can be designed.
|
Construction and evaluation of a TM domain peptide library
The amino acid distribution in TM domain peptides shows an amino acid bias that could be compatible with codon-biased library design. The hydrophobic amino acids, which are highly abundant in natural TM domains, appear to form a group that has T as the second base in their triplet. Similarly, the charged amino acids, which are of low abundance and whose existence close to each other in a peptide drastically reduces the likelihood of the peptide to appear in the membrane (18), have A as their second base. Based on such observations three different libraries were designed with each one being a refinement of the previous one (Table 2).
The quality of the TM libraries was evaluated using three different criteria: (i) percentage of the encoded peptides (18mers) with stop codons; (ii) the occurrence of amino acid compositions that are defined as less likely to appear in a natural TM domain [Gromiha (18) concluded that an 18mer peptide was unlikely to be a TM peptide if it contained three amino acids out of lysine, glutamate, arginine, aspartic acid, glutamine, asparagine, histidine and proline]; (iii) how well the library mimics the relative amino acid abundance in naturally occurring TM domains.
Of the encoded 18mer peptides in a totally randomised library, 84% had stop codons in any position. This is improved 20-fold in TM library 3 (the proportion that contains stop codons is calculated as 4%). The occurrence of non-TM peptides, which are defined as peptides that contain three or more non-TM amino acids, was calculated to decrease from 97% in a randomised library to 10% in TM library 3. If all peptides that are either truncated or non-TM are regarded as non-functional, TM library 3 theoretically contains 86% functional peptides that can enter the membrane, in contrast to 0.5% for a totally randomised library (Table 3). In Figure 2A, the theoretical amino acid abundance in the TM libraries, a totally randomised library and the natural occurrence in TM domains is shown (18). In general, the TM libraries do not differ much in the abundance of different amino acids from each other and mimic the natural occurrence well. All three libraries have an over-representation of arginine, valine and leucine, the last two being the most abundant amino acids in the libraries. Glutamine, asparagine, histidine, glycine, cysteine, tyrosine and tryptophan are under-represented in all three libraries but are not depleted and will appear in some of the peptides.
|
|
Library sequencing
To verify that the chemically synthesised DNA library can be constructed according to the theoretical design, 50 independent clones from TM library 3 were sequenced. One of the clones was found to contain only 53 nt, instead of containing 54 nt according to the formula. This probably represents a truncated product from the chemical synthesis. Of the remaining 49 clones, three contained a stop codon and six encoded peptides that are unlikely to be TM peptides due to their amino acid composition (Fig. 3). Thus the experimental data suggested that
82% of the clones from library 3 encode TM peptides, which is close to the expected frequency of 86%. Figure 2B shows the amino acid abundance of the 49 sequenced full-length encoded peptides, theoretically calculated values for TM library 3 and natural TM domains. The sequenced library displays similar amino acid abundance, as we predicted, and mimics the amino acid distribution in natural TM domains.
|
Cellular localisation
In order to study where the TM peptides localise within the cell, 13 peptides were fused to GFP and expressed in Cos7 or 293 cells. All 13 peptides were highly expressed and non-toxic. Ten of the 13 peptides were able to target the GFP to the plasma membrane, although some cytosol localisation was observed (Fig. 4BN). The occurrence of functional TM peptides in library 3 (77%) is close to what we predicted (86%). To generate negative controls, 20 clones from a fully randomised library were sequenced. Of these, 17 contained stop codons and were not used in the localisation assay. None of the remaining three peptides were able to target GFP to the outer membrane (Fig. 4OQ) In summary, we have drastically increased the occurrence of functional TM peptides in TM library 3 (10/13) compared with a fully randomised library (0/20).
|
| CONCLUSION |
|---|
|
|
|---|
Peptide ligands are important resources for functional proteomics and drug development. Different partially randomised schemes have been employed for constructing peptide libraries through chemical synthesis (19). By using codon-biased library design it is possible to formulate a systematic strategy to generate different peptide libraries with small subsets of all amino acids or a strong bias towards some amino acids with shared properties. Such libraries, encoded by chemically synthesised oligonucleotides and containing only a fraction of all possible peptides, may provide a shortcut for the screening of peptide ligands. Compared with the chemical synthesis of peptide libraries, the generation and re-generation of codon-biased libraries is much easier and very flexible. The construction of the library involves only standard molecular biology techniques, and the library can be integrated easily into phage display, bacterial display or ribosome display protocols for high throughput screening.
The main limitation of the codon-directed approach is that the flexibility, in terms of amino acid combination, is limited by the geographical distribution of codons in the codon table. This is reflected by either the unwanted incorporation of stop codons and other amino acids, or a partial loss of some amino acids (especially Trp due to the geographical linkage with the stop codons). Even with severe bias against Trp in the TM libraries, the abundance of functional TM peptides containing this residue is still higher in our libraries compared with a fully randomised library. This is mainly due to the fact that our library has an
160-fold elevated abundance of TM peptides. Similar mathematics could apply to many of the codon-biased libraries.
| ACKNOWLEDGEMENT |
|---|
This work was supported by a grant from the Center for Genomics and Bioinformatics, Karolinska Institutet and Pharmacia Corporation to Z.L.
| REFERENCES |
|---|
|
|
|---|
- Ruvkun,G.B. and Ausubel,F.M. (1981) A general method for site-directed mutagenesis in prokaryotes. Nature, 289, 8588.[Medline]
- Traboni,C., Ciliberto,G. and Cortese,R. (1982) A novel method for site-directed mutagenesis: its application to an eukaryotic tRNAPro gene promoter. EMBO J., 1, 415420.[ISI][Medline]
- Hubner,P., Iida,S. and Arber,W. (1988) Random mutagenesis using degenerate oligodeoxyribonucleotides. Gene, 73, 319325.[ISI][Medline]
- Munir,K.M., French,D.C. and Loeb,L.A. (1993) Thymidine kinase mutants obtained by random sequence selection. Proc. Natl Acad. Sci. USA, 90, 40124016.
[Abstract/Free Full Text] - Jayasena,S.D. (1999) Aptamers: an emerging class of molecules that rival antibodies in diagnostics. Clin. Chem., 45, 16281650.
[Abstract/Free Full Text] - Bouvier,M. (2001) Oligomerization of G-protein-coupled transmitter receptors. Nat. Rev. Neurosci., 2, 274286.[ISI][Medline]
- Overton,M.C. and Blumer,K.J. (2000) G-protein-coupled receptors function as oligomers in vivo. Curr. Biol., 10, 341344.[ISI][Medline]
- George,S.R., Fan,T., Xie,Z., Tse,R., Tam,V., Varghese,G. and ODowd,B.F. (2000) Oligomerization of mu- and delta-opioid receptors. Generation of novel functional properties. J. Biol. Chem., 275, 2612826135.
[Abstract/Free Full Text] - Tao,Y.X., Abell,A.N., Liu,X., Nakamura,K. and Segaloff,D.L. (2000) Constitutive activation of G protein-coupled receptors as a result of selective substitution of a conserved leucine residue in transmembrane helix III. Mol. Endocrinol., 14, 12721282.
[Abstract/Free Full Text] - Li,S.C., Deber,C.M. and Shoelson,S.E. (1994) An irregularity in the transmembrane domain helix correlates with the rate of insulin receptor internalization. Biochemistry, 33, 1433314338.[Medline]
- Han,M., Smith,S.O. and Sakmar,T.P. (1998) Constitutive activation of opsin by mutation of methionine 257 on transmembrane helix 6. Biochemistry, 37, 82538261.[Medline]
- Lin,Y., Jian,X., Lin,Z., Kroog,G.S., Mantey,S., Jensen,R.T., Battey,J. and Northup,J. (2000) Two amino acids in the sixth transmembrane segment of the mouse gastrin-releasing peptide receptor are important for receptor activation. J. Pharmacol. Exp. Ther., 294, 10531062.
[Abstract/Free Full Text] - Pfeiffer,M., Koch,T., Schroder,H., Klutzny,M., Kirscht,S., Kreienkamp,H.J., Hollt,V. and Schulz,S. (2001) Homo- and heterodimerization of somatostatin receptor subtypes. Inactivation of sst(3) receptor function by heterodimerization with sst(2A). J. Biol. Chem., 276, 1402714036.
[Abstract/Free Full Text] - Hebert,T.E., Moffett,S., Morello,J.P., Loisel,T.P., Bichet,D.G., Barret,C. and Bouvier,M. (1996) A peptide derived from a beta2-adrenergic receptor transmembrane domain inhibits both receptor dimerization and activation. J. Biol. Chem., 271, 1638416392.
[Abstract/Free Full Text] - George,S.R., Lee,S.P., Varghese,G., Zeman,P.R., Seeman,P., Ng,G.Y. and ODowd,B.F. (1998) A transmembrane domain-derived peptide inhibits D1 dopamine receptor function without affecting receptor oligomerization. J. Biol. Chem., 273, 3024430248.
[Abstract/Free Full Text] - Zhu,X. and Wess,J. (1998) Truncated V2 vasopressin receptors as negative regulators of wild-type V2 receptor function. Biochemistry, 37, 1577315784.[Medline]
- Peelle,B., Lorens,J., Li,W., Bogenberger,J., Payan,D.G. and Anderson,D.C. (2001) Intracellular protein scaffold-mediated display of random peptide libraries for phenotypic screens in mammalian cells. Chem. Biol., 8, 521534.[ISI][Medline]
- Gromiha,M.M. (1999) A simple method for predicting transmembrane alpha helices with better accuracy. Protein Eng., 12, 557561.
[Abstract/Free Full Text] - Nishikawa,K., Sawasdikosol,S., Fruman,D.A., Lai,J., Songyang,Z., Burakoff,S.J., Yaffe,M.B. and Cantley,L.C. (2000) A peptide library approach identifies a specific inhibitor for the ZAP-70 protein tyrosine kinase. Mol. Cell, 6, 969974.[ISI][Medline]
| ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||




