Nucleic Acids Research, 2003, Vol. 31, No. 13 3763-3766
© 2003 Oxford University Press
CODEHOP (COnsensus-DEgenerate Hybrid Oligonucleotide Primer) PCR primer design
Department of Pathobiology, School of Public Health and Community Medicine, University of Washington, Seattle, WA 98195, USA 1 Howard Hughes Medical Institute, Fred Hutchinson Cancer Research Center, 1100 Fairview Avenue North, Seattle, WA 98109, USA
*To whom correspondence should be addressed. Tel: +1 206 616 2084; Fax: +1 206 543 3873; Email: trose{at}u.washington.edu
Received February 12, 2003; Revised and Accepted March 20, 2003
| ABSTRACT |
|---|
|
|
|---|
We have developed a new primer design strategy for PCR amplification of distantly related gene sequences based on consensus-degenerate hybrid oligonucleotide primers (CODEHOPs). An interactive program has been written to design CODEHOP PCR primers from conserved blocks of amino acids within multiply-aligned protein sequences. Each CODEHOP consists of a pool of related primers containing all possible nucleotide sequences encoding 34 highly conserved amino acids within a 3' degenerate core. A longer 5' non-degenerate clamp region contains the most probable nucleotide predicted for each flanking codon. CODEHOPs are used in PCR amplification to isolate distantly related sequences encoding the conserved amino acid sequence. The primer design software and the CODEHOP PCR strategy have been utilized for the identification and characterization of new gene orthologs and paralogs in different plant, animal and bacterial species. In addition, this approach has been successful in identifying new pathogen species. The CODEHOP designer (http://blocks.fhcrc.org/codehop.html) is linked to BlockMaker and the Multiple Alignment Processor within the Blocks Database World Wide Web (http://blocks.fhcrc.org).
| INTRODUCTION |
|---|
|
|
|---|
Sequence comparison of multiple members of protein families has proven to be an important approach in the analysis of protein structure and function. The identification of blocks of conserved amino acid sequences has revealed the presence of protein motifs and domains that play important roles in protein function. However, such sequence analysis is limited by the number of individual protein sequences available for comparison. Even with the recent advances in whole genome sequencing, the acquisition of new members of a targeted protein family is limited. Previous methods to isolate unknown family members by PCR have relied on either degenerate primers consisting of a pool of primers containing most or all of the possible nucleotide sequences encoding a conserved amino acid motif or consensus primers consisting of a single primer containing the most common nucleotide at each codon position within the motif. Although these strategies have been successful in isolating closely related sequences, they have generally failed when sequences were more distantly related or were in low copy number. We have developed the COnsensus-DEgenerate Hybrid Oligonucleotide Primer (CODEHOP) PCR strategy for the identification of new members of protein families, which overcomes problems inherent in both degenerate and consensus methods for primer design (1).
| CODEHOP STRATEGY |
|---|
|
|
|---|
Short regions of proteins with high levels of conservation can be represented as ungapped blocks of multiply aligned protein sequences (Fig. 1) (2). CODEHOPs are derived from these conserved sequence blocks (http://blocks.fhcrc.org/codehop.html), and are used in PCR to amplify the region between them. A CODEHOP PCR primer consists of a pool of primers each containing a different sequence in the 3' degenerate core region where each primer provides one of the possible codon combinations encoding a targeted 34 conserved amino acid motif within the sequence block (Fig. 2). In addition, each primer in the pool has an identical 5' consensus clamp region derived from the most probable nucleotide at each position encoding the conserved amino acids flanking the targeted motif. Amplification initiates by annealing and extension of primers in the pool with the most similarity in the 3' degenerate core to the target template (Fig. 3). Annealing is stabilized by the 5' consensus clamp which partially matches the target template. Once the primer is incorporated, it becomes the template for subsequent amplification cycles. Because all primers are identical in the 5' consensus clamp region, they all will anneal at high stringency during subsequent rounds of amplification. This increases the efficiency of the PCR amplification and differentiates the CODEHOP technique from the less efficient consensus PCR or degenerate PCR techniques. The CODEHOP technique has been validated by the successful amplification of new members of protein families that have proven challenging using conventional methods (1).
|
|
|
| CODEHOP PROGRAM |
|---|
|
|
|---|
The CODEHOP program consists of the following nine steps.
- In order to identify target amino acid motifs which are conserved between members of a gene family, a set of related sequences are aligned and conserved regions are determined. This can be done conveniently using the BlockMaker program (http://blocks.fhcrc.org/blockmkr/make_blocks.html) or multiple alignment processor (http://blocks.fhcrc.org/process_blocks.html) at the Blocks Database website. The resulting conserved blocks of amino acid sequences can be used directly as input for the CODEHOP program through a hyperlink (see Fig. 1). Additionally, blocks derived from annotated protein families can be obtained directly from the Blocks Database (http://blocks.fhcrc.org).
- A position-specific scoring matrix (PSSM) is computed for each block using the odds-ratio method (3).
- For each position of the block, a consensus amino acid is chosen as the highest scoring amino acid in the matrix.
- The most common codon for each consensus amino acid in the block is determined using the selected codon usage table. This selection is used for the default 5' consensus clamp determination in step 8.
- A DNA PSSM is calculated from the amino acid matrix in step 2 using the selected codon usage table. The score for each amino acid within a position in the matrix is divided among its codons in proportion to their relative frequencies from the codon usage table and the scores for each of the four different nucleotides are combined in each DNA matrix position.
- The degeneracy is determined by the number of bases at each position of the DNA matrix.
- Degenerate core regions are evaluated by scanning the DNA matrix in the 3'5' direction to determine the overall codon degeneracy within 11 or 12 positions ending on a codon boundary. The degeneracy of a core region is the product of the number of possible nucleotides in each position. A core region must start on an invariant 3' nucleotide position and have a maximum degeneracy of 128 (current default), which can be changed.
- A 5' consensus clamp region from steps 4 or 5 is added to candidate degenerate core regions. The length of the clamp region is controlled by an annealing temperature calculation (4). The current default is 60°C, which usually provides a clamp of
20 nucleotides.
- For primers corresponding to the opposite strand of DNA, steps 7 and 8 are repeated on the reverse complement of the DNA matrix from step 5.
Examples of CODEHOP PCR primers designed from Blocks D and E of the cytosine DNA methyltransferases are provided in Figure 4.
|
| USING THE CODEHOP PROGRAM |
|---|
|
|
|---|
The CODEHOP designer can be biased towards selected sequences within a block of multiply aligned sequences. This is useful for targeting specific orthologous or paralogous members of a protein family. The set of blocks returned by BlockMaker for CODEHOP input may be analyzed phylogenetically to permit convenient selection of a subset of related blocks using the ProWeb TreeViewer link from the BlockMaker output. To emphasize or de-emphasize a particular sequence, the weight provided for each sequence segment in the BlockMaker output (5) can be manually altered, thus changing its contribution to the primer design (Fig. 1). Low abundance nucleotides within the 3' degenerate codon positions can be excluded by increasing the degeneracy strictness value in the range from 0 to 1. Also, members of a protein family from a specific organism or genome can be targeted by selecting the applicable codon usage table for that genome. Finally, the most common codon encoding the consensus amino acid determined for the 5' consensus clamp region can replace the most favored nucleotide chosen from DNA PSSM.
Other user-defined parameters are available to alter the primer design and output. First, the specificity and function of a CODEHOP PCR primer may be altered by changing the length of the 5' consensus clamp region through an annealing temperature parameter (default=60°C). The presence of nucleotide runs within the 5' consensus clamp region can be limited through a polynucleotide parameter (default=5) and the core/clamp boundary may be restricted to a codonboundary. Finally, the program output can show all possible primers or only the single most degenerate primer in each region.
Selection of the optimal CODEHOP PCR primer is primarily based on minimal degeneracy across the 3' degenerate core and secondarily on the clamp score, which indicates the quality of the match between the 5' non-degenerate clamp and the sequence block given a codon usage table. If no primer is identified from an input sequence block, the block may be biased using the methods described above to remove or de-emphasize the most distantly related sequences. Conversely, the default degeneracy limit of 128 may be increased to 256 or higher, which may identify highly conserved motifs that contain amino acids with higher codon degeneracies.
| CONCLUSIONS |
|---|
|
|
|---|
Since publication of the original CODEHOP manuscript (1) and implementation of the CODEHOP designer program on the WWW in 1998 more than 70 studies have been published in which CODEHOP PCR primers have been designed and successfully utilized to amplify distantly related sequences from organisms as diverse as fish, frog, protozoa, plants, viruses and bacteria (http://courses.washington.edu/bioinfo/CODEHOP/Codehop%20Genes.html). The methodology has been further expanded in the exploration of conservation and diversity of gene families in higher plants (6) and the characterization of viral genomes (7,8). Useful features of the CODEHOP designer include reweighting sequences in order to bias the design of CODEHOPs towards targeted protein families and the ProWeb TreeView selection of input sequences based on their phylogenetic relationship. The CODEHOP web site provides information for Getting Started, a detailed Help page, and a description of the CODEHOP Algorithm. We are currently working on enhancements to the CODEHOP PCR strategy and the output of the designer program.
| REFERENCES |
|---|
|
|
|---|
- Rose,T.M., Schultz,E.R., Henikoff,J.G., Pietrokovski,S., McCallum,C.M. and Henikoff,S. (1998) Consensus-degenerate hybrid oligonucleotide primers for amplification of distantly related sequences. Nucleic Acids Res., 26, 16281635.
[Abstract/Free Full Text] - Henikoff,J.G., Pietrokovski,S., McCallum,C.M. and Henikoff,S. (2000) Blocks-based methods for detecting protein homology. Electrophoresis, 21, 17001706.[CrossRef][Web of Science][Medline]
- Henikoff,J.G. and Henikoff,S. (1996) Using substitution probabilities to improve position-specific scoring matrices. Comput. Appl. Biosci., 12, 135143.
[Abstract/Free Full Text] - Rychlik,W., Spencer,W.J. and Rhoads,R.E. (1990) Optimization of the annealing temperature for DNA amplification in vitro. Nucleic Acids Res., 18, 64096412.
[Abstract/Free Full Text] - Henikoff,S. and Henikoff,J.G. (1994) Position-based sequence weights. J. Mol. Biol., 243, 574578.[CrossRef][Web of Science][Medline]
- Morant,M., Hehn,A. and Werck-Reichhart,D. (2002) Conservation and diversity of gene families explored using the CODEHOP strategy in higher plants. BMC Plant Biol., 2, 7.[CrossRef][Medline]
- Schultz,E.R., Rankin,G.W.,Jr., Blanc,M.P., Raden,B.W., Tsai,C.C. and Rose,T.M. (2000) Characterization of two divergent lineages of macaque rhadinoviruses related to Kaposi's sarcoma-associated herpesvirus. J. Virol., 74, 49194928.
[Abstract/Free Full Text] - Rose,T.M., Ryan,J.T., Schultz,E.R., Raden,B.W., Tsai,C.-C. (2003) Analysis of 4.3 Kb of the divergent locus-B of macaque retroperitoneal fibromatosis-associated herpesvirus (RFHV) reveals close similarity to Kaposi's sarcoma-associated herpesvirus (KSHV) in gene sequence and genome organization. J. Virol., in press.
- Schneider,T.D. and Stephens,R.M. (1990) Sequence logos: a new way to display consensus sequences. Nucleic Acids Res., 18, 60976100.
[Abstract/Free Full Text]
This article has been cited by other articles:
![]() |
S. N. Gardner, A. L. Hiddessen, P. L. Williams, C. Hara, M. C. Wagner, and B. W. Colston Jr Multiplex primer prediction software for divergent targets Nucleic Acids Res., October 1, 2009; 37(19): 6291 - 6304. [Abstract] [Full Text] [PDF] |
||||
![]() |
B. Contreras-Moreira, B. Sachman-Ruiz, I. Figueroa-Palacios, and P. Vinuesa primers4clades: a web server that uses phylogenetic trees to design lineage-specific PCR primers for metagenomic and diversity studies Nucleic Acids Res., July 1, 2009; 37(suppl_2): W95 - W100. [Abstract] [Full Text] [PDF] |
||||
![]() |
R. Boyce, P. Chilana, and T. M. Rose iCODEHOP: a new interactive program for designing COnsensus-DEgenerate Hybrid Oligonucleotide Primers from multiply aligned protein sequences Nucleic Acids Res., July 1, 2009; 37(suppl_2): W222 - W228. [Abstract] [Full Text] [PDF] |
||||
![]() |
C. R. Haag, S. J. McTaggart, A. Didier, T. J. Little, and D. Charlesworth Nucleotide Polymorphism and Within-Gene Recombination in Daphnia magna and D. pulex, Two Cyclical Parthenogens Genetics, May 1, 2009; 182(1): 313 - 323. [Abstract] [Full Text] [PDF] |
||||
![]() |
H. Huang, P. Shi, Y. Wang, H. Luo, N. Shao, G. Wang, P. Yang, and B. Yao Diversity of Beta-Propeller Phytase Genes in the Intestinal Contents of Grass Carp Provides Insight into the Release of Major Phosphorus from Phytate in Nature Appl. Envir. Microbiol., March 15, 2009; 75(6): 1508 - 1516. [Abstract] [Full Text] [PDF] |
||||
![]() |
W. E. Grus and J. Zhang Origin of the Genetic Components of the Vomeronasal System in the Common Ancestor of all Extant Vertebrates Mol. Biol. Evol., February 1, 2009; 26(2): 407 - 419. [Abstract] [Full Text] [PDF] |
||||
![]() |
A.-C. M. Toes, M. H. Daleke, J. G. Kuenen, and G. Muyzer Expression of copA and cusA in Shewanella during copper stress Microbiology, September 1, 2008; 154(9): 2709 - 2718. [Abstract] [Full Text] [PDF] |
||||
![]() |
S. Tong, S.-W. W. Chern, Y. Li, M. A. Pallansch, and L. J. Anderson Sensitive and Broadly Reactive Reverse Transcription-PCR Assays To Detect Novel Paramyxoviruses J. Clin. Microbiol., August 1, 2008; 46(8): 2652 - 2658. [Abstract] [Full Text] [PDF] |
||||
![]() |
M. Bekaert and E. C. Teeling UniPrime: a workflow-based platform for improved universal primer design Nucleic Acids Res., June 1, 2008; 36(10): e56 - e56. [Abstract] [Full Text] [PDF] |
||||
![]() |
M. Pilhofer, K. Rappl, C. Eckl, A. P. Bauer, W. Ludwig, K.-H. Schleifer, and G. Petroni Characterization and Evolution of Cell Division and Cell Wall Synthesis Genes in the Bacterial Phyla Verrucomicrobia, Lentisphaerae, Chlamydiae, and Planctomycetes and Phylogenetic Comparison with rRNA Genes J. Bacteriol., May 1, 2008; 190(9): 3192 - 3202. [Abstract] [Full Text] [PDF] |
||||
![]() |
A. Labes, E. N. Karlsson, O. H. Fridjonsson, P. Turner, G. O. Hreggvidson, J. K. Kristjansson, O. Holst, and P. Schonheit Novel Members of Glycoside Hydrolase Family 13 Derived from Environmental DNA Appl. Envir. Microbiol., March 15, 2008; 74(6): 1914 - 1921. [Abstract] [Full Text] [PDF] |
||||
![]() |
K. A. Hyndman and D. H. Evans Endothelin and endothelin converting enzyme-1 in the fish gill: evolutionary and physiological perspectives J. Exp. Biol., December 15, 2007; 210(24): 4286 - 4297. [Abstract] [Full Text] [PDF] |
||||
![]() |
J. T. Wertz and J. A. Breznak Physiological Ecology of Stenoxybacter acetivorans, an Obligate Microaerophile in Termite Guts Appl. Envir. Microbiol., November 1, 2007; 73(21): 6829 - 6841. [Abstract] [Full Text] [PDF] |
||||
![]() |
O. J. Jabado, G. Palacios, V. Kapoor, J. Hui, N. Renwick, J. Zhai, T. Briese, and W. I. Lipkin Greene SCPrimer: a rapid comprehensive tool for designing degenerate primers from multiple sequence alignments Nucleic Acids Res., December 2, 2006; 34(22): 6605 - 6611. [Abstract] [Full Text] [PDF] |
||||
![]() |
J. Aronowicz and C. J. Lowe Hox gene expression in the hemichordate Saccoglossus kowalevskii and the evolution of deuterostome nervous systems Integr. Comp. Biol., December 1, 2006; 46(6): 890 - 901. [Abstract] [Full Text] [PDF] |
||||
![]() |
G. Zhang and M. J. Cohn Hagfish and lancelet fibrillar collagens reveal that type II collagen-based cartilage evolved in stem vertebrates PNAS, November 7, 2006; 103(45): 16829 - 16833. [Abstract] [Full Text] [PDF] |
||||
![]() |
W. A. Nix, M. S. Oberste, and M. A. Pallansch Sensitive, Seminested PCR Amplification of VP1 Sequences for Direct Identification of All Enterovirus Serotypes from Original Clinical Specimens. J. Clin. Microbiol., August 1, 2006; 44(8): 2698 - 2704. [Abstract] [Full Text] [PDF] |
||||
![]() |
X. C. Tang, J. X. Zhang, S. Y. Zhang, P. Wang, X. H. Fan, L. F. Li, G. Li, B. Q. Dong, W. Liu, C. L. Cheung, et al. Prevalence and genetic diversity of coronaviruses in bats from china. J. Virol., August 1, 2006; 80(15): 7481 - 7490. [Abstract] [Full Text] [PDF] |
||||
![]() |
R. A. Barthelson, P. Sundareshan, D. W. Galbraith, and R. L. Woosley Development of a comprehensive detection method for medicinal and toxic plant species Am. J. Botany, April 1, 2006; 93(4): 566 - 574. [Abstract] [Full Text] [PDF] |
||||
![]() |
G. Beyene, C. H. Foyer, and K. J. Kunert Two new cysteine proteinases with specific expression patterns in mature and senescent tobacco (Nicotiana tabacum L.) leaves J. Exp. Bot., March 1, 2006; 57(6): 1431 - 1443. [Abstract] [Full Text] [PDF] |
||||
![]() |
C. Bontemps, G. Golfier, C. Gris-Liebe, S. Carrere, L. Talini, and C. Boivin-Masson Microarray-Based Detection and Typing of the Rhizobium Nodulation Gene nodC: Potential of DNA Arrays To Diagnose Biological Functions of Interest Appl. Envir. Microbiol., December 1, 2005; 71(12): 8042 - 8048. [Abstract] [Full Text] [PDF] |
||||
![]() |
M. Hajibabaei, J. R deWaard, N. V Ivanova, S. Ratnasingham, R. T Dooh, S. L Kirk, P. M Mackie, and P. D.N Hebert Critical factors for assembling a high volume of DNA barcodes Phil Trans R Soc B, October 29, 2005; 360(1462): 1959 - 1967. [Abstract] [Full Text] [PDF] |
||||
![]() |
A. Stang, K. Korn, O. Wildner, and K. Uberla Characterization of Virus Isolates by Particle-Associated Nucleic Acid PCR J. Clin. Microbiol., February 1, 2005; 43(2): 716 - 720. [Abstract] [Full Text] [PDF] |
||||
| ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||

















