Skip Navigation

This Article
Right arrow Abstract Freely available
Right arrow Print PDF (274K) Freely available
Right arrow Alert me when this article is cited
Right arrow Alert me if a correction is posted
Services
Right arrow Email this article to a friend
Right arrow Similar articles in this journal
Right arrow Similar articles in ISI Web of Science
Right arrow Similar articles in PubMed
Right arrow Alert me to new issues of the journal
Right arrow Add to My Personal Archive
Right arrow Download to citation manager
Right arrow Search for citing articles in:
ISI Web of Science (42)
Right arrowRequest Permissions
Right arrow Commercial Re-use Guidelines
for Open Access NAR Content
Google Scholar
Right arrow Articles by Haas, S.
Right arrow Articles by Wiemann, S.
Right arrow Search for Related Content
PubMed
Right arrow PubMed Citation
Right arrow Articles by Haas, S.
Right arrow Articles by Wiemann, S.
Social Bookmarking
 Add to CiteULike   Add to Connotea   Add to Del.icio.us  
What's this?

Nucleic Acids Research Pages 3006-3012  


Primer design for large scale sequencing
Introduction
Material And Methods
   Physical parameters
   Computational methods
   DNA sequencing
   Polymerase chain reaction (PCR)
Results And Discussion
   PRIDE algorithm
   Evaluation of the performance of primers designed with PRIDE
   Primer design in regions of repetitive DNA
   Performance of PRIDE
   Integration into the Staden package
   Availability of PRIDE
Acknowledgements
References


Primer design for large scale sequencing

Primer design for large scale sequencing

Stefan Haas1,2,*, Martin Vingron2, Annemarie Poustka1, Stefan Wiemann1

1Department of Molecular Genome Analysis and 2Department of Theoretical Bioinformatics,Deutsches Krebsforschungszentrum, Im Neuenheimer Feld 280, D-69120 Heidelberg, Germany

Received February 6, 1998; Revised and Accepted April 28, 1998

ABSTRACT

We have developed PRIDE, a primer design program that automatically designs primers in single contigs or whole sequencing projects to extend the already known sequence and to double strand single-stranded regions. The program is fully integrated into the Staden package (GAP4) and accessible with a graphical user interface. PRIDE uses a fuzzy logic-based system to calculate primer qualities. The computational performance of PRIDE is enhanced by using suffix trees to store the huge amount of data being produced. A test set of 110 sequencing primers and 11 PCR primer pairs has been designed on genomic templates, cDNAs and sequences containing repetitive elements to analyze PRIDE's success rate. The high performance of PRIDE, combined with its minimal requirement of user interaction and its fast algorithm, make this program useful for the large scale design of primers, especially in large sequencing projects.

INTRODUCTION

In most common large scale genomic sequencing projects the sequencing part is split into a random shotgun phase and a finishing phase (1-3). In the shotgun phase, the sequence of a large number of single-stranded or double-stranded clones is determined from one or both ends using standard vector primers. In the second, the finishing phase, gaps are closed and single-stranded regions are double-stranded by primer walking (4,5). In cDNA sequencing, the directed phase starts right from the beginning due to the small size of the constructs.

For primer walking, specific primers are generated close to the ends of the known sequence to perform consecutive elongation steps. The repeated cycles of sequencing reactions and primer design are continued until all contigs are joined and the complete sequence has been determined unambiguously on both strands. Since a primer constitutes the starting point for the elongation, the design of walking primers plays a crucial role for efficient and successful sequencing.

In order to achieve high success rates, primers need to be selected carefully. Several parameters have been described in the literature to influence primer quality. The melting temperature of the primer and the stability of potential secondary binding sites are essential criteria to measure specificity of primer-template interactions. The existence of thermodynamically stable secondary binding sites leads to ambiguous sequences. The recently described (6) sequencing directly from large vectors (BAC, PAC) with inserts of 100-300 kilobases (7,8) increases the probability of secondary primer binding sites drastically. This is partly due to the increased probability of several copies of the same class of repetitive element being located on such templates. Other parameters affect primer quality by reducing the amount of single-stranded primers that is available in the annealing step (self-complementarity, loop formation) or by modifying the efficiency of primer elongation (stability of the 3[prime]-terminus). In addition, there are a number of further criteria suggested to influence primer quality (e.g. G/C content, 3[prime]-terminal base).

Several primer design programs exist [e.g. OLIGO (9), OSP (10), Primo (11), Primer Master (12)], which use a set of common criteria (e.g. G/C content, melting temperature) to evaluate the quality of primer candidates in a specific target region selected by the user. These criteria are supplemented by other parameters (secondary binding sites in the vector, potential for loop formation, existence of repeats) to improve the quality of primers. The computation of all these parameters, especially the search for possible secondary binding sites for every possible primer candidate, is a time consuming step. Therefore, primer design programs generally restrict the search space to regions where primer candidates are likely to be of high quality. This is done by masking all regions that contain extreme parameter constellations (e.g. a G/C content <20 or >80%). Primer candidates within such regions are excluded from the primer search, although these regions may very well contain primer sites of good quality. If no primer can be found using the default parameters, the user must repeat the search with less restrictive parameters until a primer is finally suggested. Especially for the sequencing of complicated (e.g. repeat-rich) sequences, this strategy may require extensive user interaction and is very time consuming. The strategy of masking accelerates primer searches on sequences of moderate length (several kilobases), but the performance decreases rapidly with increasing size of template and/or vector.

Here we introduce a new primer design program called PRIDE. It uses a different approach to streamline automated primer design in the context of large scale sequencing with the aim to minimize user interaction and to find all possible primer sites, even in `difficult' target regions. We have selected a reduced set of independent criteria to efficiently describe primer quality. These criteria are calculated for every possible primer candidate over the entire target sequence without masking. We assign a quality score to every primer candidate using a flexible fuzzy logic rule system. In this way, PRIDE selects the best primer within any target region in a single run. The user interaction is reduced to a single user-defined parameter, the desired optimal melting temperature. PRIDE has been fully integrated into the Staden package (13) using its graphical user interface to control all features.

MATERIAL AND METHODS

Physical parameters

Thermodynamic stability (melting temperature). Thermodynamic stability of the primer-template duplex has to be fixed within a small range according to the physical conditions present in sequencing reactions. We use a refined nearest neighbor method of Sugimoto et al. (14) to calculate the thermodynamic stability of hybrids. This method is an updated version of the method of Breslauer et al. (15) used in other primer design programs (9,11,12).

In addition to the thermodynamic stability, melting temperature of a primer also reflects the strength of primer-template interaction. The discrete melting temperature of the primers can be inferred by combining thermodynamic stability, DNA concentration and salt concentration (16). We provide an estimate of the melting temperature using the rules of Suggs et al. (17), which assign 2 and 4°C for A/T and G/C pairs respectively. To provide the user with a more intuitive measure we also compute the melting temperature.

Secondary binding sites. The non-existence of secondary binding sites is necessary for efficient sequencing. We decided to include all relevant sequences for the calculation of possible secondary binding sites. This includes the sequencing vector and all known sequences of the template, including the sequences in related contigs. Additionally, we have implemented a user-defined database of repeats (Alu, LINE, MER, etc.) that can be checked against.

To our knowledge, there is no exact method to calculate stability of short dsDNA containing mismatches. Therefore, we approximate the stability of primer annealing at every possible secondary binding site using the algorithm of Sugimoto et al. (14). The ratio between the stability of the duplex at the original binding site and that of the most stable secondary binding site yields a relative measure of the uniqueness of a primer binding site. Additionally, we weight the last eight bases at the 3[prime]-terminus to reflect the extendibility of a primer by DNA polymerase.

Self-complementarity. Self-complementarity of a primer affects sequencing efficiency by reducing the concentration of single-stranded primer that is available in the sequencing reaction. The negative effect of self-complementarities would be increased further if the primer dimer could be elongated due to matching 3[prime]-terminal bases and 5[prime]-overhangs. We calculate the relative stability of the primer dimer with the highest number of neighboring complementary bases to reflect the probability of dimer formation. The quality of primers forming primer dimers is decreased.

Ambiguous bases. During sequencing, it is not always possible to resolve each base correctly. This will subsequently lead to the calling of bases of unknown type. The properties of primers containing such ambiguities cannot be calculated exactly. Therefore, we estimated the quality of these primers by assuming a worst case base composition and assign to primers with ambiguities a low quality.

Words of identical bases. Due to the increased error rate of base calling at the end of reads, PRIDE favors stretches of sequence that do not contain words of three or more identical consecutive bases.

Position of primers. For efficient sequencing using the walking primer strategy an optimal primer should be located close to the 3[prime]-terminus of the already known sequence or at borders of single-stranded regions. The overlap between old and new sequence data has to be large enough to allow correct assembly. Therefore, the software aims at the selection of primers in a distance of 50 nt from the 3[prime]-terminus of the known sequence or in double-stranded regions ~50 nt from the single-stranded region. This distance ensures that an overlap with the known sequence is established while only marginally adding to the redundancy.

Computational methods

Data storage. PRIDE does not reject any primer candidate before having calculated all relevant parameters (thermodynamic stability, secondary binding sites, self-complementarities, etc.) for each possible primer. For efficient handling of the huge amount of data generated during a primer search we store all relevant information in a suffix tree (18). For the first use of this data structure in biology see Martinez (19). Suffix trees provide fast access to sequence parts which are unique in the whole sequence (including the complementary strand). This structure can be built up in a time proportional to the number of positions in the sequence using the on-line-construction algorithm of Ukkonen (20).

Fuzzy logic. For the purpose of assigning relative weights to the parameters a fuzzy logic (21,22) system was used. Such a tool takes a list of relevant parameters as input. Their ranges are partitioned into categories according to specifications by the user. For example, thermodynamic stability was divided into `very low', `low', `optimal', `high' and `very high'.

These categories may be overlapping and the fuzzy logic tool will use specific functions to describe partial assignment of a value to a category. The user also supplies verbal rules (using the category descriptors) linking the input parameters to output parameters. Table 1 lists a subset of the rules used in our application. The fuzzy logic system designs a mathematical model that maps the input parameters to output parameters in accordance with the user's qualitative description. It is thus not necessary for a user to design a mathematical description by himself. Instead, this description is generated by the fuzzy logic system based on only a verbal description of the relevant rules.


Table 1. Example of fuzzy logic rules used for the calculation of primer qualitiesTransformation of verbal description into numerical values is done by the fuzzy logic system.

We used the fuzzy logic system fuzzyTECH (INFORM GmbH, Aachen, Germany). This system can be handled easily using its graphical user interface. It also provides tools to weight each rule separately.

Quality calculation. Our fuzzy logic rule system contains 37 rules which have been arranged into three different parts (Fig. 1). While the first rule block includes all parameters associated with the primer sequence itself (Table 1), the second rule block is used to combine the resulting quality with the position and the stability of the worst secondary binding site of the primer. In the last rule block we assemble the rules from the first two blocks. In the end, an overall primer quality ranging from 0 to 100% is obtained.


Figure 1. Flow chart of the rule system that is used to calculate primer quality. Input parameters (ellipses) relevant for the design of primers are combined by fuzzy logic (21,22) rules (open rectangles) in several steps, to obtain the final primer quality. Intermediate results (shaded boxes) from the individual rule blocks are exported and taken as input for the next rule block.

Data management. The management of sequencing data, assembly and editing of sequences are done using the Staden package (13). The communication between PRIDE and the Staden package is performed using the recently described scripting language (23), which is supported by the Staden package.

DNA sequencing

Bacteria carrying plasmid vectors were grown in 96-well deep well blocks for 22 h in 2× YT medium supplemented with 150 µg/ml ampicillin. Plasmid DNA was isolated using a Biorobot 9600 with Qiaprep 96 Turbo modules (Qiagen). Bacteria harboring human P1 DNA were grown for 20 h in LB medium supplemented with 40 µg/ml kanamycin. P1 DNA was isolated using alkaline lysis followed by phenol extraction and ethanol precipitation. Sequencing primers were dissolved in water and adjusted to a concentration of 10 pmol/µl. Cycle sequencing reactions were carried out using Big Dye terminator chemistry (ABI) with 1 µg double-stranded plasmid DNA template or 2 µg P1 DNA and 10 pmol primer/reaction. Cycling conditions were: 25 cycles of 15 s at 95°C, followed by 15 s at 55°C and 4 min at 60°C for plasmids and 40 cycles of 30 s at 95°C, followed by 15 s at 50°C and 4 min at 60°C. Reaction products were purified according to the supplier's recommendations and analyzed on automated DNA sequencers (Applied Biosystems 377).

In the context of evaluation, we generated 110 primers for the finishing of two sequencing projects (P1 with a human genomic insert and human cDNAs) using PRIDE. The quality of each primer was evaluated by measuring the length of the related reads. All reads were manually edited to improve quality before assembly. The length of these reads was determined by ambiguities that could not be resolved manually. If there was more than one ambiguity in a stretch of 80 bases, the sequence starting with the latter ambiguities up to the 3[prime]-terminus was clipped.

Low amounts of DNA in the reaction may reduce the effective read length dramatically, due to a low signal-to-noise ratio, independent of primer sequences. Therefore, all reactions with an average peak strength <60 were repeated using the same primer under the same sequencing conditions but with a higher template concentration. In these cases, maximal read length served us as a measure of quality of each primer.

Polymerase chain reaction (PCR)

RNA was prepared from cultured skin fibroblasts with RNAzol (AGS, Heidelberg, Germany) and cDNA synthesized with M-MuLV reverse transcriptase and random hexamers (Perkin Elmer). A multiplex PCR was performed in order to obtain products covering the whole cDNA sequence of procollagen [alpha]1(V) (COL5A1, referring to EMBL entry D90279) in fragments of ~400-900 bp. The multiplex amplification was set up in two separate tubes in a total volume of 5 µl with the Clontech Advantage GCTm cDNA PCR kit. In one tube fragments between nucleotide positions 5 and 746, 1148 and 1985, 2491 and 2796, 3049 and 3780, 4216 and 4975, and 5085 and 5617 were amplified. Primers in the second tube were designed to amplify fragments between positions 656 and 1512, 1978 and 2651, 2745 and 3344, 3580 and 4239, and 4559 and 5105.

RESULTS AND DISCUSSION

PRIDE algorithm

The algorithm for searching for optimal primers is divided into two distinct parts. In a first step, PRIDE scans along the complete primary target sequence (e.g. a contig) to find single-stranded regions. For each of these regions PRIDE has to create a primer in the direction suitable to close the single-stranded gap. PRIDE defines subregions for the design of single primers that start at the border of each single-stranded sequence and include 500 bases of the neighboring double-stranded sequence. Additionally, PRIDE also defines subregions at the ends of a contig used for the creation of primers for contig elongation.

In a second step, the consensus sequence of every subregion is used for computation of all relevant parameters (thermodynamic stability, self-complementarity, etc.) of all possible primer candidates. PRIDE builds up a suffix tree containing all sequences necessary in secondary binding site calculations. By default, this includes the entire primary target sequence and a vector, but the user can also extend the tree by adding sequences of repetitive elements or any other sequences that are contained in a user-defined database. After computation of all parameters each primer candidate is weighted using a fuzzy logic rule system which assigns a quality to each candidate. Finally, PRIDE writes a list of the best primers designed for each target region.

Evaluation of the performance of primers designed with PRIDE

We used a test set of 110 primers to analyze the quality of primers designed with the PRIDE program. The average read length of sequences was 589 bases (Fig. 2), applying the parameters described in Materials and Methods. No primers failed. Only a few reads (n = 10) were <400 bases. However, we were unable to determine if the length of these short reads is directly caused by the primers, since other sequencing parameters (template concentration, quality of gels, electrophoresis parameters, etc.) also influence the sequencing results.


Figure 2. Distribution of the read lengths of sequences generated with primers (n = 110) designed with PRIDE. Sequences were analyzed with the PE-ABI semi-adaptive basecaller (Sequencing Analysis v.2.1). Reads were then automatically clipped (3[prime] quality clipping) if more than one N was detected in a window of 80 nt. The 3[prime]-terminal base of the primers is reflected by shading.

The signal strength of most reads was strong, reflecting a high quality of the designed primers. None of the tested reads contained a stretch of signals that would indicate secondary binding of primers or the extension of possible primer dimers. We can exclude, therefore, that any of these primers bound to potential secondary binding sites, either on the template or the vector.

The distance between primer and target region (the end of the contig or the border between single-stranded and double-stranded regions) is also important for efficient sequencing. All tested primers were positioned 40-87 bases (mean 61 bases) from the target (Fig. 3). This short overlap was in all cases large enough to assemble new sequences correctly. By limiting the overlap between the known, double-stranded sequence and the sequence newly generated with the primers designed with PRIDE, the redundancy is reduced and the amount of useful sequence is maximized.


Figure 3. Distance of 5[prime]-terminus of the primers to their target. The target is defined as the end of the known sequence (to contiguate) or the breakpoint between single- and double-stranded sequence (to double strand single-stranded regions). Primers were designed to be located close to the target while guaranteeing sufficient overlap between the new sequence generated with that primer and the known sequence.

Thermodynamic stability. The interaction of primers with the complementary template sequence depends on the thermodynamic stability of the duplex and the physical constraints of sequencing reactions. On the one hand, the thermodynamic stability of a primer has to be high enough to allow annealing to the template. On the other hand, the stability should be as low as possible to increase primer specificity and to decrease the probability of secondary binding sites. All primers designed with PRIDE have a thermodynamic stability of -14 to -17 kcal/mol at 60°C. The melting temperatures of these primers estimated using the rule of Suggs et al. (17) (2°C for A/T and 4°C for G/C pairs) would be 50-64°C (data not shown). The calculation of thermodynamic stability used in PRIDE more accurately reflects reality, because all designed primers worked under the reaction conditions applied, namely a temperature of 55°C for the annealing of primers.

Parameters not used for primer calculation. Several additional parameters used by other primer design programs are not used by PRIDE. In the following we analyze the dependencies between these parameters and the criteria evaluated by our algorithm.

Short repeats. Primers containing at least two units of a short repeat sequence show a higher probability of having thermodynamically stable secondary binding sites within this repeat region. Therefore, checking of regions of short repeats is always included implicitly, since we calculate all secondary binding sites for each primer without masking any repeat.

Loop formation. A subset of primers with self-complementarities is also able to form internal loops. The stability of such loops is roughly the same as that of the related primer dimer (9,24). Since formation of loops and dimer formation are not independent, we only consider the more general self-complementarity for further calculations.

G/C content. Most primer design programs select primers with a G/C content of ~50%. Primers outside a user-defined range are rejected explicitly. However, we do not evaluate G/C content because the base composition influences primer quality indirectly by affecting the probability of self-complementarities and secondary binding sites.

Primers with an extreme G/C content (>80 or <20%) can roughly fall into two different classes: primers consisting mostly of one single type of base and primers consisting mostly of two complementary bases. In the first class, the likelihood of a stable secondary binding site is increased compared with primers with moderate G/C content because it is always possible to find a secondary binding site simply by shifting the primer. In the second class, the high content of complementary bases increases the likelihood of formation of self-complementarities leading to primer dimers. For these reasons primers with a G/C content close to 50% will preferentially be of `good quality'.

The primers that were used to evaluate the performance of PRIDE have a G/C content between 20 and 71% (mean 41.5%; Fig. 4). This might be a reflection of the G/C content of the DNA on which the primers were designed (human), but the G/C content of the primers designed to amplify the collagen gene (G/C content 65%) did not deviate significantly from the average. Only a few primers had a G/C content of <35 or >65%, but even in these cases we could not find a correlation between G/C content and the quality of the sequences generated.


Figure 4. Histogram showing the G/C content of the tested primers.

Primer length. The length of sequencing primers is dependent on its base composition and the melting temperature set by the user. By default, the optimal melting temperature is set to 60°C in PRIDE. The length of primers that made up the test set for PRIDE varied over a range of 15-25 bases (mean 21 bases; Fig. 5). The difference in length influenced neither the success rate nor the quality in sequencing.


Figure 5. Histogram displaying the length of the tested primers.

Specific base at the 3[prime]-terminus. The elongation of a primer depends strongly on the stability at its 3[prime]-terminus. The higher this stability is, the more likely a successful elongation of the primer will be. Accordingly, some authors (11) suggest the use of primers with a G or C as 3[prime]-terminal base (GC clamp). Kwok et al. (25) postulate that the last base should not be a T because of the ability of thymine to form non-Watson-Crick base pairs, which would lead to an increased probability of secondary binding sites.

PRIDE selects primers regardless of their 3[prime]-terminal base. Figure 2 shows the distribution of the length of reads related to the primers tested, subdivided according to the 3[prime]-terminal base of the primers. We tested the hypothesis that the distribution of the reading length is independent of the 3[prime]-terminal base. The P value of 0.25, using a Kruskal-Wallis test (26), indicates that there is no qualitative difference between primers with different 3[prime]-terminal bases.

Primer design in regions of repetitive DNA

New developments in sequencing chemistry (Big Dye terminators) make the direct sequencing of very large templates (e.g. BAC, PAC) feasible. The probability of such large templates containing complex repeats is rather high, especially when human or other mammalian genomic DNAs are to be sequenced. Because of the high probability of secondary binding, the design of primers in repetitive regions is difficult or, in many cases, even impossible. Upon sequencing large genomic templates, where repetitive elements such as Alu repeats are very likely to be present in several copies, PRIDE harbors a database which consists of a compilation of several known repeat sequences (Alu, LINE, MER, etc.) that can be searched for secondary binding sites. Different copies of a type of repetitive element may differ in a few bases. Therefore, it may also be possible to find primers within these repetitive elements which do not have stable secondary binding sites in other members of the same type of element. Additionally, repetitive elements are sometimes spaced by short, non-repetitive sequences. These intersecting regions are also candidates for successful primer design.

We used COL5A1 to test the quality of our primer design program when handling sequences mostly consisting of repeats. COL5A1 contains a large triple helix region which consists of repetitive, but sometimes slightly different, sequence elements, each 5-10 bases in length. These repeats are spaced by short regions of non-repetitive DNA. This highly repetitive triple helix region makes it difficult to find unique primer binding sites.

The properties of COL5A1 are reflected in the quality of primers computed using PRIDE. Figure 6 shows the distribution of high quality (>50%) primers along the whole COL5A1 sequence. In non-repetitive regions with moderate G/C content (positions 1-1500 and 4800-5676) many good primers were predicted. In contrast, the number of high quality primers in the triple helix region (position 1500-4800) was much lower. The primer sites were restricted to regions that intersect the repeat units (Fig. 6). Because of sequence variations in repeat elements, few primers were calculated within a tandem repeat (27; Fig. 6b, positions 4493-4498). Most important for successful design of primers is the uniqueness of the 3[prime]-terminus of primers, while the 5[prime]-terminus could be part of a repeat unit. We designed 11 primer pairs for PCR to amplify the whole COL5A1 sequence. All target fragments were amplified successfully.

Figure 6. Quality of each possible primer in procollagen [alpha]1(V) (COL5A1). For each position in the sequence of COL5A1 the quality of the best primers starting at this position is plotted as a vertical bar. Quality values <50% have been set to 0. The shaded rectangle marks the subset zoomed in (b). (a) Many high quality primers are predicted in non-repetitive regions (e.g. positions 1-1500), as indicated by the large number of vertical bars. The number of high quality primers is much lower in the repetitive triple helix region (positions 1500-4800), reflecting the high probability of secondary binding sites. (b) Subset of (a) showing a more detailed view of the triple helix region. Repetitive sequences are indicated by shaded boxes.

Performance of PRIDE

In contrast to other primer design software, PRIDE evaluates each possible primer candidate. The amount of computational effort here is far above what any other primer design program does. Nevertheless, PRIDE is fast in calculating optimal primers. It checks for all potential secondary binding sites in template, repetitive element or vector sequences. Calculation time depends linearly on the length of sequence used for secondary binding site checking (Fig. 7). On average, a primer calculation considering all primer candidates on a target sequence of length 500 bases and a vector sequence of 5000 bases in length takes <6 s on a Sun SPARCstation 5. This is only slightly slower than the calculation time of other primer design programs, but this certainly does not limit the throughput of sequencing. A similar but much more complex primer calculation, including checking of secondary binding sites within 100 kilobases of random (`artificial vector') sequence, takes <30 s. PRIDE's high performance makes it feasible to efficiently search primers on large templates and/or to analyze secondary binding sites in abundant sequences like repetitive elements.


Figure 7. Performance of PRIDE. A Sun SPARCstation 5 was used for analysis of the time (vertical axis) required to compute a single primer on a random sequence (`artificial vector') with varying length (with increments of 1 kb, horizontal axis). The calculation included searches for all possible secondary binding sites in the sequence.


Integration into the Staden package

PRIDE is fully integrated into the widely used Staden package (GAP4 v.4.2), which is used for editing and assembling of sequencing data. The Staden package is a UNIX-based software package that has a user-friendly graphical user interface. PRIDE is directly launched from a pull-down menu within GAP4.

Single contigs or all contigs of a sequencing project can be selected at a time for automated primer design. PRIDE extracts the relevant sequence parts for primer design to elongate and/or to double strand contigs. All newly designed primers are automatically visualized by creating tags in the project database.

Each primer tag is placed on a template which is predicted to be the best template for sequencing in order to obtain the longest read without reading into vector. The tags contain a unique primer identifier and the creation date. The nucleotide sequence of each primer and its name are also written into a separate file, which can be further used for oligonucleotide ordering. PRIDE recognizes existing primer tags from a preceding primer design automatically, to avoid duplication of primers for the same target region.

Additionally, we have developed software that facilitates the handling of tags in large sequencing projects. The searching for, the visualization of and the removal of single or groups of specific tags can be performed using different selection criteria (tag type, primer name, creation date). By entering the primers as tags, the correct assembly of new related reads can be controlled by opening a contig join editor with the primer tag in one window and the related read in the second window, both sequences in the same orientation.

Availability of PRIDE

To academic institutions PRIDE is available for a fee of 100 German marks (~US$55) that is intended to cover our costs of distribution and maintenance. Commercial users are asked to contact the first author. A freely accessible World Wide Web version of PRIDE can be used to design primers on a single sequence. The address for this service is http://www.dkfz-heidelberg.de/tbi/services/Pride/prideform . This web page also contains detailed information on the availability and hardware requirements of PRIDE.

ACKNOWLEDGEMENTS

The authors would like to thank Caspar Grond for his detailed testing of PRIDE on COL5A1. We would also like to thank Stefan Kurtz for supplying routines for efficient suffix tree construction and Axel Benner for his statistical support.

REFERENCES

1. Hunkapiller,T., Kaiser,R.J., Koop,B.F. and Hood,L.E. (1991) Science, 254, 59-67. MEDLINE Abstract

2. Messing,J., Crea,R. and Seeburg,H.P. (1981) Nucleic Acids Res., 9, 309-321. MEDLINE Abstract

3. Messing,J., Carlson,J., Hagen,G., Rubenstein,I. and Oleson,A. (1984) DNA, 3, 31-40. MEDLINE Abstract

4. Strauss,E.C., Kobori,J.A., Siu,G. and Hood,L.E. (1986) Anal. Biochem., 154, 353-360. MEDLINE Abstract

5. Kaiser,R.J., MacKellar,S.L., Vinayak,R.S., Sanders,J.Z., Saavedra,R.A. and Hood,L.E. (1989) Nucleic Acids Res., 17, 6087-6102. MEDLINE Abstract

6. Rosenblum,B.B., Lee,L.G., Spurgeon,S.L., Khan,S.H., Menchen,M.S., Heiner,C.R. and Chen,S.M. (1997) Nucleic Acids Res., 25, 4500-4504. MEDLINE Abstract

7. Shizuya,H., Birren,B., Kim,U.J., Mancino,V., Slepak,T., Tachiiri,Y. and Simon,M. (1992) Proc. Natl. Acad. Sci. USA, 89, 8794-8797. MEDLINE Abstract

8. Ioannou,P.A., Amemiya,C.T., Garnes,J., Kroisel,P.M., Shizuya,H., Chen,C., Batzer,M.A. and de Jong,P.J. (1994) Nature Genet., 6, 84-89. MEDLINE Abstract

9. Rychlik,W. and Rhoads,R.E. (1989) Nucleic Acids Res., 17, 8543-8551. MEDLINE Abstract

10. Hillier,L. and Green,P. (1991) PCR Methods Applicat., 1, 124-128.

11. Li,P., Kupfer,K.C., Davies,C.J., Burbee,D., Evans,G.A. and Garner,H.R. (1997) Genomics, 40, 476-485. MEDLINE Abstract

12. Proutski,V. and Holmes,E.C. (1996) Comput. Applicat. Biol. Sci., 12, 253-255.

13. Bonfield,J., Smith,K.F. and Staden,R. (1995) Nucleic Acids Res., 23, 4992-4999. MEDLINE Abstract

14. Sugimoto,N., Nakano,S., Yoneyama,M. and Honda,K. (1996)Nucleic Acids Res., 24, 4501-4505. MEDLINE Abstract

15. Breslauer,K.J., Frank,R., Blocker,H. and Marky,L.A. (1986)Proc. Natl. Acad. Sci. USA, 83, 3746-3750. MEDLINE Abstract

16. Freier,S.M., Kierzek,R., Jaeger,J.A., Sugimoto,N., Caruthers,M.H., Neilson,T. and Turner,D.H. (1986) Proc. Natl. Acad. Sci. USA, 83, 9373-9377. MEDLINE Abstract

17. Suggs,S.V., Hirose,T., Miyake,T., Kawashima,E.H., Johnson,M.J., Itakura,K. and Wallace,R.B. (1981) In Brown,D.D. (ed.), ICN-UCLA Symposia on Developmental Biology Using Purified Genes.Academic Press Inc., New York, NY, Vol. 23, pp. 683-693.

18. Weiner,P. (1973) In Proceedings of the IEEE 14th Annual Symposium on Switching and Automata Theory, Linear Pattern Matching Algorithms. IEEE, New York, pp. 1-11.

19. Martinez,H.M. (1983) Nucleic Acids Res., 11, 4629-4634. MEDLINE Abstract

20. Ukkonen,E. (1993) Report A-1993-1, Department of Computer Science, University of Helsinki, Finland.

21. Zadeh,L.A. (1965) Informat. Control, 8, 338-353.

22. Altrock,v.C. (ed.) (1995) Fuzzy Logic, Vols 1-3. R.Oldenbourg, Munich, Germany.

23. Bonfield,J. (1997) Programming with GAP4 v.0.99.2, User Manual. MRC Laboratory of Molecular Biology, Cambridge, UK. (http://www.mrc-lmb.cam.ac.uk/pubseq/scripting_manual/scripting_toc.html)

24. Antao,V.P. and Tinoco,I.,Jr (1992) Nucleic Acids Res., 20, 819-824. MEDLINE Abstract

25. Kwok,S., Kellogg,D.E., McKinney,N., Spasic,D., Goda,L., Levenson,C. and Sninsky,J.J. (1990) Nucleic Acids Res., 18, 999-1005. MEDLINE Abstract

26. Lehmann,E.L. (1975) Nonparametrics: Statistical Methods Based on Ranks. Holden-Day Inc., New York, NY.

27. Rivals,E., Delgrange,O., Delahaye,J.-P., Dauchet,M., Delorme,M.-O., Henaut,A. and Ollivier,E. (1997) Comput. Applicat. Biol. Sci., 13, 131-136.


*To whom correspondence should be addressed. Tel: +49 6221 422 778; Fax: +49 6221 422 849; Email: s.haas@dkfz_heidelberg.de


This page is run by Oxford University Press, Great Clarendon Street, Oxford OX2 6DP, as part of the OUP Journals
Comments and feedback: www-admin{at}oup.co.uk
Last modification: 4 Jun 1998
Copyright©Oxford University Press, 1998.

Add to CiteULike CiteULike   Add to Connotea Connotea   Add to Del.icio.us Del.icio.us    What's this?


This article has been cited by other articles:


Home page
Genome ResHome page
A. L. Hufton, S. Mathia, H. Braun, U. Georgi, H. Lehrach, M. Vingron, A. J. Poustka, and G. Panopoulou
Deeply conserved chordate noncoding sequences preserve genome synteny but do not drive gene duplicate retention
Genome Res., November 1, 2009; 19(11): 2036 - 2051.
[Abstract] [Full Text] [PDF]


Home page
Nucleic Acids ResHome page
R. Owczarzy, A. V. Tataurov, Y. Wu, J. A. Manthey, K. A. McQuisten, H. G. Almabrazi, K. F. Pedersen, Y. Lin, J. Garretson, N. O. McEntaggart, et al.
IDT SciTools: a suite for analysis and design of nucleic acid oligomers
Nucleic Acids Res., July 1, 2008; 36(suppl_2): W163 - W169.
[Abstract] [Full Text] [PDF]


Home page
Nucleic Acids ResHome page
R. Andreson, T. Mols, and M. Remm
Predicting failure rate of PCR in large genomes
Nucleic Acids Res., June 1, 2008; 36(11): e66 - e66.
[Abstract] [Full Text] [PDF]


Home page
Nucleic Acids ResHome page
Y.-F. Chen, R.-C. Chen, L.-Y. Tseng, E. Lin, Y.-K. Chan, and R.-H. Pan
NTMG (N-terminal Truncated Mutants Generator for cDNA): an automatic multiplex PCR assays design for generating various N-terminal truncated cDNA mutants
Nucleic Acids Res., July 13, 2007; 35(suppl_2): W66 - W70.
[Abstract] [Full Text] [PDF]


Home page
BioinformaticsHome page
T. Koressaar and M. Remm
Enhancements and modifications of primer design program Primer3
Bioinformatics, May 15, 2007; 23(10): 1289 - 1291.
[Abstract] [Full Text] [PDF]


Home page
J. Clin. Microbiol.Home page
M. Mehlmann, E. D. Dawson, M. B. Townsend, J. A. Smagala, C. L. Moore, C. B. Smith, N. J. Cox, R. D. Kuchta, and K. L. Rowlen
Robust Sequence Selection Method Used To Develop the FluChip Diagnostic Microarray for Influenza Virus.
J. Clin. Microbiol., August 1, 2006; 44(8): 2857 - 2862.
[Abstract] [Full Text] [PDF]


Home page
BioinformaticsHome page
F. Miura, C. Uematsu, Y. Sakaki, and T. Ito
A novel strategy to design highly specific PCR primers based on the stability and uniqueness of 3'-end subsequences
Bioinformatics, December 15, 2005; 21(24): 4363 - 4370.
[Abstract] [Full Text] [PDF]


Home page
BioinformaticsHome page
S. Chavali, A. Mahajan, R. Tabassum, S. Maiti, and D. Bharadwaj
Oligonucleotide properties determination and primer designing: a critical examination of predictions
Bioinformatics, October 15, 2005; 21(20): 3918 - 3925.
[Abstract] [Full Text] [PDF]


Home page
Nucleic Acids ResHome page
H. Hyyro, M. Juhola, and M. Vihinen
Genome-wide selection of unique and valid oligonucleotides
Nucleic Acids Res., July 26, 2005; 33(13): e115 - e115.
[Abstract] [Full Text] [PDF]


Home page
MicrobiologyHome page
J. M. Rodriguez-Pena, R. M. Perez-Diaz, S. Alvarez, C. Bermejo, R. Garcia, C. Santiago, C. Nombela, and J. Arroyo
The 'yeast cell wall chip' - a tool to analyse the regulation of cell wall biogenesis in Saccharomyces cerevisiae
Microbiology, July 1, 2005; 151(7): 2241 - 2249.
[Abstract] [Full Text] [PDF]


Home page
Nucleic Acids ResHome page
J. Fredslund, L. Schauser, L. H. Madsen, N. Sandal, and J. Stougaard
PriFi: using a multiple alignment of related sequences to find primers for amplification of homologs
Nucleic Acids Res., July 1, 2005; 33(suppl_2): W516 - W520.
[Abstract] [Full Text] [PDF]


Home page
J. Bacteriol.Home page
S. Kuhner, L. Wohlbrand, I. Fritz, W. Wruck, C. Hultschig, P. Hufnagel, M. Kube, R. Reinhardt, and R. Rabus
Substrate-Dependent Regulation of Anaerobic Degradation Pathways for Toluene and Ethylbenzene in a Denitrifying Bacterium, Strain EbN1
J. Bacteriol., February 15, 2005; 187(4): 1493 - 1503.
[Abstract] [Full Text] [PDF]


Home page
BioinformaticsHome page
S. Weckx, P. De Rijk, C. Van Broeckhoven, and J. Del-Favero
SNPbox: a modular software package for large-scale primer design
Bioinformatics, February 1, 2005; 21(3): 385 - 387.
[Abstract] [Full Text] [PDF]


Home page
Genome ResHome page
S. Wiemann, D. Arlt, W. Huber, R. Wellenreuther, S. Schleeger, A. Mehrle, S. Bechtel, M. Sauermann, U. Korf, R. Pepperkok, et al.
From ORFeome to Biology: A Functional Genomics Pipeline
Genome Res., October 1, 2004; 14(10b): 2136 - 2144.
[Abstract] [Full Text] [PDF]


Home page
Nucleic Acids ResHome page
S. Weckx, P. De Rijk, C. Van Broeckhoven, and J. Del-Favero
SNPbox: web-based high-throughput primer design from gene to genome
Nucleic Acids Res., July 1, 2004; 32(suppl_2): W170 - W172.
[Abstract] [Full Text] [PDF]


Home page
Nucleic Acids ResHome page
N. Ben Zakour, M. Gautier, R. Andonov, D. Lavenier, M.-F. Cochet, P. Veber, A. Sorokin, and Y. Le Loir
GenoFrag: software to design primers optimized for whole genome scanning by long-range PCR amplification
Nucleic Acids Res., January 2, 2004; 32(1): 17 - 24.
[Abstract] [Full Text] [PDF]


Home page
Nucleic Acids ResHome page
S. A. Haas, M. Hild, A. P. H. Wright, T. Hain, D. Talibi, and M. Vingron
Genome-scale design of PCR primers and long oligomers for DNA microarrays
Nucleic Acids Res., October 1, 2003; 31(19): 5576 - 5581.
[Abstract] [Full Text] [PDF]


Home page
Nucleic Acids ResHome page
S.H. Chen, C.Y. Lin, C.S. Cho, C.Z. Lo, and C.A. Hsiung
Primer Design Assistant (PDA): a web-based primer design tool
Nucleic Acids Res., July 1, 2003; 31(13): 3751 - 3754.
[Abstract] [Full Text] [PDF]


Home page
Nucleic Acids ResHome page
C. Varotto, E. Richly, F. Salamini, and D. Leister
GST-PRIME: a genome-wide primer design software for the generation of gene sequence tags
Nucleic Acids Res., November 1, 2001; 29(21): 4373 - 4377.
[Abstract] [Full Text] [PDF]


Home page
Proc. Natl. Acad. Sci. USAHome page
T. M. Finan, S. Weidner, K. Wong, J. Buhrmester, P. Chain, F. J. Vorholter, I. Hernandez-Lucas, A. Becker, A. Cowie, J. Gouzy, et al.
The complete sequence of the 1,683-kb pSymB megaplasmid from the N2-fixing endosymbiont Sinorhizobium meliloti
PNAS, July 24, 2001; (2001) 161294698.
[Abstract] [Full Text] [PDF]


Home page
MicrobiologyHome page
L. Frangeul, K. E. Nelson, C. Buchrieser, A. Danchin, P. Glaser, and F. Kunst
Cloning and assembly strategies in microbial genome projects
Microbiology, October 1, 1999; 145(10): 2625 - 2634.
[Full Text] [PDF]


Home page
StrokeHome page
C. Grond-Ginsbach, R. Weber, J. Haas, E. Orberk, S. Kunz, O. Busse, I. Hausser, T. Brandt, and B. Wildemann
Mutations in the COL5A1 Coding Sequence Are Not Common in Patients With Spontaneous Cervical Artery Dissections
Stroke, September 1, 1999; 30 (9): e1887 - 1890.
[Abstract] [Full Text] [PDF]


Home page
Genome ResHome page
S. Wiemann, B. Weil, R. Wellenreuther, J. Gassenhuber, S. Glassl, W. Ansorge, M. Böcher, H. Blöcker, S. Bauersachs, H. Blum, et al.
Toward a Catalog of Human Genes and Proteins: Sequencing and Analysis of 500 Novel Complete Protein Coding Human cDNAs
Genome Res., March 1, 2001; 11(3): 422 - 435.
[Abstract] [Full Text]


Home page
Proc. Natl. Acad. Sci. USAHome page
T. M. Finan, S. Weidner, K. Wong, J. Buhrmester, P. Chain, F. J. Vorholter, I. Hernandez-Lucas, A. Becker, A. Cowie, J. Gouzy, et al.
From the Cover: The complete sequence of the 1,683-kb pSymB megaplasmid from the N2-fixing endosymbiont Sinorhizobium meliloti
PNAS, August 14, 2001; 98(17): 9889 - 9894.
[Abstract] [Full Text] [PDF]


This Article
Right arrow Abstract Freely available
Right arrow Print PDF (274K) Freely available
Right arrow Alert me when this article is cited
Right arrow Alert me if a correction is posted
Services
Right arrow Email this article to a friend
Right arrow Similar articles in this journal
Right arrow Similar articles in ISI Web of Science
Right arrow Similar articles in PubMed
Right arrow Alert me to new issues of the journal
Right arrow Add to My Personal Archive
Right arrow Download to citation manager
Right arrow Search for citing articles in:
ISI Web of Science (42)
Right arrowRequest Permissions
Right arrow Commercial Re-use Guidelines
for Open Access NAR Content
Google Scholar
Right arrow Articles by Haas, S.
Right arrow Articles by Wiemann, S.
Right arrow Search for Related Content
PubMed
Right arrow PubMed Citation
Right arrow Articles by Haas, S.
Right arrow Articles by Wiemann, S.
Social Bookmarking
 Add to CiteULike   Add to Connotea   Add to Del.icio.us  
What's this?