A novel method for making nested deletions and its application for sequencing of a 300 kb region of human APP locus
A novel method for making nested deletions and its application for sequencing of a 300 kb region of human APP locusMasahira Hattori1,*, Fujiko Tsukahara1,2, Yoshiaki Furuhata1, Hiroshi Tanahashi1, Matsumi Hirose1, Masae Saito1, Shiho Tsukuni1 and Yoshiyuki Sakaki1
1Human Genome Center, Institute of Medical Science, University of Tokyo, Tokyo 108, Japan and 2Department of Pharmacology, Tokyo Woman's Medical College, Tokyo 162, Japan
Received December 16, 1996;Revised and Accepted March 24, 1997
DDBJ/EMBL/GenBank accession no. D87675
ABSTRACT
We developed a novel in vitro method for making nested deletions and applied it to a large-scale DNA sequencing. A DNA fragment to be sequenced (up to 15 kb long) was cloned with a new vector possessing two unique SfiI sites, digested by SfiI and ligated to generate a large head-to-tail concatemer. The large concatemer was randomly fragmented by sonication and then redigested by SfiI to separate insert and vector DNAs. The fragments of various length were then cloned into the other vector(s) specifically designed for selective cloning of insert-derived DNA fragments to generate a library of nested deletions. This method allowed a single person to generate >20 nested deletion libraries sufficient to cover 100 kb in a few days. We applied the method for sequencing of P1 clones and successfully determined the complete sequence of ~300 kb of the human amyloid precursor protein (APP) locus on chromosome 21 with a redundancy of 3.8, reasonably low cost and very few gaps remaining to be closed. Development of some new instruments and software is also described which makes this method more applicable for large-scale sequencing.
INTRODUCTION
Large-scale sequencing is now one of the central issues in the human genome project (1 ,2 ). For sequencing the genome with reasonable speed, accuracy and cost, sequencing strategy is an important factor to be considered. So far, three types of strategies have been used or proposed, namely, shot-gun, primer-walk and nested deletion strategies (reviewed in ref. 3 ). Each strategy has advantages and disadvantages in practical use. The shot-gun strategy has been most widely used for large-scale sequencing projects (i.e., 4 ,5 ). It is simple as a whole and easy to scale up. However, intrinsically it requires a high redundancy of sequencing and extensive gap closure efforts, which may create cost and data assembly process problems, respectively. Primer-walk is a directed strategy with a minimum redundancy but it requires the design and synthesis of a very large number of primers, which may be expensive. It has been proposed to use a library of short oligonucleotides as sequencing primers (6 ,7 ) but it has not been technically well-established. The third strategy is sequential sequencing by using nested deletion or transposon-inserted templates (8 ). This strategy could be carried out with reasonably low redundancy and simple data assembly process, but has not been considered to be applicable for large-scale sequencing because of its complicated procedure for template preparation. However, a relatively simple, transposon-mediated method has been developed and successfully applied for sequencing of the Drosophila genome (9 ).
We have attempted to develop a simple and reproducible method for making nested deletions on a large-scale. We herein describe the novel method for making nested deletions in vitro and its successful application for sequencing ~300 kb of human APP locus on chromosome 21q22.1. Development of some instruments and software is also described, which makes this method applicable for the large-scale and systematic sequencing of the human genome.
MATERIALS AND METHODS
Construction of pSFI vectors
Oligonucleotides for polylinker were prepared by a DNA synthesizer (Perkin-Elmer ABI 394). The double-stranded DNAs described below (a, b and c) were prepared by annealing an equimolar of the synthesized complementary oligonucleotides at 50oC overnight in 100 [mu]l of 0.1 M NaCl. The three double-stranded DNAs having 5" overhang cohesive ends of HindIII and EcoRI sites were ligated with HindIII and EcoRI double-digested pUC13 to construct pSFI-CV (a) or pTZ19R for pSFI-SV1 (b) and pSFI-SV10 (c). The polylinker of pSFI-CV contains EagI, SalI, HindIII, BamHI and two SfiI sites. The polylinker of pSFI-SV1 contains HindIII, SfiI, BglII, EcoRV and EcoRI sites and the polylinker of pSFI-SV1 contains HindIII, SfiI, BglII, StuI and EcoRI sites, respectively.
Kanamycin-resistant pSFI-SV vectors were further prepared by ligation of the BspHI-digested plasmid having the new polylinker with the BamHI fragment containing kanamycin-resistant gene of pBS-Kan2 (10 ). For preparation of nested deletions, the pSFI-SV1 and -SV10 were double-digested by EcoRV and SfiI or StuI and SfiI, treated by CIP and gel-purified. The pSFI-CV was digested with appropriate restriction enzymes which cleave the multiple cloning site, treated by CIP and gel-purified. The MCSs of these three vectors are in-frame for the Escherichia coliLac Z gene which produces blue colonies in the presence of X-gal. The structure of these vectors is shown in Figure 1 .
Preparation of nested deletion library
The overall procedure for the preparation of nested deletions is illustrated in Figure 2 . The pSFI-CV clone was cultured overnight in the presence of ampicillin and the plasmid was isolated by the alkaline-SDS method (11 ). The plasmid DNA (20 [mu]g) was digested with SfiI (NEB, 40 U) in a final volume of 100 [mu]l at 50oC for 1 h and extracted by phenol/chloroform and precipitated by ethanol. The DNA was treated with T4 DNA ligase (Takara, 10 U) and 0.1 mM ATP under DNA concentrations of 0.5-1 [mu]g/[mu]l at 15oC in 40 [mu]l of 1* ligation buffer for 2 h to overnight. An aliquot of the viscous ligated mixture was diluted to ~5 ng/[mu]l in 200 [mu]l TE (10 mM Tris-HCl, 1 mM EDTA, pH 8.0) and sonicated by a sonicator Astrason XL with the pulsar dial 1.5-2.0. The time for sonication depends on the insert DNA size and was routinely set for 30 s-1 min for 5 kb DNA, in which the broad range of smear bands were obtained. The sonicated DNA was extracted by phenol/chloroform, precipitated by ethanol and dissolved in 30 [mu]l TE. The DNA was treated with T4 DNA polymerase (Toyobo, 0.5 U) at 37oC for 5 min in a 20 [mu]l of 50 mM Tris-HCl, pH 7.5, 10 mM MgCl2, 10 mM DTT and 0.2 mM dNTPs. The reaction was quenched by heating at 70oC for 15 min, then the DNA was treated with T4 polynucleotide kinase (Takara, 5 U) in the presence of 0.1 mM ATP in 30 [mu]l of appropriate buffer at 37oC for 20 min. The DNA was further digested with SfiI (NEB, 10 U) in 80 [mu]l of appropriate buffer at 50oC for 1 h. The reaction was quenched by adding EDTA and the DNA was extracted by phenol/chloroform, precipitated with ethanol and dissolved in 20 [mu]l TE. An aliquot of the digested DNA (0.1-0.3 [mu]g) was ligated with pSFI-SV1 or pSFI-SV10 (0.1 [mu]g) in 20 [mu]l of 1* ligation buffer containing 4 U of T4 DNA ligase (Takara) and 0.1 mM ATP at 15oC for 2 h to overnight. The ligated mixture was transformed to E.coli DH5[alpha] (Gibco-BRL) and the colonies resistant to kanamycin were obtained as nested deletion library.
Size measurement and ordering of nested deletions
The kanamycin resistant colonies were randomly picked and cultured in 5 ml L-broth in the presence of kanamycin (100 [mu]g/ml) overnight and the plasmid DNA were isolated by an automated plasmid isolator PI100 (KURABO, Japan). Or the insert DNA was directly amplified from colonies by PCR (long PCR kit from Takara or Gibco-BRL) using the specific primers (LR: 5'-TCCGGCTCGTATGTTGTGTGGA-3', LL: 5'-GTGCTGCAAGGCGATTAAGTTGG-3') for 30 cycles of 94oC for 30 s and 68oC for 1-15 min followed by one cycle at 70oC for 10 min. The plasmid DNAs digested with SfiI or the PCR-amplified products were electrophoresed on a 0.8% agarose gel and the gel image was recorded by using a CCD imaging system (ATTO Co. Ltd., Tokyo, Japan). The size measurement and ordering of the clones were performed by a computer software program Lane Screener newly developed for this purpose by ATTO. The nested deletion clones were selected at the interval of 250-350 bases for sequencing. The nested deletion plasmids were used for sequencing without further purification or the PCR products were used after treating with shrimp alkaline phosphatase-E.coli exonuclease I (Amersham) at 37oC for 20 min followed by heating at 85oC for 10 min.
Sequencing and data assembly of nested deletions
Sequencing was done by cycle sequencing with a commercially available fluorescent-labeled forward primer (-21m) and analyzed by a four color-based sequencer (Perkin-Elmer ABI 373S). The sequencing reaction was carried out by a manual manner in the beginning and later by a sequencing robot (Vistra, Amersham) according to the manufacturer's instructions. The buffers and reagents for sequencing were obtained from the manufacturers. Data assembly was done by a commercially available program ATSQ (Japan Software Inc., Tokyo, Japan) in the beginning and later by a newly developed system SAND (see Results).
Subcloning of restriction fragments from P1 DNA
The P1 clone was cultured in 100 ml of L-broth in the presence of kanamycin (100 [mu]g/ml) and the DNA was isolated by alkaline-SDS method according to the literature (14 ). The crude P1 DNA was treated with RNase A (Sigma, 20 [mu]g) at 37oC for 1 h and precipitated by adding 0.6 vol of 2.5 M NaCl/20% polyethylene-glycol 6000. The mixture was kept on ice for 15 min and the DNA was precipitated by centrifugation. The precipitate was rinsed by 75% ethanol, dried and dissolved in 100 [mu]l TE. An aliquot (20 [mu]l) was digested by AvrII, XbaI, BamHI or BglII for 1 h and then treated with Klenow fragment (0.1 U) in the presence of 0.2 mM dCTP/TTP or dGTP/dATP to partially fill-in the restriction end at 37oC for a further 15 min. The reaction was quenched by adding 1 [mu]l of 0.5 M EDTA and the DNA was extracted with phenol/chloroform, precipitated by ethanol and dissolved in 20 [mu]l TE (pH 8). An aliquot (1-5 [mu]l) was ligated with a pSFI-CV (0.1 [mu]g) partially filled-in at the HindIII, or SalI site in 20 [mu]l of 1* ligation buffer containing 4 U T4 DNA ligase (Takara) and 0.1 mM ATP at 15oC for 2 h to overnight. The ligation mixture was transformed into E.coli DH5[alpha] (Gibco-BRL) and the colonies resistant to ampicillin were obtained as subclones. Sixty colonies were randomly picked and cultured in 5 ml of L-broth in the presence of ampicillin (100 [mu]g/ml) overnight and the plasmid DNA was isolated by alkaline-SDS method and dissolved in 150 [mu]l of TE. These clones were stored as the sub-libraries and subjected to the following fingerprinting selection. The cloned DNA (10 [mu]l) was digested with HaeIII (Nippon gene, 4 U) at 37oC for 1 h and electrophoresed on a 2% Nusieve/1% agarose gel. The HaeIII fingerprint of each clone was used to choose the independent subclones. The size of the insert DNA was also estimated by digestion with SfiI. The sequences of the both-ends of each subclone were determined by cycle sequencing with forward (endash 21m) and reverse (M13 RP) primers.
RESULTS
Construction of the nested deletion libraries
To construct nested deletion libraries, we designed and constructed three unique vectors, termed pSFI-CV and pSFI-SV1 and pSFI-SV10, from commercially available plasmid vectors pUC13 and pTZ19R as described in Materials and Methods. As shown in Figure 1 , pSFI-CV has a multiple cloning site (MCS) which is sandwiched by the two unique SfiI sites producing the 3'-overhang ends of ATT-3' and TAA-3', respectively. Two pSFI-SV vectors have a MCS producing a blunt-end and a SfiI 3'-overhang end complementary to either SfiI end of the pSFI-CV, respectively. These vectors are available upon request. By using these vectors, nested deletion libraries were generated as follows (Fig. 2 ). A DNA fragment to be sequenced (usually 2-15 kb long) was cloned into the MCS of pSFI-CV. The plasmid was digested with SfiI and then ligated at a high DNA concentration to generate the alternatively ligated high-molecular weight DNA. The ligated DNA was sonicated to generate the various sizes of fragments and the ends were flushed by the treatment with T4 DNA polymerase. The sonication of high molecular weight DNA enabled us to generate various size of fragments more efficiently than that of un-treated small plasmid DNA. The blunt-ended DNA was digested with SfiI to produce the fragments with a blunt-end at one end and a SfiI 3'-overhang end at the other end. Among the fragments, only the insert-derived fragments have the SfiI site complementary to the SfiI end of either pSFI-SV vector. The fragments were ligated with a pSFI-SV1 or SV10 double-digested with SfiI and appropriate blunt-end enzymes, which enabled us to selectively ligate the insert DNA-derived fragments with pSFI-SV1 or pSFI-SV10 through the complementary end of the SfiI. Finally, the ligated DNAs were introduced into E.coli to generate a library of the nested deletion clones of various length. The results of construction of nested deletions from several fragments are summarized in Table 1 . The data indicated that ~90% of the clones in the library were nested deletions with the various sizes. In general, smaller DNAs were preferentially cloned but some uneven size distribution of the deletions caused little problem in practical use.
Construction of nested deletions from various size of DNA fragments
Clone
Estimated insert size (kb)
Vector
No. of isolated clone
No. of nested deletions
Yield of nested deletions
Size range of insert DNA (kb)
1
2.5
SV1
40
36
90
0.2-2.3
SV10
40
35
87
0.3-2.3
2
3.6
SV1
60
52
87
0.1-3.7
SV10
60
43
71
0.2-3.3
3
4.0
SV1
60
58
97
0.3-3.9
SV10
60
54
90
0.3-3.9
4
6.0
SV1
90
58
64
0.2-5.9
SV10
90
71
80
0.4-6.0
5
7.2
SV1
96
82
85
0.2-6.7
SV10
96
65
68
0.4-6.8
6
8.5
SV1
120
107
89
0.2-7.6
SV10
120
100
83
0.1-8.0
7
11.3
SV1
120
113
86
0.1-10.4
SV10
120
119
99
0.4-10.8
Insert DNA was amplified by colony PCR except clones 6 and 7, which were isolated as plasmids. The DNA was subjected to electrophoresis on a 0.8% agarose gel and the size was measured by CCD imaging system. Clones other than nested deletions were un-amplified or had no insert.
Isolation and ordering of nested deletion clones
The kanamycin-resistant white colonies prepared as above were randomly picked, cultured and the plasmids were isolated by an automated plasmid isolator. Alternatively, the insert DNA was directly amplified from colonies by PCR. The plasmid or PCR-amplified DNA was electrophoresed on agarose gel for size measurement. Figure 3 a shows a typical example of the electrophoresis pattern of PCR-amplified insert DNAs of randomly isolated clones from a nested deletion library. Since the size measurement is a laborious step, we developed an automated size-measuring system, in collaboration with ATTO Co. Ltd. (Tokyo, Japan). The system allowed us to automatically measure the size of insert DNAs and to re-align the clones by size with the aid of the computer program, Lane Screener. The CCD Imaging system and the program Lane Screener will be purchased from ATTO. Figure 3 b shows an example of the electrophoresis pattern of the ordered nested deletions. Among ordered clones, appropriate clones were chosen with appropriate deletion interval and subjected to sequencing. Since 400-600 bases of raw data can be constantly obtained by sequencing of one clone, a set of nested deletions with an interval of 250-350 bases is sufficient to obtain the contiguous sequence data of one strand. For example, in the case of a 4 kb DNA fragment, 12 clones out of the 60 ordered clones for one strand were selected for sequencing.
Figure 3.An example of size-measurement and ordering of nested deletions. (a) Electrophoresis profiles of the PCR-amplified DNAs. The insert DNA was originally 4.6 kb long and the amplification product of no insert clone is ~200 bp. The most left lane of the left gel and the most right lane of the right gel indicate a size marker, StyI-digested [lambda] DNA. The size of each band is ~19.3, 7.7, 6.2, 4.3, 3.5, 2.7, 1.9, 1.5, 0.9 and 0.4 kb from the top, respectively. (b) The size of DNA band was measured by the CCD imaging system (ATTO Co. Ltd., Japan), and the bands were ordered by the size by newly developed software program, Lane Screener, which enables us to simultaneously measure and order up to 300 fragments and also to select the clones with appropriate deletion-interval for sequencing. The process from size-measurement to selection of nested deletions for sequencing can be automatically done in 15 min for 100 fragments.
Sequencing and data assembly
The selected clones were subjected to sequencing analysis. The sequencing was carried out as described in Materials and Methods. The data assembly was initially performed by a commercially available software program ATSQ (Japan Sofware Inc., Japan) developed for shot-gun sequencing. However, our strategy gives an ordered sequencing data that can be assembled without the unnecessary matching process of positionally unrelated sequences. We thus developed a new data assembly program for the nested deletion strategy in collaboration with Mitsui Knowledge Industry Co., Ltd. (Japan). The system named Sequence Assembler of Nested Deletion (SAND) enabled us to assemble the data with high speed and accuracy. Details of SAND will be published elsewhere.
Application of the nested deletion strategy for large-scale sequencing
We applied our nested deletion method for sequencing ~300 kb of the human amyloid precursor protein (APP) locus mapped on chromosome band 21q22.1. All the exons and their flanking regions of the APP gene were previously isolated and partly sequenced in our laboratory (12 ). For this study, we isolated five P1 clones covering the entire APP gene (Fig. 4 ) from a P1 library specific for human chromosome 21q (13 ).
The overall procedure for sequencing of the P1 clones consisted of: (i) subcloning of restriction fragments of the P1 clone into the pSFI-CV; (ii) construction of the restriction map of the P1 clone; (iii) construction of nested deletions from selected subclones; (iv) sequencing and data assembly of the nested deletions; and (v) assembly of the consensus sequence data of each subclone to generate the final contiguous sequence data.
We first subcloned the restriction fragments from each P1 clone into pSFI-CV. We used BamHI, XbaI, AvrII and BglII, because BamHI, XbaI, AvrII do not appear and BglII appears once in the P1 vector (pAd10SacIIB, ref. 14 ). The restriction subclones were randomly isolated from a P1 clone and briefly characterized by HaeIII-fingerprinting to choose independent clones. Among the 240 subclones, ~80 of independent clones with a range of 0.3-16 kb in size were chosen. The both-end sequencing of the independent clones followed by the data assembly were performed to obtain the partial map. This process usually produced three to five contiguous regions of 10-40 kb. The gaps between the contigs were then filled by re-screening of remaining subclones by PCR using primers designed from the sequence data of the edges of each contig. This procedure allowed us to rapidly construct the restriction map and to construct the minimum tiling path of the fragments over the five contiguous P1 clones. Finally, we selected 72 restriction fragments from the five P1s. The nested deletion libraries were prepared for 69 restriction fragments and finally ~1600 of nested deletions were subjected to sequencing analysis. The remaining three fragments of <1.5 kb were sequenced by primer-walk method using dye-terminator chemistry. We finally obtained 301 692 bp of the consensus sequence (DDBJ accession number; D87675) from the five P1 clones. The results revealed the complete structure of the APP gene spanning 286 722 bp (Fig. 4 ). Detailed analysis of the sequence data will be reported elsewhere. Table 2 summarized the efficiency of our method for sequencing of the APP locus. It should be noted that the final data were obtained with a redundancy of 3.8 that is about a half of that by shot-gun strategy. The typical shot gun strategy would require redundancy in a range from 6 to 8 (15 ). The redundancy by primer-walk strategy would be estimated at ~2.5, assuming that a reaction produces a 500 base data and the 100 bases are overlapped between the data. But this value is an underestimation for sequencing of human genome because the presence of highly repetitive sequences such as the Alu family makes it difficult to continue the walk. It should be also noted that gap closure for only six gaps was required for the 300 kb sequencing. These gaps were due to the failure of sequencing of the PCR-amplified templates containing poly-purine, poly-pyrimidine or poly purine-pyrimidine sequences and were successfully closed by re-sequencing the original subclones as templates by dye-terminator chemistry using specific primers. About 96% of double-stranded coverage was obtained at the initial step and the remaining 4% (29 regions) of single-stranded coverage were caused by missing of nested deletions from either strand. These regions were also covered by sequencing the original subclones by primer-walk method using dye-terminator chemistry. In total, 44 synthetic primers were required to complete the double-stranded coverage of 300 kb.
Figure 4.P1 clones covering the human APP locus and exon-intron organization of the APP gene based on the sequence data obtained in this study. The five overlapping P1 clones (S-459, T-1559, S-491, T-1715 and T-364) were used as starting materials for sequencing. The position of the exons (1-18) was arbitrarily shown by vertical bars. The numbers between the exons indicate the size of the introns in bp. The sequence data revealed that the five P1 clones covered 301 692 bp. The first exon started at 9001 and the end of the last exon (exon 18) located at 295 722, so that the APP gene is 286 722 bp in length. The accession number of the entire sequence of the APP locus is D87675.
DISCUSSION
In the present study, we developed a novel method for making nested deletions and demonstrated that it is applicable for large-scale sequencing. The method for making nested deletions is comprised of simple steps including digestion by SfiI, ligation and sonication. It has not yet been automated but is simple enough to obtain the nested deletion libraries for >100 kb of final sequence data in a few days by one technician. The use of asymmetric 3'-overhang ends of SfiI cleavage sites in the vector provides a high efficiency and selectivity in the ligation steps. Actually, the nested deletion libraries prepared by this method contained almost no vector-derived fragments (see Table 1 ). The method is applicable for all the clonable DNA, although some modifications may be required for the DNA fragments having SfiI sites of the same 3'-overhang ends with those of pSFI vectors. On average, SfiI site appears every 200 kb in the human genome and the SfiI sites with the same 3'-overhang end every 13 000 kb or less, thus the frequency of such SfiI sites is low enough for practical use.
To apply this method for sequencing of the large insert clones such as P1 and BAC clones, subcloning and mapping of the restriction fragments are required. In the case of P1 clones, this step could be simply performed by one-pass end-sequencing of the restriction fragments followed by the data assembly on the computer, and provided several advantages for practical use. It made the final data assembly easy. It was also useful to find the same fragments present in the overlapped regions between the P1 clones to avoid unnecessary duplication of sequencing. Also, the use of restriction enzyme allowed us to use P1 DNA purified simply by polyethylene-glycol precipitation in the subcloning step. No ultra-centrifugation step was required.
The most labor-intensive and time-consuming step of the nested deletion-based strategy is the isolation and ordering of nested deletions. For example, we isolated and electrophoresed ~8000 clones for selecting 1600 nested deletions. However, the employment of the direct amplification of insert DNA by long PCR (16 ) enabled us to rapidly prepare templates of various lengths. The templates prepared in this manner gave sequence data of high quality sufficient for the data assembly. For ordering the clones, we developed a new instrument for automated size-measurement and ordering of the clones, which allowed us to handle >2000 nested deletion clones in a day per person. A software program was also developed for automated selection of nested deletions with appropriate deletion interval.
. Summary of sequencing of APP locus using the nested deletion method
Description
Number
P1 clones sequenced
5 clones
Independent restriction fragments
368 fragments
Restriction fragments sequenced
72 fragments
Restriction fragments used for ND construction
69 fragments
Restriction fragments used for primer-walk
3 fragments
ND clones isolated
8120 clones
Total sequencing reactions
2370 reactions
End-sequencing of restriction fragments
736 reactions
Sequencing of small fragments by primer-walk
8 reactions
Sequencing of ND clones
1582 reactions
Primer-walk for gap closure
12 reactions
Primer-walk for double-stranded coverage
32 reactions
Average edited read base
486 bases
Total read base
1 151 820 bases
Consensus sequence
301 692 bases
Redundancy
3.8
ND, nested deletions.
In contrast to relatively complicated steps in the front side, the steps for sequencing and data assembly are easy and simple. To obtain the final 300 kb contiguous data of the APP locus, only 2400 sequencing reactions including end-sequencing of the subclones and gap-close sequencing were carried out in total. The data were quickly assembled in a quite simple and reliable manner. The results were easily checked by comparing the clone alignment in the assembled data with those obtained from size-measurement. It should be emphasized that in our method the direction of each sequence data was already known prior to data assembly, so that at first the data for each strand were assembled and then the data of each strand were compared to obtain the final consensus data. This two-step assembly procedure eliminated the unnecessary matching process of the complementary sequence data of each sample, accelerating the overall speed of the data assembly step and making it easier to resolve the ambiguities.
The accuracy of the data obtained in this study is estimated to be >99.99% as discussed below. We compared raw data from 1000 samples and found that the frequency of ambiguity of our data was 0.25% in bases 1-400 and ~10% in bases 401-500. The data assembly for one strand was carried out by using up to 500 bases of raw data and the ambiguities were resolved by eye inspection. Since we used the data with an interval of ~300 bases, 400 bases out of the 500 bases of raw data are overlapped each other and the ambiguity in the overlapped region is estimated to 0.005% (0.25% * 2% = 0.005%). The accuracy of the remaining 100 bases of non-overlapped region is 0.25% as mentioned above. Therefore, the assembled data for one strand should have an accuracy of at least 99.75%. However, for practical purposes, the ambiguities were finally resolved based on the data from both-stranded coverage over the entire region. Therefore, the error rate should be much less than 0.01% (0.25% * 0.25% = 0.000625%). Also, the final consensus data were obtained with a redundancy of 3.8. Thus, the accuracy should be much higher than the above value. In fact, comparison of the present data with those of 3.6 kb APP cDNA showed no inconsistency between them.
In conclusion, the nested deletion method developed in this study provided a novel strategy for large-scale sequencing characterized by low redundancy, little gap closure effort and rapid data assembly. The procedure can be employed not only for P1 but also for BAC (17 ) and PAC (18 ) systems.
ACKNOWLEDGEMENTS
This work was supported in part by Grant-in-Aid for a Creative Basic Research (Human Genome Program) from the Ministry of Education, Science, Sports and Culture, and a Grant of a Special Coordination Funds for Promotion of Science and Technology from the Science and Technology Agency (STA) of Japan.
3 Hunkapiller, T., Kaiser, R. J., Koop, B. F. and Hood, L. (1991) Science, 254, 59-67.MEDLINE Abstract
4 Hodgkin, J., Plasterk, R. H. and Waterston, R. H. (1995) Science, 270, 410-414 .MEDLINE Abstract
5 Fleischmann, R. D. et al. (1995) Whole-genome random sequencing and assembly of Haemophilus influenzae Rd. Science, 269, 496.
6 Studier, F. W. A (1989) Proc. Natl. Acad. Sci. USA, 86, 6917-6921.
7 Kotler, L. E., Zevin-Sonkin, D., Sobolev, I. A., Beskin, A. D. and Ulanovsky, L. E. (1993) Proc. Natl. Acad. Sci. USA, 90, 4241-4245.
8 Strathmann, M., Hamilton, B. A., Mayeda, C. A., Simon, M. I., Meyerowitz, E. M. and Palazzolo, M. J. (1991) Proc. Natl. Acad. Sci. USA, 88, 1247-1250.MEDLINE Abstract
9 Martin, C. H., Mayeda, C. A., Davis,C. A., Ericsson, C. L., Knafels, J. D., Mathog, D. R., Celniker, S. E., Lewis, E. B. and Palazzolo, M. J. (1995) Proc. Natl. Acad. Sci. USA, 92, 8398-8402.
10 Kusuda, J. Kameoka, Y. Takahashi, I. Fujiwara, H. and Hashimoto, K. (1989) Nucleic Acids Res., 17, 8890.
11 Maniatis, T., Fritsch, E. F. and Sambrook, J. (1989) Molecular cloning: A Laboratory Manual. Cold Spring Harbor Laboratory Press, Cold Spring Harbor, NY.
12 Yoshikai, S., Sasaki, H., Doh-ura, K., Furuya, H. and Sakaki, Y. (1990) Gene, 87, 257-263.MEDLINE Abstract
13 Tanahashi, H., Ito, T., Hattori, M., Ohira, M., Ohki, M., Tashiro, K. and Sakaki, Y. (1994) DNA Res., 1, 85-89.MEDLINE Abstract
14 Pierce, J. C. Sauer, B. and Sternberg, N. (1992) Proc. Natl. Acad. Sci. USA, 89, 2056-2060.
15 Hodgkin, J., Plasterk, R. H. A. and Waterston, R. H. (1995) Science, 270, 410-414.MEDLINE Abstract
16 Barnes, W. M. (1994) Proc. Natl. Acad. Sci. USA, 91, 2216-2220.
17 Shizuya, H., Birren, B., Kim, U. J., Mancino, V., Slepak, T., Tachiiri, Y. and Simon, M. (1992) Proc Natl. Acad. Sci. USA, 89, 8794-8797.MEDLINE Abstract
18 Ioannou, P. A., Amemiya, C. T., Garnes, J., Kroisel, P. M., Shizuya, H., Chen, C., Batzer, M. A. and De Jong, P. J. (1994) Nature Genet., 6, 84-89.