Skip Navigation

This Article
Right arrow Abstract Freely available
Right arrow Print PDF (134K) Freely available
Right arrow Alert me when this article is cited
Right arrow Alert me if a correction is posted
Services
Right arrow Email this article to a friend
Right arrow Similar articles in this journal
Right arrow Similar articles in ISI Web of Science
Right arrow Similar articles in PubMed
Right arrow Alert me to new issues of the journal
Right arrow Add to My Personal Archive
Right arrow Download to citation manager
Right arrow Search for citing articles in:
ISI Web of Science (51)
Right arrow Commercial Re-use Guidelines
for Open Access NAR Content
Google Scholar
Right arrow Articles by Hattori, M.
Right arrow Articles by Sakaki, Y.
Right arrow Search for Related Content
PubMed
Right arrow PubMed Citation
Right arrow Articles by Hattori, M.
Right arrow Articles by Sakaki, Y.
Social Bookmarking
 Add to CiteULike   Add to Connotea   Add to Del.icio.us  
What's this?

© 1997 Oxford University Press 1802-1808

Footnote

A novel method for making nested deletions and its application for sequencing of a 300 kb region of human APP locus

A novel method for making nested deletions and its application for sequencing of a 300 kb region of human APP locus Masahira Hattori1,*, Fujiko Tsukahara1,2, Yoshiaki Furuhata1, Hiroshi Tanahashi1, Matsumi Hirose1, Masae Saito1, Shiho Tsukuni1 and Yoshiyuki Sakaki1

1Human Genome Center, Institute of Medical Science, University of Tokyo, Tokyo 108, Japan and 2Department of Pharmacology, Tokyo Woman's Medical College, Tokyo 162, Japan

Received December 16, 1996; Revised and Accepted March 24, 1997

DDBJ/EMBL/GenBank accession no. D87675

ABSTRACT

We developed a novel in vitro method for making nested deletions and applied it to a large-scale DNA sequencing. A DNA fragment to be sequenced (up to 15 kb long) was cloned with a new vector possessing two unique SfiI sites, digested by SfiI and ligated to generate a large head-to-tail concatemer. The large concatemer was randomly fragmented by sonication and then redigested by SfiI to separate insert and vector DNAs. The fragments of various length were then cloned into the other vector(s) specifically designed for selective cloning of insert-derived DNA fragments to generate a library of nested deletions. This method allowed a single person to generate >20 nested deletion libraries sufficient to cover 100 kb in a few days. We applied the method for sequencing of P1 clones and successfully determined the complete sequence of ~300 kb of the human amyloid precursor protein (APP) locus on chromosome 21 with a redundancy of 3.8, reasonably low cost and very few gaps remaining to be closed. Development of some new instruments and software is also described which makes this method more applicable for large-scale sequencing.

INTRODUCTION

Large-scale sequencing is now one of the central issues in the human genome project (1 ,2 ). For sequencing the genome with reasonable speed, accuracy and cost, sequencing strategy is an important factor to be considered. So far, three types of strategies have been used or proposed, namely, shot-gun, primer-walk and nested deletion strategies (reviewed in ref. 3 ). Each strategy has advantages and disadvantages in practical use. The shot-gun strategy has been most widely used for large-scale sequencing projects (i.e., 4 ,5 ). It is simple as a whole and easy to scale up. However, intrinsically it requires a high redundancy of sequencing and extensive gap closure efforts, which may create cost and data assembly process problems, respectively. Primer-walk is a directed strategy with a minimum redundancy but it requires the design and synthesis of a very large number of primers, which may be expensive. It has been proposed to use a library of short oligonucleotides as sequencing primers (6 ,7 ) but it has not been technically well-established. The third strategy is sequential sequencing by using nested deletion or transposon-inserted templates (8 ). This strategy could be carried out with reasonably low redundancy and simple data assembly process, but has not been considered to be applicable for large-scale sequencing because of its complicated procedure for template preparation. However, a relatively simple, transposon-mediated method has been developed and successfully applied for sequencing of the Drosophila genome (9 ).

We have attempted to develop a simple and reproducible method for making nested deletions on a large-scale. We herein describe the novel method for making nested deletions in vitro and its successful application for sequencing ~300 kb of human APP locus on chromosome 21q22.1. Development of some instruments and software is also described, which makes this method applicable for the large-scale and systematic sequencing of the human genome.

MATERIALS AND METHODS

Construction of pSFI vectors

Oligonucleotides for polylinker were prepared by a DNA synthesizer (Perkin-Elmer ABI 394). The double-stranded DNAs described below (a, b and c) were prepared by annealing an equimolar of the synthesized complementary oligonucleotides at 50oC overnight in 100 [mu]l of 0.1 M NaCl. The three double-stranded DNAs having 5" overhang cohesive ends of HindIII and EcoRI sites were ligated with HindIII and EcoRI double-digested pUC13 to construct pSFI-CV (a) or pTZ19R for pSFI-SV1 (b) and pSFI-SV10 (c). The polylinker of pSFI-CV contains EagI, SalI, HindIII, BamHI and two SfiI sites. The polylinker of pSFI-SV1 contains HindIII, SfiI, BglII, EcoRV and EcoRI sites and the polylinker of pSFI-SV1 contains HindIII, SfiI, BglII, StuI and EcoRI sites, respectively.

(a)5'-AGCTGGCCAAATCGGCCGTCGACAAGCTTGGATCCGGCCATAAGGGCC CCGGTTTAGCCGGCAGCTGTTCGAACCTAGGCCGGTATTCCCGGTTAA-5'(b)5'-AGCTTGCATGCCAGGCCAAATCGGCCCTAGGAGATCTGATATCAGGCCTGAGCTCG ACGTACGGTCCGGTTTAGCCGGCTTCCTCTAGACTATAGTGGGGACTCGAGCTTAA-5'(c)5'-AGCTTGCATGCCAGGCCATTAGGGCCGAGATCTGGAGGCCTCCCGGGGAGCTCG ACGTACGGTCCGGTAATCCCGGCTCTAGACCTCCGGAGGGCCCCTCGAGCTTAA-5'

Kanamycin-resistant pSFI-SV vectors were further prepared by ligation of the BspHI-digested plasmid having the new polylinker with the BamHI fragment containing kanamycin-resistant gene of pBS-Kan2 (10 ). For preparation of nested deletions, the pSFI-SV1 and -SV10 were double-digested by EcoRV and SfiI or StuI and SfiI, treated by CIP and gel-purified. The pSFI-CV was digested with appropriate restriction enzymes which cleave the multiple cloning site, treated by CIP and gel-purified. The MCSs of these three vectors are in-frame for the Escherichia coli Lac Z gene which produces blue colonies in the presence of X-gal. The structure of these vectors is shown in Figure 1 .

Preparation of nested deletion library

The overall procedure for the preparation of nested deletions is illustrated in Figure 2 . The pSFI-CV clone was cultured overnight in the presence of ampicillin and the plasmid was isolated by the alkaline-SDS method (11 ). The plasmid DNA (20 [mu]g) was digested with SfiI (NEB, 40 U) in a final volume of 100 [mu]l at 50oC for 1 h and extracted by phenol/chloroform and precipitated by ethanol. The DNA was treated with T4 DNA ligase (Takara, 10 U) and 0.1 mM ATP under DNA concentrations of 0.5-1 [mu]g/[mu]l at 15oC in 40 [mu]l of 1* ligation buffer for 2 h to overnight. An aliquot of the viscous ligated mixture was diluted to ~5 ng/[mu]l in 200 [mu]l TE (10 mM Tris-HCl, 1 mM EDTA, pH 8.0) and sonicated by a sonicator Astrason XL with the pulsar dial 1.5-2.0. The time for sonication depends on the insert DNA size and was routinely set for 30 s-1 min for 5 kb DNA, in which the broad range of smear bands were obtained. The sonicated DNA was extracted by phenol/chloroform, precipitated by ethanol and dissolved in 30 [mu]l TE. The DNA was treated with T4 DNA polymerase (Toyobo, 0.5 U) at 37oC for 5 min in a 20 [mu]l of 50 mM Tris-HCl, pH 7.5, 10 mM MgCl2, 10 mM DTT and 0.2 mM dNTPs. The reaction was quenched by heating at 70oC for 15 min, then the DNA was treated with T4 polynucleotide kinase (Takara, 5 U) in the presence of 0.1 mM ATP in 30 [mu]l of appropriate buffer at 37oC for 20 min. The DNA was further digested with SfiI (NEB, 10 U) in 80 [mu]l of appropriate buffer at 50oC for 1 h. The reaction was quenched by adding EDTA and the DNA was extracted by phenol/chloroform, precipitated with ethanol and dissolved in 20 [mu]l TE. An aliquot of the digested DNA (0.1-0.3 [mu]g) was ligated with pSFI-SV1 or pSFI-SV10 (0.1 [mu]g) in 20 [mu]l of 1* ligation buffer containing 4 U of T4 DNA ligase (Takara) and 0.1 mM ATP at 15oC for 2 h to overnight. The ligated mixture was transformed to E.coli DH5[alpha] (Gibco-BRL) and the colonies resistant to kanamycin were obtained as nested deletion library.


Figure 1.Vectors for construction of nested deletions. The pSFI-CV contains a multiple cloning site (MCS) for EagI, SalI, HindIII and BamHI, which is used for cloning of foreign DNA fragments to be sequenced. The MCS is flanked by two SfiI sites having the overhang ends of AAT-3" (a) and TTA-3" (b), respectively. Two pSFI-SV vectors (SV1 and SV10) were constructed for cloning of nested deletions generated from pSFI-CV clone. The pSFI-SV1 contains a SfiI site producing the overhang end of AAT-3" and a blunt-end of EcoRV. The pSFI-SV10 contains a SfiI site producing the overhang end of TTA-3" and a blunt-end of StuI. Each SfiI end of the pSFI-SV vectors is complementary to either of the SfiI ends of the insert DNA in the pSFI-CV. AmpR and KmR indicate ampicillin and kanamycin resistant genes, respectively. Arrows indicate the direction of sequencing.


Figure 2.Procedure for construction of nested deletions. The nested deletions are constructed by five steps. (I) The pSFI-CV clone is digested by SfiI and then the fragments are ligated to obtain alternatively multimerized high molecular weight DNA. (II) The ligated DNA is sonicated to obtain the fragments of various size and then flushed by treatment of T4 DNA polymerase. (III) The sonicated DNA is digested by SfiI to obtain the fragments with a SfiI end at the one end and a blunt-end at the other end. (IV) Finally, the SfiI-digested fragments are ligated to either of pSFI-SV vectors and (V) introduced into E.coli to obtain the nested deletion library. Open and closed boxes indicate vector and cloned DNA fragments, respectively. Capital A and K in circles indicate ampicillin and kanamycin resistant genes, respectively.

Size measurement and ordering of nested deletions

The kanamycin resistant colonies were randomly picked and cultured in 5 ml L-broth in the presence of kanamycin (100 [mu]g/ml) overnight and the plasmid DNA were isolated by an automated plasmid isolator PI100 (KURABO, Japan). Or the insert DNA was directly amplified from colonies by PCR (long PCR kit from Takara or Gibco-BRL) using the specific primers (LR: 5'-TCCGGCTCGTATGTTGTGTGGA-3', LL: 5'-GTGCTGCAAGGCGATTAAGTTGG-3') for 30 cycles of 94oC for 30 s and 68oC for 1-15 min followed by one cycle at 70oC for 10 min. The plasmid DNAs digested with SfiI or the PCR-amplified products were electrophoresed on a 0.8% agarose gel and the gel image was recorded by using a CCD imaging system (ATTO Co. Ltd., Tokyo, Japan). The size measurement and ordering of the clones were performed by a computer software program Lane Screener newly developed for this purpose by ATTO. The nested deletion clones were selected at the interval of 250-350 bases for sequencing. The nested deletion plasmids were used for sequencing without further purification or the PCR products were used after treating with shrimp alkaline phosphatase-E.coli exonuclease I (Amersham) at 37oC for 20 min followed by heating at 85oC for 10 min.

Sequencing and data assembly of nested deletions

Sequencing was done by cycle sequencing with a commercially available fluorescent-labeled forward primer (-21m) and analyzed by a four color-based sequencer (Perkin-Elmer ABI 373S). The sequencing reaction was carried out by a manual manner in the beginning and later by a sequencing robot (Vistra, Amersham) according to the manufacturer's instructions. The buffers and reagents for sequencing were obtained from the manufacturers. Data assembly was done by a commercially available program ATSQ (Japan Software Inc., Tokyo, Japan) in the beginning and later by a newly developed system SAND (see Results).

Subcloning of restriction fragments from P1 DNA

The P1 clone was cultured in 100 ml of L-broth in the presence of kanamycin (100 [mu]g/ml) and the DNA was isolated by alkaline-SDS method according to the literature (14 ). The crude P1 DNA was treated with RNase A (Sigma, 20 [mu]g) at 37oC for 1 h and precipitated by adding 0.6 vol of 2.5 M NaCl/20% polyethylene-glycol 6000. The mixture was kept on ice for 15 min and the DNA was precipitated by centrifugation. The precipitate was rinsed by 75% ethanol, dried and dissolved in 100 [mu]l TE. An aliquot (20 [mu]l) was digested by AvrII, XbaI, BamHI or BglII for 1 h and then treated with Klenow fragment (0.1 U) in the presence of 0.2 mM dCTP/TTP or dGTP/dATP to partially fill-in the restriction end at 37oC for a further 15 min. The reaction was quenched by adding 1 [mu]l of 0.5 M EDTA and the DNA was extracted with phenol/chloroform, precipitated by ethanol and dissolved in 20 [mu]l TE (pH 8). An aliquot (1-5 [mu]l) was ligated with a pSFI-CV (0.1 [mu]g) partially filled-in at the HindIII, or SalI site in 20 [mu]l of 1* ligation buffer containing 4 U T4 DNA ligase (Takara) and 0.1 mM ATP at 15oC for 2 h to overnight. The ligation mixture was transformed into E.coli DH5[alpha] (Gibco-BRL) and the colonies resistant to ampicillin were obtained as subclones. Sixty colonies were randomly picked and cultured in 5 ml of L-broth in the presence of ampicillin (100 [mu]g/ml) overnight and the plasmid DNA was isolated by alkaline-SDS method and dissolved in 150 [mu]l of TE. These clones were stored as the sub-libraries and subjected to the following fingerprinting selection. The cloned DNA (10 [mu]l) was digested with HaeIII (Nippon gene, 4 U) at 37oC for 1 h and electrophoresed on a 2% Nusieve/1% agarose gel. The HaeIII fingerprint of each clone was used to choose the independent subclones. The size of the insert DNA was also estimated by digestion with SfiI. The sequences of the both-ends of each subclone were determined by cycle sequencing with forward (endash 21m) and reverse (M13 RP) primers.

RESULTS

Construction of the nested deletion libraries

To construct nested deletion libraries, we designed and constructed three unique vectors, termed pSFI-CV and pSFI-SV1 and pSFI-SV10, from commercially available plasmid vectors pUC13 and pTZ19R as described in Materials and Methods. As shown in Figure 1 , pSFI-CV has a multiple cloning site (MCS) which is sandwiched by the two unique SfiI sites producing the 3'-overhang ends of ATT-3' and TAA-3', respectively. Two pSFI-SV vectors have a MCS producing a blunt-end and a SfiI 3'-overhang end complementary to either SfiI end of the pSFI-CV, respectively. These vectors are available upon request. By using these vectors, nested deletion libraries were generated as follows (Fig. 2 ). A DNA fragment to be sequenced (usually 2-15 kb long) was cloned into the MCS of pSFI-CV. The plasmid was digested with SfiI and then ligated at a high DNA concentration to generate the alternatively ligated high-molecular weight DNA. The ligated DNA was sonicated to generate the various sizes of fragments and the ends were flushed by the treatment with T4 DNA polymerase. The sonication of high molecular weight DNA enabled us to generate various size of fragments more efficiently than that of un-treated small plasmid DNA. The blunt-ended DNA was digested with SfiI to produce the fragments with a blunt-end at one end and a SfiI 3'-overhang end at the other end. Among the fragments, only the insert-derived fragments have the SfiI site complementary to the SfiI end of either pSFI-SV vector. The fragments were ligated with a pSFI-SV1 or SV10 double-digested with SfiI and appropriate blunt-end enzymes, which enabled us to selectively ligate the insert DNA-derived fragments with pSFI-SV1 or pSFI-SV10 through the complementary end of the SfiI. Finally, the ligated DNAs were introduced into E.coli to generate a library of the nested deletion clones of various length. The results of construction of nested deletions from several fragments are summarized in Table 1 . The data indicated that ~90% of the clones in the library were nested deletions with the various sizes. In general, smaller DNAs were preferentially cloned but some uneven size distribution of the deletions caused little problem in practical use.

Table 1 Construction of nested deletions from various size of DNA fragments
Clone
 

Estimated
insert size (kb)

Vector
 


No. of
isolated clone


No. of
nested deletions

Yield of
nested deletions


Size range of
insert DNA (kb)

1

2.5

SV1

40

36

90

0.2-2.3

 

 

SV10

40

35

87

0.3-2.3

2

3.6

SV1

60

52

87

0.1-3.7

 

 

SV10

60

43

71

0.2-3.3

3

4.0

SV1

60

58

97

0.3-3.9

 

 

SV10

60

54

90

0.3-3.9

4

6.0

SV1

90

58

64

0.2-5.9

 

 

SV10

90

71

80

0.4-6.0

5

7.2

SV1

96

82

85

0.2-6.7

 

 

SV10

96

65

68

0.4-6.8

6

8.5

SV1

120

107

89

0.2-7.6

 

 

SV10

120

100

83

0.1-8.0

7

11.3

SV1

120

113

86

0.1-10.4

 

 

SV10

120

119

99

0.4-10.8

Insert DNA was amplified by colony PCR except clones 6 and 7, which were isolated as plasmids. The DNA was subjected to electrophoresis on a 0.8% agarose gel and the size was measured by CCD imaging system. Clones other than nested deletions were un-amplified or had no insert.

Isolation and ordering of nested deletion clones

The kanamycin-resistant white colonies prepared as above were randomly picked, cultured and the plasmids were isolated by an automated plasmid isolator. Alternatively, the insert DNA was directly amplified from colonies by PCR. The plasmid or PCR-amplified DNA was electrophoresed on agarose gel for size measurement. Figure 3 a shows a typical example of the electrophoresis pattern of PCR-amplified insert DNAs of randomly isolated clones from a nested deletion library. Since the size measurement is a laborious step, we developed an automated size-measuring system, in collaboration with ATTO Co. Ltd. (Tokyo, Japan). The system allowed us to automatically measure the size of insert DNAs and to re-align the clones by size with the aid of the computer program, Lane Screener. The CCD Imaging system and the program Lane Screener will be purchased from ATTO. Figure 3 b shows an example of the electrophoresis pattern of the ordered nested deletions. Among ordered clones, appropriate clones were chosen with appropriate deletion interval and subjected to sequencing. Since 400-600 bases of raw data can be constantly obtained by sequencing of one clone, a set of nested deletions with an interval of 250-350 bases is sufficient to obtain the contiguous sequence data of one strand. For example, in the case of a 4 kb DNA fragment, 12 clones out of the 60 ordered clones for one strand were selected for sequencing.


Figure 3.An example of size-measurement and ordering of nested deletions. (a) Electrophoresis profiles of the PCR-amplified DNAs. The insert DNA was originally 4.6 kb long and the amplification product of no insert clone is ~200 bp. The most left lane of the left gel and the most right lane of the right gel indicate a size marker, StyI-digested [lambda] DNA. The size of each band is ~19.3, 7.7, 6.2, 4.3, 3.5, 2.7, 1.9, 1.5, 0.9 and 0.4 kb from the top, respectively. (b) The size of DNA band was measured by the CCD imaging system (ATTO Co. Ltd., Japan), and the bands were ordered by the size by newly developed software program, Lane Screener, which enables us to simultaneously measure and order up to 300 fragments and also to select the clones with appropriate deletion-interval for sequencing. The process from size-measurement to selection of nested deletions for sequencing can be automatically done in 15 min for 100 fragments.

Sequencing and data assembly

The selected clones were subjected to sequencing analysis. The sequencing was carried out as described in Materials and Methods. The data assembly was initially performed by a commercially available software program ATSQ (Japan Sofware Inc., Japan) developed for shot-gun sequencing. However, our strategy gives an ordered sequencing data that can be assembled without the unnecessary matching process of positionally unrelated sequences. We thus developed a new data assembly program for the nested deletion strategy in collaboration with Mitsui Knowledge Industry Co., Ltd. (Japan). The system named Sequence Assembler of Nested Deletion (SAND) enabled us to assemble the data with high speed and accuracy. Details of SAND will be published elsewhere.

Application of the nested deletion strategy for large-scale sequencing

We applied our nested deletion method for sequencing ~300 kb of the human amyloid precursor protein (APP) locus mapped on chromosome band 21q22.1. All the exons and their flanking regions of the APP gene were previously isolated and partly sequenced in our laboratory (12 ). For this study, we isolated five P1 clones covering the entire APP gene (Fig. 4 ) from a P1 library specific for human chromosome 21q (13 ).

The overall procedure for sequencing of the P1 clones consisted of: (i) subcloning of restriction fragments of the P1 clone into the pSFI-CV; (ii) construction of the restriction map of the P1 clone; (iii) construction of nested deletions from selected subclones; (iv) sequencing and data assembly of the nested deletions; and (v) assembly of the consensus sequence data of each subclone to generate the final contiguous sequence data.

We first subcloned the restriction fragments from each P1 clone into pSFI-CV. We used BamHI, XbaI, AvrII and BglII, because BamHI, XbaI, AvrII do not appear and BglII appears once in the P1 vector (pAd10SacIIB, ref. 14 ). The restriction subclones were randomly isolated from a P1 clone and briefly characterized by HaeIII-fingerprinting to choose independent clones. Among the 240 subclones, ~80 of independent clones with a range of 0.3-16 kb in size were chosen. The both-end sequencing of the independent clones followed by the data assembly were performed to obtain the partial map. This process usually produced three to five contiguous regions of 10-40 kb. The gaps between the contigs were then filled by re-screening of remaining subclones by PCR using primers designed from the sequence data of the edges of each contig. This procedure allowed us to rapidly construct the restriction map and to construct the minimum tiling path of the fragments over the five contiguous P1 clones. Finally, we selected 72 restriction fragments from the five P1s. The nested deletion libraries were prepared for 69 restriction fragments and finally ~1600 of nested deletions were subjected to sequencing analysis. The remaining three fragments of <1.5 kb were sequenced by primer-walk method using dye-terminator chemistry. We finally obtained 301 692 bp of the consensus sequence (DDBJ accession number; D87675) from the five P1 clones. The results revealed the complete structure of the APP gene spanning 286 722 bp (Fig. 4 ). Detailed analysis of the sequence data will be reported elsewhere. Table 2 summarized the efficiency of our method for sequencing of the APP locus. It should be noted that the final data were obtained with a redundancy of 3.8 that is about a half of that by shot-gun strategy. The typical shot gun strategy would require redundancy in a range from 6 to 8 (15 ). The redundancy by primer-walk strategy would be estimated at ~2.5, assuming that a reaction produces a 500 base data and the 100 bases are overlapped between the data. But this value is an underestimation for sequencing of human genome because the presence of highly repetitive sequences such as the Alu family makes it difficult to continue the walk. It should be also noted that gap closure for only six gaps was required for the 300 kb sequencing. These gaps were due to the failure of sequencing of the PCR-amplified templates containing poly-purine, poly-pyrimidine or poly purine-pyrimidine sequences and were successfully closed by re-sequencing the original subclones as templates by dye-terminator chemistry using specific primers. About 96% of double-stranded coverage was obtained at the initial step and the remaining 4% (29 regions) of single-stranded coverage were caused by missing of nested deletions from either strand. These regions were also covered by sequencing the original subclones by primer-walk method using dye-terminator chemistry. In total, 44 synthetic primers were required to complete the double-stranded coverage of 300 kb.


Figure 4.P1 clones covering the human APP locus and exon-intron organization of the APP gene based on the sequence data obtained in this study. The five overlapping P1 clones (S-459, T-1559, S-491, T-1715 and T-364) were used as starting materials for sequencing. The position of the exons (1-18) was arbitrarily shown by vertical bars. The numbers between the exons indicate the size of the introns in bp. The sequence data revealed that the five P1 clones covered 301 692 bp. The first exon started at 9001 and the end of the last exon (exon 18) located at 295 722, so that the APP gene is 286 722 bp in length. The accession number of the entire sequence of the APP locus is D87675.

DISCUSSION

In the present study, we developed a novel method for making nested deletions and demonstrated that it is applicable for large-scale sequencing. The method for making nested deletions is comprised of simple steps including digestion by SfiI, ligation and sonication. It has not yet been automated but is simple enough to obtain the nested deletion libraries for >100 kb of final sequence data in a few days by one technician. The use of asymmetric 3'-overhang ends of SfiI cleavage sites in the vector provides a high efficiency and selectivity in the ligation steps. Actually, the nested deletion libraries prepared by this method contained almost no vector-derived fragments (see Table 1 ). The method is applicable for all the clonable DNA, although some modifications may be required for the DNA fragments having SfiI sites of the same 3'-overhang ends with those of pSFI vectors. On average, SfiI site appears every 200 kb in the human genome and the SfiI sites with the same 3'-overhang end every 13 000 kb or less, thus the frequency of such SfiI sites is low enough for practical use.

To apply this method for sequencing of the large insert clones such as P1 and BAC clones, subcloning and mapping of the restriction fragments are required. In the case of P1 clones, this step could be simply performed by one-pass end-sequencing of the restriction fragments followed by the data assembly on the computer, and provided several advantages for practical use. It made the final data assembly easy. It was also useful to find the same fragments present in the overlapped regions between the P1 clones to avoid unnecessary duplication of sequencing. Also, the use of restriction enzyme allowed us to use P1 DNA purified simply by polyethylene-glycol precipitation in the subcloning step. No ultra-centrifugation step was required.

The most labor-intensive and time-consuming step of the nested deletion-based strategy is the isolation and ordering of nested deletions. For example, we isolated and electrophoresed ~8000 clones for selecting 1600 nested deletions. However, the employment of the direct amplification of insert DNA by long PCR (16 ) enabled us to rapidly prepare templates of various lengths. The templates prepared in this manner gave sequence data of high quality sufficient for the data assembly. For ordering the clones, we developed a new instrument for automated size-measurement and ordering of the clones, which allowed us to handle >2000 nested deletion clones in a day per person. A software program was also developed for automated selection of nested deletions with appropriate deletion interval.

Table 2 . Summary of sequencing of APP locus using the nested deletion method
Description

Number

P1 clones sequenced

5 clones

Independent restriction fragments

368 fragments

Restriction fragments sequenced

72 fragments

Restriction fragments used for ND construction

69 fragments

Restriction fragments used for primer-walk

3 fragments

ND clones isolated

8120 clones

Total sequencing reactions

2370 reactions

End-sequencing of restriction fragments

736 reactions

Sequencing of small fragments by primer-walk

8 reactions

Sequencing of ND clones

1582 reactions

Primer-walk for gap closure

12 reactions

Primer-walk for double-stranded coverage

32 reactions

Average edited read base

486 bases

Total read base

1 151 820 bases

Consensus sequence

301 692 bases

Redundancy

3.8

ND, nested deletions.

In contrast to relatively complicated steps in the front side, the steps for sequencing and data assembly are easy and simple. To obtain the final 300 kb contiguous data of the APP locus, only 2400 sequencing reactions including end-sequencing of the subclones and gap-close sequencing were carried out in total. The data were quickly assembled in a quite simple and reliable manner. The results were easily checked by comparing the clone alignment in the assembled data with those obtained from size-measurement. It should be emphasized that in our method the direction of each sequence data was already known prior to data assembly, so that at first the data for each strand were assembled and then the data of each strand were compared to obtain the final consensus data. This two-step assembly procedure eliminated the unnecessary matching process of the complementary sequence data of each sample, accelerating the overall speed of the data assembly step and making it easier to resolve the ambiguities.

The accuracy of the data obtained in this study is estimated to be >99.99% as discussed below. We compared raw data from 1000 samples and found that the frequency of ambiguity of our data was 0.25% in bases 1-400 and ~10% in bases 401-500. The data assembly for one strand was carried out by using up to 500 bases of raw data and the ambiguities were resolved by eye inspection. Since we used the data with an interval of ~300 bases, 400 bases out of the 500 bases of raw data are overlapped each other and the ambiguity in the overlapped region is estimated to 0.005% (0.25% * 2% = 0.005%). The accuracy of the remaining 100 bases of non-overlapped region is 0.25% as mentioned above. Therefore, the assembled data for one strand should have an accuracy of at least 99.75%. However, for practical purposes, the ambiguities were finally resolved based on the data from both-stranded coverage over the entire region. Therefore, the error rate should be much less than 0.01% (0.25% * 0.25% = 0.000625%). Also, the final consensus data were obtained with a redundancy of 3.8. Thus, the accuracy should be much higher than the above value. In fact, comparison of the present data with those of 3.6 kb APP cDNA showed no inconsistency between them.

In conclusion, the nested deletion method developed in this study provided a novel strategy for large-scale sequencing characterized by low redundancy, little gap closure effort and rapid data assembly. The procedure can be employed not only for P1 but also for BAC (17 ) and PAC (18 ) systems.

ACKNOWLEDGEMENTS

This work was supported in part by Grant-in-Aid for a Creative Basic Research (Human Genome Program) from the Ministry of Education, Science, Sports and Culture, and a Grant of a Special Coordination Funds for Promotion of Science and Technology from the Science and Technology Agency (STA) of Japan.

REFERENCES

1 Gibbs, R. A. (1995) Nature Genet., 11, 121-125.

2 Olson, M. V. (1995) Science, 270, 394-396.

3 Hunkapiller, T., Kaiser, R. J., Koop, B. F. and Hood, L. (1991) Science, 254, 59-67. MEDLINE Abstract

4 Hodgkin, J., Plasterk, R. H. and Waterston, R. H. (1995) Science, 270, 410-414 . MEDLINE Abstract

5 Fleischmann, R. D. et al. (1995) Whole-genome random sequencing and assembly of Haemophilus influenzae Rd. Science, 269, 496.

6 Studier, F. W. A (1989) Proc. Natl. Acad. Sci. USA, 86, 6917-6921.

7 Kotler, L. E., Zevin-Sonkin, D., Sobolev, I. A., Beskin, A. D. and Ulanovsky, L. E. (1993) Proc. Natl. Acad. Sci. USA, 90, 4241-4245.

8 Strathmann, M., Hamilton, B. A., Mayeda, C. A., Simon, M. I., Meyerowitz, E. M. and Palazzolo, M. J. (1991) Proc. Natl. Acad. Sci. USA, 88, 1247-1250. MEDLINE Abstract

9 Martin, C. H., Mayeda, C. A., Davis,C. A., Ericsson, C. L., Knafels, J. D., Mathog, D. R., Celniker, S. E., Lewis, E. B. and Palazzolo, M. J. (1995) Proc. Natl. Acad. Sci. USA, 92, 8398-8402.

10 Kusuda, J. Kameoka, Y. Takahashi, I. Fujiwara, H. and Hashimoto, K. (1989) Nucleic Acids Res., 17, 8890.

11 Maniatis, T., Fritsch, E. F. and Sambrook, J. (1989) Molecular cloning: A Laboratory Manual. Cold Spring Harbor Laboratory Press, Cold Spring Harbor, NY.

12 Yoshikai, S., Sasaki, H., Doh-ura, K., Furuya, H. and Sakaki, Y. (1990) Gene, 87, 257-263. MEDLINE Abstract

13 Tanahashi, H., Ito, T., Hattori, M., Ohira, M., Ohki, M., Tashiro, K. and Sakaki, Y. (1994) DNA Res., 1, 85-89. MEDLINE Abstract

14 Pierce, J. C. Sauer, B. and Sternberg, N. (1992) Proc. Natl. Acad. Sci. USA, 89, 2056-2060.

15 Hodgkin, J., Plasterk, R. H. A. and Waterston, R. H. (1995) Science, 270, 410-414. MEDLINE Abstract

16 Barnes, W. M. (1994) Proc. Natl. Acad. Sci. USA, 91, 2216-2220.

17 Shizuya, H., Birren, B., Kim, U. J., Mancino, V., Slepak, T., Tachiiri, Y. and Simon, M. (1992) Proc Natl. Acad. Sci. USA, 89, 8794-8797. MEDLINE Abstract

18 Ioannou, P. A., Amemiya, C. T., Garnes, J., Kroisel, P. M., Shizuya, H., Chen, C., Batzer, M. A. and De Jong, P. J. (1994) Nature Genet., 6, 84-89.


Return

*To whom correspondence should be addressed. Tel: +81 3 5449 5623; Fax: +81 3 5449 5445; Email: hattori@hgc.ims.u-tokyo.ac.jp
Add to CiteULike CiteULike   Add to Connotea Connotea   Add to Del.icio.us Del.icio.us    What's this?


This article has been cited by other articles:


Home page
DNA ResHome page
K. Kurokawa, T. Itoh, T. Kuwahara, K. Oshima, H. Toh, A. Toyoda, H. Takami, H. Morita, V. K. Sharma, T. P. Srivastava, et al.
Comparative Metagenomics Revealed Commonly Enriched Gene Sets in Human Gut Microbiomes
DNA Res, October 16, 2007; (2007) dsm018v2.
[Abstract] [Full Text] [PDF]


Home page
DNA ResHome page
Y. Azuma, H. Hirakawa, A. Yamashita, Y. Cai, M. A. Rahman, H. Suzuki, S. Mitaku, H. Toh, S. Goto, T. Murakami, et al.
Genome Sequence of the Cat Pathogen, Chlamydophila felis
DNA Res, January 1, 2006; 13(1): 15 - 23.
[Abstract] [Full Text] [PDF]


Home page
Genome Res.Home page
T. Yokomine, H. Shirohzu, W. Purbowasito, A. Toyoda, H. Iwama, K. Ikeo, T. Hori, S. Mizuno, M. Tsudzuki, Y.-i. Matsuda, et al.
Structural and functional analysis of a 0.5-Mb chicken region orthologous to the imprinted mammalian Ascl2/Mash2-Igf2-H19 region
Genome Res., January 1, 2005; 15(1): 154 - 165.
[Abstract] [Full Text] [PDF]


Home page
Genome Res.Home page
K. Abe, H. Noguchi, K. Tagawa, M. Yuzuriha, A. Toyoda, T. Kojima, K. Ezawa, N. Saitou, M. Hattori, Y. Sakaki, et al.
Contribution of Asian mouse subspecies Mus musculus molossinus to genomic constitution of strain C57BL/6J, as defined by BAC-end sequence-SNP analysis
Genome Res., December 1, 2004; 14(12): 2439 - 2447.
[Abstract] [Full Text] [PDF]


Home page
Nucleic Acids ResHome page
Y. Sasaki, J. Ishikawa, A. Yamashita, K. Oshima, T. Kenri, K. Furuya, C. Yoshino, A. Horino, T. Shiba, T. Sasaki, et al.
The complete genomic sequence of Mycoplasma penetrans, an intracellular bacterial pathogen in humans
Nucleic Acids Res., December 1, 2002; 30(23): 5293 - 5300.
[Abstract] [Full Text] [PDF]


Home page
Genome Res.Home page
A. Toyoda, H. Noguchi, T. D. Taylor, T. Ito, M. T. Pletcher, Y. Sakaki, R. H. Reeves, and M. Hattori
Comparative Genomic Sequence Analysis of the Human Chromosome 21 Down Syndrome Critical Region
Genome Res., September 1, 2002; 12(9): 1323 - 1332.
[Abstract] [Full Text] [PDF]


Home page
Genome Res.Home page
K. Okamura, Y. Hagiwara-Takeuchi, T. Li, T. H. Vu, M. Hirai, M. Hattori, Y. Sakaki, A. R. Hoffman, and T. Ito
Comparative Genome Analysis of the Mouse Imprinted Gene Impact and Its Nonimprinted Human Homolog IMPACT: Toward the Structural Basis for Species-Specific Imprinting
Genome Res., December 1, 2000; 10(12): 1878 - 1889.
[Abstract] [Full Text]


Home page
J. Biol. Chem.Home page
C. Bergsdorf, K. Paliga, S. Kreger, C. L. Masters, and K. Beyreuther
Identification of cis-Elements Regulating Exon 15 Splicing of the Amyloid Precursor Protein Pre-mRNA
J. Biol. Chem., January 21, 2000; 275(3): 2046 - 2056.
[Abstract] [Full Text] [PDF]


This Article
Right arrow Abstract Freely available
Right arrow Print PDF (134K) Freely available
Right arrow Alert me when this article is cited
Right arrow Alert me if a correction is posted
Services
Right arrow Email this article to a friend
Right arrow Similar articles in this journal
Right arrow Similar articles in ISI Web of Science
Right arrow Similar articles in PubMed
Right arrow Alert me to new issues of the journal
Right arrow Add to My Personal Archive
Right arrow Download to citation manager
Right arrow Search for citing articles in:
ISI Web of Science (51)
Right arrow Commercial Re-use Guidelines
for Open Access NAR Content
Google Scholar
Right arrow Articles by Hattori, M.
Right arrow Articles by Sakaki, Y.
Right arrow Search for Related Content
PubMed
Right arrow PubMed Citation
Right arrow Articles by Hattori, M.
Right arrow Articles by Sakaki, Y.
Social Bookmarking
 Add to CiteULike   Add to Connotea   Add to Del.icio.us  
What's this?