ABSTRACT
To facilitate the scanning of large genomic regions for the presence of exonic
gene segments we have constructed a cosmid-based exon trap vector. The vector serves a dual purpose since it is also
suitable for contig construction and physical mapping. The exon trap cassette
of vector sCOGH1 consists of the human growth hormone gene driven by the mouse
metallothionein-1 promoter. Inserts are cloned in the multicloning site located in intron
2 of the hGH gene. The efficiency of the system is demonstrated with cosmids
containing multiple exons of the Duchenne Muscular Dystrophy gene. All exons
present in the inserts were successfully retrieved and no cryptic products were
detected. Up to seven exons were isolated simultaneously in a single spliced
product. The system has been extended by a transcription-translation-test protocol to determine the presence of large open reading
frames in the trapped products, using a combination of tailed PCR primers
directing protein synthesis in three different reading frames, followed by
in vitro
transcription-translation. Having larger stretches of coding sequence in a single exon
trap product rather than small single exons greatly facilitates further
analysis of potential genes and offers new possibilities for direct mutation
analysis of exon trap material.
Exon trapping has become a widely used method which is generally acknowledged as
a versatile tissue-independent approach to detect genes in cloned DNA. In contrast to RNA-based methods, such as cDNA selection and direct screening of cDNA
libraries, exon trapping is independent of tissue-specific gene expression. It uses cloned DNA directly to select sequences
surrounded by functional splice sites (
1
-
3
). Original exon trapping protocols have been improved with respect to speed and
efficiency and improvements have been made to reduce the background consisting
of cryptically spliced products and products arising from vector-vector splicing (
4
). However, some of the limitations of the original systems have remained
unaddressed.
A major limitation of current systems is the need for subcloning of the region
of interest in a vector with a capacity for inserts typically measuring 1-2 kb. This has several consequences: (i) due to the small insert size
after subcloning, multiple exons will only rarely be present in one insert,
resulting in exon trap clones containing only a single exon. Consequently, many
of the exon trap probes derived are small (~80-150 bp) and frequently give poor signals or a high signal-to-noise ratio in subsequent experiments, e.g. the
screening of cDNA-libraries or probing of Northern blots. Furthermore, since the
individually trapped exons require the use of cDNA libraries in the next step
to further define the gene, the initial advantage of working with an expression
independent system is to a large extent lost in the subsequent step. (ii) Due
to subcloning into plasmid-based exon trap vectors the gene(s) present are scattered into many
separate, disconnected pieces. Any exons thus obtained have to be aligned to
reconstruct their original order. Reconstruction of the gene from individually
trapped exons requires a significant amount of time and effort and implies a
major loss of information originally contained within the input material prior
to subcloning. (iii) Subcloning disrupts the genomic context around the exons.
Cloning of regions which are never transcribed or of intronic sequences without their naturally flanking
exons often results in activation of cryptic splice sites, leading to
recognition of false exons and a background of false positives. On the other hand, genuine exons will be missed due to poor
recognition of the host system or due to unfavourable factors resulting from
the cloning (e.g. spacing of restriction sites). (iv) Current exon trapping
systems can only be used in combination with specific cell lines (e.g. COS
cells), since they require a system of replication in the host cell, commonly
based on the SV40 origin of replication (
2
). It is imaginable that some exons of genes with a highly tissue-specific expression pattern will not be included in the mature transcripts
generated in a completely different cell type (
5
).
Although the 3' exon trapping recently described (
6
,
7
) has some advantages in that it allows larger exons to be trapped, specifically
identifies the end of a gene, and selects exons based on two independent
criteria i.e. splicing and polyadenylation; it does not, however, address the
other limitations of small-insert exon trapping.
We have designed a large-insert exon trapping vector capable of scanning 25-40 kb genomic regions for exons. The vector has a dual use: as
cosmid vector for contig construction and physical mapping, and as exon trap
vector for isolation of coding sequences. In the vector, inserts are cloned
into intron 2 of the human growth hormone gene (hGH) and transcription is
driven by a mouse metallothionein-1 promoter (mMT-1). This is a strong, ubiquitously expressed promoter which allows
many different cell types to be used, thus obviating the restriction to COS
cells applying to the SV40-based systems used so far. During exon trapping the genomic context is
maintained over the entire 25-40 kb region, reducing the false positive rate while yielding processed
transcripts with multiple exons spliced together in the correct order. The
efficiency of the system is demonstrated using cosmids containing up to seven
exons of the duchenne muscular dystrophy gene (DMD). We believe that the system
should greatly increase the speed and reliability of gene isolation by exon
trapping by offering a solution for most major limitations of current exon
trapping systems.
sCOGH1, schematically drawn in Figure
1
, was constructed as follows: cosmid vector sCos1(
8
) was digested with
Eco
RI and the 7.9 kb vector fragment was separated from the
Eco
RI linker by agarose gel electrophoresis and elution. Similarly, plasmid pXGH5(
9
) was digested with
Eco
RI and the 4 kb fragment containing the mouse metallothionein-1 promoter (mMT-1) and the human growth hormone gene (hGH) was isolated by gel-purification. Both fragments were combined by ligation,
resulting in the isolation of sCOGH0a and sCOGH0b, differing in the orientation of the mMT1/hGH-insert in sCos1. Subsequently a linker composed of two complementary oligonucleotides (5'-AGCGGCCGCGAATTCGGATCCGGCGGCCGC-3' and 5'-CTGCGGCCGCCGGATCCGAATTCGCGGCCG-3') was synthesized
containing
Not
I,
Bam
HI and
Eco
RI sites as well as
Acc
I sticky ends, and introduced into intron 2 of the hGH gene by digestion of
sCOGH0b with
Acc
I and ligation. The resulting vector was designated sCOGH1.
All sCOGH-derivatives were propagated in
E.coli
strain HB10B (kindly provided by Pieter de Jong). For cosmid cloning, vector
DNA was linearised with
Xba
I, dephosphorylated and subsequently digested with
Bam
HI. Agarose plugs containing genomic yeast DNA and YAC DNA of yDMD(0-25)C, containing the human DMD-gene from 100 kb upstream of the brain exon 1 to 100 kb downstream
of exon 79 (
10
), were partially digested with
Mbo
I, size fractionated and ligated into the
Bam
HI-site of sCOGH1. The ligated material was packaged using Gigapack II Plus
Packaging Extract (Stratagene) and used to infect
E.coli
1046. Cosmids containing specific regions of the DMD-gene were isolated and analysed using standard protocols (
11
) by hybridization with specific DMD cDNA sequences (
10
). The exon content was established by PCR with exon primers and by
hybridisation of the
Hin
dIII-digested cosmids with the DMD cDNA. The inserts of screened cosmids were
reversed by
Not
I-excision of the insert, religation and transformation to
E.coli.
The orientation of the insert was determined by restriction digestion.
Initially COS-1 cells were used for transfection experiments. We found, however, that
exon trapping results strongly improved using hamster V79 cells. Higher yields
were obtained of full length PCR fragments. Therefore later experiments were
performed with this cell line. We explain the improvement by a lower degree of
homology between the hGH-primers and the endogenous hamster growth hormone gene compared to the
corresponding sequences in COS-1 cells. The cells were cultured in DMEM with 10% inactivated fetal calf
serum (Gibco-BRL).
Cosmid DNA was introduced by electroporation: actively growing cells were
collected by centrifugation, washed in cold PBS (without bivalent cations) and
resuspended in cold PBS at a density of 2 * 10
7
cells/ml. Cell suspension (0.5 ml) was added to 20 [mu]l of PBS containing 10 [mu]g cesium-chloride purified cosmid DNA and placed in a pre-chilled electroporation cuvette (0.4 cm chamber, BioRad).
After 5 min on ice, the cells were electroporated in a BioRad Gene Pulser [300
V (750 V/cm); 960 [mu]F], and placed on ice again. After 5 min the cells were transferred gently
to a 100 mm tissue culture dish containing 10 ml of pre-warmed, equilibrated DMEM + 10% FCS. Transfection efficiency was monitored
by assaying the hGH concentration in 100 [mu]l of the culture medium of cells transfected in parallel with pXGH5 using
the Allégro hGH Transient Gene Assay kit (Nichols Institute, San Juan
Capistrano, USA) (
9
).
48-72 h after transfection, the cells were harvested and total RNA was
isolated using RNazolB (CINNA/BIOTECX). First-strand cDNA synthesis was performed by adding 50 pmol of primer hGHf to 2 [mu]g total RNA in a volume of 16 [mu]l. The mixture was incubated at 65oC for 10 min and chilled on ice. 14 [mu]l of a mix containing 3 [mu]l 0.1 M DTT, 3 [mu]l 10 mM dNTPs, 0.5 [mu]l RNasin (40 U/[mu]l; Promega), 6 [mu]l 5* RT buffer (250 mM Tris-HCl pH 8.3, 375 mM KCl,
15 mM MgCl
2
; Gibco-BRL) and 150 U SuperScript Reverse Transcriptase (Gibco-BRL) were added to a final volume of 30 [mu]l, and incubated at 42oC for 1 h. Subsequently, the solution was heated to 95oC for 5 min and chilled on ice. RNase H (2.25 U;
Promega) was added and the solution was incubated at 37oC for 20 min. An aliquot of the solution (10 [mu]l) was used in a PCR reaction containing 12.5 pmol of primer hGHe, 50
mM KCl, 1.5 mM MgCl
2
, 10 mM Tris-HCl pH 8.0, 0.2 mM dNTPs, 0.2 mg/ml BSA and 0.25 U SuperTaq (HT
Biotechnology Ltd) in a reaction volume of 50 [mu]l, followed by an initial denaturation step of 5 min at 94oC, 30 cycles of amplification (1 min at 94oC, 1 min at 60oC and 2 min at 72oC) and a final extension of 10 min at 72oC. No additional hGHf primer was added in the PCR
reaction. Nested PCR, using either internal hGH primers or combinations of a
hGH primer and a DMD primer, was performed on 1 [mu]l of the primary PCR material with 12.5 pmol of each primer and PCR
conditions identical to the first PCR. The internal hGH primers used were hGHa
and hGHb. When RNA-PCR products were used for
in vitro
transcription-translation, primer hGHa was replaced by hGHORF1, hGHORF2 or hGHORF3.
Direct sequencing of PCR products was performed using the Sequenase
TM
PCR Product Sequencing kit (USB).
hGHa: 5'-CGGGATCCTAATACGACTCACTATAGGCGTCTGCACCAGCTGGCCTTTGAC-3'
hGHb: 5'-CGGGATCCCGTCTAGAGGGTTCTGCAGGAATGAATACTT-3'
hGHe: 5'-ACGCTATGCTCCGCGCCCATCGT-3'
hGHf : 5'-ACAGAGGGAGGTCTGGGGGTTCT-3'
D69F1: 5'-GCCATAAAAATGCACTATCCA-3'
D72F1: 5'-CCTCAGCTTTCACACGATGA-3'
D72R1: 5'-TCATCGTGTGAAAGCTGAGG-3'
D73R1: 5'-ATCCATTGCTGTTTTCCATTTC-3'
D74R1: 5'-GCAGGACTACGAGGCTGG-3'
polyT-REP: 5'-GGATCCGTCGACATCGATGAATTC(T)
25
-3'
hGHORF1: 5'-CGGGATCCTAATACGACTCACTATAGGACGACCACC
hGHORF2: 5'-CGGGATCCTAATACGACTCACTATAGGACAGACCACC
hGHORF3: 5'-CGGGATCCTAATACGACTCACTATAGGACAGACCACC
hGHUTR1: 5'-CAGGAGAGGCACTGGGGA-3'
hGH primers were designed from the sequence M13438 and DMD primers from the
sequence M18533 (EMBL sequence database). cDNA probe 63-1/3 is a subclone of the DMD cDNA and was used to screen for cosmids
containing specific regions of the DMD gene. Probe 63-1/3 contains exons 65-74. Probe P20 contains exon 45 and part of intron 44 of the DMD
gene (
12
).
Modified primers, containing a T7 promoter and an eukaryotic translation
initiation sequence, were used to generate PCR products suitable for
in vitro
transcription-translation. T7-PCR product (200-400 ng) was added to the TnT/T7 coupled reticulocyte lysate
system (Promega). The synthesized protein products were separated on a 15% SDS-polyacrylamide minigel system. Fluorography was obtained by washing the
gels in DMSO/PPO. Dried gels were exposed 16-40 h for autoradiography.
Vector sCOGH1 contains all the essential elements of a cosmid vector, i.e.
origin of replication, antibiotic resistance marker (ampicillin and neomycin)
and two
cos
sites (Fig.
1
). In addition it contains an exon trap cassette consisting of a mMT-1 promoter driving expression of the hGH gene, containing a multicloning
site (MCS) located between exons 2 and 3 (see Materials and Methods for details
of vector construction). The ubiquitous mMT-1 promoter allows the use of many cell types. The vector is constructed such that the inserts can easily be excised and religated
to obtain the opposite transcriptional orientations.
Cosmids are introduced into the cell type of choice by electroporation. We have
tested and compared various cell lines and found V79 Chinese hamster lung cells
to be a very efficient general host cell type. Upon expression, the hGH-initiated transcript will incorporate putative exons from the insert,
cloned between exons 2 and 3 of the hGH gene, thus giving a chimeric product.
After processing of the primary transcript, the putative RNA containing the
exons to be trapped is amplified by RT-PCR using flanking vector-derived primers (Fig.
2
A). In the specific event of 5' and 3' ends of genes being present in the insert, these will be skipped
by the processing system or lead to alternatively initiated or terminated
transcripts. They can be detected in the same mixture by 5' or 3' RACE, using opposite vector-derived primers separately (
13
). Gene inserts cloned in an antisense orientation will not be trapped,
resulting in amplification of hGH sequences only (Fig.
2
B).
To demonstrate the ability of the present method to isolate exonic gene segments
from eukaryotic mammalian DNA, we subcloned YAC yDMD(0-25)C known to contain the human DMD gene (
10
). Two cosmids were isolated and used for exon trapping: cDMD2 and cDMD3. cDMD2
contains exons 72-76 and cDMD3 exons 68-74 (Fig.
3
). RNA was isolated from V79 cells transfected with the cosmids and vector-derived transcripts were amplified by RT-PCR using either two vector-derived primers (hGHa and hGHb; Fig.
3
, lane A,) or a combination of a vector-derived and DMD exon primers (Fig.
3
; lanes B and C). In all cases, RNA-PCR analysis yielded products containing the expected exonic DMD-segments.
Exon trapping of cDMD2r, containing the exonic DMD segments in the antisense
orientation, gave no insert-derived products. The only product amplified was the 132 bp empty hGH exon
2/exon 3 product (Fig.
4
, lane 1). This shows that, using this system, the false-positive rate of an entire 30 kb insert is effectively zero. Exon trapping
of cDMD3r resulted in a PCR product of ~0.25 kb (Fig.
4
, lane 4), either corresponding to a cryptic product or to an unknown exon
derived from the antisense strand. Hybridisation of this product to a
Hin
dIII-digest of cDMD3 showed that it mapped to the cosmid and was spliced. The
339 bp product visible in lanes 2 (cDMD2r) and 3 (cDMD3) represents unspliced
hamster growth hormone and results from traces of contamination of V79 genomic
DNA in the RNA preparation.
The 0.73 kb RT-PCR product of cDMD3 (Fig.
3
B, lane B) was reamplified, replacing the hGH exon 2 forward primer with three
different primers, hGHORF1-3, containing a T7 promoter, a translation initiation sequence and either
no, one or two additional nucleotides inserted between the ATG translation
initiation codon and the hGH-sequence. The resulting RT-PCR products, each introducing a different reading frame, were used
in an
in vitro
transcription-translation assay to scan for the presence of an open reading frame
(ORF). As a control, a 0.6 kb PCR product of the hGH gene was synthesized in
the three reading frames, using the same forward primers in combination with
primer hGHUTR1, located in the 3'-UTR of the hGH gene. The control hGH product synthesized using
primer hGHORF1 (Fig.
5
), was predicted to contain an ORF of 172 amino acids and yielded the expected
peptide of ~20 kDa, while no product was obtained in the two other reading frames.
Similarly,
in vitro
transcription-translation of the cDMD3-derived hGHORF1 RT-PCR product yielded a peptide slightly over 30 kDa, as
expected for the 230 amino acid ORF. (Fig.
5
). This system is based on our earlier published `protein truncation test' (PTT)
system for the detection of open reading frames by
in vitro
transcription-translation (
14
).
The cosmid-based exon trapping method described in this paper copes with several
limitations of currently available exon trapping methods. Using large genomic
inserts of 30 kb and larger, we isolated all exons present as a complete set,
eliminating the need of subcloning and reordering of individually isolated
exons and verification of their continuity from isolated cDNAs. If the cosmid
inserts were in the antisense orientation, either nothing or a small product
(i.e. cDMD3r) was trapped. The relevance of the latter product is still
unclear; it either contains several cryptic exons or is part of a newly
identified transcription unit. We did not trap any false exons and the false positive rate obtained was in fact zero. Splicing was
perfect between all DMD exon-exon transitions and hGH exon 2 and DMD exons. In both cosmids analysed
the last DMD exon was not spliced to the splice acceptor of hGH exon 3, but to
a site directly upstream of or in the multiple cloning site. This indicates the
existence of
cis
-active `higher order' effects in splicing, further underscoring the
advantage of concerted trapping of a series of unknown exons, selected during
evolution to cooperate in their parent gene. When separate exons are inserted
in an `alien' context this fine-tuning will be lost, which is probably the explanation for the differences
in trapping efficiency of different exons using current systems. Alternatively,
but not mutually exclusive, the selection of cryptic splice sites could be
related to the maintenance of an open reading frame which has recently been
shown to be an important factor influencing splice site selection (
15
).
We have constructed several variants of sCOGH1. In sCOGH2 the
Alu
repeat in the 3'-UTR of hGH has been removed, facilitating the screening of human
positive cosmids with radiolabelled human DNA after subcloning of, for example,
YACs from a mixture of YAC and total yeast DNA. In sCOGH6 a 4.7 kb fragment
containing the SV2neo selectable marker has been removed, facilitating cloning
of larger inserts. sCOGH3 differs from sCOGH1 by a deletion of the mMT1/hGH-exons 1 to 2 region (i.e. the promoter and 5' end of the gene). Due to the removal of the promoter and 5' end of the hGH-gene no RNA will be produced unless an insert contains
a 5'-first exonic gene segment and a promoter which is active in the
chosen cell line. These 5'-exonic sequences can be isolated efficiently from the RNA using a 5' RACE protocol (
13
).
RNA production can be boosted by super-inducing the mMT-1 promoter with heavy metal ions such as Zn
2+
and Cd
2+
(
16
). Neomycin selection can be used to select transfected clones specifically, but
the system as described works so efficiently that we have never applied this
selection, and in fact removed the neo gene in the sCOGH6 vector to generate
4.7 kb more space. The vectors used do not require a specific system for
replication in the host cell and can be used in combination with any
in vivo
or
in vitro
system able to produce correctly processed RNA. In particular, due to the use
of a strong ubiquitously expressed promoter, the necessity to use COS-1 cells for the initiation of transcription from the SV40 promoter is
eliminated. The sCOGH-system allows one to use other cell types (e.g. hamster V79), opening up
several possibilities including targeting of tissue specific genes, e.g. in
combination with sCOGH3, and functional complementation in specific cell types.
The results described in this paper deal with single cosmids containing part of
a large gene. In gene rich regions more than one gene might be present in the
cosmid insert and it is unclear what would happen in such an event. Most likely, transcripts initiated at the strong
mMT-1 promoter will overrun cloned promoters, a situation similar to known
genes with multiple promoters (
17
). We expect that the presence of cloned 3' exons will usually cause transcription termination. Still, examples are
known where genes have multiple 3' exons, often expressed in a tissue specific manner, indicating that
transcription can proceed and trap downstream sequences. Since with this system
two RT-PCR reactions are standard, one with hGH exon 2 and 3 primers and one 3' RACE reaction, in most cases where multiple genes are cloned one
should at least trap sequences from the most upstream gene. The identification
of all genes from gene rich regions will depend on the use of a highly
redundant cosmid contig covering the region. To scan large regions with the
sCOGH-system, one has two possibilities: perform one experiment with a mixture
of cosmids or use every cosmid in a single experiment. The feasibility of using
complex mixtures remains to be tested. However, the situation will not differ
significantly from that using small-insert vectors, where the high complexity of the input material introduces
several technical problems. First, a large proportion of the clones will be
empty and produce a PCR-favoured small product. Secondly, a wide range of products will be trapped
with large size differences making it difficult to recognise the individual
products. Consequently, PCR conditions should be chosen carefully to allow
amplification of a wide size-range of RT-PCR products, especially for the cosmid-based system e.g. by using long-range PCR protocols. Since each cosmid contains a 25-40 kb insert, covering extensive regions with a
manageable number of clones should be possible. Therefore, we would opt to use
multiple cosmids simultaneously but in a miniaturised exon trap experiment
where the cosmids are not mixed.
As demonstrated using RT-PCR and
in vitro
transcription-translation of products synthesized in all three possible reading frames derived from the hGH control and cDMD3, the exon trapping system
can be coupled with a direct transcription-translation test (TTT) to detect the presence of large ORFs in the isolated
sequences. This TTT approach provides an efficient tool to discriminate
bona fide
coding sequences from false positives. At the same time, this assay facilitates
the identification of mutations by comparison of translated products derived
from different sources of input genomic DNA, e.g. normal versus patient
samples. Recently, we have shown that such a test can be performed even when
only limited parts of a newly identified coding sequence have been elucidated (
18
). Since the proper connection of adjacent exons provides for correct
translation, any disturbance in patient samples will become immediately
apparent and highlight the area to be sequenced. In this way we could identify
the CBP gene as the gene involved in Rubinstein-Taybi by the detection of translation terminating mutations in some
patient-derived products.
We are grateful to Paola van der Bent-Klootwijk for technical assistance. This work was supported by grants from
the Netherlands Organisation for Scientific Research (NWO), Council for Medical and Health Research, project nos 900-716-818 and 900-716-830.
REFERENCES
Return




