ABSTRACT
During the recombination process that assembles immunoglobulin and T-cell receptor gene segments, the coding ends to be joined are extensively processed. Contradictory reports have been made in the past about the existence of homology directed mechanisms in V(D)J recombination. In this study we analyse coding end processing
and the influence of the presence of homology stretches on coding joint
formation using artificial substrates in which short sequence changes creating
direct repeats have been introduced. These changes were monitored 3 bp away
from the termini in order to avoid any differences due to the initiation steps
of V(D)J recombination. Our results show that the sequence of the coding ends
influences joint formation, but no evidence was found for a mechanistic bias
due to the presence of direct repeats.
Functional immunoglobulin (Ig) and T-cell receptor (TcR) genes are assembled from separately encoded gene
segments during lymphocyte differentiation (for a review see ref.
1
). This assembly process, called V(D)J recombination, is site specific. The signal sequences
that are recognised, are composed of a palindromic heptamer, a 12 or 23 bp
spacer and a nonamer. The presence of two signal sequences with different
spacer lengths is sufficient to direct recombination of artificial substrates
introduced into recombination proficient cells. The recombination process produces two types of joints, the signal joint which is formed by the precise
juxtaposition of the heptamers, and the coding joint formed by the gene
segments. The coding ends are extensively processed. Deletions ranging in size
from 1 to 15 or 20 nucleotides can be observed as well as two types of nucleotides additions: P-nucleotides form short palindromes with the unprocessed coding sequence (
2
,
3
) and are thought to result from the asymetric opening of a hairpin intermediate
(
4
,
5
); N-regions are random nucleotide insertions that can occur after deletion or
P-nucleotide insertion, as well as on full length coding ends, and have been
shown to be added by terminal deoxynucleotidyl transferase (TdT) (
6
-
8
). In the absence of TdT expression, the Ig and TCR repertoires are restricted
and the junctions of Ig and TCR gene segments are often formed at sites of
homology between the coding ends (
3
,
9
-
12
). The observation of these canonical joints led to the hypothesis that the high
incidence of these joints might be due to a mechanistic bias towards formation
of joints at short direct repeats (
11
,
12
). However, canonical joints are not observed for all segment combinations (
8
,
12
,
13
). Reports on the influence of short stretches of homology on the V(D)J
recombination reaction are contradictory. Pandey
et al.
showed that the bias observed in endogenous genes in mucovy duck is due to
selection (
14
). In other experiments where unrearranged transgenes could not undergo selection, the formation of canonical junctions still occurred (
15
). Experiments with artificial substrates containing homopolymers as coding
regions show that the extent of junctional deletion is not altered by the
presence of homology (
16
), while another study with substrates bearing randomised coding end sequences concluded that junctions were preferentially formed at overlapping nucleotides (
17
).
Here we report on the influence of coding end sequences on coding joint formation during V(D)J recombination. Extrachromosomal V(D)J recombination substrates provide excellent tools to address this
question. Unlike the endogenous Ig and TCR genes, these artificial substrates
used in transient tranfection assays are not subject to antigen selection but
only submitted to the constraints imposed by the recombination process itself.
Small changes in the coding end sequences can be monitored, and their effect on
coding joint formation can be analysed after site specific rearrangement.
Our results show that the sequence of the coding ends affects the distribution
of coding joints. However, no evidence was found for a mechanistic bias
triggered by the presence of direct repeats.
All the vectors contain the polyoma early region necessary for autonomous
replication in mouse fibroblasts, the replication sequences and the [beta]-lactamase gene necessary for growth and selection in
Escherichia coli
. To test how coding end sequences would influence coding end resolution, we
constructed four extrachromosomal V(D)J recombination substrates (Fig.
1
). The coding end associated with the recombination signal sequence that has a 12 bp spacer
(12-coding end) is identical in the four plasmids. Six base pairs were
inserted into the
Hin
cII site of the 23-coding end of pBlueRec (
18
) to obtain the three other plasmids. The double strand oligonucleotides are
ATCCTA for pBlueH4 and pBlueI4 (in opposite orientation) and AGTGGA for
pBlueH6.
NIH 3T3 mouse embryo fibroblasts were grown in Dulbecco modified Eagle medium
supplemented with 10% newborn calf serum. This cell line was chosen, because it
does not express endogenous terminal transferase activity. In V(D)J recombination assays, cells were transfected by electroporation (960 [mu]F, 300 V) following the procedure described by Chu
et al
. (
19
). Cells (2 * 10
6
) were transfected with 2.5 [mu]g of one of the recombination substrates and 6 [mu]g of M2CD.7 (pRag1) and 4.8 [mu]g of R2RCD.2 (pRag2) (
20
). After the electric pulse, cells were plated in three separate dishes. Cells
were harvested after 40-48 h incubation at 37oC, and washed with phosphate-buffered saline; plasmid DNA was prepared according to a
modified procedure of alkaline minilysis (
21
). DNA pellets were resuspended in 20 [mu]l of sterile water. At least 10 independent transfections were performed
with each V(D)J recombination substrate.
To score the V(D)J recombination frequency, 5 [mu]l of the DNA solution obtained after transfection were digested by
Dpn
I, which only cleaves its recognition site when it is dam methylated on both
strands, in order to eliminate non replicated plasmids and thus to enrich for
plasmids which have penetrated into the nucleus, and transformed into XLI-blue bacteria by electroporation. Bacteria were plated on LB agar plates containing 80 [mu]g/ml X-Gal, 150 [mu]M IPTG, 100 [mu]g/ml ampicillin and 10 [mu]g/ml tetracyclin. Rearrangement frequency was calculated as the (number of blue
clones * 3)/total number of clones (
18
).
For the analysis of the joints obtained after V(D)J rearrangement, 5 [mu]l of the DNA solution obtained after transfection were digested by either
Eco
RV or
Cla
I, which cleave between the RSS, to eliminate non recombined plasmids and transformed into XLI-blue bacteria by electroporation. Bacteria were immediately plated on LB agar
plates containing 100 [mu]g/ml ampicillin. Recombinant clones were picked directly or after screening
with an internal oligonucleotide that is deleted if rearrangement has occurred.
Small scale DNA preparations were made by the boiling method (
22
). DNA was resuspended in 50 [mu]l of 10 mM Tris-HCl, pH 8; 1 mM EDTA; 10 [mu]g/ml RNase. An aliquot of 5 [mu]l was digested by
Pvu
II and analysed on an agarose gel to identify rearranged plasmids (not shown). A
volume of 10 [mu]l was denatured by alkaline treatment, precipitated and sequenced with
reverse primer according to Sanger
et al
(
23
). For each transfection 10 joints were sequenced. Identical sequences were only taken into account when obtained from independent transfections.
The observed distributions of coding joints were compared using a two-tailed [chi]
2
test. Significance was assessed at
P
-value <= 0.05. When two distributions were found significantly different,
post hoc
individual frequencies were examined to determine which classes of recombination products could account for the discrepancies.
Recombination substrates were designed where the 12-coding ends were conserved and different sequences introduced at the 23-coding ends (Fig.
1
). The changes generate direct repeats of various lengths and at various
distances from the recombination signals. These direct repeats are located
within the coding sequences or in the loops that lead to P-nucleotide formation. Since the sequences immediately adjacent to the
heptamer have been reported to influence the initiation steps of recombination
(
24
), the sequence changes were monitored 3 bp away from the heptamer sequence, in
order to study only the resolution steps of the recombination reaction.
The four V(D)J recombination substrates were transfected in 3T3 fibroblasts
alone or with Rag1 and Rag2 expression vectors. Plasmid DNA was recovered after
48 h and tested in
E.coli
for recombination. In all cases recombination only occurs when both Rag1 and
Rag2 are expressed (Table
1
). The recombination frequencies are similar from one substrate to the other
indicating that the presence of direct repeats in the coding sequences does not
affect the efficiency of rearrangement. Similar results were obtained by others
(
17
).
The representation in Figure
3
allows a visualisation of the distribution of the observed joints in comparison
to the theoretical coding end combinations. We refer to individual joints by
mentioning the number of nucleotides deleted at the 12-coding end followed by the number of nucleotides deleted at the 23-coding end. Each predicted joint is represented at the intersection of the
corresponding deletion sizes or P inserts at each end. For instance, junction 4-0 has 4 nucleotides deleted at the 12-coding end and none at the 23-coding end, and is found six times for pBlueRec. It is
important to notice that joints formed at
n
overlapping nucleotides can be formed by
n
+ 1 combinations of coding ends. These joints are thus represented by the fusion of the corresponding cases. For instance, junctions 2-5, 1-6 and 0-7 of pBlueRec are formed at the direct repeat CC and can
not be distinguished from each other. Joints having deletions >12 nucleotides at either end, or P-nucleotide insertions >4, as well as sequences with non templated insertions, are not included here and are shown in
Figure
4
.
Figure Joints formed at direct repeats are well represented (Fig.
3
). Their frequency however, is not higher, and in some cases even lower, than
that of other joints. For instance, the joints of size -4 with the direct repeat TC. As explained above, this junction can be
formed in three ways (1-3, 2-2, 3-1) and in the absence of any mechanistic bias
it is thus expected to be found three times as frequently as the neighbouring
joints 4-0 and 0-4 which can be formed by only one coding end combination. However,
in pBlueRec those last joints are found six and seven times, while the junction
formed at the site of the 2 bp homology TC is found seven times, indicating
that the joint with the direct repeat is underrepresented (
P
= 0.019). Similarly, in the other three plasmids the same blocks of homology
are present, but are never used above the frequency expected for a random
distribution. The peaks at junction size -4 found for pBlueRec, pBlueH6 and pBlueI4 (Fig.
2
) are not due to a high frequency of joint formation at the overlapping
nucleotides, but rather to a high representation of the other joints of this
size : 4-0 and 0-4 (Fig.
3
). The lower frequency of joints of size 4 in pBlueH4 confirms this statement.
Indeed the same overlapping nucleotides are present in this plasmid and the abs
Figure
This representation of the sequences also shows that some sequences are clearly
underrepresented compared to a random distribution, for instance joints 3-0 and
2-0 are not found in any plasmid.
P-nucleotide insertions of 1-5 nucleotides length are observed at an overall frequency of 17% but are unequally distributed between the 12- and 23-coding ends (4 and 13% respectively). Fourteen (4%) joints had 1 or 2 non templated nucleotide insertions (Fig.
4
). In seven joints, these insertions were coupled to the presence of P-nucleotide insertions. In two joints, a point mutation is observed near
the junction.
We believe that non-templated nucleotide insertions are not due to terminal transferase and that a separate mechanism is responsible for these additions. Indeed, 3T3 fibroblasts do not exhibit TdT activity and non templated insertions of 1 nucleotide were also found after V(D)J recombination of endogenous genes in homozygous terminal
transferase knock out mice at a comparable frequency (3%) (
8
). A search within the recombination substrates for potential template sequences for these joints with extra nucleotides did not give convincing results. Several ligation or polymerisation based mechanisms can account for these insertions: for example all DNA polymerases tested have been shown to catalyse the addition
of one non templated nucleotide (
25
) and hence may be responsible for these additions.
In order to understand the irregular distribution of the junctions we analysed
the deletion and P-nucleotide insertion patterns of the 12- and 23-coding ends separately (Fig.
5
). It appears that the profiles obtained for the 12-end, which is identical in the four plasmids, is the same for all the
vectors despite the introduction of direct repeats by modification of the 23-coding end. On the reverse, the 23-coding ends exhibit different profiles. Nevertheless, this difference was not significant by [chi]
2
analyis, perhaps due to the small number of joints analysed for which the
deletions extend in the region where the sequences of the vectors diverge.
The theoretical distribution of the joints as predicted by the mutiplication of
the frequencies observed for each deletion size does not correspond to the
observed coding joint distributions depicted in Figure
3
for any of the plasmids. Again some joints are under-represented while others are over-represented independently of the presence of direct repeats.
In this study we analysed coding joint formation of four V(D)J recombination
substrates in which the coding end associated with the RSS12 is conserved and
the 23-coding end is varied over six nucleotides which lead to the introduction
of short stretches of homology (Fig.
1
).
The distribution of the junctions among the theoretical coding end combinations is not random as can be seen in the representation in Figure
3
. The coding end sequence seems to have some influence on the deletion and P-nucleotide insertion patterns as illustrated by the conservation of the
deletion profiles of the 12-coding end in the four substrates. This can be the result of preferential
opening of the hairpin intermediates at some sites as well as of pauses of an
exonuclease activity at favoured sites. However, the coding joint distribution is not a simple combination of the 12- and 23-coding end deletion profiles. When compared with a distribution predicted by the deletion patterns some joints
are under-represented while others are over-represented. This indicates that the two coding ends are not processed independently. Other
factors, like the ligation efficiency, must have an important role in the
formation of the coding joint. Artificial recombination substrates with
different combinations of homopolymers at the coding ends are not recombined
with the same efficiency (
26
). It should be noted however that in these experiments the effects of the
initation and the ligation steps can not be distinguished.
No correlation can be made between the over-representation of some joints and the presence of direct repeats when it
is taken into account that joints formed at direct repeats are expected a
higher frequencies as they can be formed by several combinations of deletions
on each coding end. The results obtained by Zhang
et al
. (
27
) with mutated transgenic substrates also suggest that the presence of a direct
repeat is not
per se
sufficient to bias recombination. Indeed, the presence of a direct repeat AT or
ATA in their constructs does lead to a preferential joining while an AG or AGCT
repeat does not. The experiments performed with homopolymeric substrates by
Boubnov
et al
. (
16
) also indicate that DNA homology does not stabilize coding end structures for
processing and joining. We can not formally exclude the hypothesis that
nucleotide pairing influences the formation of the coding joint, but this
process would then be strongly dependent on the sequence and location of the
direct repeats. We rather favour the hypothesis according to which a so called
canonical joint results from the coincidence of a direct repeat, which is naturally expected at a higher frequency, and one or several coding end combinations that are preferentially formed due to their sequence. In
conclusion, our results show that beside their effect on the initiation step of
V(D)J rearrangement, the coding end sequences influence the distribution of the
coding joints. The presence of direct repeats, however, does not bias the recombination reaction.
We are very grateful to Catherine Papanicolaou for helpful discussions and reading of the manuscript. We thank Patricia Barbot for technical assistance. F.N. was supported by a fellowship from le Ministère de la Recherche et de la Technologie. Q.T.N. was supported by a fellowship from Fonds d'Etudes et de
Recherche du Corps Médical des Hôpitaux de Paris.
+
Present address: Laboratoire de Génétique et Physiologie du Développement, CNRS UMR 9943, Institut de Biologie de Développement de Marseille, Case 907, 13288
Marseille cedex 09, France


REFERENCES
Return

