ABSTRACT
In this study we investigated the role of several parameters governing the efficiency of gene targeting mediated by homologous recombination in the protozoan parasite Leishmania. We evaluated the relative targeting frequencies of different replacement vectors designed to target several sequences within the parasite genome. We found that a decrease in the length of homologous sequences <1 kb on one arm of the vector linearly influences the targeting frequency. No homologous recombination was detected, however, when the flanking homologous regions were <180 bp. A requirement for a very high degree of homology between donor and target sequences was found necessary for efficient gene targeting in Leishmania, as targeted recombination was strongly affected by base pair mismatches. Targeting frequency increased proportionally with copy number of the target only when the target was part of a linear amplicon, but remained unchanged when it was present on circles. Different chromosomal locations were found to be targeted with significantly variable levels of efficiency. Finally, different strains of the same species showed differences in gene targeting frequency. Overall, gene targeting mediated by homologous recombination in Leishmania shares similarities to both the yeast and the mammalian recombination systems.
The ability to alter genes in different organisms constitutes a powerful genetic tool. Such manipulations in mammalian and yeast cells have been made possible by the generation of efficient methods of gene targeting using homologous recombination. The frequency of homologous recombination between an introduced vector and chromosomal DNA sequences is influenced by many factors, including, among others, the amount and nature of homologous sequences, the genetic locus, the copy number of the target and design of the vector.
Generally, targeted recombination in mammalian cells is less frequent than non-homologous integration (1 ,2 ). In the Kinetoplastidae Trypanosoma and Leishmania, as in yeast cells (3 ), nearly all integration events proceed by homologous recombination (4 -8 ). Recently, genetic manipulations in protozoan parasites have become possible by introducing exogenous DNA by electroporation, yielding stably transfected parasite lines. In Leishmania the transfected DNA can either remain episomal or can integrate into the genome by homologous recombination (9 -13 ). Nothing is yet known on the parameters that may influence the efficiency of gene targeting in protozoan parasites however. Given the importance of gene targeting technology in studying parasite structure and pathogenesis and in developing tools for new and effective therapeutic approaches against protozoan parasites, we initiated studies to evaluate the role of different parameters on gene targeting efficiency in Leishmania.
In this report we describe the effect of the length and degree of homology between the target sequences and the targeting vector, the copy number of the target, the locus to be targeted and finally the type of strain on the frequency of homologous recombination in Leishmania. We have analyzed the frequencies of genomic integration by gene targeting with a series of vectors designed to target several single or multiple copy parasite sequences located at distinct positions within the genome. Generally, two types of vector, insertion and replacement, have been used for gene targeting studies in yeast and mammalian cells (for a review see 14 ). Although the issue is still controversial (15 ,16 ), sequence replacement and insertion vectors behaved equivalently with respect to targeting efficiency. For these studies we have exclusively used replacement vectors that contain a linear array of the sequences to be targeted on both sides of the selectable marker. In Leishmania frequencies of DNA integration by homologous recombination could be measured quantitatively because of the ability of these parasites to form colonies on solid media.
The data presented here constitute a first attempt to study the effect of several important parameters on gene targeting frequency in a protozoan parasite. Our results indicate that the rate of gene targeting is modulated by the same parameters that influence homologous recombination frequencies in yeast or in mammalian cells. Overall, the factors that affect the efficiency and fidelity of gene targeting in Leishmania act, for the most, in a similar fashion as in the yeast system, although for some parameters the recombination frequency resembles the situation observed in mammalian cells.
All targeting constructs used for these studies were of the replacement type and were made with isogenic DNA. The neomycin phosphotransferase (neo) and the hygromycin phosphotransferase (hyg) expression cassettes used for construction of the targeting vectors were derived from pSPYneo and pSPYhyg respectively (17 ). The ptr1-neo, ptr1-hyg, pgpA-neo and pgpA-hyg targeting constructs have been described previously (12 ,13 ; see also Fig. 1 A and B). For generation of ptr1-neo vectors from L.major and L.donovani the neo expression cassette was inserted as a SmaI-EcoRV restriction fragment into SalI (treated with Klenow fragment to generate blunt ends) and MscI sites of ptr1 respectively. The L.donovani and L.major TR-neo replacement vectors were made as reported previously(11 ). The L.tarentolae TR-neo vector was constructed by inserting the neo expression cassette as a SmaI-EcoRV fragment into the BalI site of the TR gene (see Fig. 1 C). The targeting construct to generate a L.tarentolae single pgpE knockout strain was made by introducing the neo cassette as a BamHI-BglII fragment into the BglII site of a 7 kb BamHI-BamHI genomic fragment containing the pgpE gene (18 ; Fig. 1 D). The sizes of the 5' and 3' homologous sequences flanking the selectable markers for each targeting construct are shown in Figure 1 .
TheL.tarentolae TarII and TarVIa strains have been described previously (19 ). The unselected TarVIa wild-type strain contains free H circles (19 ). The methotrexate (MTX)-resistant mutants TarII MTX1000.4 and TarII MTX1000.1 contain a linear and a circular amplicon respectively, both derived from the H region as described (20 ,21 ). TarII 1000.4 pgpA/neo is transfected with the MTX1000.4 linear amplicon in which the pgpA gene was disrupted with a neo cassette (21 ). TarII Hirt-ptr1/neo is transfected with an H circle isolated from MTX1000 where the ptr1 gene was disrupted with the neo expression cassette (22 ). TarII 40.8 contains a Leishmania expression vector carrying the ptr1 gene (10 kb) (23 ). Leishmania cells were grown at 29oC in SDM-79 medium (24 ) supplemented with 10% fetal bovine serum (FBS) (Multicell; Wisent Inc.) and 5 µg/ml hemin. For all transfections ~2 µg linearized targeting vector DNA sequences purified from LMP agarose (Gibco BRL) were consistently used. The linearized DNA fragments were purified twice to avoid contamination with vector sequences or partially digested DNA. The TR-neo targeting constructs were amplified by PCR using primers as indicated (11 ). The DNA concentration for each targeting vector was adjusted so that the number of DNA molecules per cell during electroporation was kept constant. For each transfection 5 * 106 cells (0.5 ml) were used. Leishmania cells were electroporated as described elsewhere (23 ). The efficiency of DNA transfection into L.tarentolae was calculated for each separate experiment as the ratio of the number of colonies grown in the presence of a selectable marker versus those grown without selection. The procedure used to calculate gene targeting frequency in our experiments is as follows. After electroporation Leishmania cells were grown in 5 ml liquid medium for 48 h at 29oC. Then dilutions at 103 and 104 cells were plated onto SDM agar without selection to calculate the total number of parasites surviving after electroporation for each experiment. The remaining cells were pelleted by centrifugation, resuspended in 200 µl fresh medium and spreaded onto freshly prepared SDM drug-containing plates. In parallel, 300 µl of the same 48 h culture were transferred systematically to a liquid SDM medium with the appropriate drug selection to look at the result of genomic integrations on a total population of cells. Transfectants were selected with 40 µg/ml G-418 (Geneticin; Gibco BRL) or 80 µg/ml hygromycin B (HygB) (CalBiochem). Colonies appeared on average 8-12 days following incubation at 29oC. The targeting frequencies reported here are expressed as an absolute number calculated from the number of G418-resistant (G418R) or HygB-resistant (HygBR) colonies divided by the total number of colonies grown without selection. Each experiment (E) in Tables 1 -5 represents compilation of data from three to four independent transfections done in triplicate.
Total genomic DNA from L.tarentolae was prepared as described (25 ), digested with EcoRI, HindIII, XhoI and BamHI, resolved on 0.7% agarose gels and transferred to nylon membranes (Hybond-N; Amersham). Southern blots, radiolabeling of probes, hybridizations and washing conditions followed standard procedures (26 ). The probes used here were PCR fragments generated from TR, hyg, neo and ptr1 coding sequences. The probe containing the nucleotide binding site (nbs) of the Leishmania P-glycoprotein genes and recognizing five pgp sequences was described previously (18 ). The copy number of the ptr1 and pgpA targets was determined by a Beckman DU-8B spectrophotometer using a gel-scan computer module as described before (27 ). Three experimental tests were used to analyze the recombination events in our transfectants. First, the genomic DNA from a minimum of 12 randomly isolated G418R or HygBR colonies (when the number of clones was higher than 12) was prepared, digested and hybridized to the appropriate probes to identify clonal recombination events. The remaining colonies were pooled and genomic DNA prepared and also tested by hybridization. Finally, the genomic DNA from a total population of Leishmania transfectants grown in the presence of a selectable marker in liquid medium was prepared and analyzed by hybridization. An excellent correlation was observed between the results obtained from the three approaches used.
Table 1
Previous investigations in mammalian cells have suggested an exponential relationship between target length ofhomology and targeting frequency with insertion and replacement vectors (2 ,15 ,28 ,29 ). A linear dependence on substrate length and efficiency of targeting has been reported in yeast however (30 ). To determine whether the fidelity of homologous recombination in the protozoan Leishmania could be influenced by the length of homology flanking the selectable marker we have examined gene targeting frequencies under conditions in which the length of homologous sequences was varied. We have used two replacement vectors for these studies. Both contained a neo expression cassette inserted into the ptr1 and pgpA genes, part of the H locus (31 ), respectively (Fig. 1 ). The vectors were cut within the target homology sequence at several restriction sites leaving homologous sequences of 0.16-10 kb (see Fig. 1 ) and electroporated into L.tarentolae cells. Southern blot analysis of the G418R colonies confirmed integration of the neo cassette into the ptr1 and pgpA loci respectively (Fig. 1 A and B and not shown).
The relative targeting frequencies with respect to different lengths of homology present in the replacement vectors are presented in Table 1 . The highest targeting frequency was obtained when the region of homology with the target DNA in both arms of the vector was >= 1 kb for both the ptr1 and pgpA constructs. The efficiency of gene targeting remained relatively stable even when homologous sequences in one or both arms of the vector were increased in size by 2- to 10-fold. This contrasts with the situation observed in mammalian cells (32 ). Indeed, we have noticed that recombination frequency decreases slightly when large regions of homology (>10 kb) were used (see Table 1 and unpublished data). The frequency of recombination was decreased significantly, however, when the homologous sequences were <1 kb. Thus, using the ptr1 NarI-SmaI targeting construct with homologous sequences of 0.35 and 0.48 kb on each side of the vector (Fig. 1 A), the frequency of targeting was diminished by 5.5-fold compared with a vector in which the homologous sequences were twice as big (see Table 1 ). In the case of the pgpA ScaI-ScaI replacement vector, where only 220 bp of homologous sequences were present at the 3'-end, targeting was decreased by a factor of 18 compared with the MseI-MseI construct, carrying a 7-fold larger region of homology (Table 1 ). Vectors with homologous sequences <200 bp at one or at both ends did not yield any resistant colonies (Table 1 ), suggesting that the minimum size required for efficient gene targeting in Leishmania, at least under our conditions, is ~150-200 bp.
Base pair mismatches affect the efficiency of genetic recombination in bacteria (33 ) and in mammalian cells (16 ,34 ). Recently Blundell et al. (35 ) reported that targeting of exogenous DNA into Trypanosoma brucei required a high degree of homology between donor and target DNA. To determine the degree of homology necessary for efficient gene targeting in Leishmania we used a series of isogenic and non-isogenic constructs carrying intragenic sequences from different species (Table 2 ). The length of homology present on both isogenic and non-isogenic targeting constructs used was the same for most cases (Table 2 ). We first concentrated our studies on targeting constructs for which the nucleotide sequences of the genes to be targeted were known. The ptr1 gene homologs from L.tarentolae (23 ), L.major (36 ) and L.donovani (unpublished) share from 86 to 89% identity at the level of nucleotide sequence. Our attempts to target the L.major or the L.donovani ptr1 homologs when using the ptr1 construct from L.tarentolae were unsuccessful, whereas isogenic transfection consistently yielded transfectants (Table 2 ). Similarly, the L.major or L.donovani ptr1 homologs transfected into L.tarentolae did not yield any colonies (Table 2 ).
In mammalian cells targeting does not depend on the number of target copies (37 ), whereas in yeast there is a linear correlation between copy number and frequency of targeting (38 ). To test to what extent copy number of the target sequence could influence targeting frequencies in Leishmania, we have conducted experiments in isogenic strains in which the target was present either as a single copy or as multiple copies. The ptr1 gene was used as target sequence for these studies. ptr1 is present at two copies in the L.tarentolae Tar II wild-type strain and at variable multiple copies, as part of linear and circular amplicons, in the TarII MTX1000.4 and TarII MTX1000.1 strains respectively (20 ,21 ). Transfection of the XhoI-XhoI ptr1-hyg replacement vector into these strains and appropriate selection with HygB allowed successful integration of the hyg gene into the ptr1 locus for all three strains, as indicated in the Southern blot hybridization in Figure 2 A and B. A linear correlation between copy number of ptr1 and frequency of targeting was found for the TarII MTX1000.4 strain, where the extra ptr1 copies were part of a linear amplicon (Table 3 ). Surprisingly, however, no marked increase in targeting frequency was observed in TarII MTX1000.1, where ptr1 was present on a circle.
To exclude the possibility that the differences we have observed in recombination frequencies between linear and circular targets in MTX-resistant strains were due to mutation(s) occurring during drug selection in one of these strains we re-evaluated the effect of copy number of the target in L.tarentolae wild-type strains into which the same linear and circular amplicons were introduced by transfection (21 ,22 ). The results we obtained using these transfectants were in line with our previous observations (Table 3 ). Targeting frequency was indeed proportional to copy number of the ptr1 target only in strain TarII 1000.4-pgpA/neo, where ptr1 was amplified as part of a linear element, and almost no differences were detected between the TarII wild-type and the TarII Hirt-ptr1/neo strain containing a circular amplicon (Table 3 ). The efficiency of homologous recombination was estimated at 11- to 18.5-fold higher in linear targets versus circles for the same copy number of the target (Table 3 ). These differences in efficiency of hyg gene integration were also observed at the level of Southern blot hybridization, where variable intensities of the targeted 3.5 kb XhoI restriction fragment were detected (see Fig. 2 ).
Moreover, we have investigated whether the size of circular amplicons may be responsible for the low efficiency of targeting in TarII MTX1000.1 and TarII Hirt-ptr1/neo strains. We have therefore targeted the ptr1 gene to a smaller episomal ptr1 expression vector present in the TarII 40.8 strain (23 ). As shown in Table 3 and also in Figure 2 A and B, integration of the hyg cassette into the ptr1 copies of this vector was very low despite the high copy number of the target sequence. Indeed, if we take into account the difference in copy number between this circular amplicon versus the linear targets, the frequency of recombination is 49-fold higher when the target is part of a linear amplicon. Overall, these data imply that in Leishmania the recombination machinery seems to recognize homologous sequences preferentially when they are present in linear minichromosomes rather than in extrachromosomal circles.
To compare targeting frequencies at different genomic locations we have constructed four replacement vectors carrying sequences from distinct chromosomal locations (Fig. 1 ). The vector sequences corresponding to the ptr1 and pgpA genes are part of the same genomic locus, the H region (31 ), present on a 820 kb chromosome. The TR vector sequences are located on a 520 kb chromosome (11 ) and the pgpE sequences on a 1750 kb chromosome (18 ). All these genes are single copy in the genome. The isogenic replacement vectors were successfully integrated following transfection into their respective chromosomal locations, as shown by the Southern blot hybridization in Figure 1 . Table 4 summarizes the results we have obtained from four independent experiments made in triplicate for each target locus. The 820 kb H locus-containing chromosome carrying the ptr1 and pgpA genes was targeted 500-fold more efficiently compared with the 1750 pgpE-containing chromosome and 144-fold compared with the TR locus-containing 520 kb chromosome (Table 4 ). These differences cannot be attributed to variations either in copy number of the target or length of homologous sequences or to the degree of homology with the target loci. Indeed, all vectors used for these studies carried isogenic DNAs that were single copy in the genome with lengths of homologous sequences that in principal should not significantly influence targeting efficiency (see Tables 1 and 4 ). Our data demonstrate that very large variations in gene targeting frequencies can be obtained depending on the genomic region of the parasite to be targeted.
To test the effect of the strain on recombination frequencies we have used two L.tarentolae strains, TarII and TarVIa. We have chosen L.tarentolae for these studies because it can grow much more rapidly on plates than other Leishmania pathogenic species (unpublished data). TarVIa is similar to strain TarII but contains free H circles (19 ). For these studies we targeted two different genomic loci, ptr1 and TR (Fig. 1 A and C and data not shown), for which other parameters have already been investigated (see Tables 1 -5 ). Integration events by homologous recombination into the same genomic locus between the two strains showed differences as high as 7- to 9-fold (Table 5 ). The 7-fold increase in targeting efficiency in strain TarVIa was not due to the higher copy number of the ptr1 target in that strain (10 versus 2), as we have shown that targeting frequency was not altered when the target is present on circles (see Table 3 ). Different Leishmania species have demonstrated significant differences in transformation efficiency by electroporation that could influence targeting efficiency. The results we have obtained here, however, were generated from two strains of the same species which have similar transformation efficiencies (data not shown).
In this study we have evaluated the effect of a number of experimental parameters on gene targeting frequency in the protozoan parasite Leishmania. Gene targeting is mediated exclusively by homologous recombination in Kinetoplastidae (4 -8 ) and, as in other systems (28 ,39 ), it occurs primarily if the construct has free homologous ends, rather than being circular. For our studies we have used replacement vectors, the most commonly used targeting vectors. Replacement vectors with a double-strand break outside the region of homology may recombine with the host chromosomal sequences by double reciprocal recombination or by gene conversion. Given the numerous observations in yeast (40 ,41 ) and in mammalian cells (28 ,42 ) that gene targeting is consistent with double-strand break repair models of recombination, it seems unlikely that the pathways for gene targeting are fundamentally different in protozoan parasites, however, the details of these events are not known.
The most critical parameters for efficient homologous recombination in many systems tested so far are the degree of homology between the vector and the genomic target and the length of homologous sequences. DNA sequence mismatches represent a considerable barrier to homologous recombination in a wide variety of systems, ranging from bacteria (33 ,43 ,44 ) to yeast (45 -48 ) and to mammals (16 ,34 ,49 ). Even in mammalian cells, where homologous recombination is by far not the most frequent route of genomic rearrangement, vectors prepared from isogenic DNA were found to target 5- to 20-fold more efficiently than vectors composed of non-isogenic DNA (16 ,34 ). Recently it has been reported that a very high homology was needed for efficient targeting to occur in the protozoan T.brucei (35 ). Our studies with the protozoan Leishmania have demonstrated that the use of isogenic DNA is a very critical parameter, if not essential, to ensure a high frequency of homologous recombination. Indeed, a coding sequence from one Leishmania species sharing at least 86% identity with its gene homolog in another species did not lead to any detectable homologous integration when used as a vector to target a non-isogenic DNA (Table 2 ). If we consider that the average targeting frequencies for a single copy target in our experiments were from 2 to 6 * 10-4 and that the lower frequency for which an integration event was observed was 0.01 * 10-4, we can expect a 200- to 600-fold decrease in targeting frequency with 14% base pair mismatch and this could possibly explain the results we obtained when using non-isogenic constructs (Table 2 ). Our data indicate that sequence divergence between donor and recipient target DNA drastically reduces the efficiency of gene targeting in Leishmania (Table 2 ). At this stage, however, we could not determine the need for a minimal stretch of DNA with perfect homology to the target locus. Based on our observations and those of Blundell et al. (35 ), we hypothesize that parasites of the order Kinetoplastidae should possess a mismatch repair system capable of suppressing recombination between divergent DNA sequences. Bacteria, yeast and mammalian cells possess a mismatch repair system highly conserved in evolution (43 ) to preserve the integrity of their genetic material and to ensure both chromosome replication and genetic recombination.
The length of homology between the vector and target sequences is also critical for efficient gene targeting in both lower and higher eukaryotes. The exact length of homologous DNA that gives a maximum recombination rate in embryonic stem cells is controversial, but may be as high as 14 kb (16 ,32 ). The need for so long a stretch of DNA to saturate the recombination machinery in higher eukaryotes might be necessary to disfavor exchange between short DNA sequences, such as repetitive elements, present in high copy number in these organisms that could lead to genome instability. An increase in the length of homology by 2- to 7-fold augments the targeting frequency in embryonic stem cells by 10- to 200-fold (2 ,32 ). The exponential dependence of targeting frequency on length of homology between vector and target sequences observed in mammalian cells is much stronger than that found in bacteria (33 ,50 ) or in yeast (30 ), where, for the most part, a linear dependency has been reported. We have shown that in Leishmania the length of homology influences the frequency of recombination only when it is <1 kb. Indeed, the frequency of recombination decreases significantly when the homology is <1 kb and this even if only on one side of the vector (Table 1 ). In Leishmania the decrease in homologous recombination frequency seems to be relatively linear with respect to the length of homology of the target sequence, as a reduction of 2.5- to 7-fold at the level of the homologous sequences resulted in a 5.5- to 18-fold decrease in targeting efficiency (Table 1 ). Regions of homology >1 kb on both arms of the vector had little if any effect on targeting frequency (Table 1 ). Moreover, the use of larger sequence homology, for example of 10 kb, at one or at both sides of the replacement vector could decrease the efficiency of targeting (Table 1 and unpublished data). The recombination system in Leishmania seems to plateau at 1-2 kb of homologous sequences, in contrast to the situation in mammalian cells, where 10-fold higher homology is required (16 ,32 ).
The minimal size for efficient homologous recombination in Leishmania was established at 150-200 bp based on our inability to detect colonies when a region of homology <180 bp was used (Table 1 ). Similar results have been reported in other systems. In mammalian (51 ,52 ) and in yeast systems (30 ) the minimal efficient processing fragment (MEPF) appears to be in the region of 250 bp, although recombination can still occur with up to 30 bp of homology in yeast, but with much lower frequency (53 ). Recently Gaud et al. (54 ) demonstrated that homologous integration in T.brucei was accomplished using flanking sequences of 40 bp. Overall, our data suggest that the MEPF for Leishmania is relatively similar to that found in higher eukaryotes.
In mammalian cells gene targeting frequency does not depend on copy number of targets (37 ,55 ), whereas in yeast cells there is a linear dependence of targeting efficiency on the number of target copies and this whether or not the copies are dispersed in the genome or tandemly linked (38 ,56 ,57 ). In Leishmania we have evidence that targeting frequency increases proportionally to copy number of the target only when the target is part of a linear amplicon (Table 3 ). However, frequency of homologous recombination does not increase when extra copies of the targets are present on circular elements (see Table 3 ). The lack of recombinogenicity of targets present on circles is novel. It is possible that linear amplicons, because of their structural similarity to chromosomes, could be recognized more efficiently by the replication and recombination machinery of the parasite than circles. Very little is known on how replication occurs in Leishmania, but it is clear that circular vectors do not require any specific parasite sequence for replication (17 ). Additional experiments are needed, however, to explain the decreased efficiency of targeting in circles. We have shown that these differences were not due to a mutation in the recombination system occurring possibly during drug selection (see Table 3 and Fig. 2 ) and that genes present on linear and circular elements were transcribed similarly (not shown).
It has been reported that different DNA sequences or their immediate chromosomal environment could influence the relative efficiencies of different gene targeting pathways and this is what we have observed also using distinct genomic loci from Leishmania (Table 4 ). Transcription has been shown to stimulate homologous recombination in yeast and in mammalian cells (58 -62 ). We have therefore tested whether differences at the RNA expression level for the four genes used to target the different homologous loci could explain the high variability in the frequency of targeting. Northern blot hybridization demonstrated that expression of the ptr1 and pgpA genes, which showed the highest targeting frequency, was lower than that of the TR and pgpE genes (data not shown). Probably, other factors than transcriptional activation might be responsible for the remarkable differences in targeting frequencies observed between these loci (see Table 4 ). The presence of hot spots of recombination within the target locus, the nucleotide sequence of the target and differences in chromatin structure adjacent to the target sequence may constitute some of these other factors (63 ,64 ). Although unlikely, we cannot exclude the possibility that some of the locus-to-locus variations we have observed were due to differences in neo expression at these loci.
Gene targeting technologies mediated by homologous recombination can be useful to study gene function (11 -13 ) or to generate attenuated lines for vaccination purposes (65 ,66 ). In addition, gene targeting can be used to explore several aspects of parasite chromosome structure and function. The manipulation of parasite genomes requires efficient gene transfection and gene integration events and the studies presented here should permit the use of optimal transfection vectors for these purposes.
We thank Marc Ouellette for constructive criticisms regarding the manuscript. This work was supported by a Medical Research Council (MRC) grant MT-12182 to BP. BP is an MRC scholar and member of an MRC group on Infectious Diseases.

tblfn>gThe absence of colonies in solid G418-containing medium was also accompanied by a failure to obtain any growth in liquid medium.
Table 2

Table 3
Table 4
Table 5
REFERENCES

