Skip Navigation


Nucleic Acids Research Advance Access originally published online on September 23, 2008
Nucleic Acids Research 2008 36(18):5970-5982; doi:10.1093/nar/gkn594
This Article
Right arrow Abstract Freely available
Right arrow Print PDF (7839K) Freely available
Right arrow Screen PDF (1130K) Freely available
Right arrowOA All Versions of this Article:
36/18/5970    most recent
gkn594v1
Right arrow Alert me when this article is cited
Right arrow Alert me if a correction is posted
Services
Right arrow Email this article to a friend
Right arrow Similar articles in this journal
Right arrow Similar articles in PubMed
Right arrow Alert me to new issues of the journal
Right arrow Add to My Personal Archive
Right arrow Download to citation manager
Right arrow Search for citing articles in:
ISI Web of Science (1)
Right arrow Commercial Re-use Guidelines
for Open Access NAR Content
Google Scholar
Right arrow Articles by Thierry, A.
Right arrow Articles by Richard, G.-F.
Right arrow Search for Related Content
PubMed
Right arrow PubMed Citation
Right arrow Articles by Thierry, A.
Right arrow Articles by Richard, G.-F.
Social Bookmarking
 Add to CiteULike   Add to Connotea   Add to Del.icio.us  
What's this?

Nucleic Acids Research, 2008, Vol. 36, No. 18 5970-5982
© 2008 The Author(s)
This is an Open Access article distributed under the terms of the Creative Commons Attribution Non-Commercial License (http://creativecommons.org/licenses/by-nc/2.0/uk/) which permits unrestricted non-commercial use, distribution, and reproduction in any medium, provided the original work is properly cited.


Genomics

Megasatellites: a peculiar class of giant minisatellites in genes involved in cell adhesion and pathogenicity in Candida glabrata

Agnès Thierry1, Christiane Bouchier2, Bernard Dujon1 and Guy-Franck Richard1,*

1Institut Pasteur, Unité de Génétique Moléculaire des Levures; CNRS, URA2171; Université Pierre et Marie Curie, UFR 927; 25 rue du Dr Roux and 2Institut Pasteur, Plate-Forme 1-Génomique; 28 rue du Dr Roux, F-75015 Paris, France

*To whom correspondence should be addressed. Tel: +33 1 40 61 34 54; Fax: +33 1 40 61 34 56; Email: gfrichar{at}pasteur.fr

Received July 1, 2008. Revised August 26, 2008. Accepted September 3, 2008.


    ABSTRACT
 TOP
 ABSTRACT
 INTRODUCTION
 MATERIALS AND METHODS
 RESULTS
 DISCUSSION
 FUNDING
 REFERENCES
 
Minisatellites are DNA tandem repeats that are found in all sequenced genomes. In the yeast Saccharomyces cerevisiae, they are frequently encountered in genes encoding cell wall proteins. Minisatellites present in the completely sequenced genome of the pathogenic yeast Candida glabrata were similarly analyzed, and two new types of minisatellites were discovered: minisatellites that are composed of two different intermingled repeats (called compound minisatellites), and minisatellites containing unusually long repeated motifs (126–429 bp). These long repeat minisatellites may reach unusual length for such elements (up to 10 kb). Due to these peculiar properties, they have been named ‘megasatellites’. They are found essentially in genes involved in cell–cell adhesion, and could therefore be involved in the ability of this opportunistic pathogen to colonize the human host. In addition to megasatellites, found in large paralogous gene families, there are 93 minisatellites with simple shorter motifs, comparable to those found in S. cerevisiae. Most of the time, these minisatellites are not conserved between C. glabrata and S. cerevisiae, although their host genes are well conserved, raising the question of an active mechanism creating minisatellites de novo in hemiascomycetes.


    INTRODUCTION
 TOP
 ABSTRACT
 INTRODUCTION
 MATERIALS AND METHODS
 RESULTS
 DISCUSSION
 FUNDING
 REFERENCES
 
As more and more eukaryotic genomes are sequenced, a wealth of new information on gene duplications, evolution of paralogous sets of genes, differential loss of genes and neo-functionalization of paralogues becomes available. With the largest number of species sequenced within a single phylum, hemiascomycetous yeasts stand up as a reference for comparative genomics (1). Minisatellites are a subclass of DNA tandem repeats, that exhibit size polymorphism among different individuals or isolates (2). In previous works, it was found that minisatellites are spread in the Saccharomyces cerevisiae genome, and are preferentially encountered in genes encoding proteins involved in cell wall formation (3,4). Such proteins, including those belonging to the FLO family of flocculins, exhibit a variable number of repeats among different yeast strains. The role of such repeats is illustrated by the fact that strains having a larger number of repeats in the FLO1 gene exhibit better adhesion than those with a smaller number of repeats (5). Similarly, S. cerevisiae strains isolated from biofilms formed at the surface of sherry wines contain an increased number of repeats in one of the FLO11 minisatellites, supporting the importance of such sequences in cell adhesion (6).

We previously reported that S. cerevisiae minisatellites are frequently not conserved in the corresponding orthologous gene in other hemiascomycetes, suggesting that minisatellites are created, evolve and disappear at a faster pace that the genes containing them (3), a property shared by microsatellites, another class of DNA tandem repeats with shorter motifs (7). In order to investigate more thoroughly the mechanisms of creation and loss of minisatellites in a pathogenic hemiascomycete, we searched all such repeats in the genome of Candida glabrata, a human opportunistic pathogen, responsible for mucosal candidiasis, blood stream infections and vaginitis. Candida glabrata is the second cause of nosocomial infections due to yeasts, after C. albicans. Its genome was completely sequenced, and revealed its closer relationship to S. cerevisiae than to C. albicans (8), making comparisons easier. Despite similar genome sizes, we found three times as many minisatellites in C. glabrata as compared to S. cerevisiae. We also discovered two new species of minisatellites absent from the S. cerevisiae genome, including some unusually long minisatellites, composed of several kilobases of a tandemly repeated sequence. We propose to name them ‘megasatellites’, in order to distinguish them from more regular minisatellites. Megasatellites are present in genes whose sequences suggest that they are involved in cellular adhesion. Some of the peculiar DNA motifs encoded by megasatellites are also found in Kluyveromyces delphensis, but not in more distantly related yeast species (nor in any other living organism), suggesting that they are specific to this branch of the hemiascomycetes, and may be involved in creating new gene functions in these yeast species.


    MATERIALS AND METHODS
 TOP
 ABSTRACT
 INTRODUCTION
 MATERIALS AND METHODS
 RESULTS
 DISCUSSION
 FUNDING
 REFERENCES
 
Analysis of the C. glabrata genome
The complete sequence of C. glabrata strain CBS138 was analyzed using the MREPS program (9), and the following parameters: minimal size of repeat unit (-minp) equal to 10, minimal repeat length (-minsize) equal to 30. Since the resolution parameter (allowing some degree of ‘fuzziness’ within the repeat) was set at the minimal value, variant repeats could not be detected. Therefore, repeats were individually examined and minisatellites manually extended 5' and 3' of the initial repeat detected by MREPS, as described previously (3).

In addition, some minisatellites, corresponding in fact to imperfect microsatellites (10), were detected by the program but not taken into account thereafter. Using this approach, MREPS detected 706 repeats fulfilling the required criteria. After careful examination, some of the repeats found by the program were partially overlapping or were part of the same minisatellite, resulting in a final number of 238 minisatellites used for the present analysis, including 145 detected in coding regions. Since several genes contain more than one repeat array, each of the 145 minisatellites was given a unique identifier, from MS#1 to MS#237. Compound minisatellites are also given a single identifier, followed by a letter for each motif of the minisatellite (e.g. MS#109a represents the 20 x 12 bp motif and MS#109b represents the 3 x 168 bp motif of compound MS#109).

Minisatellite size polymorphism was determined by standard PCR and Southern blot methods, using the CBS138 type strain, the laboratory BG2 strain (11), and two strains isolated form infected patients, F11017Blo1 and F15035Blo1, a kind gift of C. Hennequin (Muller,H. et al., manuscript in preparation).

Search for orthologues
The functional annotation of the C. glabrata genome, developed during the course of the Génolevures 2 project, was used [http://cbi.labri.fr/Genolevures; (8)]. Whenever several homologs were found in the S. cerevisiae genome, synteny data were used to discriminate among the possible genes. When synteny data were unsufficient to discriminate between two or three possible homologs of a C. glabrata gene, all of them were indicated (seven instances, Table 2). Many C. glabrata genes exhibit sequence similarities to several S. cerevisiae genes belonging to the FLO/STA superfamily of flocculins. These similarities were always limited in size and not sufficient to identify the right ortholog. Synteny data did not allow to discriminate among the possible homologues either. Sequence similarities to large DNA motifs in K. delphensis were searched with tblastx, using the motif itself (SHITT, SFFIT or TTITL) as a query, in a K. delphensis DNA database of 17 000 sequences (genome coverage 0.8x), provided by the Pasteur Genopole DNA sequencing facilities.

Amino acid composition and motif analysis
To determine the global composition of the 93 minisatellites with short motifs, all motifs were concatenated and calculation was performed using the DNA Strider 1.4f6 software (12). Long motifs were aligned using the ClustalW software on the BioWeb interface at the Pasteur Institute (http://bioweb.pasteur.fr/seqanal/interfaces/clustalw-simple.html). GC skews were calculated as (G–C/G+C) or (A–T)/(A+T), using DNA Strider. Both GC content and GC skew of minisatellite-containing genes were calculated on the gene DNA sequence without the minisatellite. Search for known domains in cell wall proteins was performed using the InterProScan software (http://www.ebi.ac.uk/InterProScan). The long motifs (SHITT, SFFIT, TITTL and the three unknown motifs) were used as queries for a Blast search into the NCBI non-redundant GeneBank CDS translations, PDB, Swissprot and PIR databases.


    RESULTS
 TOP
 ABSTRACT
 INTRODUCTION
 MATERIALS AND METHODS
 RESULTS
 DISCUSSION
 FUNDING
 REFERENCES
 
We performed a genome-wide search for minisatellites in the C. glabrata genome, using the MREPS software (9), set to the same parameters used previously for the S. cerevisiae genome (3) (see Materials and methods section). Given that the two yeast genomes have similar sizes and nucleotide composition [12.1 Mb for S.cerevisiae, 12.3 Mb for C. glabrata; (8)], a similar number of minisatellites was expected. Instead, a total of 145 minisatellites in 112 protein-coding genes and 93 minisatellites in intergenic regions were found in C. glabrata, compared to 55 in genes and 11 in intergenic regions in S. cerevisiae. Minisatellites in C. glabrata show no obvious bias for specific chromosomal locations (Figure 1).


Figure 1
View larger version (32K):
[in this window]
[in a new window]
[Download PowerPoint slide]
 
Figure 1. Distribution of minisatellites in the C. glabrata genome. Each chromosome is represented by a horizontal line, from the left to the right telomere. Vertical short lines represent the 109 minisatellite-containing genes and pseudogenes. Each gene starts with CAGL0 followed by the chromosome letter (A–M) then by the gene five-digit number and a final ‘g’ (38). Only the five-digit number is given here (e.g. 01284 on chromosome A stands for CAGL0A01284g). Note that some minisatellites may cumulate several properties, i.e. being a compound minisatellite with a long motif, in which case it is both underlined and colored. Size of the two rDNA arrays is not precisely known.

 
Minisatellites are, on average, more GC-rich than the genes containing them, but no obvious GC skew was noted, in contrary to S. cerevisiae where minisatellites show more cytosines than guanines on the coding strand. In intergenic regions, minisatellites are shorter and contain less repeat units than in coding regions, like in S. cerevisiae. All these data are summarized in Table 1.


View this table:
[in this window]
[in a new window]

 
Table 1. Comparative distributions of minisatellites in the S. cerevisiae and C. glabrata genomes

 
Unusual minisatellites scatter the C. glabrata genome
In addition to the presence of 93 ‘simple’ minisatellites similar in size and composition to those discovered in S. cerevisiae (Tables 2, 3, 4, 5, minisatellites numbered from #1 to #93), the C. glabrata genome contains two peculiar types of minisatellites. First, 15 minisatellites are made of two different motifs (or even three, in one case) intermingled with each other (Tables 3, 5 minisatellites numbered from #101 to #115). In each case, the two motifs have different sizes and are repeated a different number of times, with no regular period, as if two decks of playing cards were shuffled with each other (two examples are shown in Figure 2). In 10 cases out of these 15 ‘compound minisatellites’, the two motifs share a common sequence at their 5'-ends (L on Figure 2) but the 3' ends (R on Figure 2) are different (5- and 3-ends are defined according to the coding DNA strand of the gene that contains the minisatellite) (Tables 3, 5).


Figure 2
View larger version (25K):
[in this window]
[in a new window]
[Download PowerPoint slide]
 
Figure 2. Two examples of compound minisatellites. Minisatellites are shown by color boxes, short motifs in yellow, long motifs in blue. Gray boxes represent partial 5' and 3' parts of gene coding sequences, along with gene names for which the first five characters have been ommited (see legend to Figure 1). Short motifs have been numbered from 1 to 10, motif 1 is used as the reference, and point mutations are shaded. Long motifs have been lettered from A to E, motif A is used as the reference, and point mutations are shaded. The 5' part of each motif (L region) is common to both short and long motifs, whereas the 3' part (R region) is different between short and long motifs. Duplicated blocks are roman numbered under each minisatellite. Note that for MS#111, the large duplicated block in the middle of the minisatellite contains several shorter internal duplications.

 

View this table:
[in this window]
[in a new window]

 
Table 2. Simple minisatellites in C. glabrata genes

 

View this table:
[in this window]
[in a new window]

 
Table 3. Compound minisatellites in C. glabrata genes

 
The second type of peculiar minisatellites is made by those composed of unusually long motifs (from 126 to 429 bp) repeated from 3 to 32 times. Thirty-seven such minisatellites with long motifs were detected (numbered from #201 to #237), in addition to seven being part of a compound minisatellite, for a total of 44 minisatellites with long motifs. Five of them reach a total length >2 kb, the longest being 9.6 kb long (MS#214 in gene CAGLOI10147g), a length that no S. cerevisiae minisatellite reached. Given the unusual size of the repeated motif, these tandem repeats were named ‘megasatellites’ (Tables 3, 4). Both compound minisatellites and megasatellites are exclusively found in coding regions and their motif is always a multiple of three, raising the intriguing question of their formation in the genes containing them. Note that 12 out of 144 minisatellites are found in pseudogenes (Table 5), a much higher proportion than expected from random distribution, the C. glabrata genome containing ~1% of pseudogenes, compared to active genes (I. Lafontaine and B. D., personal communication).

In order to estimate the degree of polymorphism found in such large arrays, we analyzed megasatellite sizes, by Southern blot hybridization of DNA extracted from three C. glabrata strains, isolated from infected patients (Muller,H. et al., manuscript in preparation), and compared them to the same megasatellites in the sequenced strain (CBS138), used as a reference. For two megasatellites out of the three tested, we found polymorphism in at least one of the strains tested (data not shown). One strain shows a large increase in the MS#213 megasatellite size, corresponding to 7–8 additional 135-bp repeat units within gene CAGL0I07293g. In addition, gene CAGL0J05170g shows size increase in this strain, whereas in another strain it exhibits a size decrease. This gene contains three different megasatellites (MS#202, 203 and 109), and we did not determine which one is polymorphic (more than one may exhibit polymorphism). The last minisatellite tested (MS#224) did not show any clear polymorphism among the three strains tested as compared to the reference strain (data not shown).

In addition, we also compared the size of minisatellites found in the EPA gene family, between the CBS138 reference strain and the BG2 strain. The EPA gene family is composed of at least 15 members in the BG2 strain. These genes encode surface glycoproteins involved in cell–cell adhesion and pathogenicity. Eight of them (EPA1 to EPA8) were sequenced in the BG2 strain (13,14). Among them, EPA7 and EPA8 do not contain minisatellites, the six other members containing simple minisatellites, as well as compound minisatellites and megasatellites. EPA4 and EPA5 were not found in the CBS138 strain, and EPA6 does not contain any minisatellite in this strain. We, therefore, focused on the three remaining members (EPA1 to EPA3), that contain minisatellites both in the CBS138 and in the BG2 strains. As shown in Figure 3, five out of six minisatellites found within these three genes exhibit polymorphism between the two strains sequenced. An additional megasatellite was detected in the EPA3 gene, in the BG2 strain, that was absent from the CBS138 strain. This suggests that this megasatellite was inserted or deleted since the separation of the two strains. Note that outside of the regions containing tandem repeats, the EPA3 genes in both strains show 99.8% identity, at the nucleotidic level. In a more specific analysis, Frieman and colleagues (15) showed that the number of repeat units of the 120 bp minisatellite found within the EPA1 gene, varied from three to six (four repeat units are found in the CBS138 and in the BG2 strains), among a panel of 25 clinical isolates of C. glabrata. Additional experiments using PCR and Southern blot analyses to determine the size of minisatellites in EPA genes, confirmed that they exhibit size polymorphism among four different strains of C. glabrata (Muller,H. et al., manuscript in preparation). We concluded that, like microsatellites and minisatellites in S. cerevisiae, several minisatellites exhibit size polymorphism in C. glabrata.


Figure 3
View larger version (75K):
[in this window]
[in a new window]
[Download PowerPoint slide]
 
Figure 3. Comparison of minisatellites in EPA genes in two different C. glabrata strains. (A) Schematic representation of the EPA1, EPA2 and EPA3 genes, located on the right subtelomeric region of chromosome VI. Note that gene order and organization are identical in both the BG2 and the CBS138 strains. (B) Minisatellites in the three EPA genes show size polymorphism. DNA self-matrix of EPA1, EPA2 and EPA3 are shown for each of the two strains studied (BG2 and CBS138). Gene names are indicated in the right upper corner of each matrix. Number and size of each motif are shown next to each minisatellite. Note the additional compound minisatellite in EPA3 in the BG2 strain. The smaller repeats (2 x 24 bp and 2 x 15 bp), not detected in the CBS138 strain due to the parameters chosen for the program (see Materials and methods section), are slightly expanded in the BG2 strain.

 
Proteins encoded by megasatellite-containing genes
There are 44 megasatellites, encoded by 33 different genes. Sixteen out of these 44 megasatellites share a common motif, that was called the SFFIT motif, conserved in all cases except two in which it is slightly degenerated (MS#203 and MS#110, Tables 3, 4, 5 and Figure 4). This 100 amino-acid SFFIT motif is conserved in 37 proteins in Kluyveromyces delphensis, a hemiascomycetous yeast closely related to C. glabrata. In these proteins, it is tandemly repeated, like in C. glabrata. This protein motif is threonine rich (20%), but also contains numerous serine (9%) and proline (8%) residues.


Figure 4
View larger version (93K):
[in this window]
[in a new window]
[Download PowerPoint slide]
 
Figure 4. Alignments of megasatellite motifs. The first motif of each megasatellite was aligned using ClustalW. The signature motif in each family (SFFIT, SHITT, TTITL) is shown in a light gray box to the left. The megasatellite number is indicated left to the sequence (MS#), followed by the number of repeat motifs within the megasatellite (in parentheses). The central part of the SHITT motif in which insertions occur (see text) is indicated by a dark gray box.

 

View this table:
[in this window]
[in a new window]

 
Table 4. Megasatellites in C. glabrata genes

 

View this table:
[in this window]
[in a new window]

 
Table 5. Minisatellites in C. glabrata pseudogenes

 
We identified a second long motif, contained in 16 megasatellites, that was called the SHITT motif. It is 45 amino acid long, rich in threonine (28%), serine (16%) and valine (14%) residues. In one megasatellite (MS#212), it is slightly degenerated, and in another one (MS#109b) it contains a small insertion of nine amino acids (Figure 4). Two variant forms of the SHITT motif are also found, one corresponds to an insertion of 60 amino acids in the middle of the motif and the other one to an insertion of 30 amino acids at the same position. The first one was called SHITT-V (three occurences: MS#114b, #227 and #229), and is rich in threonine (20%) and valine (16%), residues, and the second one was called SHITT-G (three occurences: MS#105b, #108b and #112c; Figure 4), and is rich in glycines (48%) and serines (23%). Seven occurences of the SHITT motif were detected in K. delphensis, tandemly repeated in proteins, but no other match was found in protein databases, proving that this motif, like the SFFIT motif, is specific to species closely related to C. glabrata.

Another long motif was found in two megasatellites (MS#201 and #204). It was called the TITTL motif, and is rich is threonine (26%), serine (16%) and aspartic acid, glycine and proline residues (9% each). It is found in two genes (EPA2 and CAGL0K00170g) in the completely sequenced CBS138 strain of C. glabrata, and in addition, in two other genes (EPA4 and EPA5), in the BG2 strain of C. glabrata (13,14). All these proteins are involved in cell–cell adhesion. Finally, three megasatellites contain a unique motif in C. glabrata (MS#210, #213, #220), one of them being homologous to a motif found in five proteins in K. delphensis, tandemly repeated form two to eight times (MS#213).

Proteins containing the SFFIT or any of the three SHITT motifs often show weak matches with the flocculin family of proteins in S. cerevisiae [FLO/STA superfamily; (16)], involved in flocculation and cell adhesion, but this similarity is restricted to the serine-rich repeated region. We have subsequently used the InterProScan software (see Materials and methods section) to look for possible conserved domains within genes that contain megasatellites. Several of the corresponding proteins contain putative transmembrane spans (TM in Tables 3, 4, 5), the signature of membrane-anchored proteins (17). Another frequent domain encountered is the PA14 domain found in bacterial toxins, glucosidases and adhesins (18), and shown to be involved in carbohydrate binding, both in C. glabrata (19) and in S. cerevisiae (20), making proteins containing this domain good candidates to interact with membrane glycoproteins. In S. cerevisiae, the PA14 domain is found in four proteins encoded by minisatellite-containing genes, belonging to the family of flocculins, directly involved in cell–cell adhesion: FLO1, FLO5, FLO9 and FLO10. The PRich domain is found in highly glycosylated proline-rich cell wall proteins in plants, and is probably involved in interactions with cell wall carbohydrates (21). It is present in two proteins encoded by genes that contain compound minisatellites (CAGL0E06666g/EPA2 and CAGL0I10098g; Table 3). EPA2 encodes an adhesin responsible for cell–cell adhesion in C. glabrata (13,14). Altogether, 10 proteins encoded by megasatellite-containing genes, out of 33, show the signature of membrane proteins, many of them probably involved in interactions with glycoproteins.

InterProScan was also used to find structural domains in proteins that are not encoded by megasatellite-containing genes. Out of 79 such genes (Table 2), 12 contain at least three transmembrane spans and are therefore good candidates to be membrane proteins. Altogether, ~30–40% of minisatellite-containing genes are suspected to encode either cell wall components or proteins involved in cell wall formation, a much higher figure than expected if minisatellites were randomly distributed among C. glabrata genes.

SHITT motifs are well conserved among the different minisatellites in the N- and C-terminal parts of the motif, but insertions are found in the central region (Figure 4). The SHITT-G and SHITT-V motifs correspond to insertions of 90 and 180 nt (30 and 60 amino acids), respectively. At the DNA level, SHITT motifs are split into two parts, the sequence corresponding to the N-terminal region is very rich in cytosines (GC skew: –0.7) and adenosines (AT skew: +0.4), whereas the sequence encoding the C-terminal part does not show such biases. The central region, in which minisatellite insertions occur (MS#105b, 108b, 112c, 114b, 227 and 229), is also rich in cytosines and adenosines. This observation suggests that negative GC skews (and to a lesser extent positive AT skews) are a determinant favoring the insertion of new DNA sequences, a conclusion that was also reached for S. cerevisiae minisatellites (3). In comparison, the SFFIT motif does not exhibit any particular sequence bias, whereas the TTITL motif is almost as skewed as the SHITT motif (GC skew: –0.5, AT skew: +0.3).

The remaining 93 minisatellites contain shorter motifs (up to 120 nt), that do not belong to any of the families described above. The global amino acid composition of proteins encoded by these 93 minisatellites is given in Table 7. The most common amino acid found in such repeats is serine, followed by glycine, proline and asparagine. This is quite different from motif composition of S. cerevisiae minisatellites, in which, serine and threonine residues are the most frequent amino acids encountered, as in the C. glabrata SFFITT and SHITT megasatellites. In S. cerevisiae, serine- and threonine-rich repeats are thought to be the sites of O-glycosylations of cell wall proteins by the Pmt4 protein (22,23). It is therefore possible that in C. glabrata, proteins containing long-motif minisatellites are targets of similar posttranslational modifications and play a role at the cell wall surface, whereas short-motif minisatellites are involved in a variety of other cellular processes.


    DISCUSSION
 TOP
 ABSTRACT
 INTRODUCTION
 MATERIALS AND METHODS
 RESULTS
 DISCUSSION
 FUNDING
 REFERENCES
 
In the present work, we analyzed the distribution and composition of all minisatellites detected in the genome of the pathogenic yeast C. glabrata. Although similar in size to that of S. cerevisiae, the genome of C. glabrata exhibits a much larger number of minisatellites. The human genome was estimated to contain approximately 6000 minisatellites ({approx} 2 minisatellites/Mb of sequences), whereas 6 and 7 minisatellites/Mb were found in Arabidopsis thaliana and in Caenorhabditis elegans, respectively (2,24). Similar figures were found in S. cerevisiae [9 minisatellites/Mb; (3)], but a larger number of minisatellites was found in the present study of the C. glabrata genome (19 minisatellites/Mb). Of particular interest are two new two types of minisatellites absent in S. cerevisiae: compound minisatellites, containing two different intermingled motifs, and megasatellites with long motifs (126–429 bp), that can be tandemly repeated up to 32 times. The latter are often encountered in genes whose products show signatures of cell wall proteins (Tables 3, 4, 5).

In contrast to microsatellites, that have been the subject of numerous studies in all sequenced organisms, there are very few reports in the litterature on minisatellite distribution in eukaryotic genomes. The genome of Tetraodon nigroviridis, extensively examined in search of such elements (25), revealed that minisatellites cover only 0.41% of the total sequence, compared to 0.7% in C. glabrata. In T. negroviridis, minisatellites are mainly located in two regions: a subtelocentric minisatellite (10 bp highly polymorphic motif) hybridizing on the short arm of 10 out of 11 subtelocentric chromosomes and a minisatellite with a 118 bp repeated motif, found at all centromeres. Except for these two minisatellites, found in very large arrays in the tetraodon genome, no minisatellite with a repeat motif size >200 bp was detected, nor any kind of tandem repeat resembling compound minisatellites or megasatellites.

Possible origin of C. glabrata minisatellites
One intriguing question is the origin of the numerous C. glabrata minisatellites. Are they de novo created, or are they propagated when the genes that contain them are duplicated? We classified minisatellites into families, based on their motif length and sequence. In total, 109 different motifs are found in 117 simple minisatellites (Table 6, top), showing that, most of the time, each motif is unique. Therefore, minisatellites do not propagate by duplicating a minisatellite-containing gene, but are probably de novo created in existing genes.


View this table:
[in this window]
[in a new window]

 
Table 6. Size distribution of C. glabrata minisatellites and megasatellites

 

View this table:
[in this window]
[in a new window]

 
Table 7. Amino acids encoded by minisatellites and megasatellites

 
Megasatellites can be classified into defined families, even though their motif size exhibits some size variation (Table 6, bottom). We compared ten genes containing SFFIT megasatellites with each other, and found that only two of them (CAGL0L00157g and CAGL0E00231g), are similar in their 3'-end (35% identity at the nucleotidic level), and are therefore, most probably paralogues. The remaining genes do not show any significant similarity (besides the SFFIT motif itself), suggesting that these megasatellites are also, most of the time, de novo created in genes.

It was previously proposed that minisatellites result from replication slippage between two short DNA sequences located downstream and upstream of a central element (26). Almost all S. cerevisiae minisatellites exhibit such short repeated DNA sequences, consistent with this model (3). However, in C. glabrata, only half of the simple minisatellites show such short repeats, upstream and downstream of the minisatellite. When present, their mean size is 5 ± 0.8 nt, very similar to what was observed in S. cerevisiae. (5 ± 0.4 nt). The absence of such repeats in so many minisatellites in C. glabrata suggests that an additional mechanism may exist to create minisatellites in C. glabrata, or that these short repeats were subsequently erased by mutational decay in this yeast species.

Evolution of C. glabrata minisatellites
In the present study, only 15 of the 65 (23%) S. cerevisiae homologs to the C. glabrata genes containing simple minisatellites, also contain a minisatellite in S. cerevisiae (Table 2). It was previously reported (3), that out of 24 minisatellite-containing S. cerevisiae genes, only six of them (25%) also contain a minisatellite in C. glabrata, a similar proportion to what was found in the present study. Hence, minisatellites evolve faster than the genes containing them. It is interesting to note that among the 53 S. cerevisiae homologs that do not contain a minisatellite (Table 2), only six encode products that probably play a role in cell wall metabolism (YLR194c, OSW2, SAG1, DSE1, BNI1 and KRE1). The others exhibit various functions, in all the known cellular compartments. This suggests that minisatellites in C. glabrata are found in a much wider variety of genes than in S. cerevisiae, in which they are mostly found in cell wall genes (3). This could be due to a higher flexibility of the C. glabrata genes to accomodate such tandem repeats, and underlines the fact that minisatellites may have a function in other genes besides cell wall genes.

The insertion of internal motifs into the SHITT motif itself (Figure 4), can be explained by two models, not mutually exclusive (Figure 5). A pre-existing gene already containing a minisatellite will be modified by the insertion of a second motif into one of the previous motifs, and subsequently either lost or propagated by intra-allelic gene conversion or replication slippage (Figure 5A) [for an in-depth review on gene conversion, see ref. (27)]. An alternative hypothesis supposes that one gene contains a nonrepeated motif. A new amino acid motif in inserted into it, and is subsequently amplified to give rise to a minisatellite (Figure 5B). Note that both hypotheses postulate that a short DNA sequence has the propensity to ‘jump’ into another DNA sequence, a property reminiscent of transposable elements (2,28). The same model may be used to explain the presence of the compound minisatellites, showing an irregular alternation of two different motifs (Figure 2), that would result from intermediate steps before complete homogeneization of the minisatellite (Figure 5A, bottom).


Figure 5
View larger version (13K):
[in this window]
[in a new window]
[Download PowerPoint slide]
 
Figure 5. Insertion of a new motif within a minisatellite: two possible models. The motif may target a pre-existing minisatellite, and subsequently spread by intragenic gene conversion (A). Alternatively, the same motif may target a gene that does not contain a minisatellite, and is afterwards expanded in a minisatellite (B). Note that both models are not mutually exclusive, but only model A may lead to compound minisatellites.

 
In S. cerevisiae, the CUP1 locus is amplified under selection pressure, by intra-allelic gene conversion and unequal crossing-over between tandem repeats of the CUP1 gene (29). Similarly, human minisatellites CEB1 and MS32 show high levels of inter- and intra-allelic gene conversions during meiosis and mitosis in S. cerevisiae, leading to complex reshuffling of repeat order and composition (30–32). Such mechanisms are also operating on human minisatellites, both during meiosis (33) and mitosis (34), and are probably also active in C. glabrata. Given that its genome contains significantly more unusual minisatellites than the S. cerevisiae genome, one can hypothesize that replication and/or recombination machineries have slightly different properties in each yeast species. In silico comparisons of the gene content of several hemiascomycetous yeasts showed that both replication and recombination machineries are very well conserved between C. glabrata and S. cerevisiae, exhibiting very few differences (35). However, the few differences found (like the presence of two copies of the TOP1 gene and an extra truncated copy of the SGS1 helicase in C. glabrata), might point to some specific properties of replication and/or recombination of the C. glabrata genome, that may explain the numerous peculiar minisatellites found there.

In a very recent analysis, Muller and colleagues (Muller,H. et al., manuscript in preparation) showed that two deletions in two C. glabrata strains isolated from infected patients (F11017Blo1 and F15035Blo1), were located in close proximity to three megasatellites (MS#228/229 and MS#214, the largest megasatellite in the genome), suggesting that megasatellites may behave as fragile sites. Fragile sites are natural sites of chromosomal breakage in humans (36) and in yeast (37). It is therefore possible that due to the large repeated nature of megasatellites, spontaneous breakage occur during DNA replication at or near the megasatellite, giving rise to deletions around it (2).


    FUNDING
 TOP
 ABSTRACT
 INTRODUCTION
 MATERIALS AND METHODS
 RESULTS
 DISCUSSION
 FUNDING
 REFERENCES
 
Agence Nationale de la Recherche (ANR-05-BLAN-0331). Funding for open access charge: Agence Nationale de la Recherche.

Conflict of interest statement. None declared.


    ACKNOWLEDGEMENTS
 
We thank our colleagues of the Unité de Génétique Moléculaire des Levures for many fruitful discussions and C. Fairhead for careful reading of the article. We also thank the Génolevures consortium, particularly Tiphaine Martin for expert assistance with the Génolevures database. B.D. is a member of the Institut Universitaire de France.


    REFERENCES
 TOP
 ABSTRACT
 INTRODUCTION
 MATERIALS AND METHODS
 RESULTS
 DISCUSSION
 FUNDING
 REFERENCES
 

  1. Dujon B. Yeasts illustrate the molecular mechanisms of eukaryotic genome evolution. Trends Genet. (2006) 22:375–387.[CrossRef][Web of Science][Medline]

  2. Richard GF, Kerrest A, Dujon B. Comparative genomics and molecular dynamics of DNA repeats in eukaryotes. Microbiol. Mol. Biol. Rev. (2008) In press.

  3. Richard G.-F, Dujon B. Molecular evolution of minisatellites in hemiascomycetous yeasts. Mol. Biol. Evol. (2006) 23:189–202.[Abstract/Free Full Text]

  4. Bowen S, Roberts C, Wheals AE. Patterns of polymorphism and divergence in stress-related yest proteins. Yeast (2005) 22:659–668.[CrossRef][Web of Science][Medline]

  5. Verstrepen KJ, Jansen A, Lewitter F, Fink GR. Intragenic tandem repeats generate functional variability. Nat. Genet. (2005) 37:986–990.[CrossRef][Web of Science][Medline]

  6. Fidalgo M, Barrales RR, Ibeas JI, Jimenez J. Adaptive evolution by mutations in the FLO11 gene. Proc. Natl Acad. Sci. USA (2006) 103:11228–11233.[Abstract/Free Full Text]

  7. Malpertuy A, Dujon B, Richard G.-F. Analysis of microsatellites in 13 hemiascomycetous yeast species: mechanisms involved in genome dynamics. J. Mol. Evol. (2003) 56:730–741.[CrossRef][Web of Science][Medline]

  8. Dujon B, Sherman D, Fischer G, Durrens P, Casaregola S, Lafontaine I, De Montigny J, Marck C, Neuveglise C, Talla E, et al. Genome evolution in yeasts. Nature (2004) 430:35–44.[CrossRef][Web of Science][Medline]

  9. Kolpakov R, Bana G, Kucherov G. mreps: efficient and flexible detection of tandem repeats in DNA. Nucleic Acids Res. (2003) 31:3672–3678.[Abstract/Free Full Text]

  10. Richard G-F, Dujon B. Distribution and variability of trinucleotide repeats in the genome of the yeast Saccharomyces cerevisiae. Gene (1996) 174:165–174.[CrossRef][Web of Science][Medline]

  11. Cormack BP, Falkow S. Efficient homologous and illegitimate recombination in the opportunistic yeast pathogene Candida glabrata. Genetics (1999) 151:979–987.[Abstract/Free Full Text]

  12. Marck C. ‘DNA Strider’: a ‘C’ program for the fast analysis of DNA and protein sequences on the Apple Macintosh family of computers. Nucleic Acids Res. (1988) 16:1829–1836.[Abstract/Free Full Text]

  13. Castano I, Pan S.-J, Zupancic M, Hennequin C, Dujon B, Cormack BP. Telomere length control and transcriptional regulation of subtelomeric adhesins in Candida glabrata. Mol. Microbiol. (2005) 55:1246–1258.[CrossRef][Web of Science][Medline]

  14. De Las Penas A, Pan SJ, Castano I, Alder J, Cregg R, Cormack BP. Virulence-related surface glycoproteins in the yeast pathogen Candida glabrata are encoded in subtelomeric clusters and subject to RAP1- and SIR-dependent transcriptional silencing. Genes Dev. (2003) 17:2245–2258.[Abstract/Free Full Text]

  15. Frieman MB, McCaffery JM, Cormack BP. Modular domain structure in the Candida glabrata adhesin Epa1p, a β1,6 glucan-cross-linked cell wall protein. Mol. Microbiol. (2002) 46:479–492.[CrossRef][Web of Science][Medline]

  16. Fabre E, Muller H, Therizols P, Lafontaine I, Dujon B, Fairhead C. Comparative genomics in hemiascomycete yeasts: evolution of sex, silencing, and subtelomeres. Mol. Biol. Evol. (2005) 22:856–873.[Abstract/Free Full Text]

  17. De Hertogh B, Hancy F, Goffeau A, Baret PV. Emergence of species-specific transporters during evolution of the hemiascomycete phylum. Genetics (2006) 172:771–781.[Abstract/Free Full Text]

  18. Rigden D, Mello LV, Galperin MY. The PA14 domain, a conserved all-ß domain in bacterial toxins, enzymes, adhesins and signaling molecules. Trends Biochem. Sci. (2004) 29:335–339.[CrossRef][Web of Science][Medline]

  19. Zupancic M, Frieman MB, Smith D, Alvarez RA, Cummings RD, Cormack BP. Glycan microarray analysis of Candida glabrata adhesin ligand specificity. Mol. Microbiol. (2008) 68:547–559.[CrossRef][Web of Science][Medline]

  20. Kobayashi O, Hayashi N, Kuroki R, Sone H. Region of Flo1 proteins responsible for sugar recognition. J. Bacteriol. (1998) 180:6503–6510.[Abstract/Free Full Text]

  21. Stiefel V, Pérez-Grau L, Albericio F, Giralt E, Ruiz-Avila L, Ludevid MD, Puigdomènech P. Molecular cloning of cDNAs encoding a putative cell wall protein from Zea mays and immunological identification of related polypeptides. Plant Mol. Biol. (1988) 11:483–493.[CrossRef][Web of Science]

  22. Ecker M, Mrsa V, Hagen I, Deutzmann R, Strahl S, Tanner W. O-mannosylation precedes and potentially controls the N-glycosylation of a yeast cell wall glycoprotein. EMBO Rep. (2003) 4:628–632.[CrossRef][Web of Science][Medline]

  23. Latgé J-P, Calderone R, Esser K, Fischer R. The Mycota XIII (2005) Springer, Berlin.

  24. Vergnaud G, Denoeud F. Minisatellites: mutability and genome architecture. Genome Research (2008) 10:899–907.[CrossRef]

  25. Roest Crollius H, Jaillon O, Dasilva C, Ozouf-Costaz C, Fizames C, Fischer C, Bouneau L, Billault A, Quetier F, Saurin W, et al. Characterization and repeat analysis of the compact genome of the freshwater pufferfish Tetraodon nigroviridis. Genome Res. (2000) 10:939–949.[Abstract/Free Full Text]

  26. Haber JE, Louis EJ. Minisatellite origins in yeast and humans. Genomics (1998) 48:132–135.[CrossRef][Web of Science][Medline]

  27. Pâques F, Haber JE. Multiple pathways of recombination induced by double-strand breaks in Saccharomyces cerevisiae. Microbiol. Mol. Biol. Rev. (1999) 63:349–404.[Abstract/Free Full Text]

  28. Wicker T, Sabot F, Hua-Van A, Bennetzen JL, Capy P, Chalhoub B, Flavell A, Leroy P, Morgante M, Panaud O, et al. A unified classification system for eukaryotic transposbale elments. Nat. Rev. Genet. (2007) 8:973–982.[CrossRef][Medline]

  29. Welch JW, Maloney DH, Fogel S. Unequal crossing-over and gene conversion at the amplified CUP1 locus of yeast. Mol. Gen. Genet. (1990) 222:304–310.[CrossRef][Web of Science][Medline]

  30. Appelgren H, Cederberg H, Rannug U. Mutations at the human minisatellite MS32 integrated in yeast occur with high frequency in meiosis and involve complex recombination events. Mol. Gen. Genet. (1997) 256:7–17.[CrossRef][Web of Science][Medline]

  31. Debrauwère H, Buard J, Tessier J, Aubert D, Vergnaud G, Nicolas A. Meiotic instability of human minisatellite CEB1 in yeast requires double-strand breaks. Nat. Genet. (1999) 23:367–371.[CrossRef][Web of Science][Medline]

  32. Lopes J, Ribeyre C, Nicolas A. Complex minisatellite rearrangements generated in the total or partial absence of Rad27/hFEN1 activity occur in a single generation and are Rad51 and Rad52 dependent. Mol. Cell Biol. (2006) 26:6675–6689.[Abstract/Free Full Text]

  33. Jeffreys AJ, Tamaki K, McLeod A, Monckton DG, Neil DL, Armour JAL. Complex gene conversion events in germline mutation at human minisatellites. Nat. Genet. (1994) 6:136–145.[CrossRef][Web of Science][Medline]

  34. Jeffreys AJ, Neumann R. Somatic mutation processes at a human minisatellite. Hum. Mol. Genet. (1997) 6:129–136.[Abstract/Free Full Text]

  35. Richard G-F, Kerrest A, Lafontaine I, Dujon B. Comparative genomics of hemiascomycete yeasts: genes involved in DNA replication, repair, and recombination. Mol. Biol. Evol. (2005) 22:1011–1023.[Abstract/Free Full Text]

  36. Debacker K, Kooy RF. Fragile sites and human disease. Hum. Mol. Genet. (2007) 16(Spec No. 2):R150–R158.[Abstract/Free Full Text]

  37. Zhang H, Freudenreich CH. An AT-rich sequence in human common fragile site FRA16D causes fork stalling and chromosome breakage in S. cerevisiae. Mol. Cell (2007) 27:367–379.[CrossRef][Web of Science][Medline]

  38. Durrens P, Sherman DJ. A systematic nomenclature of chromosomal elements for hemiascomycete yeasts. Yeast (2005) 22:337–342.[CrossRef][Web of Science][Medline]


Add to CiteULike CiteULike   Add to Connotea Connotea   Add to Del.icio.us Del.icio.us    What's this?


This article has been cited by other articles:


Home page
Proc. Natl. Acad. Sci. USAHome page
S. Polakova, C. Blume, J. A. Zarate, M. Mentel, D. Jorck-Ramberg, J. Stenderup, and J. Piskur
Formation of new chromosomes as a virulence mechanism in yeast Candida glabrata
PNAS, February 24, 2009; 106(8): 2688 - 2693.
[Abstract] [Full Text] [PDF]


This Article
Right arrow Abstract Freely available
Right arrow Print PDF (7839K) Freely available
Right arrow Screen PDF (1130K) Freely available
Right arrowOA All Versions of this Article:
36/18/5970    most recent
gkn594v1
Right arrow Alert me when this article is cited
Right arrow Alert me if a correction is posted
Services
Right arrow Email this article to a friend
Right arrow Similar articles in this journal
Right arrow Similar articles in PubMed
Right arrow Alert me to new issues of the journal
Right arrow Add to My Personal Archive
Right arrow Download to citation manager
Right arrow Search for citing articles in:
ISI Web of Science (1)
Right arrow Commercial Re-use Guidelines
for Open Access NAR Content
Google Scholar
Right arrow Articles by Thierry, A.
Right arrow Articles by Richard, G.-F.
Right arrow Search for Related Content
PubMed
Right arrow PubMed Citation
Right arrow Articles by Thierry, A.
Right arrow Articles by Richard, G.-F.
Social Bookmarking
 Add to CiteULike   Add to Connotea   Add to Del.icio.us  
What's this?