DDBJ/EMBL/GenBank accession no. D85594
ABSTRACT
The telomeres of the silkworm, Bombyx mori, consist of pentanucleotide repeats (TTAGG)n. We previously characterized the non-LTR element TRAS1, which terminates with oligo (A) in a head to tail orientation at the exact position (between A and C) of the (CCTAA)n repeats. Here we characterized another family of telomere-specific non-LTR retrotransposon named SART1. The SART1 family was inserted at another site of the (TTAGG)n in a reverse orientation from that of TRAS1. The complete unit of SART1, 6.7 kb in length with a poly (A) stretch, contains two open reading frames encoding putative gag and pol products, overlapping by 54 bp in the -1 reading frame. Most of the 600 SART1 copies in the silkworm haploid genome are completely conserved in structure without 5' truncation. All SART1 sequences analyzed were inserted at the same position (between T and A) within the (TTAGG)n repeats. Fluorescence in situ hybridization showed that many of the SART1 copies were localized in the chromosomal ends. A phylogenetic tree showed that the SART1, TRAS1 and two other site-specific elements, R1 and RT, which insert into 28S ribosomal RNA genes in insects, belong to the same group. Based on the orientation for the chromosomal insertion and structural similarities, these elements could be further classified into two subgroups, R1/TRAS1 and RT/SART1, suggesting that the target specificity of the two telomere-associated elements was changed independently.
Retrotransposons are mobile DNA elements within the genome that transpose via RNA intermediates using their own-encoding reverse transcriptase. These elements have been classified into two groups according to whether or not they contain long terminal repeats at both ends (LTR type or retrovirus-like element) (non-LTR type or LINE-like element) (1 -3 ). Although many retrotransposable elements are located nearly at random in the genome, several non-LTR retrotransposons are located at specific sites on the chromosome. To date, site-specific elements have been found in several organisms, including yeast, insects, nematodes, amphibians and protists. Tx1 of Xenopus laevis inserts within another transposable element (4 ). CRE1, SLACS and CZAR are found in the spliced leader exons of trypanosomes (5 -7 ). R1 and R2, which are located at specific sites of 28S rDNA in most insects, are site-specific retrotransposons and have been investigated in detail (8 -10 ). R4 (Ascaris lumbricoides) inserts at a site midway between those of R1 and R2 (11 ). In the sibling mosquito species, RT1 (Anopheles gambiae) and RT2 (A.arabiensis) are inserted at the same position ~630 bp downstream of the R1 insertion site (12 ,13 ). It was shown that R2 element owes its site specificity to the sequence-specific endonuclease that it putatively encodes. The translation product of the R2Bm open reading frame (ORF) from Bombyx mori cuts the 28S gene at the correct insertion site and uses that site as primer for its reverse transcription (14 ,15 ). Moreover, some group II introns, which have a close evolutionary relationship to non-LTR retrotransposons, encode reverse transcriptases that make a double strand break on the target sequences and promote site-specific insertion (16 ,17 ). However, a general mechanism underlying the expression and integration of the site-specific non-LTR retrotransposons remains unclear.
The extreme ends of the silkworm chromosomes are composed of the simple telomeric repeat (TTAGG)n, which is >6-8 kb in length (18 ).When studying the telomeric structure of the silkworm, we found >2000 copies of retrotransposable elements, which may be classified into several different classes, inserted into the telomeric short repeats (TTAGG)n (Okazaki et al., in preparation). We showed that a family of non-LTR retrotransposons, TRAS1 (telomeric repeat associated sequence 1) interrupts the (TTAGG)n repeats at a highly specific site (18 ,19 ). On the other hand, retrotransposons of the principle telomere-associated family are inserted at another site of (TTAGG)n in a reverse orientation to the TRAS1 insertion. Thus, this new family of retrotransposons will be transcribed in the reverse direction to TRAS1. We called this family SART1, the name of which derives from the inversion of `TRAS'. To try to understand the factors and structure involved in target specificity, we characterized SART1, because it seems to be a novel telomeric repeat-specific retrotransposon with a different target sequence from TRAS1. We studied the structural features and genomic organization of SART1 and discuss the evolutionary origin of the target-specific non-LTR retrotransposons in insects.
The B.mori strain, Kinshu * Showa, purchased from Kyodo-shiryo Co., Tokyo, Japan, was reared on an artificial diet (Yakult Co., Tokyo) in the laboratory.
We screened phage clones from an EMBL3 genomic DNA library constructed from Sau3AI partial digests of Bombyx fat body DNA (19 ). Clones containing (TTAGG)n repeats were isolated with a 32P-labeled (TTAGG)5 oligonucleotide probe. After sequencing adjacent regions around the (TTAGG)n from >20 positive clones, we found that several clones terminating with a poly(A) tail had the same nucleotide sequence that neighbors the (TTAGG)n in the reverse orientation of the TRAS1 insertion. We named this putative retrotransposon family, SART1. From one phage clone containing SART1, we isolated a 1.8 kb HindIII-SacI DNA fragment from the 3'-terminal region near the poly(A) tail. We cloned this fragment into pBluescript SK+ (Stratagene) and named it p51S.
To identify a clone containing a complete unit of the SART1 element, we screened the EMBL phage library again with the 32P-labeled 1.8 kb insert of the p51S as a probe. Among 20 000 plaques, we isolated 20 clones with intense signals and found eight clones containing two (TTAGG)n separated repeats, which may correspond to the terminal junction in both ends of SART1. We named one of the eight clones, which includes a full-length SART1 element as determined by genomic Southern hybridization, BS103. We digested BS103 with BamHI and HindIII and obtained from the insert region, 3.5 kb BamHI/BamHI, 6.7 kb HindIII/HindIII and 6.7 kb BamHI/BamHI digests, which were subcloned into pBluescript SK+. Because we found that BS103 included three SART1 elements arrayed in tandem, these subclones were sequenced to determine the complete structure of SART1. Sequencing was performed by dideoxynucleotide termination (20 ) using a Thermo Sequenase core sequencing kit (Amersham) and an automatic DNA sequencer SQ5500 (Hitachi). We used DNASIS-Mac version 3.0 (Hitachi Software) for sequence analysis.
Genomic DNA was prepared from the silk glands of fifth instar larvae as described previously (19 ). DNAs digested with restriction enzymes were electrophoresed on 0.9% agarose gels and blotted onto nitrocellulose membranes (BA85; Schleicher and Schuell) in 20* SSC (3 M NaCl, 0.3 M sodium citrate) by capillary transfer (21 ). Hybridization was performed in 0.9 M NaCl, 90 mM Tris-HCl (pH 7.9), 6 mM EDTA, 0.5% SDS, 2.0% skimmed milk. The (TTAGG)5 oligonucleotide was labeled with [[gamma]-32P]ATP by T4 polynucleotide kinase (Toyobo). DNA fragments were labeled with [[alpha]-32P]dCTP by random priming using the BcaBEST DNA labeling kit (Takara). The blotted membranes were incubated at 50oC with the (TTAGG)5 probe or at 65oC with the random primer labeled probe overnight. Thereafter, the membranes were washed in 4*, 2*, 1*, 0.5* and 0.1* SSC for 20 min each at the same temperature as the hybridization.
Fluorescence in situ hybridization was performed as described using prometaphase chromosomes prepared from testes of fifth instar larvae (18 ,19 ). To obtain a labeled probe, we amplified a 1.4 kb 3'-terminal region of SART1 by PCR (22 ) using biotinylated dUTP (Bio-16-dUTP; Boehringer). The primer set used for PCR, 4629 (5'-CGGCACCTTGAAAATGTCGG-3') and 6359 (5'-ACAACTGGACTATCGTGTCG-3'), is shown in Figure 2 A.
We found that about half of the genomic copies of the SART1 element formed tandem arrays of SART1-(TTAGG)n as a repeat unit in the silkworm genome. Thus, to identify the specific target site of the SART1 insertion, we amplified the adjacent regions of the (TTAGG)n in the SART1-(TTAGG)n-SART1 in the genome by PCR (22 ). The primers for this PCR, 6610 (5'-CGGAGTCCGACATAACCCGGTCCGA-3') and 62 (5'-TGGAAGTCCAGCAAAACTCCCCCAC-3') are shown in Figure 2 B. The PCR reaction mixture (50 [mu]l) contained 1* PCR buffer (500 mM KCl, 15 mM MgCl2, 100 mM Tris-HCl, pH 8.3), 5 mM dNTP, 10 pmol each primer, 1.5 [mu]g genomic DNA and 1 U Taq polymerase (Takara). PCR proceeded for 30 cycles under the following conditions: 94oC, 30 s; 64oC, 30 s and 73oC, 1 min. A denaturing step at 94oC for 3 min preceded the first cycle and the final cycle was followed by a further extension at 73oC for 10 min. The amplified DNA fragments were cloned into a plasmid vector with the TA Cloning Kit (Invitrogen).
We aligned the amino acid sequences of reverse transcriptase domains of 18 retrotransposable elements by the neighbor joining method using the CLUSTAL V program (23 ,24 ). The region of the amino acid sequence in ORF2 of the SART1 element starts at 455 and ends at 783, as shown in Figure 2 A. The monophily of groups was assessed with 1000 bootstrap resamplings. The sources and NCBI sequence identification numbers of these sequences are as follows: I (Drosophila melanogaster), 85020; L1Hs (human), 106903; L1Md (mouse), 130402; Tx1 (Xenopus laevis), 141475; R2Dm (D.melanogaster), 130551; R1Dm (D.melanogaster), 140023; R1Bm (B.mori), 84806; TRAS1 (B.mori), 940390; RT1 (Anopheles gambiae), 159617; RT2 (A.arabiensis), 159620; TART (D.melanogaster), 435415; jockey (D.melanogaster), 134083; F (D.melanogaster), 103353; Doc (D.melanogaster), 103221; T1Ag (A.gambiae), 103015. The sequences of ingi (Trypanosoma brucei) and R2Bm (B.mori) are from references 25 and 26 respectively.
The nucleotide sequence data reported in this paper will appear in the GenBank, EMBL and DDBJ nucleotide sequence databases with the accession number D85594.
We first identified several clones containing SART1 from a lambda genomic library by screening with the (TTAGG)5 probe (see Materials and Methods). The SART1 elements in these clones terminated with a poly(A) tail which is directly joined to the (TTAGG)n repeat on the strand. Since the poly(A) tail of TRAS1 was joined to the (CCTAA)n repeat, transcription of SART1 should be in the opposite direction to that of TRAS1 (Fig. 1 ).Using the 3'-terminal region of a SART1 element (the insert sequence of p51S) as a probe, we isolated 20 phage clones and identified eight that included a complete SART1 unit. The sequence of the SART1 element is highly conserved among many genomic copies, as described below. Because BS103 showed the same restriction profile as that of genomic Southern hybridization in the corresponding region, we further analyzed this clone. We found that BS103 consisted of three direct repeats of the SART1 sequence(Fig. 1 ). Structural analysis revealed that a 6.7 kb sequence sandwiched between two short (TTAGG)n regions corresponds to a complete unit of the SART1. Thus, the BS103 insert consisted of a 3'-terminal region (unit 1), one complete unit (unit 2) and a 5'-half (unit 3) of the SART1 element, as described below.
Unit 2 of the SART1 element, terminating with poly(A), was 6704 bp in length and G+C rich (64%). The absence of long direct or inverted repeats at both ends indicates that the SART1 is a non-LTR retrotransposon.
An 867 bp 5' untranslated region (5'-UTR) precedes the first ORF. There was a T rich region (68%) of 122 bp in the middle of the 5'-UTR (data not shown). SART1 contained two long ORFs which occupy 80% of its total length (Fig. 2 A). The first ORF (ORF1) was 2148 bp long (from 868 to 3015) and putatively encoded 712 amino acids, which is quite similar to the Gag-like protein of non-LTR retrotransposons. The first ATG codon found at the 13th base position is flanked by the translation start consensus sequences among eukaryotes (27 ). ORF1 is basic (isoelectric point 10.3), contains many proline residues (8%) and charged amino acids (29%), which are characteristic of Gag proteins (28 ). In addition, three putative zinc finger motifs (CCHC) were found near the carboxyl end of the ORF1 (Fig. 2 A).
The second ORF (ORF2) is 3266 bp long (from 2952 to 6217) and putatively encodes 1067 amino acids. This amino acid sequence shows significant similarity to the Pol-like protein. The ORF2 overlaps with ORF1 by 54 nt in the -1 reading frame. Although in many retroviruses and retrotransposons, the two ORFs are translated as a fusion protein by ribosomal frameshifting (29 ,30 ), translation of the ORF2 in SART1 seems to start at the first ATG 66 bp from the beginning of the ORF. The N-terminal region of the putative translation product using this methionine codon, which is flanked by a consensus sequence for initiation (27 ), was homologous to the corresponding area of the TRAS1 and R1 ORF2 (data not shown). However, the upstream region of that N-terminal had no obvious homology to the corresponding area of the two retrotransposons.
In the ORF2, we identified a reverse transcriptase domain in the middle region and a cysteine-histidine motif near the C-terminus, as depicted in Figure 2 A (1 ). The 886 bp 3'- UTR ends in a 27 bp poly(A) tail, although we could not find an evident polyadenylation signal (AATAAA).
In three tandem SART1 units of the BS103 clone, there were 18 TTAGG repeats between units 1 and 2, but only one between units 2 and 3 (Fig. 1 ). The nucleotide sequence of the coding region in unit 3 is different at 23 positions from that of the corresponding area in unit 2 (data not shown). Of the 23 substitutions, 21 are synonymous and, thus, do not influence the amino acid sequence. This conservation of the protein encoding sequence provides indirect evidence that the element can still retrotranspose. To understand the distribution of SART1 elements within the B.mori genome, we performed genomic Southern hybridization using two DNA fragments from unit 2 shown in Figure 2 B as probes. First, the genomic DNA was double-digested with HindIII and one of five other enzymes, blotted and hybridized with probe 1 (SacI-HindIII fragment) (Fig. 2 D, lanes 1-5). When the genomic DNA was digested with HindIII and SacI, a 0.4 kb band, which has the same length as probe 1, was the most prominent (lane 1). This result indicates that these HindIII and SacI sites are conserved in most SART1 copies within the genome. The results shown in lanes 2-5 and with the other 10 enzymes (data not shown) are also consistent with the restriction enzyme map for unit 2. By comparison with the signal intensity for a diluted series of DNA fragments of BS103 as a positive hybridization control (data not shown), we estimated the copy number of SART1 per haploid genome as 600. Thus, SART1 is highly conserved within the B.mori genome and unit 2 in the BS103 is a representative of conserved SART1 copies.
Next, to examine whether SART1 elements form `tandem arrays' in the genome, as in BS103, we digested the genomic DNA with HindIII or BamHI which cuts once within unit 2 and hybridized it with probe 2. When SART1 copies are tandemly located within the genome, a major band will appear that corresponds to the complete length of the element, 6.7 kb (Fig. 2 C). Figure 2 D (lanes 6 and 7) shows a prominent 6.7 kb band, supporting this notion. As estimated based on the intensity of each band, at least half of the SART1 copies exist as tandem arrays in the genome. Many bands other than the 6.7 kb band were separated discretely. This indicates that most of these bands could result from insertion of a SART 1 close to BamHI and HindIII sites in flanking DNA, such as TRAS1 and other telomeric repeat-associated sequences.
We further investigated the localization of SART1 elements on the chromosomes by FISH. Most FITC signals were given by the ends of 15-20 chromosomes (Fig. 3 ). However, it is not yet determined whether these terminal signals are associated with specific chromosomes.
Figure
To investigate whether SART1 elements are actually associated with the telomeric repeats and at which position they insert, we identified 12 independent SART1 terminal regions by PCR (22 ). Because ~300 SART1 elements form tandem arrays as described above, we amplified the junction regions between two neighboring SART1 genomic copies, using a set of primer extending over the two copies (see Fig. 2 C). The PCR products cloned into a plasmid vector (GP1-GP12) were sequenced and junction regions in SART1-(TTAGG)n-SART1 are shown in Figure 4 . J12 and J23 are the junction sequences between units 1 and 2 and between units 2 and 3 of the phage clone BS103 respectively. DNA sequences of the SART1 portion in these clones were essentially identical at least in the region shown in the figure. All clones contain from 1 to 20 TTAGG repeats. The length of the poly(A) tract varied from 13 to 61 nt. The junctions between the 3' termini of SART1 elements and the telomeric repeats were identical among all the clones in which poly(A) tails are connected with GG in a (TTAGG)n unit. However, it is not clear whether the last nucleotide A of the poly(A) belongs to the SART1 element or to the telomeric repeat.
Figure
On the other hand, the 5'-ends of SART1 in these clones assume three different forms. The 5' termini of SART1 in GP9, 10, 11 and 12 had additional GA nucleotides compared with J23. In the corresponding region in J12, however, an additional CCCGG was duplicated. Such 5' duplication is also found in R1 elements in B.mori (28 ). Except for the 5' duplication in J12 the telomeric repeats in the other clones terminated in the TT of a (TTAGG)n repeat unit and the following CCC nucleotides which are presumed to be part of the SART1 element.
We also amplified a part of BS103 that had been subcloned into a plasmid as a control. In four sequenced clones we did not find additional GA nucleotides, CCCGG duplication or any base substitutions, although the length of poly(A) tails and the numbers of the telomeric repeats were inaccurate (data not shown). This sequence homogeneity indicates that the sequence variety seen in Figure 4 is not due to PCR errors. These results indicate that SART1 is inserted between TT and AGG of the telomeric repeat unit in a highly specific manner.
To understand the evolutionary origin of SART1, we constructed a phylogenetic tree among non-LTR retrotransposons by the neighbor joining method (24 ) based upon the amino acid sequences of the reverse transcriptase domain (Fig. 2 A). As shown in Figure 5 , SART1 falls into one group with R1, RT elements and TRAS1, all of which are site-specific retrotransposons (12 ,19 ,31 ). This relationship indicates that they derived from a common evolutionary origin and have changed the specificity for their insertion sites. These four types of retrotransposon further branched into two subgroups, R1/TRAS1 and RT/SART1. Although the bootstrap value shown in Figure 5 is not high enough to establish the subgroups, another phylogenetic tree based on sequences for the entire Gag region supported this classification (data not shown).
Figure
We compared the putative zinc finger motifs and their flanking regions between the two subgroups (data not shown). In general, the amino acid sequences in these regions were conserved better within members of the same subgroup than between the two groups, supporting the above classification. The structural difference between the two subgroups, such as spacing Cys and His residues, is evident in the C-terminal region of the pol-ORF. The conserved cysteine-histidine motifs in the pol-ORF are HX17CXCX8-9HX4C in the TRAS1 group and HX13HX17CX2- CX13C in the SART1 group respectively. In retroviruses, an integrase domain containing an HX3HX22-32CX2C motif follows a reverse transcriptase domain (32 ) and in the corresponding location, a CXCX8-9HX4C motif is generally conserved among divergent species of non-LTR retrotransposons (31 ). Recent studies of the site-specific integration of the group II intron also demonstrated the importance of this domain (33 ). The CXCX8-9HX4C motif was maintained in the TRAS1/R1 but not in the SART1/RT group, in which the spacing between the first C and second C (CXC) was changed to an interval of two bases (CX2C). In addition, the H (histidine) residue in the CXCX8-9HX4C was changed to D (asparagine) in RT2.
Does each member of the same subgroup with a similar structure have a similar target? To answer this question, we compared the targets and their flanking sequences among four elements (R1, TRAS1, RT1/2 and SART1; Fig. 6 ). The target sequence of TRAS1 resembled that of R1Dm, because five to six CTA sequences appeared in the area examined. However, we could not find prominent similarity between the subgroup members, SART1 and RT1. During the integration of the elements, the regions underlined in the target sites are duplicated on both ends (13 ,31 ). However, it was difficult to identify such duplication in TRAS1 and SART1 because their targets, the telomeric repeats themselves, are tandemly duplicated.
Figure In this study, we characterized a new class of telomeric repeat- associated retrotransposons called SART1 from the silkworm, B.mori. Although another site-specific element, TRAS1, is also located in the telomeric repeats (19 ), the two retrotransposons have different target sequences for insertion. TRAS1 is inserted exactly between A and C of the (CCTAA)n, but SART1 is inserted in a reverse direction between T and A of the (TTAGG)n. Since the G rich strand in the telomeric repeat is synthesized toward the chromosomal end (18 ), the TRAS1 transcriptional unit is oriented toward the centromere, whereas SART1 is oriented toward the telomeric end (Figs 1 and 6 A).
The number of insertion sites for the site-specific retrotransposable elements is limited in the genome. Thus, repetitive sequences seem to be suitable as the `target' for this kind of element, since there are many of their copies in the genome. The best studied targets of this type of insertion are the tandemly arranged rRNA genes in insects, where several non-LTR retrotransposons are located at specific sites (10 ,11 ,13 ). Other site-specific elements so far known are also inserted into the sequences with high copy number (4 -7 ). Telomeric repeats should be compensated by a telomerase in germ cells even if they are disrupted by an insertion of transposable elements (34 ). Telomeric repeats on the chromosomal ends, of over 6-8 kb in the silkworm (18 ), are therefore stably maintained and seem optimal for harboring sequence-specific elements. The fact that there are far higher copy numbers of SART1 (~600) and TRAS1 (~300) in the haploid genome than of R1 and RT1 (<100), supports the above notion (1 ,19 ,28 ).
Like the TRAS1 elements, most genomic copies of SART1 were of full-length (6.7 kb) and the restriction sites highly conserved (Fig. 2 ). Most base changes observed in the coding regions of the two SART1 copies, units 2 and 3 in BS103, were synonymous substitutions. These findings suggest that most SART1 elements have functional ORFs and can still transpose. In fact, a duplication of the CCCGG sequence at the extreme 5'-end of the unit 2 (J12 in Fig. 4 ) implies that SART1 elements have inserted into the telomeric repeats by retrotransposition but not by recombination. Retrotransposition itself is believed to cause sequence variation among retrotransposon copies because of the high rate of error incorporation by the encoded reverse transcriptase (35 ). Therefore, the sequence uniformity and synonymous substitution of SART1 might be partly obtained under selective pressure by unequal crossover and gene conversion, as in the R1 and R2 insertion of 28S rDNA (36 ). Eickbush and his group concluded that the recombinational forces that work in the concerted evolution of the rRNA genes themselves can rapidly amplify and eliminate copies of R1 and R2 independent of their ability to retrotranspose. Although organisms rely on a telomerase, telomere-telomere recombination is also thought to proceed by gene conversion and results in a net increase in telomeric DNA (37 ). Thus, this kind of recombination in the telomere region may contribute to the structural uniformity of telomeric repeat associated retrotransposons. In this study, we found that most of the 23 base substitutions between units 2 and 3 of BS103 were located in two very restricted regions within the compared sequenced area over 5 kb (filled boxes in Fig. 1 ). Multiple rounds of unequal crossing over may explain the restricted localization of these base substitutions.
The RT domain of the R2 element of B.mori was shown to generate sequence specific breaks at its target site (15 ). Recent studies of the human L1 element may explain this target priming mechanism. An endonuclease domain (EN) in the N-terminus of the L1 RT cleaves target DNA with similar sequences for L1 insertion in the human genome (38 ). Since the EN sequence was also conserved in SART1 and TRAS1 (data not shown), their domains may be responsible for the target specific insertion into (TTAGG)n repeats. Another LTR-retrotransposon, Ty5, is suggestive of the site preference of SART1 insertion. Ty5 was found near the telomeric repeats of yeast chromosomes (39 ). This indicates SART1 and TRAS1 might prefer to insert into the telomeric repeats by recognizing proteins in the telomeric heterochromatin, as suggested for Ty5 insertion.
A phylogenetic tree based on the amino acid sequences of the reverse transcriptase domain and sequence comparisons in the zinc finger regions revealed that SART1 is more closely related to the RT elements of Anopheles than to any other retrotransposable elements (Fig. 5 ). The TRAS1 and R1 elements are closely related to the SART1/RT group but may be further categorized into another subgroup. The classification of these two subgroups based on structural similarities is not in accord with that judged by the chromosomal locus they interrupt. R1 and RT elements are inserted in the 28S rRNA gene, but TRAS1 and SART1 are inserted in the telomeric repeats (13 ,19 ,28 ). Besides having different target sequences, the relative orientation of the inserted elements to the target sequences is not consistent between two rDNA elements and between two telomeric repeat-associated elements (Fig. 6 ). The RT elements would be transcribed in the opposite direction to R1, R2 and R4 (Fig. 6 A). Considering the structural similarities, R1/TRAS1 and RT/SART1 groups may have different preference for chromosomal polarity in integration. The direction of the insertion may be related to the regulation of transcription and the integration of non-LTR retrotransposons. The same direction of the R1 element relative to the rRNA gene enables this element to be transcribed in combination with 28S rRNA (28 ). In contrast, we believe the telomeric repeats cannot be transcribed. Thus, flanked by telomeric repeats, TRAS1 and SART1 should possess their own promoters for transcription. In 5'-UTR regions, we did not find any obvious homology between TRAS1 and SART1 except for CCCG at the extreme ends (data not shown). Another interesting possibility is that the telomeric short repeats work as promoters for the retrotransposons.
The Drosophila telomeres are free from short repetitive sequences. The retroposons, HeT-A and TART, transpose to reconstitute the ends that balance the terminal loss during DNA replication (40 ,41 ). As shown in the phylogenetic tree (Fig. 5 ), TRAS1 and SART1 are not closely related to HeT-A and TART. At present, we have no evidence for the role of these telomeric repeat-specific elements on telomere function. However, we consider them important in understanding how telomeres evolved in insects. These telomere-associated elements of the silkworm may be able to regulate the organization and length of the telomere, which influences telomeric heterochromatin or to work as a buffer and an alternative backup pathway as in the Y' element in yeast (42 ). They may also increase the number of telomeric repeats during retrotransposition by short terminal duplication of the target sequence at both ends of the element but not at the chromosomal ends. TRAS1 and SART1 might have spread in the silkworm strains through vertical transmission as implicated for R1 and R2 (43 ). Studies of telomeric repeat-specific retroposons in other species and on overall telomere structure would help in understanding the evolution of telomeres in insects.
We thank Miss Y. Nakazato for help with FISH. This work was supported by grants for helping the study of FISH. This work was supported by a grant to H.F. (Evolutionary Life Systems Research Group) from the Ministry of Education, Science and Culture of Japan.
*To whom correspondence should be addressed. Tel: +81 3 3812 2111; Fax: +81 3 3816 1965; Email: haruh@uts2.s.u-tokyo.ac.jp
REFERENCES
Return
