| Nucleic Acids Research | Pages |
Tandem repeats finder: a program to analyze DNA sequences
Introduction
Methods
Probabilistic model of tandem repeats
Program outline
Program usage and output
Results
Human frataxin gene (Friedreich's ataxia), intron 1
Human [beta] T cell receptor locus sequence
Yeast chromosomes
The (27, 21, 48, 15, 135) cluster
The (13, 10, 36) cluster
Conclusion
Statistical issues
Mutational history
Acknowledgements
References
Tandem repeats finder: a program to analyze DNA sequences
ABSTRACT
INTRODUCTION
DNA molecules are subject to a variety of mutational events. One of the less well understood is tandem duplication in which a stretch of DNA, which we call the pattern, is converted into two or more copies, each following the preceding one in a contiguous fashion. For example we could have
. . . TCGGA . . .
. . . TCGGCGGCGGA . . .
in which the single occurrence of triplet CGG has been transformed into three identical, adjacent copies. The result of a tandem duplication event is termed a tandem repeat. Over time, individual copies within a tandem repeat may undergo additional, uncoordinated mutations so that typically, only approximate tandem copies are present.
Tandem repeats are presumed to occur frequently in genomic sequences, comprising perhaps 10% or more of the human genome. But, accurate characterization of the properties of tandem repeats has been limited by the inability to easily detect them. In recent years, the discovery of the trinucleotide repeat diseases has piqued interest in tandem repeats. These diseases, including fragile-X mental retardation (1), Huntington's disease (2), myotonic dystrophy (3), spinal and bulbar muscular atrophy (4) and Friedreich's ataxia (5), are the result of a dramatic increase in the number of copies of a trinucleotide pattern. In afflicted individuals, the copy number has been amplified from the normal range of tens of copies to hundreds or thousands, resulting in the disease. It has been suggested that the repeats themselves produce unusual physical structures in the DNA, causing polymerase slippage and the resulting amplification (6,7).
A more salubrious potential role for tandem repeats is gene regulation, in which the repeats may interact with transcription factors, alter the structure of the chromatin or act as protein binding sites (8-12). Tandem repeats have an apparent function in development of immune system cells. Breakpoints for immunoglobulin heavy chain switch recombination occur within tandem repeats preceding the heavy chain constant region genes (13). Because the number of copies in any specific tandem repeat is often polymorphic in the population, tandem repeats have proven useful in linkage analysis and DNA fingerprinting (14,15). Recent studies of allele diversity at tandem repeat loci have provided support for the `Out of Africa' hypothesis of modern human evolution (16,17).
To date, much of the research on tandem repeats has focused on those with short patterns (2-5 nt), presumably because such repeats are relatively easy to spot by eye in printed sequences. Repeats with long patterns (sometimes called variable number of tandem repeats or VNTRs) are notoriously harder to detect [even when the copies are identical, for example see Benson (18) for the 101 bp repeats undetected in Hellman et al. (19), a paper on the role of tandem repeats as hot spots for recombination]. Given the importance of known and potential biological roles for tandem repeats and their usefulness in other biological studies, it seemed essential to us to develop an efficient and sensitive algorithm for detecting these repeats so that they may receive further study.
A number of algorithms already exist which either directly or indirectly detect tandem repeats. All suffer from significant limitations. One group of algorithms is based on computing alignment matrices (20-22). Their primary limitation is excessive running time. The best algorithm in this group (22) has time complexity O[n2 polylog(n)] for a sequence of length n and would not be useful for sequences much longer than several thousand bases. (In this paper we report on our analysis of sequences up to 700 kb in length.)
Another group of algorithms finds tandem repeats indirectly using methods from the field of data compression. An algorithm by Milosavljevic and Jurka (23) detects `simple sequences',i.e. mixtures of fragments that occur elsewhere. Simple sequences may or may not contain tandem repeats and this algorithm makes no attempt to deduce a repeated pattern. An algorithm by Rivals et al. (24) bases the compression on the presence of small preselected patterns (all those of size 1-3) and is not readily generalized to longer patterns for which there is an algorithmic need. To their credit, both of these methods provide a measure of statistical significance based on the amount of compression.
Another collection of algorithms aim more directly at finding tandem repeats. Of these, one exact algorithm (25) is limited by its definition of approximate patterns, requiring that two copies differ either by k or fewer substitutions (Hamming distance) or by k or fewer substitutions and indels (unit cost edit distance). Besides treating substitutions and indels as equals, the requirement for a fixed number of differences rather than a percentage difference is unsatisfactory. Any fixed number of differences suitable for small patterns (say five differences for patterns of size 20) would be unreasonably restrictive for larger patterns (five differences for patterns of size 100). Conversely, any fixed number for large patterns would allow too much variability in small patterns. A heuristic algorithm by Karlin et al. (26) is similarly hampered by the use of matching blocks separated by error blocks of fixed size. The remaining two algorithms in this group require input from the user which limits their usefulness. An earlier heuristic algorithm by Benson (27) finds tandem repeats only if they have a pattern size which is specified in advance. An exact algorithm by Myers and Sagot (28) (limited to patterns with size of at most 40 bases) requires that the approximate pattern size and a range for the number of copies be specified.
The algorithm (29) presented in this paper is designed to overcome many of the aforementioned limitations: (i) it uses the method of k-tuple matching to avoid the need for full scale alignment matrix computations; (ii) it requires no a priori knowledge of the pattern, pattern size or number of copies; (iii) there are no restrictions on the size of the repeats that can be detected; (iv) it uses percentage differences between adjacent copies and treats substitutions and indels separately; (v) it determines a consensus pattern for the smallest repetitive unit in the tandem repeat. The program has already been used as a preprocessor in a new alignment algorithm where tandem duplication augments the standard mutation set of insertion, deletion and substitution (18).
A number of ideas incorporated into this new algorithm have been utilized in earlier homology detection programs (30,31), yet the goals and methods differ. Instead of looking for highest scoring homologous regions, the algorithm looks for tandem repeats which are often hidden in larger homologous regions or which may fall well below the level of significance required for other programs to report a match. The detection criteria are based on a stochastic model of tandem repeats specified by percent identity and frequency of insertions and deletions, rather than some minimal alignment score. Finally, the program aligns repeat copies against a consensus sequence, revealing patterns of common mutations. These patterns yield insight into the history of duplications that produced the tandem repeat, thus providing a potentially valuable tool for phylogenetic research.
The remainder of this paper is organized as follows. In Methods we present a probabilistic model of tandem repeats, an algorithm overview and the set of criteria that guide the recognition process. In the Discussion we present our analysis of the frataxin (Friedreich's ataxia) gene sequence, the human [beta] T cell receptor locus and two yeast chromosomes. Finally, in the Conclusion we describe directions for future research.
METHODS
Probabilistic model of tandem repeats
We model alignment of two tandem copies of a pattern of length n by a sequence of n-independent Bernoulli trials (coin tosses). The probability of success, P (heads), which we also call pM or matching probability, represents the average percent identity between the copies. Each head in the Bernoulli sequence is interpreted as a match between aligned nucleotides. Each tail is a mismatch, insertion or deletion. A second probability, pI or indel probability, specifies the average percentage of insertions and deletions between the copies. Figure
Figure 1. Two adjacent copies from a tandem repeat in the human [beta] T cell receptor locus sequence (37). H indicates a match, T indicates a mismatch, insertion or deletion. While Figure
Program outline
Our program has detection and analysis components. The detection component uses a set of statistically based criteria to find candidate tandem repeats. The analysis component attempts to produce an alignment for each candidate and if successful gathers a number of statistics about the alignment (percent identity, percent indels) and the nucleotide sequence (composition, entropy measure).
Detection component. We assume that adjacent copies of any pattern will contain some matching characters in corresponding positions. Just how many matches and how the distance between those matches should vary depend on the fixed values of pM and pI. In the next section, we develop the statistical criteria to answer these questions. Here, we describe how the matches are detected.The algorithm looks for matching nucleotides separated by a common distance d, which is not specified in advance. For reasons of efficiency it looks for runs of k matches, which we call k-tuple matches. A k-tuple is a window of k consecutive characters from the nucleotide sequence. Matching k-tuples are two windows with identical contents and if aligned in the Bernoulli model would produce a run of k heads. Because we limit ourselves to k-tuple matches, we will not detect all matching characters. For example, if k = 6 and two windows contain TCATGT and TCTTGT we will not know that there are 5 matching characters because the window contents are not identical. Put in terms of the Bernoulli model, the aligned windows would be represented by the sequence HHTHHH, which is not a run of 6 heads.
The basic operation of the detection component is illustrated in Figure
Figure 2. Tandem repeats are detected by scanning the sequence with a small window, determining the distance between exact matches and testing the statistical criteria. When a position i is added to Hp, we scan Hp for all earlier occurrences of p. Let one earlier occurrence be at j. Since i and j are the indices of matching k-tuples, the distance d = i - j is a possible pattern size for a tandem repeat. For the criteria tests, we need information about other k-tuple matches at the same distance d where the leading tuple occurs in the sequence between j and i. A distance list Dd stores this information. It can be thought of as a sliding window of length d which keeps track of the positions of matches and their total. List Dd is updated every time a match at distance d is detected. Position i of the match is stored on the list and the total is increased. The right end of the window is set to i and matches that occurred before j = i - d are dropped from the list and subtracted from the total. Lists for other nearby distances are also updated at this time (Random Walk Distribution in the next section), but only to reset their right ends to i and remove matches that have been passed by the advancing windows. Information in the updated distance lists is used for the sum of heads and apparent size criteria tests as described in the next section. If both tests are passed, the program moves on to the analysis component.
. We set [Delta]dmax =
. For example if pI = 0.1 and d = 100, then [Delta]dmax = 7.
Figure 3. Insertions and deletions change the distance between exact matches. The inserted character X causes one pair of matching k-tuples to be separated by distance d + 1 while another pair is separated only by distance d. Apparent size distribution. This distribution is used to distinguish between tandem repeats and non-tandem direct repeats (Fig. Figure 4. We must distinguish between (a) a tandem repeat (leading tuples in k-tuple matches spread over the interval between i and j) and (b) a non-tandem, direct repeat (leading tuples concentrated on the right). Matching k-tuples are indicated by the shaded boxes. w is the distance between the first and last leading tuple.
For example, if pM = 0.75 and k = 5 then we need at least 31 trials (coin tosses) to have a 95% chance of seeing a run of 5 heads. For patterns smaller than 31 characters, we need to use a smaller k-tuple. The waiting time distribution allows us to balance the running time and sensitivity of our algorithm by picking a set of tuple sizes, each applying to a different range of pattern sizes. The program processes the sequence once, simultaneously checking these different tuple sizes. We require that the smallest pattern for tuple size k have a sum of heads criterion of at least k + 1. Table 1 shows the range of tuple sizes and the corresponding pattern sizes currently used by the program.
Waiting time distribution. This distribution is used to pick tuple sizes. Tuple size has a significant inverse effect on the running time of the program because increasing tuple size causes an exponential decrease in the expected number of tuple matches. If the nucleotides occur with equal frequency, then increasing the tuple size by [Delta]k increases the average distance between randomly matching tuples by a factor of 4[Delta]k. If k = 5, the average distance between random matches is ~1 kb, but if k = 7, the average distance is ~16 kb. Thus, by using a larger tuple size, we keep the history lists short. On the other hand, increasing the tuple size decreases the chance of noticing approximate copies because they may not contain a long, unbroken run of matches. Let the random variable Tk,pM = the number of iid Bernoulli trials with success probability pM until the first occurrence of a run of k successes. Tk,pM follows the geometric distribution of order k. If we let p = pM and q = 1 - p then the exact probability P(Tk,pM = x) for x [ge] 0 is given by the recursive formula (34)

Table 1.
Program usage and output
Input to the program consists of a sequence file and the following parameters: (i) alignment weights for match, mismatch and indels; (ii) pM and pI; (iii) a minimum size for patterns to report; (iv) a minimum alignment score to report. We have developed a web based interface for the program. Using an HTML form at c3.biomath.mssm.edu/trf.html , the user provides an input DNA sequence file. Defaults can be used for the remaining parameters. After program execution, two files are returned. The first is a summary table describing the location and statistical properties of the tandem repeats found. The second contains the alignment of each repeat with its consensus sequence. The files are linked so that selecting an entry from the table opens a second browser window which contains the proper alignment. The summary table includes the following information: (i) indices of the repeat in the sequence; (ii) period size; (iii) number of copies aligned with the consensus pattern; (iv) size of the consensus pattern (may differ from the period size); (v) percent of matches between adjacent copies overall; (vi) percent of indels between adjacent copies overall; (vii) alignment score; (viii) percent composition for each of the four nucleotides; (ix) entropy measure based on percent composition.
RESULTS
To demonstrate the capabilities of our program, we used it to analyze four sequences, the human frataxin gene sequence (Friedreich's ataxia) (5), the human [beta] T cell receptor locus sequence (37) and two yeast chromosomes (I and VIII). [The frataxin gene sequence and the human [beta] T cell receptor sequences were obtained from GenBank. The yeast chromosomes sequences were obtained via ftp from ftp.ebi.ac.uk directory pub/databases/yeast in files chri_230209.ascii and chrviii_562638.ascii. Indexing in this paper is relative to the sequences in these files. Data file accession numbers for these sequences are: frataxin gene promoter and intron 1, U43748; human T cell receptor, L36092; yeast chromosome 1, U12980, L20125, L05146, L22015, L28920; yeast chromosome 8, U11583, U11582, U11581, U10555, U10400, U10399, U00062, U00061, U10556, U00060, U00059, U10398, U10397, U00027, U00028, U00030, U00029.] In our analysis, we searched for all pattern sizes between 1 and 500 bases (the implementation's current upper size limit, to be extended in subsequent versions). We used one of two sets of alignment parameters (match, mismatch, gap), either (+2,-7,-7) or (+2,-5,-7). Only those repeats scoring at least 50 with these parameters are reported. Occasionally, the same repeat is reported at different pattern sizes. We have omitted these redundancies.
We performed two searches on each sequence, using different conservation parameter values, (pM = 0.75, pI = 0.20) and(pM = 0.80, pI = 0.10). While the first search is slower than the second, the detected repeats are nearly identical. Table 2 shows running times of the program and Tables 4-4 list the tandem repeats found.
Table 2.
Human frataxin gene (Friedreich's ataxia), intron 1
Friedreich's ataxia is one of the triplet repeat diseases (5). It is caused by copy number expansion of the triplet GAA in the first intron of the frataxin gene. Table 4 lists the repeats found in the sequence. Besides the triplet repeat, our program found two others which were apparently unknown, a 44 bp pattern and a 14 bp pattern. Figure
Figure 5. The program's alignment of the 44 bp repeat from the frataxin gene intron 1 (Friedreich's ataxia). This repeat was apparently unknown. The actual sequence is on the top; the consensus sequence is on the bottom. Each pair of lines represents one period. Position of the beginning of the repeat is relative to the detected pattern when the criteria were met and is therefore arbitrary. Symbol * indicates a mismatch. Summary refers to matches, mismatches and indels between adjacent copies in the sequence, not between the sequence and the consensus pattern. Table 3.
Human [beta] T cell receptor locus sequence
This sequence (37) contains a family of immune recognition coding elements, the T cell receptor variable, diversity, joining and constant gene segments. It was selected for its size and because many tandem repeats within the sequence had already been identified. Table 5 lists the new repeats we found. Of the 83 repeats that we found, 38 were previously annotated and most of those were for patterns of size 5 or smaller. We missed 6 annotated repeats: 4 dinucleotide repeats and 1 tetranucleotide repeat (alignment scores were below our cut-off) and 1 repeat with period size 10 567 bases (beyond the current implementation's pattern upper size limit). Of the 45 unannotated repeats, 13 have short patterns (2-6 bp) and may be polymorphic and thus useful for linkage analysis. Six unannotated repeats have large pattern sizes (116, 65, 52, 49, 34 and 30 bp). The 116 base pattern is also reported at size 39 with a lower scoring alignment. The annotated 60 base pattern repeat (indices 12596-13266) is indicative of the program's ability to find repeats with substantial amounts of mutation between adjacent copies (74% matching characters and 7% indels overall).
Table 5.
Yeast chromosomes
Tables 6 and 6 list the tandem repeats found for the yeast sequences. Of special interest are the clusters of tandem repeats which show up repeatedly at the ends of the chromosomes, suggesting recent swapping of the ends. Chromosome 8, in particular, has two different clusters on its right end.
Table 6.
Table 7.
The (27, 21, 48, 15, 135) cluster
The FLO1 gene and its paralogous pseudogenes in chromosomes 1 and 8 contain a cluster of 5 tandem repeats with pattern sizes 27, 21, 48, 15 and 135. We designate these clusters C1 and C2 (adjacent on the left end of chromosome 1), C3 (right end, opposite strand) and C4 (right end of chromosome 8). The 27, 48 and 135 base patterns are not reported in every cluster in Tables 6 and 6. Subsequent analysis of the surrounding sequences, however, revealed that every pattern is present but not necessarily as two or more copies (Table 3). For each pattern size, the number of copies varies among the four clusters. More specifically, no cluster is identical in its copy number to any other cluster. This implies that duplication or excision events (deletion of copies) have occurred since the separate clusters were incorporated into the chromosomes. The sequences around these clusters also reveal close homology. For example, C3 and C4 are nearly identical over 18 000 bases and C2 and C3 display homology over 15 000 bases.
The (13, 10, 36) cluster
A cluster of 3 tandem repeats with pattern sizes 13, 10 and 36 bases appears on both ends of chromosome 8 (Table 7). The 36 bp pattern also appears on the left end (low index numbers) of chromosome 6 (not shown). For the 36 bp pattern, each occurrence has a different copy number. The 10 and 13 bp patterns are identical in their occurrences. Surrounding sequences comprising 4200 bases are nearly identical for these three clusters.
CONCLUSION
In this paper, we have presented a new algorithm for finding tandem repeats in DNA sequences without the need to specify either the pattern or pattern size. The algorithm is based on the detection of k-tuple matches. It uses a probabilisitic model of tandem repeats and a collection of statistical criteria based on that model. We have demonstrated the speed and utility of the algorithm by analyzing four sequences ranging in size up to 700 kb. Several avenues for future research are raised by this work, including methods to estimate statistical significance for tandem repeats and algorithms to determine plausible mutational histories.
Statistical issues
We have yet to develop a good statistical significance measure for tandem repeats. For now, we use a cut-off alignment score based on simulations with random sequences. Difficulties include the local variation in nucleotide content in real sequences, which is decidedly non-random, and the problem of accounting for copy number as well as total repeat length. Estimates of significance developed in Benson and Waterman (27) are too high in this application because they apply to tandem repeats of one pattern size only, rather than the range of sizes considered here.
Mutational history
Analyzing the mutational history of tandem repeats requires utilizing the pattern of mutations among adjacent copies to describe the interwoven progression of substitutions, indels and duplication/excision events leading from a single copy of the pattern to the present day sequence. Such histories can suggest how the boundaries and size of the duplication unit vary and may reveal details about the duplication mechanism.
ACKNOWLEDGEMENTS
The author would like to thank Xiaoping Su for his help in analyzing the sum of heads criterion, Astrid Jervis for her help in simulating many of the statistical measures and examining the program output and Lan Dong for her help with some of the programing and examining the output. Thanks also to Mike Waterman, Richard Arratia and Rolf Backofen for helpful discussions. This work was partially supported by NSF grant CCR-9623532.
REFERENCES
This article has been cited by other articles:
This page is run by Oxford University Press, Great Clarendon Street, Oxford OX2 6DP, as part of the OUP Journals
Comments and feedback: www-admin{at}oup.co.uk
Last modification: 23 Dec 1998
Copyright©Oxford University Press, 1998.
![]()
CiteULike
Connotea
Del.icio.us What's this?
![]()
![]()

![]()
![]()
![]()
A. Merkel and N. Gemmell
Detecting short tandem repeats from genome data: opening the software black box
Brief Bioinform,
September 1, 2008;
9(5):
355 - 366.
[Abstract]
[Full Text]
[PDF]
![]()
![]()
![]()

![]()
![]()
![]()
Y. Goto, D. Carter, and S. G. Reed
Immunological Dominance of Trypanosoma cruzi Tandem Repeat Proteins
Infect. Immun.,
September 1, 2008;
76(9):
3967 - 3974.
[Abstract]
[Full Text]
[PDF]
![]()
![]()
![]()

![]()
![]()
![]()
Q. Zhou, G. Zhang, Y. Zhang, S. Xu, R. Zhao, Z. Zhan, X. Li, Y. Ding, S. Yang, and W. Wang
On the origin of new genes in Drosophila
Genome Res.,
September 1, 2008;
18(9):
1446 - 1455.
[Abstract]
[Full Text]
[PDF]
![]()
![]()
![]()

![]()
![]()
![]()
T. J. Pugh, A. D. Delaney, N. Farnoud, S. Flibotte, M. Griffith, H. I. Li, H. Qian, P. Farinha, R. D. Gascoyne, and M. A. Marra
Impact of whole genome amplification on analysis of copy number variants
Nucleic Acids Res.,
August 1, 2008;
36(13):
e80 - e80.
[Abstract]
[Full Text]
[PDF]
![]()
![]()
![]()

![]()
![]()
![]()
A.-L. Abraham, E. P. C. Rocha, and J. Pothier
Swelfe: a detector of internal repeats in sequences and structures
Bioinformatics,
July 1, 2008;
24(13):
1536 - 1537.
[Abstract]
[Full Text]
[PDF]
![]()
![]()
![]()

![]()
![]()
![]()
D. Ames, N. Murphy, T. Helentjaris, N. Sun, and V. Chandler
Comparative Analyses of Human Single- and Multilocus Tandem Repeats
Genetics,
July 1, 2008;
179(3):
1693 - 1704.
[Abstract]
[Full Text]
[PDF]
![]()
![]()
![]()

![]()
![]()
![]()
N. Karaiskou, L. Buggiotti, E. Leder, and C. R. Primmer
High Degree of Transferability of 86 Newly Developed Zebra Finch EST-Linked Microsatellite Markers in 8 Bird Species
J. Hered.,
June 26, 2008;
(2008)
esn052v1.
[Abstract]
[Full Text]
[PDF]
![]()
![]()
![]()

![]()
![]()
![]()
A. M. M. Abd-Alla, F. Cousserans, A. G. Parker, J. A. Jehle, N. J. Parker, J. M. Vlak, A. S. Robinson, and M. Bergoin
Genome Analysis of a Glossina pallidipes Salivary Gland Hypertrophy Virus Reveals a Novel, Large, Double-Stranded Circular DNA Virus
J. Virol.,
May 1, 2008;
82(9):
4595 - 4611.
[Abstract]
[Full Text]
[PDF]
![]()
![]()
![]()

![]()
![]()
![]()
M. A. Mallory, R. V. Hall, A. R. McNabb, D. B. Pratt, E. N. Jellen, and P. J. Maughan
Development and Characterization of Microsatellite Markers for the Grain Amaranths
Crop Sci.,
May 1, 2008;
48(3):
1098 - 1106.
[Abstract]
[Full Text]
[PDF]
![]()
![]()
![]()

![]()
![]()
![]()
K. E. Heath, J. Argente, V. Barrios, J. Pozo, F. Diaz-Gonzalez, G. A. Martos-Moreno, M. Caimari, R. Gracia, and A. Campos-Barros
Primary Acid-Labile Subunit Deficiency due to Recessive IGFALS Mutations Results in Postnatal Growth Deficit Associated with Low Circulating Insulin Growth Factor (IGF)-I, IGF Binding Protein-3 Levels, and Hyperinsulinemia
J. Clin. Endocrinol. Metab.,
May 1, 2008;
93(5):
1616 - 1624.
[Abstract]
[Full Text]
[PDF]
![]()
![]()
![]()

![]()
![]()
![]()
N. Warthmann, S. Das, C. Lanz, and D. Weigel
Comparative Analysis of the MIR319a MicroRNA Locus in Arabidopsis and Related Brassicaceae
Mol. Biol. Evol.,
May 1, 2008;
25(5):
892 - 902.
[Abstract]
[Full Text]
[PDF]
![]()
![]()
![]()

![]()
![]()
![]()
K. E. V. Sperry, S. Kathariou, J. S. Edwards, and L. A. Wolf
Multiple-Locus Variable-Number Tandem-Repeat Analysis as a Tool for Subtyping Listeria monocytogenes Strains
J. Clin. Microbiol.,
April 1, 2008;
46(4):
1435 - 1450.
[Abstract]
[Full Text]
[PDF]
![]()
![]()
![]()

![]()
![]()
![]()
D. S. Johnson, W. Li, D. B. Gordon, A. Bhattacharjee, B. Curry, J. Ghosh, L. Brizuela, J. S. Carroll, M. Brown, P. Flicek, et al.
Systematic evaluation of variability in ChIP-chip experiments using predefined DNA targets
Genome Res.,
March 1, 2008;
18(3):
393 - 403.
[Abstract]
[Full Text]
[PDF]
![]()
![]()
![]()

![]()
![]()
![]()
J. J. van Aartsen
The Klebsiella pheV tRNA locus: a hotspot for integration of alien genomic islands
Bioscience Horizons,
March 1, 2008;
1(1):
51 - 60.
[Abstract]
[Full Text]
[PDF]
![]()
![]()
![]()

![]()
![]()
![]()
S.-M. Chaw, A. Chun-Chieh Shih, D. Wang, Y.-W. Wu, S.-M. Liu, and T.-Y. Chou
The Mitochondrial Genome of the Gymnosperm Cycas taitungensis Contains a Novel Family of Short Interspersed Elements, Bpu Sequences, and Abundant RNA Editing Sites
Mol. Biol. Evol.,
March 1, 2008;
25(3):
603 - 615.
[Abstract]
[Full Text]
[PDF]
![]()
![]()
![]()

![]()
![]()
![]()
N. R. Mullane, M. Ryan, C. Iversen, M. Murphy, P. O'Gaora, T. Quinn, P. Whyte, P. G. Wall, and S. Fanning
Development of Multiple-Locus Variable-Number Tandem-Repeat Analysis for the Molecular Subtyping of Enterobacter sakazakii
Appl. Envir. Microbiol.,
February 15, 2008;
74(4):
1223 - 1231.
[Abstract]
[Full Text]
[PDF]
![]()
![]()
![]()

![]()
![]()
![]()
X. Li, T. Kahveci, and A. M. Settles
A novel genome-scale repeat finder geared towards transposons
Bioinformatics,
February 15, 2008;
24(4):
468 - 476.
[Abstract]
[Full Text]
[PDF]
![]()
![]()
![]()

![]()
![]()
![]()
T. E. Macdonald, C. H. Helma, L. O. Ticknor, P. J. Jackson, R. T. Okinaka, L. A. Smith, T. J. Smith, and K. K. Hill
Differentiation of Clostridium botulinum Serotype A Strains by Multiple-Locus Variable-Number Tandem-Repeat Analysis
Appl. Envir. Microbiol.,
February 1, 2008;
74(3):
875 - 882.
[Abstract]
[Full Text]
[PDF]
![]()
![]()
![]()

![]()
![]()
![]()
F. Kawamori, M. Hiroi, T. Harada, K. Ohata, K. Sugiyama, T. Masuda, and N. Ohashi
Molecular typing of Japanese Escherichia coli O157 : H7 isolates from clinical specimens by multilocus variable-number tandem repeat analysis and PFGE
J. Med. Microbiol.,
January 1, 2008;
57(1):
58 - 63.
[Abstract]
[Full Text]
[PDF]
![]()
![]()
![]()

![]()
![]()
![]()
Y. D. Kelkar, S. Tyekucheva, F. Chiaromonte, and K. D. Makova
The genome-wide determinants of human and chimpanzee microsatellite evolution
Genome Res.,
January 1, 2008;
18(1):
30 - 38.
[Abstract]
[Full Text]
[PDF]
![]()
![]()
![]()

![]()
![]()
![]()
B. Tawari, I. K. M. Ali, C. Scott, M. A. Quail, M. Berriman, N. Hall, and C. G. Clark
Patterns of Evolution in the Unique tRNA Gene Arrays of the Genus Entamoeba
Mol. Biol. Evol.,
January 1, 2008;
25(1):
187 - 198.
[Abstract]
[Full Text]
[PDF]
![]()
![]()
![]()

![]()
![]()
![]()
M. Legendre and K. J. Verstrepen
Using the SERV Applet to Detect Tandem Repeats in DNA Sequences and to Predict Their Variability
CSH Protocols,
January 1, 2008;
2008(2):
pdb.ip50 - pdb.ip50.
[Abstract]
[Full Text]
![]()
![]()
![]()

![]()
![]()
![]()
J. P. McCutcheon and N. A. Moran
Parallel genomic evolution and metabolic interdependence in an ancient symbiosis
PNAS,
December 4, 2007;
104(49):
19392 - 19397.
[Abstract]
[Full Text]
[PDF]
![]()
![]()
![]()

![]()
![]()
![]()
D. E. Stage and T. H. Eickbush
Sequence variation within the rRNA gene loci of 12 Drosophila species
Genome Res.,
December 1, 2007;
17(12):
1888 - 1897.
[Abstract]
[Full Text]
[PDF]
![]()
![]()
![]()

![]()
![]()
![]()
M. Legendre, N. Pochet, T. Pak, and K. J. Verstrepen
Sequence-based estimation of minisatellite and microsatellite repeat variability
Genome Res.,
December 1, 2007;
17(12):
1787 - 1796.
[Abstract]
[Full Text]
[PDF]
![]()
![]()
![]()

![]()
![]()
![]()
M. A. Huntley and A. G. Clark
Evolutionary Analysis of Amino Acid Repeats across the Genomes of 12 Drosophila Species
Mol. Biol. Evol.,
December 1, 2007;
24(12):
2598 - 2609.
[Abstract]
[Full Text]
[PDF]
![]()
![]()
![]()

![]()
![]()
![]()
C. M. Bergman and H. Quesneville
Discovering and detecting transposable elements in genome sequences
Brief Bioinform,
November 1, 2007;
8(6):
382 - 392.
[Abstract]
[Full Text]
[PDF]
![]()
![]()
![]()

![]()
![]()
![]()
S.-Y. Liang, H. Watanabe, J. Terajima, C.-C. Li, J.-C. Liao, S. K. Tung, and C.-S. Chiou
Multilocus Variable-Number Tandem-Repeat Analysis for Molecular Typing of Shigella sonnei
J. Clin. Microbiol.,
November 1, 2007;
45(11):
3574 - 3580.
[Abstract]
[Full Text]
[PDF]
![]()
![]()
![]()

![]()
![]()
![]()
J. K. Hane, R. G.T. Lowe, P. S. Solomon, K.-C. Tan, C. L. Schoch, J. W. Spatafora, P. W. Crous, C. Kodira, B. W. Birren, J. E. Galagan, et al.
Dothideomycete Plant Interactions Illuminated by Genome Sequencing and EST Analysis of the Wheat Pathogen Stag
