Skip Navigation

This Article
Right arrow Abstract Freely available
Right arrow Print PDF (203K) Freely available
Right arrow Alert me when this article is cited
Right arrow Alert me if a correction is posted
Services
Right arrow Email this article to a friend
Right arrow Similar articles in this journal
Right arrow Similar articles in ISI Web of Science
Right arrow Similar articles in PubMed
Right arrow Alert me to new issues of the journal
Right arrow Add to My Personal Archive
Right arrow Download to citation manager
Right arrow Search for citing articles in:
ISI Web of Science (839)
Right arrowRequest Permissions
Right arrow Commercial Re-use Guidelines
for Open Access NAR Content
Google Scholar
Right arrow Articles by Benson, G.
Right arrow Search for Related Content
PubMed
Right arrow PubMed Citation
Right arrow Articles by Benson, G.
Social Bookmarking
 Add to CiteULike   Add to Connotea   Add to Del.icio.us  
What's this?

Nucleic Acids Research Pages 573-580  


Tandem repeats finder: a program to analyze DNA sequences
Introduction
Methods
   Probabilistic model of tandem repeats
   Program outline
   Program usage and output
Results
   Human frataxin gene (Friedreich's ataxia), intron 1
   Human [beta] T cell receptor locus sequence
   Yeast chromosomes
   The (27, 21, 48, 15, 135) cluster
   The (13, 10, 36) cluster
Conclusion
   Statistical issues
   Mutational history
Acknowledgements
References


Tandem repeats finder: a program to analyze DNA sequences

Tandem repeats finder: a program to analyze DNA sequences

Gary Benson*

Department of Biomathematical Sciences, Mount Sinai School of Medicine, New York, NY 10029-6574, USA

Received September 10, 1998; Revised and Accepted November 12, 1998

ABSTRACT

A tandem repeat in DNA is two or more contiguous, approximate copies of a pattern of nucleotides. Tandem repeats have been shown to cause human disease, may play a variety of regulatory and evolutionary roles and are important laboratory and analytic tools. Extensive knowledge about pattern size, copy number, mutational history, etc. for tandem repeats has been limited by the inability to easily detect them in genomic sequence data. In this paper, we present a new algorithm for finding tandem repeats which works without the need to specify either the pattern or pattern size. We model tandem repeats by percent identity and frequency of indels between adjacent pattern copies and use statistically based recognition criteria. We demonstrate the algorithm's speed and its ability to detect tandem repeats that have undergone exten-sive mutational change by analyzing four sequen-ces: the human frataxin gene, the human [beta] T cellreceptor locus sequence and two yeast chromo-somes. These sequences range in size from 3 kb up to 700 kb. A World Wide Web server interface atc3.biomath.mssm.edu/trf.html has been established for automated use of the program.

INTRODUCTION

DNA molecules are subject to a variety of mutational events. One of the less well understood is tandem duplication in which a stretch of DNA, which we call the pattern, is converted into two or more copies, each following the preceding one in a contiguous fashion. For example we could have

. . . TCGGA . . . . . . TCGGCGGCGGA . . .

in which the single occurrence of triplet CGG has been transformed into three identical, adjacent copies. The result of a tandem duplication event is termed a tandem repeat. Over time, individual copies within a tandem repeat may undergo additional, uncoordinated mutations so that typically, only approximate tandem copies are present.

Tandem repeats are presumed to occur frequently in genomic sequences, comprising perhaps 10% or more of the human genome. But, accurate characterization of the properties of tandem repeats has been limited by the inability to easily detect them. In recent years, the discovery of the trinucleotide repeat diseases has piqued interest in tandem repeats. These diseases, including fragile-X mental retardation (1), Huntington's disease (2), myotonic dystrophy (3), spinal and bulbar muscular atrophy (4) and Friedreich's ataxia (5), are the result of a dramatic increase in the number of copies of a trinucleotide pattern. In afflicted individuals, the copy number has been amplified from the normal range of tens of copies to hundreds or thousands, resulting in the disease. It has been suggested that the repeats themselves produce unusual physical structures in the DNA, causing polymerase slippage and the resulting amplification (6,7).

A more salubrious potential role for tandem repeats is gene regulation, in which the repeats may interact with transcription factors, alter the structure of the chromatin or act as protein binding sites (8-12). Tandem repeats have an apparent function in development of immune system cells. Breakpoints for immunoglobulin heavy chain switch recombination occur within tandem repeats preceding the heavy chain constant region genes (13). Because the number of copies in any specific tandem repeat is often polymorphic in the population, tandem repeats have proven useful in linkage analysis and DNA fingerprinting (14,15). Recent studies of allele diversity at tandem repeat loci have provided support for the `Out of Africa' hypothesis of modern human evolution (16,17).

To date, much of the research on tandem repeats has focused on those with short patterns (2-5 nt), presumably because such repeats are relatively easy to spot by eye in printed sequences. Repeats with long patterns (sometimes called variable number of tandem repeats or VNTRs) are notoriously harder to detect [even when the copies are identical, for example see Benson (18) for the 101 bp repeats undetected in Hellman et al. (19), a paper on the role of tandem repeats as hot spots for recombination]. Given the importance of known and potential biological roles for tandem repeats and their usefulness in other biological studies, it seemed essential to us to develop an efficient and sensitive algorithm for detecting these repeats so that they may receive further study.

A number of algorithms already exist which either directly or indirectly detect tandem repeats. All suffer from significant limitations. One group of algorithms is based on computing alignment matrices (20-22). Their primary limitation is excessive running time. The best algorithm in this group (22) has time complexity O[n2 polylog(n)] for a sequence of length n and would not be useful for sequences much longer than several thousand bases. (In this paper we report on our analysis of sequences up to 700 kb in length.)

Another group of algorithms finds tandem repeats indirectly using methods from the field of data compression. An algorithm by Milosavljevic and Jurka (23) detects `simple sequences',i.e. mixtures of fragments that occur elsewhere. Simple sequences may or may not contain tandem repeats and this algorithm makes no attempt to deduce a repeated pattern. An algorithm by Rivals et al. (24) bases the compression on the presence of small preselected patterns (all those of size 1-3) and is not readily generalized to longer patterns for which there is an algorithmic need. To their credit, both of these methods provide a measure of statistical significance based on the amount of compression.

Another collection of algorithms aim more directly at finding tandem repeats. Of these, one exact algorithm (25) is limited by its definition of approximate patterns, requiring that two copies differ either by k or fewer substitutions (Hamming distance) or by k or fewer substitutions and indels (unit cost edit distance). Besides treating substitutions and indels as equals, the requirement for a fixed number of differences rather than a percentage difference is unsatisfactory. Any fixed number of differences suitable for small patterns (say five differences for patterns of size 20) would be unreasonably restrictive for larger patterns (five differences for patterns of size 100). Conversely, any fixed number for large patterns would allow too much variability in small patterns. A heuristic algorithm by Karlin et al. (26) is similarly hampered by the use of matching blocks separated by error blocks of fixed size. The remaining two algorithms in this group require input from the user which limits their usefulness. An earlier heuristic algorithm by Benson (27) finds tandem repeats only if they have a pattern size which is specified in advance. An exact algorithm by Myers and Sagot (28) (limited to patterns with size of at most 40 bases) requires that the approximate pattern size and a range for the number of copies be specified.

The algorithm (29) presented in this paper is designed to overcome many of the aforementioned limitations: (i) it uses the method of k-tuple matching to avoid the need for full scale alignment matrix computations; (ii) it requires no a priori knowledge of the pattern, pattern size or number of copies; (iii) there are no restrictions on the size of the repeats that can be detected; (iv) it uses percentage differences between adjacent copies and treats substitutions and indels separately; (v) it determines a consensus pattern for the smallest repetitive unit in the tandem repeat. The program has already been used as a preprocessor in a new alignment algorithm where tandem duplication augments the standard mutation set of insertion, deletion and substitution (18).

A number of ideas incorporated into this new algorithm have been utilized in earlier homology detection programs (30,31), yet the goals and methods differ. Instead of looking for highest scoring homologous regions, the algorithm looks for tandem repeats which are often hidden in larger homologous regions or which may fall well below the level of significance required for other programs to report a match. The detection criteria are based on a stochastic model of tandem repeats specified by percent identity and frequency of insertions and deletions, rather than some minimal alignment score. Finally, the program aligns repeat copies against a consensus sequence, revealing patterns of common mutations. These patterns yield insight into the history of duplications that produced the tandem repeat, thus providing a potentially valuable tool for phylogenetic research.

The remainder of this paper is organized as follows. In Methods we present a probabilistic model of tandem repeats, an algorithm overview and the set of criteria that guide the recognition process. In the Discussion we present our analysis of the frataxin (Friedreich's ataxia) gene sequence, the human [beta] T cell receptor locus and two yeast chromosomes. Finally, in the Conclusion we describe directions for future research.

METHODS

Probabilistic model of tandem repeats

We model alignment of two tandem copies of a pattern of length n by a sequence of n-independent Bernoulli trials (coin tosses). The probability of success, P (heads), which we also call pM or matching probability, represents the average percent identity between the copies. Each head in the Bernoulli sequence is interpreted as a match between aligned nucleotides. Each tail is a mismatch, insertion or deletion. A second probability, pI or indel probability, specifies the average percentage of insertions and deletions between the copies. Figure 1 illustrates the underlying idea for the model.


Figure 1. Two adjacent copies from a tandem repeat in the human [beta] T cell receptor locus sequence (37). H indicates a match, T indicates a mismatch, insertion or deletion.

While Figure 1 is an interpretation of a particular alignment as a Bernoulli sequence, we are more generally interested in the distribution of Bernoulli sequences and the properties of alignments that they represent when dealing with a specific pair (pM, pI), for example (pM = 0.80, pI = 0.10). Note that these conservation parameters serve as a type of extremal bound, i.e. as a quantitative description of the most divergent copies we hope to detect.

Program outline

Our program has detection and analysis components. The detection component uses a set of statistically based criteria to find candidate tandem repeats. The analysis component attempts to produce an alignment for each candidate and if successful gathers a number of statistics about the alignment (percent identity, percent indels) and the nucleotide sequence (composition, entropy measure).

Detection component. We assume that adjacent copies of any pattern will contain some matching characters in corresponding positions. Just how many matches and how the distance between those matches should vary depend on the fixed values of pM and pI. In the next section, we develop the statistical criteria to answer these questions. Here, we describe how the matches are detected.

The algorithm looks for matching nucleotides separated by a common distance d, which is not specified in advance. For reasons of efficiency it looks for runs of k matches, which we call k-tuple matches. A k-tuple is a window of k consecutive characters from the nucleotide sequence. Matching k-tuples are two windows with identical contents and if aligned in the Bernoulli model would produce a run of k heads. Because we limit ourselves to k-tuple matches, we will not detect all matching characters. For example, if k = 6 and two windows contain TCATGT and TCTTGT we will not know that there are 5 matching characters because the window contents are not identical. Put in terms of the Bernoulli model, the aligned windows would be represented by the sequence HHTHHH, which is not a run of 6 heads.

The basic operation of the detection component is illustrated in Figure 2. Let S be a nucleotide sequence. We select a small integer k for the tuple or window size (k = 5 for example) and keep a list of all possible k length strings (there are 4k for the DNA alphabet A,C,G,T) which we call the probes. By sliding the window across the sequence, we determine the probe at each position i in S. For each probe p, we maintain a history list Hp of the positions at which p occurs.


Figure 2. Tandem repeats are detected by scanning the sequence with a small window, determining the distance between exact matches and testing the statistical criteria.

When a position i is added to Hp, we scan Hp for all earlier occurrences of p. Let one earlier occurrence be at j. Since i and j are the indices of matching k-tuples, the distance d = i - j is a possible pattern size for a tandem repeat. For the criteria tests, we need information about other k-tuple matches at the same distance d where the leading tuple occurs in the sequence between j and i. A distance list Dd stores this information. It can be thought of as a sliding window of length d which keeps track of the positions of matches and their total.

List Dd is updated every time a match at distance d is detected. Position i of the match is stored on the list and the total is increased. The right end of the window is set to i and matches that occurred before j = i - d are dropped from the list and subtracted from the total. Lists for other nearby distances are also updated at this time (Random Walk Distribution in the next section), but only to reset their right ends to i and remove matches that have been passed by the advancing windows. Information in the updated distance lists is used for the sum of heads and apparent size criteria tests as described in the next section. If both tests are passed, the program moves on to the analysis component.

Statistical criteria. The statistical criteria are based on runs of heads in Bernoulli sequences, corresponding to matches detected with the k-tuples and stored in the distance lists. The criteria are based on four distributions which depend upon: (i) the pattern length, d; (ii) the matching probability, pM; (iii) the indel probability, pI; (iv) the tuple size, k. For each distribution, we either calculate it with a formula or estimate it using simulation. Then, we select a cut-off value that serves as our criterion. Below we describe the distributions and criteria in more detail.

Sum of heads distribution. This distribution indicates how many matches are required. Let the random variable Rd,k,pM = the total number of heads in head runs of length k or longer in an iid Bernoulli sequence of length d with success probability pM. The distribution of Rd,k,pM is well approximated by the normal distribution and we have previously shown that its exact mean and variance can be calculated in constant time (32). For the sum of heads criterion, we use the normal distribution to determine the largest number, x, such that 95% of the time Rd,k,pM [ge] x. For example, if pM = 0.75, k = 5 and d = 100, then the criterion is 26. Put another way, if a pattern has length 100 and aligned copies are expected to match in 75 positions, then by counting only matches that fill a window of length 5, we expect to count at least 26 matches 95% of the time.

Random walk distribution. This distribution describes how distances between matches may vary due to indels. Because indels change the distance between matching k-tuples (Fig. 3), there will be situations where the pattern has size d, yet the distance between matching k-tuples is d ± 1, d ± 2, etc. In order to test the sum of heads criterion, we count the matches in Dd ± [Delta]d, for [Delta]d = 0, 1, ..., [Delta]dmax for some [Delta]dmax. In our model, indels are single nucleotide events occurring with probability pI. Insertions and deletions are considered equally likely and we treat the distance change as a problem of random walks. Let the random variable Wd,pI = the maximum displacement from the origin of a one-dimensional random walk with expected number of steps equal to pI·d. It can be shown (33) that 95% of the time, Wd,pI ranges between . We set [Delta]dmax = . For example if pI = 0.1 and d = 100, then [Delta]dmax = 7.


Figure 3. Insertions and deletions change the distance between exact matches. The inserted character X causes one pair of matching k-tuples to be separated by distance d + 1 while another pair is separated only by distance d.

Apparent size distribution. This distribution is used to distinguish between tandem repeats and non-tandem direct repeats (Fig. 4). For tandem repeats, the leading tuples in matching k-tuples will be distributed throughout the interval from j to i, whereas for non-tandem repeats, they should be concentrated on the right side of the interval near i. Let the random variable Sd,k,pM = the distance between the first and last run of k heads in an iid Bernoulli sequence of length d with success probability pM. Sd,k,pM is the apparent size of the repeat when using k-tuples to find the matches and will usually be shorter than the pattern size d. We estimate the distribution of Sd,k,pM by simulation because we make it conditional on first meeting the sum of heads criterion. For given d, k and pM, random Bernoulli sequences are generated using pM. For every sequence that meets or exceeds the sum of heads criteria, the distance between the first and last run of heads of length k or larger is recorded. From the distribution, we determine the maximum number y such that 95% of the time Sd,k,pM > y. We use y as our apparent size criterion. For example, if pM = 0.75, k = 5 and d = 100, then the criterion is 56. In order to test the apparent size criterion, we compute the distance between the first and last tuple on list Dd. If the distance between the tuples is smaller than the criterion, we assume the repeat is not tandem or that we have not yet seen enough of it to be convinced.


Figure 4. We must distinguish between (a) a tandem repeat (leading tuples in k-tuple matches spread over the interval between i and j) and (b) a non-tandem, direct repeat (leading tuples concentrated on the right). Matching k-tuples are indicated by the shaded boxes. w is the distance between the first and last leading tuple.
Waiting time distribution. This distribution is used to pick tuple sizes. Tuple size has a significant inverse effect on the running time of the program because increasing tuple size causes an exponential decrease in the expected number of tuple matches. If the nucleotides occur with equal frequency, then increasing the tuple size by [Delta]k increases the average distance between randomly matching tuples by a factor of 4[Delta]k. If k = 5, the average distance between random matches is ~1 kb, but if k = 7, the average distance is ~16 kb. Thus, by using a larger tuple size, we keep the history lists short. On the other hand, increasing the tuple size decreases the chance of noticing approximate copies because they may not contain a long, unbroken run of matches. Let the random variable Tk,pM = the number of iid Bernoulli trials with success probability pM until the first occurrence of a run of k successes. Tk,pM follows the geometric distribution of order k. If we let p = pM and q = 1 - p then the exact probability P(Tk,pM = x) for x [ge] 0 is given by the recursive formula (34)

For example, if pM = 0.75 and k = 5 then we need at least 31 trials (coin tosses) to have a 95% chance of seeing a run of 5 heads. For patterns smaller than 31 characters, we need to use a smaller k-tuple. The waiting time distribution allows us to balance the running time and sensitivity of our algorithm by picking a set of tuple sizes, each applying to a different range of pattern sizes. The program processes the sequence once, simultaneously checking these different tuple sizes. We require that the smallest pattern for tuple size k have a sum of heads criterion of at least k + 1. Table 1 shows the range of tuple sizes and the corresponding pattern sizes currently used by the program.

Analysis component. If the information in the distance list passes the criteria tests, a candidate pattern consisting of positions j + 1 . . . i is selected from the nucleotide sequence and aligned with the surrounding sequence using wraparound dynamic programming (WDP) (35,36). If at least two copies of the pattern are aligned with the sequence, the tandem repeat is reported. Several implementation details of the analysis component are described below.


Table 1. Tuple sizes and the range of pattern sizes each is used to detect

Multiple reporting of repeat at different pattern sizes. When a single tandem repeat contains many copies, several pattern sizes are possible. For example, if the basic pattern size is 26, then the repeat may be reported at sizes 26, 52, 78, etc. We limit this redundancy in the output to, at most, three pattern sizes. Note that we do not automatically limit the output to the smallest period size because a much better alignment may come from a larger size (for example Table 5, indices 410172-410459).

Narrow band alignment. Alignments are the program's most time intensive calculations. To decrease running time, we limit WDP calculations to a narrow diagonal band in the alignment matrix for patterns larger than 20 characters. In accordance with the random walk results, the band radius is [Delta]dmax. The band is periodically recentered around a run of matches in the current best alignment.

Consensus pattern and period size. An initial candidate pattern P is drawn from the sequence, but this is usually not the best pattern to align with the tandem repeat. To improve the alignment, we determine a consensus pattern by majority rule from the alignment of the copies with P. The consensus is used to realign the sequence and this final alignment is reported in the output. Period size is defined as the most common matching distance between corresponding characters in the alignment and may not be identical to consensus size.

Program usage and output

Input to the program consists of a sequence file and the following parameters: (i) alignment weights for match, mismatch and indels; (ii) pM and pI; (iii) a minimum size for patterns to report; (iv) a minimum alignment score to report. We have developed a web based interface for the program. Using an HTML form at c3.biomath.mssm.edu/trf.html , the user provides an input DNA sequence file. Defaults can be used for the remaining parameters. After program execution, two files are returned. The first is a summary table describing the location and statistical properties of the tandem repeats found. The second contains the alignment of each repeat with its consensus sequence. The files are linked so that selecting an entry from the table opens a second browser window which contains the proper alignment. The summary table includes the following information: (i) indices of the repeat in the sequence; (ii) period size; (iii) number of copies aligned with the consensus pattern; (iv) size of the consensus pattern (may differ from the period size); (v) percent of matches between adjacent copies overall; (vi) percent of indels between adjacent copies overall; (vii) alignment score; (viii) percent composition for each of the four nucleotides; (ix) entropy measure based on percent composition.

RESULTS

To demonstrate the capabilities of our program, we used it to analyze four sequences, the human frataxin gene sequence (Friedreich's ataxia) (5), the human [beta] T cell receptor locus sequence (37) and two yeast chromosomes (I and VIII). [The frataxin gene sequence and the human [beta] T cell receptor sequences were obtained from GenBank. The yeast chromosomes sequences were obtained via ftp from ftp.ebi.ac.uk directory pub/databases/yeast in files chri_230209.ascii and chrviii_562638.ascii. Indexing in this paper is relative to the sequences in these files. Data file accession numbers for these sequences are: frataxin gene promoter and intron 1, U43748; human T cell receptor, L36092; yeast chromosome 1, U12980, L20125, L05146, L22015, L28920; yeast chromosome 8, U11583, U11582, U11581, U10555, U10400, U10399, U00062, U00061, U10556, U00060, U00059, U10398, U10397, U00027, U00028, U00030, U00029.] In our analysis, we searched for all pattern sizes between 1 and 500 bases (the implementation's current upper size limit, to be extended in subsequent versions). We used one of two sets of alignment parameters (match, mismatch, gap), either (+2,-7,-7) or (+2,-5,-7). Only those repeats scoring at least 50 with these parameters are reported. Occasionally, the same repeat is reported at different pattern sizes. We have omitted these redundancies.

We performed two searches on each sequence, using different conservation parameter values, (pM = 0.75, pI = 0.20) and(pM = 0.80, pI = 0.10). While the first search is slower than the second, the detected repeats are nearly identical. Table 2 shows running times of the program and Tables 4-4 list the tandem repeats found.


Table 2. Running times of program on selected sequences using a Silicon Graphics O2 RS10000
Time grows linearly with sequence length. With conservation parameter values (pM = 0.75, pI = 0.20) running time is ~10 times slower than with values (pM = 0.80, pI = 0.10) although the detected repeats are nearly identical. Alignment weights also affect running time. The most liberal weights tested increase the times shown here by ~50%.

Human frataxin gene (Friedreich's ataxia), intron 1

Friedreich's ataxia is one of the triplet repeat diseases (5). It is caused by copy number expansion of the triplet GAA in the first intron of the frataxin gene. Table 4 lists the repeats found in the sequence. Besides the triplet repeat, our program found two others which were apparently unknown, a 44 bp pattern and a 14 bp pattern. Figure 5 shows the program's alignment of the 44 bp repeat.


Figure 5. The program's alignment of the 44 bp repeat from the frataxin gene intron 1 (Friedreich's ataxia). This repeat was apparently unknown. The actual sequence is on the top; the consensus sequence is on the bottom. Each pair of lines represents one period. Position of the beginning of the repeat is relative to the detected pattern when the criteria were met and is therefore arbitrary. Symbol * indicates a mismatch. Summary refers to matches, mismatches and indels between adjacent copies in the sequence, not between the sequence and the consensus pattern.


Table 3. Varying copy numbers in the four similar tandem repeat clusters found in yeast chromosomes 1 and 8
See text and Tables 6 and 7 for cluster locations.


Table 4. Tandem repeats detected in the human frataxin gene intron 1 sequence

Human [beta] T cell receptor locus sequence

This sequence (37) contains a family of immune recognition coding elements, the T cell receptor variable, diversity, joining and constant gene segments. It was selected for its size and because many tandem repeats within the sequence had already been identified. Table 5 lists the new repeats we found. Of the 83 repeats that we found, 38 were previously annotated and most of those were for patterns of size 5 or smaller. We missed 6 annotated repeats: 4 dinucleotide repeats and 1 tetranucleotide repeat (alignment scores were below our cut-off) and 1 repeat with period size 10 567 bases (beyond the current implementation's pattern upper size limit). Of the 45 unannotated repeats, 13 have short patterns (2-6 bp) and may be polymorphic and thus useful for linkage analysis. Six unannotated repeats have large pattern sizes (116, 65, 52, 49, 34 and 30 bp). The 116 base pattern is also reported at size 39 with a lower scoring alignment. The annotated 60 base pattern repeat (indices 12596-13266) is indicative of the program's ability to find repeats with substantial amounts of mutation between adjacent copies (74% matching characters and 7% indels overall).


Table 5. Tandem repeats detected in human [beta] T cell receptor locus sequence
Not shown are all mononucleotide repeats and those repeats already annotated in the GenBank entries (accession nos L36092, U66059, U66060 and U66061) except for the 60 bp repeat marked with symbol *. Symbol [dagger] indicates a pattern which is included even though a longer pattern has a better scoring alignment.

Yeast chromosomes

Tables 6 and 6 list the tandem repeats found for the yeast sequences. Of special interest are the clusters of tandem repeats which show up repeatedly at the ends of the chromosomes, suggesting recent swapping of the ends. Chromosome 8, in particular, has two different clusters on its right end.


Table 6. Tandem repeats detected in yeast chromosome 1
Period sizes in bold indicate similar clusters found at the ends of chromosomes 1 and 8. From the top, these are clusters C1, C2 and C3.


Table 7. Tandem repeats detected in yeast chromosome 8 (only the latter half of the sequence is shown)
Period sizes in bold indicate one of four similar clusters found at the ends of chromosomes 1 and 8. Cluster C4 is shown. Period sizes in italics indicate one of three similar clusters found at both ends of chromosome 8 and one end of chromosome 6.

The (27, 21, 48, 15, 135) cluster

The FLO1 gene and its paralogous pseudogenes in chromosomes 1 and 8 contain a cluster of 5 tandem repeats with pattern sizes 27, 21, 48, 15 and 135. We designate these clusters C1 and C2 (adjacent on the left end of chromosome 1), C3 (right end, opposite strand) and C4 (right end of chromosome 8). The 27, 48 and 135 base patterns are not reported in every cluster in Tables 6 and 6. Subsequent analysis of the surrounding sequences, however, revealed that every pattern is present but not necessarily as two or more copies (Table 3). For each pattern size, the number of copies varies among the four clusters. More specifically, no cluster is identical in its copy number to any other cluster. This implies that duplication or excision events (deletion of copies) have occurred since the separate clusters were incorporated into the chromosomes. The sequences around these clusters also reveal close homology. For example, C3 and C4 are nearly identical over 18 000 bases and C2 and C3 display homology over 15 000 bases.

The (13, 10, 36) cluster

A cluster of 3 tandem repeats with pattern sizes 13, 10 and 36 bases appears on both ends of chromosome 8 (Table 7). The 36 bp pattern also appears on the left end (low index numbers) of chromosome 6 (not shown). For the 36 bp pattern, each occurrence has a different copy number. The 10 and 13 bp patterns are identical in their occurrences. Surrounding sequences comprising 4200 bases are nearly identical for these three clusters.

CONCLUSION

In this paper, we have presented a new algorithm for finding tandem repeats in DNA sequences without the need to specify either the pattern or pattern size. The algorithm is based on the detection of k-tuple matches. It uses a probabilisitic model of tandem repeats and a collection of statistical criteria based on that model. We have demonstrated the speed and utility of the algorithm by analyzing four sequences ranging in size up to 700 kb. Several avenues for future research are raised by this work, including methods to estimate statistical significance for tandem repeats and algorithms to determine plausible mutational histories.

Statistical issues

We have yet to develop a good statistical significance measure for tandem repeats. For now, we use a cut-off alignment score based on simulations with random sequences. Difficulties include the local variation in nucleotide content in real sequences, which is decidedly non-random, and the problem of accounting for copy number as well as total repeat length. Estimates of significance developed in Benson and Waterman (27) are too high in this application because they apply to tandem repeats of one pattern size only, rather than the range of sizes considered here.

Mutational history

Analyzing the mutational history of tandem repeats requires utilizing the pattern of mutations among adjacent copies to describe the interwoven progression of substitutions, indels and duplication/excision events leading from a single copy of the pattern to the present day sequence. Such histories can suggest how the boundaries and size of the duplication unit vary and may reveal details about the duplication mechanism.

ACKNOWLEDGEMENTS

The author would like to thank Xiaoping Su for his help in analyzing the sum of heads criterion, Astrid Jervis for her help in simulating many of the statistical measures and examining the program output and Lan Dong for her help with some of the programing and examining the output. Thanks also to Mike Waterman, Richard Arratia and Rolf Backofen for helpful discussions. This work was partially supported by NSF grant CCR-9623532.

REFERENCES

1. Verkerk,A., Pieretti,M., Sutcliffe,J. Fu,Y., Kuhl,D., Pizzuti,A., Reiner,O., Richards,S., Victoria,M., Zhang,F., Eussen,B., van Ommen,G., Blonden,A., Riggins,G., Chastain,J., Kunst,C., Galjaard,H., Caskey,C., Nelson,D., Oostra,B. and Warren,S. (1991) Cell, 65, 905-914. MEDLINE Abstract

2. Huntington's Disease Collaborative Research Group. (1993) Cell, 72, 971-983.

3. Fu,Y.-H., Pizzuti,A., Fenwick,J., King,R.G.Jr. and Rajnarayan,S., Dunne,P.W., Dubel,J., Nasser,G.A., Ashizawa,T., DeJong,P., Wieringa,B. Korneluk,R., Perryman,M.B., Epstein,H.F. and Caskey,C.T. (1992) Science, 255, 1256-1258. MEDLINE Abstract

4. La Spada,A., Wilson,E., Lubahn,D., Harding,A. and Fischbeck,K. (1991) Nature, 352, 77-79.

5. Campuzano,V., Montermini,L., Molto,M.D., Pianese,L. and Cossee,M. (1996) Science, 271, 1423-1427. MEDLINE Abstract

6. Wells,R. (1996) J. Biol. Chem., 271, 2875-2878. MEDLINE Abstract

7. Weitzmann,M., Woodford,K. and Usdin,K. (1997) J. Biol. Chem., 272, 9517-9523. MEDLINE Abstract

8. Hamada,H., Seidman,M., Howard,B. and Gorman,C. (1984) Mol. Cell. Biol., 4, 2622-2630. MEDLINE Abstract

9. Pardue,M., Lowenhaupt,K., Rich,A. and Nordheim,A. (1987) EMBO J., 6, 1781-1789. MEDLINE Abstract

10. Yee,H., Wong,A., van den Sande,J. and Rattner,J. (1991) Nucleic Acids Res., 19, 949-953. MEDLINE Abstract

11. Richards,R., Holman,K., Yu,S. and Southerland,G. (1993) Hum. Mol. Genet., 2, 1429-1435. MEDLINE Abstract

12. Lu,Q., Wallrath,L., Granok,H. and Elgin,S. (1993) Mol. Cell. Biol., 13, 2802-2814. MEDLINE Abstract

13. Du,J., Zhu,Y., Shanmugam,A. and Kenter,A. (1997) Nucleic Acids Res., 25, 3066-3073. MEDLINE Abstract

14. Edwards,A., Hammond,H., Jin,L., Caskey,C. and Chakraborty,R. (1992) Genomics, 12, 241-253. MEDLINE Abstract

15. Weber,J. and May,P. (1989) Am. J. Hum. Genet., 44, 388-396. MEDLINE Abstract

16. Tishkoff,S.A., Dietzsch,E., Speed,W., Pakstis,A.J. and Kidd,J.R. (1996) Science, 271, 1380-1387. MEDLINE Abstract

17. Armour,J., Anttinen,T., May,C., Vega,E., Sajantila,A., Kidd,J., Kidd,K., Bertranpetit,J., Pääbo,S. and Jeffreys,A. (1996) Nature Genet., 13, 154-160. MEDLINE Abstract

18. Benson,G. (1997) J. Comput. Biol., 4, 351-367. MEDLINE Abstract

19. Hellman,L., Steen,M., Sundvall,M. and Pettersson,U. (1988) Gene, 68, 93-100. MEDLINE Abstract

20. Kannan,S.K. and Myers,E.W. (1996) SIAM J. Comput., 25, 648-662.

21. Benson,G. (1995) Theor. Comput. Sci., 145, 357-369.

22. Schmidt,J.P. (1998) SIAM J. Comput., 27, 972-992.

23. Milosavljevic,A. and Jurka,J. (1993) CABIOS, 9, 407-411.

24. Rivals,E., Delgrange,O., Delahaye,J.-P., Dauchet,M., Delorme,M.-O., Hénaut,A. and Ollivier,E. CABIOS, 13, 131-136, 1997.

25. Landau,G. and Schmidt,J. (1993) In Apostolico,A., Crochemore,M., Galil,Z. and Manber,U (eds), Proceedings of the 4th Annual Symposium on Combinatorial Pattern Matching, Lecture Notes in Computer Science,Vol. 648. Springer-Verlag, Berlin, pp. 120-133.

26. Karlin,S., Morris,M., Ghandour,G. and Leung,M.-Y. (1988) Proc. Natl Acad. Sci. USA, 85, 841-845. MEDLINE Abstract

27. Benson,G. and Waterman,M. (1994) Nucleic Acids Res., 22, 4828-4836. MEDLINE Abstract

28. Sagot,M. and Myers,E. (1998) In Istrail,S., Pevzner,P. and Waterman,M. (eds), Proceedings of the Second Annual International Conference on Computational Molecular Biology. ACM Press, NY, pp. 234-242.

29. Benson,G. (1998) In Istrail,S., Pevzner,P. and Waterman,M. (eds), Proceedings of the Second Annual International Conference on Computational Molecular Biology. ACM Press, NY, pp. 20-29.

30. Pearson,W. and Lipman,D. (1988) Proc. Natl Acad. Sci. USA, 85, 2444-2448. MEDLINE Abstract

31. Altschul,S., Gish,W., Miller,W., Myers,E. and Lipman,D. (1990)J. Mol. Biol., 215, 403-410. MEDLINE Abstract

32. , Benson,G. and Su,X. (1998) J. Comput. Biol., 5, 87-100., Benson,G. and Su,X. (1998) J. Comput. Biol., 5, 87-100. MEDLINE Abstract

33. Feller,W. (1968) An Introduction to Probability Theory and its Applications, 3rd Edn, Vol. I. John Wiley & Sons, New York, NY.

34. Aki,S., Kuboki,H. and Hirano,K. (1984) Ann. Inst. Statist. Math., 36, 431-440.

35. Miller,W. and Myers,E. (1989) Bull. Math. Biol., 51, 5-37.

36. Fischetti,V., Landau,G., Schmidt,J. and Sellers,P. (1992) In Apostolico,A., Crochemore,M., Galil,Z. and Manber,U (eds), Proceedings of the 3rd Annual Symposium on Combinatorial Pattern Matching, Lecture Notes in Computer Science, Vol. 644. Springer-Verlag, Berlin, pp. 111-120.

37. Rowan,L., Koop,B. and Hood,L. (1996) Science, 272, 1755-1768.


*Tel: +1 212 241 5777; Fax: +1 212 860 4630; Email: benson@ecology.biomath.mssm.edu


This page is run by Oxford University Press, Great Clarendon Street, Oxford OX2 6DP, as part of the OUP Journals
Comments and feedback: www-admin{at}oup.co.uk
Last modification: 23 Dec 1998
Copyright©Oxford University Press, 1998.

Add to CiteULike CiteULike   Add to Connotea Connotea   Add to Del.icio.us Del.icio.us    What's this?


This article has been cited by other articles:


Home page
BioinformaticsHome page
S. Das, A. Vishnoi, and A. Bhattacharya
ABWGAT: anchor-based whole genome analysis tool
Bioinformatics, December 15, 2009; 25(24): 3319 - 3320.
[Abstract] [Full Text] [PDF]


Home page
Nucleic Acids ResHome page
G. Terai, A. Yoshizawa, H. Okida, K. Asai, and T. Mituyama
Discovery of short pseudogenes derived from messenger RNAs
Nucleic Acids Res., December 3, 2009; (2009) gkp1098v1.
[Abstract] [Full Text] [PDF]


Home page
J. Immunol.Home page
L. D. Chaves, S. B. Krueth, and K. M. Reed
Defining the Turkey MHC: Sequence and Genes of the B Locus
J. Immunol., November 15, 2009; 183(10): 6530 - 6537.
[Abstract] [Full Text] [PDF]


Home page
J. Med. Genet.Home page
U Huffmeier, J Lascorz, T Becker, F Schurmeier-Horst, A Magener, A B Ekici, S Endele, C T Thiel, S Thoma-Uszynski, R Mossner, et al.
Characterisation of psoriasis susceptibility locus 6 (PSORS6) in patients with early onset psoriasis and evidence for interaction with PSORS1
J. Med. Genet., November 1, 2009; 46(11): 736 - 744.
[Abstract] [Full Text] [PDF]


Home page
Mol Biol EvolHome page
T. Kurosaki, T. Matsuura, K. Ohno, and S. Ueda
Alu-Mediated Acquisition of Unstable ATTCT Pentanucleotide Repeats in the Human ATXN10 Gene
Mol. Biol. Evol., November 1, 2009; 26(11): 2573 - 2579.
[Abstract] [Full Text] [PDF]


Home page
Plant Physiol.Home page
N. Gill, S. Findley, J. G. Walling, C. Hans, J. Ma, J. Doyle, G. Stacey, and S. A. Jackson
Molecular and Chromosomal Evidence for Allopolyploidy in Soybean
Plant Physiology, November 1, 2009; 151(3): 1167 - 1174.
[Abstract] [Full Text] [PDF]


Home page
BioinformaticsHome page
J. Jorda and A. V. Kajava
T-REKS: identification of Tandem REpeats in sequences with a K-meanS based algorithm
Bioinformatics, October 15, 2009; 25(20): 2632 - 2638.
[Abstract] [Full Text] [PDF]


Home page
Nucleic Acids ResHome page
X. Xu, K. Tsumagari, J. Sowden, R. Tawil, A. P. Boyle, L. Song, T. S. Furey, G. E. Crawford, and M. Ehrlich
DNaseI hypersensitivity at gene-poor, FSH dystrophy-linked 4q35.2
Nucleic Acids Res., October 9, 2009; (2009) gkp833v1.
[Abstract] [Full Text] [PDF]


Home page
Proc. Natl. Acad. Sci. USAHome page
M. Molla, A. Delcher, S. Sunyaev, C. Cantor, and S. Kasif
Triplet repeat length bias and variation in the human transcriptome
PNAS, October 6, 2009; 106(40): 17095 - 17100.
[Abstract] [Full Text] [PDF]


Home page
Nucleic Acids ResHome page
M. Lin, C. Zhang, K. Gibson, and Y. Rikihisa
Analysis of complete genome sequence of Neorickettsia risticii: causative agent of Potomac horse fever
Nucleic Acids Res., October 1, 2009; 37(18): 6076 - 6091.
[Abstract] [Full Text] [PDF]


Home page
Gen Biol EvolHome page
C. Feschotte, U. Keswani, N. Ranganathan, M. L. Guibotsy, and D. Levine
Exploring Repetitive DNA Landscapes Using REPCLASS, a Tool That Automates the Classification of Transposable Elements in Eukaryotic Genomes
Gen Biol Evol, August 12, 2009; 2009(0): 205 - 220.
[Abstract] [Full Text] [PDF]


Home page
BioinformaticsHome page
V. Becher, A. Deymonnaz, and P. Heiber
Efficient computation of all perfect repeats in genomic sequences of up to half a gigabyte, with a case study on the human genome
Bioinformatics, July 15, 2009; 25(14): 1746 - 1753.
[Abstract] [Full Text] [PDF]


Home page
Appl. Environ. Microbiol.Home page
E. Harth-Chu, R. T. Espejo, R. Christen, C. A. Guzman, and M. G. Hofle
Multiple-Locus Variable-Number Tandem-Repeat Analysis for Clonal Identification of Vibrio parahaemolyticus Isolates by Using Capillary Electrophoresis
Appl. Envir. Microbiol., June 15, 2009; 75(12): 4079 - 4088.
[Abstract] [Full Text] [PDF]


Home page
J. Clin. Microbiol.Home page
M. A. Davis, K. N. K. Baker, D. R. Call, L. D. Warnick, Y. Soyer, M. Wiedmann, Y. Grohn, P. L. McDonough, D. D. Hancock, and T. E. Besser
Multilocus Variable-Number Tandem-Repeat Method for Typing Salmonella enterica Serovar Newport
J. Clin. Microbiol., June 1, 2009; 47(6): 1934 - 1938.
[Abstract] [Full Text] [PDF]


Home page
Acta Biochim Biophys SinHome page
G. Hong, S. Jiang, M. Yu, Y. Yang, F. Li, F. Xue, and Z. Wei
The complete nucleotide sequence of the mitochondrial genome of the cabbage butterfly, Artogeia melete (Lepidoptera: Pieridae)
Acta Biochim Biophys Sin, June 1, 2009; 41(6): 446 - 455.
[Abstract] [Full Text] [PDF]


Home page
Hum Mol GenetHome page
O. El-Maarri, M. S. Kareta, T. Mikeska, T. Becker, A. Diaz-Lacava, J. Junen, N. Nusgen, F. Behne, T. Wienker, A. Waha, et al.
A systematic search for DNA methyltransferase polymorphisms reveals a rare DNMT3L variant associated with subtelomeric hypomethylation
Hum. Mol. Genet., May 15, 2009; 18(10): 1755 - 1768.
[Abstract] [Full Text] [PDF]


Home page
GeneticsHome page
M. V. Cattani and D. C. Presgraves
Genetics and Lineage-Specific Evolution of a Lethal Hybrid Incompatibility Between Drosophila mauritiana and Its Sibling Species
Genetics, April 1, 2009; 181(4): 1545 - 1555.
[Abstract] [Full Text] [PDF]


Home page
J. Virol.Home page
T. F. F. Ng, C. Manire, K. Borrowman, T. Langer, L. Ehrhart, and M. Breitbart
Discovery of a Novel Single-Stranded DNA Virus from a Sea Turtle Fibropapilloma by Using Viral Metagenomics
J. Virol., March 15, 2009; 83(6): 2500 - 2509.
[Abstract] [Full Text] [PDF]


Home page
The Plant GenomeHome page
S. C. Murray, W. L. Rooney, M. T. Hamblin, S. E. Mitchell, and S. Kresovich
Sweet Sorghum Genetic Diversity and Association Mapping for Brix and Height
The Plant Genome, March 1, 2009; 2(1): 48 - 62.
[Abstract] [Full Text] [PDF]


Home page
Eukaryot CellHome page
S. Brisse, C. Pannier, A. Angoulvant, T. de Meeus, L. Diancourt, O. Faure, H. Muller, J. Peman, M. A. Viviani, R. Grillot, et al.
Uneven Distribution of Mating Types among Genotypes of Candida glabrata Isolates from Clinical Samples
Eukaryot. Cell, March 1, 2009; 8(3): 287 - 295.
[Abstract] [Full Text] [PDF]


Home page
Nucleic Acids ResHome page
L.-J. Xue, J.-J. Zhang, and H.-W. Xue
Characterization and expression profiles of miRNAs in rice seeds
Nucleic Acids Res., February 1, 2009; 37(3): 916 - 930.
[Abstract] [Full Text] [PDF]


Home page
Proc. Natl. Acad. Sci. USAHome page
P. C. Fineran, T. R. Blower, I. J. Foulds, D. P. Humphreys, K. S. Lilley, and G. P. C. Salmond
The phage abortive infection system, ToxIN, functions as a protein-RNA toxin-antitoxin pair
PNAS, January 20, 2009; 106(3): 894 - 899.
[Abstract] [Full Text] [PDF]


Home page
Appl. Environ. Microbiol.Home page
M. S. Wollenberg and E. G. Ruby
Population Structure of Vibrio fischeri within the Light Organs of Euprymna scolopes Squid from Two Oahu (Hawaii) Populations
Appl. Envir. Microbiol., January 1, 2009; 75(1): 193 - 202.
[Abstract] [Full Text] [PDF]


Home page
Nucleic Acids ResHome page
N. Axelrod, Y. Lin, P. C. Ng, T. B. Stockwell, J. Crabtree, J. Huang, E. Kirkness, R. L. Strausberg, M. E. Frazier, J. C. Venter, et al.
The HuRef Browser: a web resource for individual human genomics
Nucleic Acids Res., January 1, 2009; 37(suppl_1): D1018 - D1024.
[Abstract] [Full Text] [PDF]


Home page
Microbiol. Mol. Biol. Rev.Home page
G.-F. Richard, A. Kerrest, and B. Dujon
Comparative Genomics and Molecular Dynamics of DNA Repeats in Eukaryotes
Microbiol. Mol. Biol. Rev., December 1, 2008; 72(4): 686 - 727.
[Abstract] [Full Text] [PDF]


Home page
GeneticsHome page
E. Redman, V. Grillo, G. Saunders, E. Packard, F. Jackson, M. Berriman, and J. S. Gilleard
Genetics of Mating and Sex Determination in the Parasitic Nematode Haemonchus contortus
Genetics, December 1, 2008; 180(4): 1877 - 1887.
[Abstract] [Full Text] [PDF]


Home page
Mol Biol EvolHome page
M. Brandstrom, A. T. Bagshaw, N. J. Gemmell, and H. Ellegren
The Relationship Between Microsatellite Polymorphism and Recombination Hot Spots in the Human Genome
Mol. Biol. Evol., December 1, 2008; 25(12): 2579 - 2587.
[Abstract] [Full Text] [PDF]


Home page
jvdiHome page
L. R. Martinez, B. Harris, W. C. Black IV, R. M. Meyer, P. J. Brennan, V. D. Vissa, and R. L. Jones
Genotyping North American animal Mycobacterium bovis isolates using multilocus variable number tandem repeat analysis
J Vet Diagn Invest, November 1, 2008; 20(6): 707 - 715.
[Abstract] [Full Text] [PDF]


Home page
Nucleic Acids ResHome page
S. J. Diskin, M. Li, C. Hou, S. Yang, J. Glessner, H. Hakonarson, M. Bucan, J. M. Maris, and K. Wang
Adjustment of genomic waves in signal intensities from whole-genome SNP genotyping platforms
Nucleic Acids Res., November 1, 2008; 36(19): e126 - e126.
[Abstract] [Full Text] [PDF]


Home page
J. Clin. Microbiol.Home page
P. J. M. Bouvet and M. R. Popoff
Genetic Relatedness of Clostridium difficile Isolates from Various Origins Determined by Triple-Locus Sequence Analysis Based on Toxin Regulatory Genes tcdC, tcdR, and cdtR
J. Clin. Microbiol., November 1, 2008; 46(11): 3703 - 3713.
[Abstract] [Full Text] [PDF]


Home page
J HeredHome page
N. Karaiskou, L. Buggiotti, E. Leder, and C. R. Primmer
High Degree of Transferability of 86 Newly Developed Zebra Finch EST-Linked Microsatellite Markers in 8 Bird Species
J. Hered., November 1, 2008; 99(6): 688 - 693.
[Abstract] [Full Text] [PDF]


Home page
Genome ResHome page
D.-Q. Nguyen, C. Webber, J. Hehir-Kwa, R. Pfundt, J. Veltman, and C. P. Ponting
Reduced purifying selection prevails over positive selection in human copy number variant evolution
Genome Res., November 1, 2008; 18(11): 1711 - 1723.
[Abstract] [Full Text] [PDF]


Home page
Genome ResHome page
M. Gajecka, A. J. Gentles, A. Tsai, D. Chitayat, K. L. Mackay, C. D. Glotzbach, M. R. Lieber, and L. G. Shaffer
Unexpected complexity at breakpoint junctions in phenotypically normal individuals and mechanisms involved in generating balanced translocations t(1;22)(p36;q13)
Genome Res., November 1, 2008; 18(11): 1733 - 1742.
[Abstract] [Full Text] [PDF]


Home page
GeneticsHome page
M. Teschke, O. Mukabayire, T. Wiehe, and D. Tautz
Identification of Selective Sweeps in Closely Related Populations of the House Mouse Based on Microsatellite Scans
Genetics, November 1, 2008; 180(3): 1537 - 1545.
[Abstract] [Full Text] [PDF]


Home page
Brief BioinformHome page
A. Merkel and N. Gemmell
Detecting short tandem repeats from genome data: opening the software black box
Brief Bioinform, September 1, 2008; 9(5): 355 - 366.
[Abstract] [Full Text] [PDF]


Home page
Infect. Immun.Home page
Y. Goto, D. Carter, and S. G. Reed
Immunological Dominance of Trypanosoma cruzi Tandem Repeat Proteins
Infect. Immun., September 1, 2008; 76(9): 3967 - 3974.
[Abstract] [Full Text] [PDF]


Home page
Genome ResHome page
Q. Zhou, G. Zhang, Y. Zhang, S. Xu, R. Zhao, Z. Zhan, X. Li, Y. Ding, S. Yang, and W. Wang
On the origin of new genes in Drosophila
Genome Res., September 1, 2008; 18(9): 1446 - 1455.
[Abstract] [Full Text] [PDF]


Home page
GeneticsHome page
Y. Wang, A. Diehl, F. Wu, J. Vrebalov, J. Giovannoni, A. Siepel, and S. D. Tanksley
Sequencing and Comparative Analysis of a Conserved Syntenic Segment in the Solanaceae
Genetics, September 1, 2008; 180(1): 391 - 408.
[Abstract] [Full Text] [PDF]


Home page
Plant CellHome page
S. Tiwari, R. Schulz, Y. Ikeda, L. Dytham, J. Bravo, L. Mathers, M. Spielman, P. Guzman, R. J. Oakey, T. Kinoshita, et al.
MATERNALLY EXPRESSED PAB C-TERMINAL, a Novel Imprinted Gene in Arabidopsis, Encodes the Conserved C-Terminal Domain of Polyadenylate Binding Proteins
PLANT CELL, September 1, 2008; 20(9): 2387 - 2398.
[Abstract] [Full Text] [PDF]


Home page
Nucleic Acids ResHome page
T. J. Pugh, A. D. Delaney, N. Farnoud, S. Flibotte, M. Griffith, H. I. Li, H. Qian, P. Farinha, R. D. Gascoyne, and M. A. Marra
Impact of whole genome amplification on analysis of copy number variants
Nucleic Acids Res., August 1, 2008; 36(13): e80 - e80.
[Abstract] [Full Text] [PDF]


Home page
BioinformaticsHome page
A.-L. Abraham, E. P. C. Rocha, and J. Pothier
Swelfe: a detector of internal repeats in sequences and structures
Bioinformatics, July 1, 2008; 24(13): 1536 - 1537.
[Abstract] [Full Text] [PDF]


Home page
GeneticsHome page
D. Ames, N. Murphy, T. Helentjaris, N. Sun, and V. Chandler
Comparative Analyses of Human Single- and Multilocus Tandem Repeats
Genetics, July 1, 2008; 179(3): 1693 - 1704.
[Abstract] [Full Text] [PDF]


Home page
J. Virol.Home page
A. M. M. Abd-Alla, F. Cousserans, A. G. Parker, J. A. Jehle, N. J. Parker, J. M. Vlak, A. S. Robinson, and M. Bergoin
Genome Analysis of a Glossina pallidipes Salivary Gland Hypertrophy Virus Reveals a Novel, Large, Double-Stranded Circular DNA Virus
J. Virol., May 1, 2008; 82(9): 4595 - 4611.
[Abstract] [Full Text] [PDF]


Home page
Crop Sci.Home page
M. A. Mallory, R. V. Hall, A. R. McNabb, D. B. Pratt, E. N. Jellen, and P. J. Maughan
Development and Characterization of Microsatellite Markers for the Grain Amaranths
Crop Sci., May 1, 2008; 48(3): 1098 - 1106.
[Abstract] [Full Text] [PDF]


Home page
J. Clin. Endocrinol. Metab.Home page
K. E. Heath, J. Argente, V. Barrios, J. Pozo, F. Diaz-Gonzalez, G. A. Martos-Moreno, M. Caimari, R. Gracia, and A. Campos-Barros
Primary Acid-Labile Subunit Deficiency due to Recessive IGFALS Mutations Results in Postnatal Growth Deficit Associated with Low Circulating Insulin Growth Factor (IGF)-I, IGF Binding Protein-3 Levels, and Hyperinsulinemia
J. Clin. Endocrinol. Metab., May 1, 2008; 93(5): 1616 - 1624.
[Abstract] [Full Text] [PDF]


Home page
Mol Biol EvolHome page
N. Warthmann, S. Das, C. Lanz, and D. Weigel
Comparative Analysis of the MIR319a MicroRNA Locus in Arabidopsis and Related Brassicaceae
Mol. Biol. Evol., May 1, 2008; 25(5): 892 - 902.
[Abstract] [Full Text] [PDF]


Home page
J. Clin. Microbiol.Home page
K. E. V. Sperry, S. Kathariou, J. S. Edwards, and L. A. Wolf
Multiple-Locus Variable-Number Tandem-Repeat Analysis as a Tool for Subtyping Listeria monocytogenes Strains
J. Clin. Microbiol., April 1, 2008; 46(4): 1435 - 1450.
[Abstract] [Full Text] [PDF]


Home page
Genome ResHome page
D. S. Johnson, W. Li, D. B. Gordon, A. Bhattacharjee, B. Curry, J. Ghosh, L. Brizuela, J. S. Carroll, M. Brown, P. Flicek, et al.
Systematic evaluation of variability in ChIP-chip experiments using predefined DNA targets
Genome Res., March 1, 2008; 18(3): 393 - 403.
[Abstract] [Full Text] [PDF]


Home page
Bioscience HorizonsHome page
J. J. van Aartsen
The Klebsiella pheV tRNA locus: a hotspot for integration of alien genomic islands
Bioscience Horizons, March 1, 2008; 1(1): 51 - 60.
[Abstract] [Full Text] [PDF]


Home page
Mol Biol EvolHome page
S.-M. Chaw, A. Chun-Chieh Shih, D. Wang, Y.-W. Wu, S.-M. Liu, and T.-Y. Chou
The Mitochondrial Genome of the Gymnosperm Cycas taitungensis Contains a Novel Family of Short Interspersed Elements, Bpu Sequences, and Abundant RNA Editing Sites
Mol. Biol. Evol., March 1, 2008; 25(3): 603 - 615.
[Abstract] [Full Text] [PDF]


Home page
Appl. Environ. Microbiol.Home page
N. R. Mullane, M. Ryan, C. Iversen, M. Murphy, P. O'Gaora, T. Quinn, P. Whyte, P. G. Wall, and S. Fanning
Development of Multiple-Locus Variable-Number Tandem-Repeat Analysis for the Molecular Subtyping of Enterobacter sakazakii
Appl. Envir. Microbiol., February 15, 2008; 74(4): 1223 - 1231.
[Abstract] [Full Text] [PDF]


Home page
BioinformaticsHome page
X. Li, T. Kahveci, and A. M. Settles
A novel genome-scale repeat finder geared towards transposons
Bioinformatics, February 15, 2008; 24(4): 468 - 476.
[Abstract] [Full Text] [PDF]


Home page
Appl. Environ. Microbiol.Home page
T. E. Macdonald, C. H. Helma, L. O. Ticknor, P. J. Jackson, R. T. Okinaka, L. A. Smith, T. J. Smith, and K. K. Hill
Differentiation of Clostridium botulinum Serotype A Strains by Multiple-Locus Variable-Number Tandem-Repeat Analysis
Appl. Envir. Microbiol., February 1, 2008; 74(3): 875 - 882.
[Abstract] [Full Text] [PDF]


Home page
J Med MicrobiolHome page
F. Kawamori, M. Hiroi, T. Harada, K. Ohata, K. Sugiyama, T. Masuda, and N. Ohashi
Molecular typing of Japanese Escherichia coli O157 : H7 isolates from clinical specimens by multilocus variable-number tandem repeat analysis and PFGE
J. Med. Microbiol., January 1, 2008; 57(1): 58 - 63.
[Abstract] [Full Text] [PDF]


Home page
Genome ResHome page
Y. D. Kelkar, S. Tyekucheva, F. Chiaromonte, and K. D. Makova
The genome-wide determinants of human and chimpanzee microsatellite evolution
Genome Res., January 1, 2008; 18(1): 30 - 38.
[Abstract] [Full Text] [PDF]


Home page
Mol Biol EvolHome page
B. Tawari, I. K. M. Ali, C. Scott, M. A. Quail, M. Berriman, N. Hall, and C. G. Clark
Patterns of Evolution in the Unique tRNA Gene Arrays of the Genus Entamoeba
Mol. Biol. Evol., January 1, 2008; 25(1): 187 - 198.
[Abstract] [Full Text] [PDF]


Home page
CSH ProtocolsHome page
M. Legendre and K. J. Verstrepen
Using the SERV Applet to Detect Tandem Repeats in DNA Sequences and to Predict Their Variability
CSH Protocols, January 1, 2008; 2008(2): pdb.ip50 - pdb.ip50.
[Abstract] [Full Text]


Home page
Proc. Natl. Acad. Sci. USAHome page
J. P. McCutcheon and N. A. Moran
Parallel genomic evolution and metabolic interdependence in an ancient symbiosis
PNAS, December 4, 2007; 104(49): 19392 - 19397.
[Abstract] [Full Text] [PDF]


Home page
Genome ResHome page
D. E. Stage and T. H. Eickbush
Sequence variation within the rRNA gene loci of 12 Drosophila species
Genome Res., December 1, 2007; 17(12): 1888 - 1897.
[Abstract] [Full Text] [PDF]


Home page
Genome ResHome page
M. Legendre, N. Pochet, T. Pak, and K. J. Verstrepen
Sequence-based estimation of minisatellite and microsatellite repeat variability
Genome Res., December 1, 2007; 17(12): 1787 - 1796.
[Abstract] [Full Text] [PDF]


Home page
Mol Biol EvolHome page
M. A. Huntley and A. G. Clark
Evolutionary Analysis of Amino Acid Repeats across the Genomes of 12 Drosophila Species
Mol. Biol. Evol., December 1, 2007; 24(12): 2598 - 2609.
[Abstract] [Full Text] [PDF]


Home page
Brief BioinformHome page
C. M. Bergman and H. Quesneville
Discovering and detecting transposable elements in genome sequences
Brief Bioinform, November 1, 2007; 8(6): 382 - 392.
[Abstract] [Full Text] [PDF]


Home page
J. Clin. Microbiol.Home page
S.-Y. Liang, H. Watanabe, J. Terajima, C.-C. Li, J.-C. Liao, S. K. Tung, and C.-S. Chiou
Multilocus Variable-Number Tandem-Repeat Analysis for Molecular Typing of Shigella sonnei
J. Clin. Microbiol., November 1, 2007; 45(11): 3574 - 3580.
[Abstract] [Full Text] [PDF]


Home page
Plant CellHome page
J. K. Hane, R. G.T. Lowe, P. S. Solomon, K.-C. Tan, C. L. Schoch, J. W. Spatafora, P. W. Crous, C. Kodira, B. W. Birren, J. E. Galagan, et al.
Dothideomycete Plant Interactions Illuminated by Genome Sequencing and EST Analysis of the Wheat Pathogen Stagonospora nodorum
PLANT CELL, November 1, 2007; 19(11): 3347 - 3368.
[Abstract] [Full Text] [PDF]


Home page
Appl. Environ. Microbiol.Home page
L. Diancourt, V. Passet, C. Chervaux, P. Garault, T. Smokvina, and S. Brisse
Multilocus Sequence Typing of Lactobacillus casei Reveals a Clonal Population Structure with Low Levels of Homologous Recombination
Appl. Envir. Microbiol., October 15, 2007; 73(20): 6601 - 6611.
[Abstract] [Full Text] [PDF]


Home page
Genome ResHome page
B. E. Madsen, P. Villesen, and C. Wiuf
A periodic pattern of SNPs in the human genome
Genome Res., October 1, 2007; 17(10): 1414 - 1419.
[Abstract] [Full Text] [PDF]


Home page
Genome ResHome page
A.-S. Fiston-Lavier, D. Anxolabehere, and H. Quesneville
A model of segmental duplication formation in Drosophila melanogaster
Genome Res., October 1, 2007; 17(10): 1458 - 1470.
[Abstract] [Full Text] [PDF]


Home page
GeneticsHome page
J. O. Allen, C. M. Fauron, P. Minx, L. Roark, S. Oddiraju, G. N. Lin, L. Meyer, H. Sun, K. Kim, C. Wang, et al.
Comparisons Among Two Fertile and Three Male-Sterile Mitochondrial Genomes of Maize
Genetics, October 1, 2007; 177(2): 1173 - 1192.
[Abstract] [Full Text] [PDF]


Home page
GeneticsHome page
D. P. Reid, C.-A. Smith, M. Rommens, B. Blanchard, D. Martin-Robichaud, and M. Reith
A Genetic Linkage Map of Atlantic Halibut (Hippoglossus hippoglossus L.)
Genetics, October 1, 2007; 177(2): 1193 - 1205.
[Abstract] [Full Text] [PDF]


Home page
BrainHome page
V. Bogaerts, S. Engelborghs, S. Kumar-Singh, D. Goossens, B. Pickut, J. van der Zee, K. Sleegers, K. Peeters, J.-J. Martin, J. Del-Favero, et al.
A novel locus for dementia with Lewy bodies: a clinically and genetically heterogeneous disorder
Brain, September 1, 2007; 130(9): 2277 - 2291.
[Abstract] [Full Text] [PDF]


Home page
J HeredHome page
J. Laidlaw, Y. Gelfand, K.-W. Ng, H. R. Garner, R. Ranganathan, G. Benson, and J. W. Fondon III
Elevated Basal Slippage Mutation Rates among the Canidae
J. Hered., August 3, 2007; (2007) esm017v2.
[Abstract] [Full Text] [PDF]


Home page
J. Clin. Microbiol.Home page
R. Gierczynski, A. Golubov, H. Neubauer, J. N. Pham, and A. Rakin
Development of Multiple-Locus Variable-Number Tandem-Repeat Analysis for Yersinia enterocolitica subsp. palearctica and Its Application to Bioserogroup 4/O3 Subtyping
J. Clin. Microbiol., August 1, 2007; 45(8): 2508 - 2515.
[Abstract] [Full Text] [PDF]


Home page
Nucleic Acids ResHome page
I. Grissa, G. Vergnaud, and C. Pourcel
CRISPRFinder: a web tool to identify clustered regularly interspaced short palindromic repeats
Nucleic Acids Res., July 13, 2007; 35(suppl_2): W52 - W57.
[Abstract] [Full Text] [PDF]


Home page
BioinformaticsHome page
X. Cui, T. Vinar, B. Brejova, D. Shasha, and M. Li
Homology search for genes
Bioinformatics, July 1, 2007; 23(13): i97 - i103.
[Abstract] [Full Text] [PDF]


Home page
BioinformaticsHome page
R. Kofler, C. Schlotterer, and T. Lelley
SciRoKo: a new tool for whole genome microsatellite search and investigation
Bioinformatics, July 1, 2007; 23(13): 1683 - 1685.
[Abstract] [Full Text] [PDF]


Home page
GeneticsHome page
M. Brandstrom and H. Ellegren
The Genomic Landscape of Short Insertion and Deletion Polymorphisms in the Chicken (Gallus gallus) Genome: A High Frequency of Deletions in Tandem Duplicates
Genetics, July 1, 2007; 176(3): 1691 - 1701.
[Abstract] [Full Text] [PDF]


Home page
Nucleic Acids ResHome page
A. Vishnoi, R. Roy, and A. Bhattacharya
Comparative analysis of bacterial genomes: identification of divergent regions in mycobacterial strains using an anchor-based approach
Nucleic Acids Res., June 28, 2007; 35(11): 3654 - 3667.
[Abstract] [Full Text] [PDF]


Home page
ScienceHome page
C. D. Smith, S. Shu, C. J. Mungall, and G. H. Karpen
The Release 5.1 Annotation of Drosophila melanogaster Heterochromatin
Science, June 15, 2007; 316(5831): 1586 - 1591.
[Abstract] [Full Text] [PDF]


Home page
J. Clin. Microbiol.Home page
K. J. Bown, X. Lambin, N. H. Ogden, M. Petrovec, S. E. Shaw, Z. Woldehiwet, and R. J. Birtles
High-Resolution Genetic Fingerprinting of European Strains of Anaplasma phagocytophilum by Use of Multilocus Variable-Number Tandem-Repeat Analysis
J. Clin. Microbiol., June 1, 2007; 45(6): 1771 - 1776.
[Abstract] [Full Text] [PDF]


Home page
Appl. Environ. Microbiol.Home page
L. Mouton, G. Nong, J. F. Preston, and D. Ebert
Variable-Number Tandem Repeats as Molecular Markers for Biotypes of Pasteuria ramosa in Daphnia spp.
Appl. Envir. Microbiol., June 1, 2007; 73(11): 3715 - 3718.
[Abstract] [Full Text] [PDF]


Home page
DiabetesHome page
R. H. Wallis, K. Wang, D. Dabrowski, L. Marandi, T. Ning, E. Hsieh, A. D. Paterson, J. P. Mordes, E. P. Blankenhorn, and P. Poussier
A Novel Susceptibility Locus on Rat Chromosome 8 Affects Spontaneous but Not Experimentally Induced Type 1 Diabetes
Diabetes, June 1, 2007; 56(6): 1731 - 1736.
[Abstract] [Full Text] [PDF]


Home page
GeneticsHome page
L. N. Kwong, A. Shedlovsky, B. S. Biehl, L. Clipson, C. A. Pasch, and W. F. Dove
Identification of Mom7, a Novel Modifier of ApcMin/+ on Mouse Chromosome 18
Genetics, June 1, 2007; 176(2): 1237 - 1244.
[Abstract] [Full Text] [PDF]


Home page
BioinformaticsHome page
S. B. Mudunuri and H. A. Nagarajaram
IMEx: Imperfect Microsatellite Extractor
Bioinformatics, May 15, 2007; 23(10): 1181 - 1187.
[Abstract] [Full Text] [PDF]


Home page
J. Med. Genet.Home page
K. Debacker, B. Winnepenninckx, N. Ben-Porat, D. FitzPatrick, R. Van Luijk, S. Scheers, B. Kerem, and R Frank Kooy
FRA18C: a new aphidicolin-inducible fragile site on chromosome 18q22, possibly associated with in vivo chromosome breakage
J. Med. Genet., May 1, 2007; 44(5): 347 - 352.
[Abstract] [Full Text] [PDF]


This Article
Right arrow Abstract Freely available
Right arrow Print PDF (203K) Freely available
Right arrow Alert me when this article is cited
Right arrow Alert me if a correction is posted
Services
Right arrow Email this article to a friend
Right arrow Similar articles in this journal
Right arrow Similar articles in ISI Web of Science
Right arrow Similar articles in PubMed
Right arrow Alert me to new issues of the journal
Right arrow Add to My Personal Archive
Right arrow Download to citation manager
Right arrow Search for citing articles in:
ISI Web of Science (839)
Right arrowRequest Permissions
Right arrow Commercial Re-use Guidelines
for Open Access NAR Content
Google Scholar
Right arrow Articles by Benson, G.
Right arrow Search for Related Content
PubMed
Right arrow PubMed Citation
Right arrow Articles by Benson, G.
Social Bookmarking
 Add to CiteULike   Add to Connotea   Add to Del.icio.us  
What's this?