Nucleic Acids Research Advance Access originally published online on December 14, 2006
Nucleic Acids Research 2007 35(Database issue):D80-D87; doi:10.1093/nar/gkl1013
| ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
Nucleic Acids Research, 2007, Vol. 35, Database issue D80-D87
© 2006 The Author(s)
This is an Open Access article distributed under the terms of the Creative Commons Attribution Non-Commercial License (http://creativecommons.org/licenses/by-nc/2.0/uk/) which permits unrestricted non-commercial use, distribution, and reproduction in any medium, provided the original work is properly cited.
Articles |
TRDBThe Tandem Repeats Database
1 Lab for Biocomputing and Informatics, Boston University Boston, MA 02215, USA 2 Department of Computer Science, Boston University Boston, MA 02215, USA 3 Department of Biology, Boston University Boston, MA 02215, USA 4 Department of Neuroscience, Mount Sinai School of Medicine New York, NY 10029, USA
*To whom correspondence should be addressed. Tel: +1 617 358 2965; Fax:+1 617 353 4814; Email: gbenson{at}bu.edu
Received August 15, 2006. Revised October 30, 2006. Accepted October 31, 2006.
| ABSTRACT |
|---|
|
|
|---|
Tandem repeats in DNA have been under intensive study for many years, first, as a consequence of their usefulness as genomic markers and DNA fingerprints and more recently as their role in human disease and regulatory processes has become apparent. The Tandem Repeats Database (TRDB) is a public repository of information on tandem repeats in genomic DNA. It contains a variety of tools for repeat analysis, including the Tandem Repeats Finder program, query and filtering capabilities, repeat clustering, polymorphism prediction, PCR primer selection, data visualization and data download in a variety of formats. In addition, TRDB serves as a centralized research workbench. It provides user storage space and permits collaborators to privately share their data and analysis. TRDB is available at https://tandem.bu.edu/cgi-bin/trdb/trdb.exe.
| INTRODUCTION |
|---|
|
|
|---|
Our understanding of the role of tandem repeats in DNA has grown significantly over the past 40 years. The discovery of satellite DNA in 1961 (1) prompted research into the properties of repetitive DNA and this eventually led to an understanding of the wide range of sizes and genomic locations of tandem repeats. One class, the microsatellites, was recognized early on as useful genomic markers and today they form the basis of DNA fingerprints in forensics. Even in the face of strong competition from the more numerous single nucleotide polymorphisms (SNPs), polymorphic tandem repeats including microsatellites, and also the longer patterned minisatellites or VNTRs (variable number of tandem repeats) remain as important tools in genetic testing and linkage analysis because, unlike SNPs, they frequently exhibit more than two high frequency copy-number-variant alleles and thus can have high heterozygosity rates.
Starting 15 or so years ago, it became widely recognized that tandem repeats are causally associated with human disease. Perhaps the most well-known disease-associated repeats are the trinucloetide tandem repeats which cause severe neurological syndromes including those associated with polyglutamine (CAG)n expansion, such as Spinobulbar muscular atrophy (2); Huntington's disease (3); and Spinocerebellar ataxias types 1, 2, 3, 6 and 7; and those associated with expansion in non-coding regions, such as Fragile X mental retardation (4); Friedreich's ataxia (5); Myotonic dystrophy (6); and Spinocerebellar ataxias types 8 and 12 (7,8).
Other, more common, affective disorders and addictive behaviors have been associated with longer unit tandem repeats. For example, variations in a 40 bp VNTR at the 3' end of the dopamine transporter gene (DAT1) (9) have been linked to attention deficit hyperactivity disorder (ADHD) (10), medication response to that disorder in children (11), and response to amphetamine in adults (12). A 30 bp VNTR in intron 8 of the same gene has been linked to cocaine dependence (13). In the serotonin transporter gene (5-HTT), variations in a 1617 bp VNTR in intron 2 have been associated with bipolar disorder. The transcription factors, YB-1 and CTCF, have been shown to interact with the VNTR and modulate differences in gene expression in different copy number variants (14). Numerous studies have linked common polymorphisms in a 2023 bp VNTR in the promoter of the same gene with various affective disorders, including autism (15), and response to medication for depression (16). A common, non-neurological disease associated with tandem repeat polymorphism is type 1 diabetes which is linked to allelic variation in a 1415 bp VNTR at the IDDM2 locus situated
600 bp 5' to the insulin gene (17,18).
Some or all of the effects of intronic and non-coding polymorphic tandem repeats are presumably mediated by changes in cis-regulation of gene expression. A non-human example occurs in maize where a large tandem repeat is required for paramutational suppression of the b1 transcription factor gene which affects plant pigmentation. The paramutagenic region, 100 kb upstream of the gene, contains seven tandem copies of an 853 bp motif, while alleles with fewer copies have decreased or no paramutational effect (19). The mechanism, which involves differential cytosine methylation within the repeat region, requires an RNA-dependent RNA polymerase (20) and thus may be related to RNA interference through repeat mediated formation of double stranded RNA.
Due to a variety of mechanisms that affect their stability, including slippage replication and unequal crossing over, tandem repeats can exhibit high-mutation rates and this property may yield plasticity in a species. In dogs, variations in copy number for trinucloetide tandem repeats found in the coding regions of developmental genes have been quantitatively associated with morphological variations in the foot and skull among different domestic breeds (21). The implication is that plasticity in these repeats has enabled the selective breeding of dogs to achieve widely divergent morphologies.
The foregoing examples of known functional roles for tandem repeats are suggestive of future discoveries. They also highlight the need for readily available computational resources to study repeats. The growing interest in tandem repeats in the late 1990s led one of us (Benson) to develop the Tandem Repeats Finder program in 1999 (22), one of several now used to rapidly identify approximate tandem repeats in genomic DNA. Despite that program's usefulness and heavy usage (100 citations in 2005), what has been lacking is a more comprehensive computational resource. The Tandem Repeats Database (TRDB), described here, has been designed to fill that void. It consists of two parts: the first is a web accessible, public repository of information on the presence and characteristics of tandem repeats in a variety of genomes; the second is a research workbench which (hopefully) will serve as a model for future biological database development.
Currently, the public database contains 22 genomes, including six land vertebrates (human, chimpanzee, mouse, rat, dog, chicken), three fish (Fugu, Tetraodon, zebrafish), seven insects (five Drosophila species, honeybee, mosquito), two roundworms (Caenorhabditis species), two plants (Arabidopsis, rice), Saccharomyces cerevisiae and Escherichia coli (see also Table 1). In addition, archival copies of some of these genomes are maintained. Other species are being added as they become available and as interest warrants. A variety of tools, built into TRDB, simplify the study of repeats. These include query and filtering capabilities for finding particular repeats of interest, repeat clustering algorithms based on sequence similarity, polymorphism prediction based on common patterns of mutation, an interface for PCR primer selection using the Primer3 software (23), and data download in a variety of formats. Along with the tools, TRDB provides data visualization features including dynamically generated histograms and scatterplots of repeat characteristics, a browser for visualizing repeats in the context of other sequence features, and alignment views which accentuate both patterns of mutations and sequence similarity among repeats.
|
The major design feature of the workbench is the user workspace which is a centralized storage space for user data and the results of analysis. The workspace permits users to collect public information, upload and analyze their own sequences, add sequence annotations, and store the results of analysis in projects and reports so that work may extend over multiple sessions. All the tools provided for the public data are available for use with private data as well. Most features of TRDB are available for anonymous use, and data stored anonymously in the workspace is generally available for a limited time (currently 7 days). Users have the option of registering with TRDB which gives access to several tools that require high-computational resources, such as repeat clustering and polymorphism prediction, and eliminates the time limit for data stored in the workspace. In addition, for registered users, TRDB facilitates sharing and exchange of information through a collaboration protocol. Collaborators may be added simply by supplying their user names in the system (email addresses) and can then share data and independently work on and view joint projects. Collaboration as implemented in TRDB eliminates the need for back-and-forth data transfer between colleagues and permits simultaneous multi-party viewing and analysis.
| DATA STORED IN TRDB |
|---|
|
|
|---|
Repeats
The primary data stored in TRDB are tandem repeats as detected by the Tandem Repeats Finder (TRF) program (22). TRDB currently uses TRF version 4.0. Repeats stored in the Public Database are detected with default TRF parameter values. For repeat detection in user supplied sequences, other parameter settings are available (see Supplementary Data). Tandem repeats are organized into groups called sets. For most public genomes, TRDB maintains one set per chromosome and one set for mitochondrial repeats. All the repeats for a genome are additionally combined into a single set. Some genomes are incomplete and for these TRDB stores only what is currently available. Table 1 gives the total number of repeats stored for each of the public genomes in TRDB.
Sets of repeats are presented to the user in a table format. For each repeat, various descriptive characteristics are displayed. These are conceptually grouped into four categories described below. Figure 1 shows two partial tables from human chromosome I. The first contains characteristics primarily determined by TRF analysis and the second contains characteristics primarily determined by additional processing within TRDB. Users may select any combination of characteristics to view in a table. A complete description of all characteristics is given in the Supplementary Data.
|
- Sequence characteristics are based on the tandem array and the consensus pattern. The array is the entire sequence of the repeat. The consensus is estimated by TRDB to be the best pattern to align to the tandem array. The consensus pattern is not displayed in the repeats table, but may be obtained through data download.
- Annotation characteristics are obtained from annotation data which can be uploaded to TRDB. The characteristics table contains an indicator (yes or no) for each feature class (e.g. genes) indicating whether the repeat overlaps a member of the class. For those that do overlap, a hyperlink points to a description of the feature and a link to the external source database. For those repeats that do not overlap a member of the feature class, hyperlinks point to descriptions of the nearest features upstream and downstream and these descriptions include the distance in nucleotides from the repeat to the feature. This distance may be used in filtering, permitting queries that can, e.g. find all repeats within 10 000 nt of any gene.
- Tool generated characteristics are obtained from analysis by TRDB tools.
- Identifier characteristics help identify the source of the repeat and are useful when repeats from different sources are mixed in a single set.
User data
Three components make up the persistent data stored by a user: sequences, projects and reports. For TRF/TRDB analysis, a sequence must first be uploaded to the user workspace, either (i) as a FASTA file, (ii) by entering a GenBank accession number (for direct upload from GenBank), or (iii) by cutting and pasting. Multiple sequences in a single FASTA file are permitted, as are sequences with masked characters (Ns, upper case, lower case) or ambiguous characters (R, Y, etc.). Once stored, the following operations can be performed on a sequence:
- TRF processing. Repeats detected in the sequence are stored as a new set in a user project.
- Annotations. Locations of other features within a sequence may be uploaded as a file in General Feature Format (GFF), or by cutting and pasting. Annotated features can be used to filter a set of repeats by proximity to the features (see Filtering, Sorting and Merging) and their locations can be visualized in the browser tool (see Data visualization).
- Sequence download. The sequence or any single contiguous part of the sequence (specified by the starting and ending positions) may be retrieved as a FASTA format file. Repeats detected within the sequence can be masked (as Ns, upper case or lower case).
- PCR primer selection. Flanking sequence bordering any set of repeats may be retrieved for upload into primer selection software. Additionally an interface to the Primer3 software (23) is built directly into TRDB.
Every set of tandem repeats, whether detected in a user supplied sequence or selected and saved from the public data, is stored in a user project which forms the core for ownership and data sharing. TRDB produces a variety of visual and tabular data and any of these may be stored as static images in a report and supplemented with descriptive text. As with projects, reports are owned and can be shared with collaborators.
| FILTERING, SORTING AND MERGING |
|---|
|
|
|---|
A repeat set derived from a chromosome or other large sequence will typically contain thousands of repeats. By default, they are presented in order of occurrence along the sequence but may be sorted on any single characteristic in either ascending or descending order. To further tailor a set to the specifics of the research problem, TRDB provides filtering capabilities based on repeat characteristics. Using drop down menus and a text box, the user creates a collection of filter conditions and applies them to the set. Those repeats that meet all the conditions pass through the filter and can be saved as a new set. A distinctive property of TRDB is its ability to filter by proximity to annotated sequence features. This is accomplished by selecting a class of annotation features and requiring that the repeats either overlap one of the features or occur nearby, where nearby is expressed as a user-selected nucleotide distance upstream, downstream or in either direction (e.g. gene upstream within 10 000 nt). Repeats can also be selected manually for inclusion or exclusion in combination with other filters by checking or unchecking repeat label boxes. Figure 1 (lower panel) shows the expressions for a filter that finds short period repeats that could cause frameshift mutations: they are located in exons, have high percent matching which is typical of microsatellites that undergo replication slippage, and their unit sizes are not multiples of 3. The four repeats unchecked in the middle of Figure 1 contain at least 14 exact copies in a row (as determined by visual inspection of their alignments).
A new set of repeats can be produced by merging existing sets. For example, to create a set for the entire human genome, we merge the sets for the individual chromosomes using a union operation (i.e. A
B). Other allowed binary operations are intersection (A
B), complement of intersection [not(A
B)] and set difference (A B). Set merging is possible in two modes. By default it is based on the repeat id, an internal TRDB identifier. In this mode, equality of repeats means equality of the identifiers, i.e. the repeats are actually the same, from the same run of TRF. The alternative is to merge based on tandem array position. In this case, two repeats are considered the same if their tandem arrays are identical or they overlap by a user-specified percentage. This is useful in cases where the repeats come from different runs of TRF or the repeats are redundant. Associated with each merged set is an interactive tree diagram called the history which records and can display the merging conditions.
| DATA VISUALIZATION |
|---|
|
|
|---|
TRDB produces a variety of data visualizations, in .PNG format, which may be stored as static images in a report. Figure 2 shows TRDB's visualization of the alignment of a repeat to its consensus pattern. This view is accessed by clicking the repeat indices in a repeats table or a repeat image in the browser. Figure 3 shows the multiple alignment of a set of related repeats. Up to 20 repeats may be displayed in this way. Mutiple alignments are appropriate for repeats related by sequence similarity and can be accessed from the view repeats page.
|
|
For a repeat set, TRDB can produce a distribution histogram for any single numeric characteristic. The histogram can be presented as a graph or a table. In the case of a table, three values are returned per accumulation interval (bucket), the low and high ends of the interval range and the count for the interval. For any pair of characteristics, TRDB can produce a scatterplot of the ordered data points. Histograms and scatterplots can be accessed from the sets page. Figure 4 shows two histograms produced by TRDB.
|
The TRDB browser visualizes the occurrence of repeats along a source sequence in combination with the positions of other annotated features contained in the sequence. It was inspired by the UCSC Human Genome Browser but has more limited capability. Repeats and annotation features are displayed in separate horizontal strips. Within a strip, features are stacked if they would otherwise overlap. Each feature and repeat image contains a hyperlink. Resting the cursor on the image brings up a small text box with the feature name/id number. Clicking brings up a new window containing the feature description and an additional hyperlink for annotations to an external source database if available. Figure 5 shows a typical browser image. The browser can be accessed from the sets page or from the entry for a single repeat in a repeats table.
|
| TRDB TOOLS |
|---|
|
|
|---|
Data download
TRDB provides datafile output for repeat sets in several formats. Each repeat is described by a collection of characteristics which can be modified by the user. Additionally, sequence information can be provided, including the tandem array (subsequence), the consensus (pattern), the repeat profile (24) (a summary of the alignment of the tandem array to its consensus in terms of the A, C, G, T and indel content of each alignment column) and flanking sequence on either side of the repeat (in several prespecified lengths from 50 to 1000 bp). Repeats are sorted, ascending or descending, based on any single numeric characteristic and may be grouped by source sequence for a multi-sequence set. The output format is one of four possibilities: (i) ASCII, either tab or comma delimited, for use in spreadsheet programs; (ii) XML; (iii) FASTA for sequence information only (subsequence, pattern, flanking sequence); and (iv) GFF or UCSC custom track (see Supplementary Data for additional details).
Clustering
This tool clusters repeats by sequence similarity, thereby identifying repeats that are evolutionarily related within a single genome, or across genomes, or which may have common functional or structural properties. The output is a partition of the original repeat set into a group of clusters, each containing at least two related repeats. Those repeats unrelated to any others are omitted from the partition. Clusters can be viewed from the partitions page, by selecting a partition and then view clusters. Clusters are numbered arbitrarily and a table reports for each cluster the number of repeats it contains and the range of their consensus sizes. Each cluster is treated as a set and can be filtered, renamed and saved.
The clustering algorithm works with repeat profiles. Each element of the profile is the nucleotide and indel composition of one column in the alignment. Every pair of profiles is compared using a cyclic alignment algorithm (25) to produce a distance type alignment score for the pair. Several weighting functions for composition-to-composition scoring are available (26) and are still being tested. Alignment distance is converted to a percent similarity through the formula
![]() |
Polymorphism prediction
As discussed in the Introduction, polymorphic repeats are useful as genomic markers and can cause differential gene expression. The prediction method used in TRDB is based on the method validated in (28). A minisatellite repeat is predicted to be polymorphic based on two factors, %G s+ %C
0.48 and HistoryR
0.54. The HistoryR value (a real number between 0 and 1) measures the levels of redundant mutations in the repeat (mutations that appear in the same position in several copies of the repeat) and redundant mutation motifs (the same or similar sets of mutations that appear in several copies of the repeat, see Figure 2). A larger number means more redundancy. The HistoryR value is computed by a parsimony-based duplication history reconstruction algorithm (29).
In the validation study (28), various sequence characteristics were tested as predictors of polymorphism and heterozygosity in 127 repeats from human chromosomes 21 and 22. The highest predictive values were obtained with the pair of factors stated above. Validation was done on minisatellites with the following characteristics (i) unit length
17 bp, (ii) copy number
10, (iii) total length
350 bp and (iv) percent matches
70%. No data on the effectiveness of the prediction method for other repeats is currently available.
The Polymorphism Prediction tool is run on a set of repeats. Only the set owner can run this tool, as it modifies some fields in the source repeats. Once complete, the results are stored in the HistoryR' and Predicted Polymorphism' characteristics. These must be added to the repeat table (with the change columns' button) in order to use them for filtering or sorting.
| FUTURE ENHANCEMENTS |
|---|
|
|
|---|
In the coming months, we will add enhancements to TRDB. These are expected to include the following:
- Pre-computed clusters of all repeats in the public database. Clustering will be performed within and across genomes. It is expected that a consensus or representative repeat will be selected for each cluster so that newly deposited repeats may be compared quickly to existing clusters.
- Inclusion of other repeat detection programs. These will allow search for tandem repeats by alternate methods. One program, mreps (30), is already available for detecting longer repeats than are possible with TRF. Another, STAR (31) will allow search for repeats with a particular motif. A function import a set in the tools section has been implemented to allow external file upload of a repeat set detected by any means. It will be given more flexibility in terms of the allowed data file formats.
- Extended polymorphism prediction and annotation. Several other methods for computational polymorphism prediction have been published, both for microsatellites and minisatellites (3234). We will add these methods to the polymorphism prediction tool already available in TRDB. In addition, we will cooperate with laboratory groups conducting polymorphism typing to include annotation data on known polymorphic tandem repeats.
| CONCLUSION |
|---|
|
|
|---|
TRDB is intended as a central resource for comprehensive information on tandem repeats in sequenced genomes and as a workspace providing essential computational tools for tandem repeat analysis. Our goal is to make TRDB an informative and innovative database. We thank those who have helped in the past through their suggestions which have improved the functionality of the database and we welcome new suggestions, even wildly ambitious ones, that will simplify or extend data analysis.
| SUPPLEMENTARY DATA |
|---|
|
|
|---|
Supplementary Data are available at NAR online.
| ACKNOWLEDGEMENTS |
|---|
This research was partially supported by National Science Foundation grants DBI-0090789, CCR-0073081, DBI-0413462 and IIS-0612153. Funding to pay the Open Access publication charges for this article was provided by Boston University.
Conflict of interest statement. None declared.
| REFERENCES |
|---|
|
|
|---|
- Kit, S. (1961) Equilibrium sedimentation in density gradients of DNA preparations from animal tissues J. Mol. Biol, . 3, 711716[Web of Science][Medline] .
- La Spada, A.R., Wilson, E.M., Lubahn, D.B., Harding, A.E., Fischbeck, K.H. (1991) Androgen receptor gene mutations in X-linked spinal and bulbar muscular atrophy Nature, 352, 7779[CrossRef][Medline] .
- Huntington's disease collaborative research group. (1993) A novel gene containing a trinucleotide repeat that is expanded and unstable on Huntington's disease chromosomes Cell, 72, 971983[CrossRef][Web of Science][Medline] .
- Verkerk, A.J., Pieretti, M., Sutcliffe, J.S., Fu, Y.H., Kuhl, D.P., Pizzuti, A., Reiner, O., Richards, S., Victoria, M.F., Zhang, F.P., et al. (1991) Identification of a gene (FMR-1) containing a CGG repeat coincident with a breakpoint cluster region exhibiting length variation in fragile X syndrome Cell, 65, 905914[CrossRef][Web of Science][Medline] .
- Campuzano, V., Montermini, L., Molto, M.D., Pianese, L., Cossee, M., Cavalcanti, F., Monros, E., Rodius, F., Duclos, F., Monticelli, A., et al. (1996) Friedreich's ataxia: Autosomal recessive disease caused by an intronic GAA triplet repeat expansion Science, 271, 14231427[Abstract] .
- Fu, Y.-H., Pizzuti, A., Fenwick, R.G., Jr, King, J., Rajnarayan, S., Dunne, P.W., Dubel, J., Nasser, G.A., Ashizawa, T., DeJong, P., et al. (1992) An unstable triplet repeat in a gene related to myotonic muscular dystrophy Science, 255, 12561258
[Abstract/Free Full Text] . - Koob, M.D., Moseley, M.L., Schut, L.J., Benzow, K.A., Bird, T.D., Day, J.W., Ranum, L.P. (1999) An untranslated CTG expansion causes a novel form of spinocerebellar ataxia (SCA8) Nature Genet, . 21, 379384[CrossRef][Web of Science][Medline] .
- Holmes, S.E., O'Hearn, E.E., McInnis, M.G., Gorelick-Feldman, D.A., Kleiderlein, J.J., Callahan, C., Kwak, N.G., Ingersoll-Ashworth, R.G., Sherr, M., Sumner, A.J., et al. (1999) Expansion of a novel CAG trinucleotide repeat in the 5' region of PPP2R2B is associated with SCA12 Nature Genet, . 23, 391392[CrossRef][Web of Science][Medline] .
- Vandenbergh, D., Persico, A.M., Uhl, G.R. (1992) A human dopamine transporter cDNA predicts reduced glycosylation, displays a novel repetitive element and provides racially-dimorphic TaqI RFLPs Mol. Brain Res, . 15, 161166[Medline] .
- Cook, E.H., Jr, Stein, M.A., Krasowski, M.D., Cox, N.J., Olkon, D.M., Kieffer, J.E., Leventhal, B.L. (1995) Association of attention-deficit disorder and the dopamine transporter gene Am. J. Hum. Genet, . 56, 993998[Web of Science][Medline] .
- Gilbert, D.L., Wang, Z., Sallee, F.R., Ridel, K.R., Merhar, S., Zhang, J., Lipps, T.D., White, C., Badreldin, N., Wassermann, E.M. (2006) Dopamine transporter genotype influences the physiological response to medication in ADHD Brain, 129, 20382046
[Abstract/Free Full Text] . - Lott, D., Kim, S.J., Cook, E.H., Jr, de Wit, H. (2005) Dopamine transporter gene associated with diminished subjective response to amphetamine Neuropsychopharmacology, 30, 602609[CrossRef][Web of Science][Medline] .
- Guindalini, C., Howard, M., Haddley, K., Laranjeira, R., Collier, D., Ammar, N., Craig, I., O'Garag, C., Bubb, V.J., Greenwood, T., et al. (2006) A dopamine transporter gene functional variant associated with cocaine abuse in a Brazilian sample Proc. Natl Acad. Sci. USA, 103, 45524557
[Abstract/Free Full Text] . - Klenova, E., Scott, A.C., Roberts, J., Shamsuddin, S., Lovejoy, E.A., Bergmann, S., Bubb, V.J., Royer, H.-D., Quinn, J.P. (2004) YB-1 and CTCF differentially regulate the 5-HTT polymorphic intron 2 enhancer which predisposes to a variety of neurological disorders J. Neurosci, . 24, 59665973
[Abstract/Free Full Text] . - Cook, E.H., Jr, Courchesne, R., Lord, C., Cox, N.J., Yan, S., Lincoln, A., Haas, R., Courchesne, E., Leventhal, B.L. (1997) Evidence of linkage between the serotonin transporter and autistic disorder Mol. Psychiatry, 2, 247250[CrossRef][Web of Science][Medline] .
- Murphy, G., Jr, Hollander, S.B., Rodrigues, H.E., Kremer, C., Schatzberg, A.F. (2004) Effects of the serotonin transporter gene promoter polymorphism on mirtazapine and paroxetine efficacy and adverse events in geriatric major depression Arch. Gen. Psychiatry, 61, 11631169
[Abstract/Free Full Text] . - Owerbach, D. and Gabbay, K.H. (1993) Localization of a type 1 diabetes susceptibility locus to the variable tandem repeat region flanking the insulin gene Diabetes, 42, 17081714[Abstract] .
- Bennett, S.T., Lucassen, A.M., Gough, S.C., Powell, E.E., Undlien, D.E., Pritchard, L.E., Merriman, M.E., Kawaguchi, Y., Dronsfield, M.J., Pociot, F., et al. (1995) Susceptibility to human type 1 diabetes at IDDM2 is determined by tandem repeat variation at the insulin gene minisatellite locus Nature Genetics, 9, 284292[CrossRef][Web of Science][Medline] .
- Stam, M., Belele, C., Dorweiler, J.E., Chandler, V.L. (2002) Differential chromatin structure within a tandem array 100 kb upstream of the maize b1 locus is associated with paramutation Genes Dev, . 16, 19061918
[Abstract/Free Full Text] . - Alleman, M., Sidorenko, L., McGinnis, K., Seshadri, V., Dorweiler, J.E., White, J., Sikkink, K., Chandler, V.L. (2006) An RNA-dependent RNA polymerase is required for paramutation in maize Nature, 442, 295298[CrossRef][Medline] .
- Fondon, J.W., III and Garner, H.R. (2004) Molecular origins of rapid and continuous morphological evolution Proc. Natl Acad. Sci. USA, 101, 1805818063
[Abstract/Free Full Text] . - Benson, G. (1999) Tandem repeats finder: a program to analyze DNA sequences Nucleic Acids Res, . 27, 573580
[Abstract/Free Full Text] . - Rozen, S. and Skaletsky, H. (2000) Primer3 on the WWW for general users and for biologist programmers In Krawetz, S. and Misener, S. (Eds.). Bioinformatics Methods and Protocols: Methods in Molecular Biology, Humana Press pp. 365386 .
- Gribskov, M., McLachlan, A.D., Eisenberg, D. (1987) Profile analysis: Detection of distantly related proteins Proc. Natl Acad. Sci. USA, 84, 43554358
[Abstract/Free Full Text] . - Maes, M. (1990) On a cyclic string-to-string correction problem Information Processing Letters, 35, 7378[CrossRef] .
- Rao, S., Rodriguez, A., Benson, G. (2005) Evaluating distance functions for clustering tandem repeats Genome Inform, . 16, 312 .
- Kaufman, L. and Rousseeuw, P.J. Finding Groups in Data: An Introduction to Cluster Analysis, (1990) New York John Wiley and Sons .
- Denoeud, F., Vergnaud, G., Benson, G. (2003) Predicting human minisatellite polymorphism Genome Res, . 13, 856867
[Abstract/Free Full Text] . - Benson, G. and Dong, L. (1999) Reconstructing the duplication history of a tandem repeat Seventh International Conference on Intelligent Systems for Molecular BiologyISMB99, pp. 4453 .
- Kolpakov, R., Bana, G., Kucherov, G. (2003) mreps: efficient and flexible detection of tandem repeats in DNA Nucleic Acids Res, . 31, 36723678
[Abstract/Free Full Text] . - Delgrange, O. and Rivals, E. (2004) STAR: an algorithm to search for tandem approximate repeats Bioinformatics, 20, 28122820
[Abstract/Free Full Text] . - Naslund, K., Saetre, P., von Salome, J., Bergstrom, T.F., Jareborg, N., Jazin, E. (2005) Genome-wide prediction of human VNTRs Genomics, 85, 2435[CrossRef][Web of Science][Medline] .
- Wren, J., Forgacs, E., Fondon, J., Pertsemlidis, A., Cheng, S., Gallardo, T., Williams, R., Shohet, R., Minna, J., Garner, H. (2000) Repeat polymorphisms within gene regions: phenotypic and evolutionary implications Am. J. Hum. Genet, . 67, 34556[CrossRef][Web of Science][Medline] .
- Fondon, J.W., III, Mele, G.M., Brezinschek, R.I., Cummings, D., Pande, A., Wren, J., O'Brien, K.M., Kupper, K.C., Wei, M.H., Lerman, M., et al. (1998) Computerized polymorphic marker identification: experimental validation and a predicted human polymorphism catalog Proc. Natl Acad. Sci. USA, 95, 75147519
[Abstract/Free Full Text] .
This article has been cited by other articles:
![]() |
K. Akagi, R. M. Stephens, J. Li, E. Evdokimov, M. R. Kuehn, N. Volfovsky, and D. E. Symer MouseIndelDB: a database integrating genomic indel polymorphisms that distinguish mouse strains Nucleic Acids Res., November 20, 2009; (2009) gkp1046v1. [Abstract] [Full Text] [PDF] |
||||
![]() |
X. Xu, K. Tsumagari, J. Sowden, R. Tawil, A. P. Boyle, L. Song, T. S. Furey, G. E. Crawford, and M. Ehrlich DNaseI hypersensitivity at gene-poor, FSH dystrophy-linked 4q35.2 Nucleic Acids Res., October 9, 2009; (2009) gkp833v1. [Abstract] [Full Text] [PDF] |
||||
![]() |
M. Molla, A. Delcher, S. Sunyaev, C. Cantor, and S. Kasif Triplet repeat length bias and variation in the human transcriptome PNAS, October 6, 2009; 106(40): 17095 - 17100. [Abstract] [Full Text] [PDF] |
||||
![]() |
G. Perez, J. Pangilinan, A. G. Pisabarro, and L. Ramirez Telomere Organization in the Ligninolytic Basidiomycete Pleurotus ostreatus Appl. Envir. Microbiol., March 1, 2009; 75(5): 1427 - 1436. [Abstract] [Full Text] [PDF] |
||||
![]() |
A. Merkel and N. Gemmell Detecting short tandem repeats from genome data: opening the software black box Brief Bioinform, September 1, 2008; 9(5): 355 - 366. [Abstract] [Full Text] [PDF] |
||||
![]() |
D. Ames, N. Murphy, T. Helentjaris, N. Sun, and V. Chandler Comparative Analyses of Human Single- and Multilocus Tandem Repeats Genetics, July 1, 2008; 179(3): 1693 - 1704. [Abstract] [Full Text] [PDF] |
||||
| ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||










