Skip Navigation


Nucleic Acids Research Advance Access originally published online on September 19, 2008
Nucleic Acids Research 2009 37(Database issue):D37-D40; doi:10.1093/nar/gkn597
This Article
Right arrow Abstract Freely available
Right arrow Print PDF (4004K) Freely available
Right arrow Screen PDF (446K) Freely available
Right arrow Supplementary Data
Right arrowOA All Versions of this Article:
37/suppl_1/D37    most recent
gkn597v2
gkn597v1
Right arrow Alert me when this article is cited
Right arrow Alert me if a correction is posted
Services
Right arrow Email this article to a friend
Right arrow Similar articles in this journal
Right arrow Similar articles in PubMed
Right arrow Alert me to new issues of the journal
Right arrow Add to My Personal Archive
Right arrow Download to citation manager
Right arrow Commercial Re-use Guidelines
for Open Access NAR Content
Google Scholar
Right arrow Articles by Friedel, M.
Right arrow Articles by Wilhelm, T.
Right arrow Search for Related Content
PubMed
Right arrow PubMed Citation
Right arrow Articles by Friedel, M.
Right arrow Articles by Wilhelm, T.
Social Bookmarking
 Add to CiteULike   Add to Connotea   Add to Del.icio.us  
What's this?

Nucleic Acids Research, 2009, Vol. 37, Database issue D37-D40
© 2008 The Author(s)
This is an Open Access article distributed under the terms of the Creative Commons Attribution Non-Commercial License (http://creativecommons.org/licenses/by-nc/2.0/uk/) which permits unrestricted non-commercial use, distribution, and reproduction in any medium, provided the original work is properly cited.

This article appears in the following Nucleic Acids Research issue: Database issue [View the issue table of contents]

Articles

DiProDB: a database for dinucleotide properties

Maik Friedel1, Swetlana Nikolajewa2, Jürgen Sühnel1 and Thomas Wilhelm3,*

1Biocomputing Group, Leibniz Institute for Age Research - Fritz Lipmann Institute, Beutenbergstrasse 11, 07745 Jena, 2Department of Bioinformatics, Friedrich-Schiller-University Jena, Ernst-Abbe-Platz 2, 07743 Jena, Germany and 3Theoretical Systems Biology, Institute of Food Research, Norwich Research Park, Colney, Norwich NR4 7UA, UK

*To whom correspondence should be addressed. Tel: +44 1603 255313; Fax: +44 1603 255128; Email: thomas.wilhelm{at}bbsrc.ac.uk

Received August 1, 2008. Revised September 3, 2008. Accepted September 3, 2008.


    ABSTRACT
 TOP
 ABSTRACT
 INTRODUCTION
 DATABASE CONTENT
 USER INTERFACE
 DATA ANALYSES
 OUTLOOK
 SUPPLEMENTARY DATA
 FUNDING
 REFERENCES
 
DiProDB (http://diprodb.fli-leibniz.de) is a database of conformational and thermodynamic dinucleotide properties. It includes datasets both for DNA and RNA, as well as for single and double strands. The data have been shown to be important for understanding different aspects of nucleic acid structure and function, and they can also be used for encoding nucleic acid sequences. The database is intended to facilitate further applications of dinucleotide properties. A number of property datasets is highly correlated. Therefore, the database comes with a correlation analysis facility. Authors having determined new sets of dinucleotide property values are invited to submit these data to DiProDB.


    INTRODUCTION
 TOP
 ABSTRACT
 INTRODUCTION
 DATABASE CONTENT
 USER INTERFACE
 DATA ANALYSES
 OUTLOOK
 SUPPLEMENTARY DATA
 FUNDING
 REFERENCES
 
Nucleic acid properties are governed by the corresponding nucleotide sequence. More specifically, many properties such as nucleic acid stability, for example, seem to depend primarily on the identity of nearest-neighbour nucleotides (1). The corresponding nearest-neighbour model is also the basis for RNA secondary structure prediction by free-energy minimization (2). It is known that not only thermodynamic but also conformational nucleotide properties may play a role. It has been shown, for example, that promoter locations can be predicted adopting dinucleotide stiffness parameters derived from molecular dynamic simulations (3). Also, curved DNA is known to play a role in prokaryotic gene expression (4). In addition, physical DNA profiles have been used for an improved promoter prediction (5,6). There are numerous other examples. It is, however, beyond the scope of this brief database description to provide a comprehensive overview. Currently, we are developing a Genome Browser that encodes complete eukaryotic or prokaryotic genomes by thermodynamic and conformational dinucleotide properties. In this context, we have collected more than 100 sets of dinucleotide properties from the literature. Currently, there are two related data collections, the PROPERTY DB (srs6.bionet.nsc.ru/srs6bin/cgi-bin/wgetz?-page+LibInfo+-id+1pFZP1TuQpU+-lib+PROPERTY) with about 30 property sets (7) and plot.it (hydra.icgeb.trieste.it/dna/plot_it.html) with about 50 sets (Vlahovicek,K. and Pongor,S., unpublished data). Both of these databases do not include many of the existing datasets and, in addition, it is difficult to trace back the original data sources. Also, both of them are not included in the NAR Database Collection. Therefore, we have set up the database DiProDB, which is aimed to be a one-stop resource for these properties. With DiProDB we want to provide reliable, easily accessible and comprehensive information on dinucleotide properties that may stimulate the application of these data to a diversity of biological problems.


    DATABASE CONTENT
 TOP
 ABSTRACT
 INTRODUCTION
 DATABASE CONTENT
 USER INTERFACE
 DATA ANALYSES
 OUTLOOK
 SUPPLEMENTARY DATA
 FUNDING
 REFERENCES
 
DiProDB currently includes 115 dinucleotide datasets. They were collected from the literature and are classified according to nucleic acid type (DNA and RNA), strand information (double or single), how the data were obtained (experimental, theoretical/calculated) and also according to the general type of the dinucleotide property: thermodynamical (e.g. free energy), conformational (e.g. twist) or letter-based (e.g. GC content). We include the letter-based data to demonstrate relations to thermodynamical and conformational properties. Moreover, most of the current motif discovery approaches are letter-based. An example from our work refers to the identification of significant purine–pyrimidine patterns in restriction enzyme binding sites (8). The number of datasets for each category is shown in Table 1. For each dataset, the 16 dinucleotide values, the unit of measurement, the reference, the classification features as well as comments are provided. If a dataset refers to RNA, it is mentioned in the corresponding property name, if the name does not mention a nucleic acid, it always refers to DNA.


View this table:
[in this window]
[in a new window]

 
Table 1. Number of dinucleotide property datasets for each category

 

    USER INTERFACE
 TOP
 ABSTRACT
 INTRODUCTION
 DATABASE CONTENT
 USER INTERFACE
 DATA ANALYSES
 OUTLOOK
 SUPPLEMENTARY DATA
 FUNDING
 REFERENCES
 
DiProDB displays all data in a single table, see Figure 1. The number and type of columns shown can be customized by the user. When clicking on the ID button in the first column a new page pops up containing all relevant information about the corresponding property. The database entries can be sorted according to three different criteria. There is also a search option for all or for specific columns. The complete table or parts of it can be saved as text file or in a format directly importable into the Genome Browser mentioned in the Introduction section. The DiProDB website contains a Submit button, where users can submit new property datasets.


Figure 1
View larger version (63K):
[in this window]
[in a new window]
[Download PowerPoint slide]
 
Figure 1. Screenshot of the DiProDB table displaying search results for the term ‘twist’ (conformational dinucleotide property) in the property name.

 

    DATA ANALYSES
 TOP
 ABSTRACT
 INTRODUCTION
 DATABASE CONTENT
 USER INTERFACE
 DATA ANALYSES
 OUTLOOK
 SUPPLEMENTARY DATA
 FUNDING
 REFERENCES
 
The DiProDB website contains a Correlate option, where users can calculate Pearson's or Spearman's rank correlation coefficients for all or selected properties. This allows easy identification of dependencies between different dinucleotide properties. As an example in Figure 2, Spearman's correlation data are shown for five different datasets quantifying the twist in B-DNA. All datasets are clearly correlated to each other. However, the extent of correlation is rather different. Correlation coefficients >0.58 are considered as statistically significant (P < 0.01, t-test).


Figure 2
View larger version (18K):
[in this window]
[in a new window]
[Download PowerPoint slide]
 
Figure 2. Pearson's correlation coefficients for five sets of twist angles. ID (Ref.): 1 (9), 61 (10), 88 (11), 92 (12) and 98 (13). Correlation coefficients >0.8 are coloured in green.

 
Based on these correlations, we have done different hierarchical clustering analyses to get a deeper insight into the overall correlation of the datasets. Figure 3 shows a single linkage hierarchical clustering of all 23 B-DNA double-strand thermodynamical properties together with the three-dinucleotide letter-based quantities GC content, purine (GA) content and keto (GT) content. This clustering is based on the distance measure 1–|rPearson|, because it is just the absolute value of the correlation, which indicates whether two properties contain similar information. Other correlation measures like Spearman or Kendall-Tau give very similar results. It can be seen that all free-energy data contain more or less the same information and that this is basically equivalent to the GC content. This is very likely due to the simple fact that GC pairs have three H-bonds instead of two in AT base pairs. The complete single-linkage hierarchical clustering of all 115 properties is given in the Supplementary Material (Table 2), where also a corresponding Ward clustering (14) is shown. The latter one shows a separation between a free energy/entropy/enthalpy/stacking energy/melting temperature cluster and another cluster containing all the conformational datasets. The complete single linkage clustering reveals that the most uncorrelated dinucleotide properties are direction, inclination, twist–rise (conformational), stacking energy, tilt, shift, propeller twist and rise.


Figure 3
View larger version (10K):
[in this window]
[in a new window]
[Download PowerPoint slide]
 
Figure 3. Hierarchical clustering of all 23 B-DNA double-strand physicochemical properties and the three-dinucleotide letter-based quantities GC content, purine (GA) content and keto (GT) content. The property sets are designated by their IDs and names.

 

View this table:
[in this window]
[in a new window]

 
Table 2. Content of supplementary material

 
In order to gain more insights into the data, we performed two principal component analyses (PCA) (15). The complete data of 115 properties for 16 dinucleotides corresponds to 115 points in 16-dimensional space (or 16 points in 115-dimensional space). PCA helps to reveal the internal structure of such high-dimensional data by providing lower dimensional pictures of the ‘cloud’ in coordinates corresponding to maximum variance of the data (http://en.wikipedia.org/wiki/Principal_components_analysis). The cloud of all 115 properties in the first two principal components (PCs, the new coordinates) is shown in Figure 4. Only the most uncorrelated property ‘direction’ lies outside the shown region: (PC1,PC2)Direction = (0.1,1.6) (the complete figure containing direction and a PC1–PC3 projection are given in the Supplementary Material; note also that only the first three PCs carry relevant information: PC1 78.5%, PC2 16.9%, PC3 3.3%). The other two outliers are melting temperature and persistence length. This indicates that especially these three properties carry information quite different from the others. Note that the latter two properties are not amongst the outliers according to the above mentioned single linkage clustering, because each one has (at least) one better correlation to other datasets (melting temperature to stacking energy, and persistence length to tilt–shift). Figure 4 also indicates three clusters containing all other properties, one stacking energy/entropy cluster, a twist cluster and the central main cluster.


Figure 4
View larger version (15K):
[in this window]
[in a new window]
[Download PowerPoint slide]
 
Figure 4. All dinucleotide properties plotted in the first two PCs. A few of them are designated by property name and ID.

 
Finally, we also performed a PCA calculating the 115 principal components for the 16 dinucleotides. The first 15 PCs carry information (23%, 21%, 14%, 12%, 6%, etc.), roughly indicating that about this number of low correlated properties is needed to represent all information of the complete set of 115 properties. The Supplementary Material also contains a corresponding PC1–PC2 plot, together with all detailed information about the performed PCAs.


    OUTLOOK
 TOP
 ABSTRACT
 INTRODUCTION
 DATABASE CONTENT
 USER INTERFACE
 DATA ANALYSES
 OUTLOOK
 SUPPLEMENTARY DATA
 FUNDING
 REFERENCES
 
So far the DiProDB database contains 115 sets of dinucleotide properties. In the future, this number is to be increased. We also invite other authors to submit their measured or calculated dinucleotide properties to DiProDB.


    SUPPLEMENTARY DATA
 TOP
 ABSTRACT
 INTRODUCTION
 DATABASE CONTENT
 USER INTERFACE
 DATA ANALYSES
 OUTLOOK
 SUPPLEMENTARY DATA
 FUNDING
 REFERENCES
 
Supplementary data are available at NAR Online.


    FUNDING
 TOP
 ABSTRACT
 INTRODUCTION
 DATABASE CONTENT
 USER INTERFACE
 DATA ANALYSES
 OUTLOOK
 SUPPLEMENTARY DATA
 FUNDING
 REFERENCES
 
Funding for open access charge: Biotechnology and Biological Sciences Research Council (BBSRC)IFR Core Strategic Grant.

Conflict of interest statement. None declared.


    ACKNOWLEDGEMENTS
 
We are grateful to Friedrich Haubensak for setting up the database and to Rolf Hühne for helpful comments on the database layout.


    REFERENCES
 TOP
 ABSTRACT
 INTRODUCTION
 DATABASE CONTENT
 USER INTERFACE
 DATA ANALYSES
 OUTLOOK
 SUPPLEMENTARY DATA
 FUNDING
 REFERENCES
 

  1. SantaLucia J Jr. A unified view of polymer, dumbbell, and oligonucleotide DNA nearest-neighbour thermodynamics. Proc. Natl Acad. Sci. USA (1998) 95:1460–1465.[Abstract/Free Full Text]

  2. Mathews DH, Turner DH. Prediction of RNA secondary structure by free energy minimization. Curr. Opin. Struct. Biol. (2006) 16:270–278.[CrossRef][Web of Science][Medline]

  3. Goñi JR, Pérez A, Torrents D, Orozco M. Determining promoter location based on DNA structure first-principles calculations. Genome Biol. (2007) 8:R263.[CrossRef][Medline]

  4. Pérez-Martín J, Rojo F, de Lorenzo V. Promoters responsive to DNA bending: a common theme in prokaryotic gene expression. Microbiol. Rev. (1994) 58:268–290.[Abstract/Free Full Text]

  5. Abeel T, Saeys Y, Rouzé P, Van de Peer Y. ProSOM: core promoter prediction based on unsupervised clustering of DNA physical profiles. Bioinformatics (2008) 24:i24–i31.[Abstract/Free Full Text]

  6. Florquin K, Saeys Y, Degroeve S, Rouzé P, Van de Peer Y. Large-scale structural analysis of the core promoter in mammalian and plant genomes. Nucleic Acids Res. (2005) 33:4255–4264.[Abstract/Free Full Text]

  7. Ponomarenko JV, Ponomarenko MP, Frolov AS, Vorobyev DG, Overton GC, Kolchanov NA. Conformational and physicochemical DNA features specific for transcription factor binding sites. Bioinformatics (1999) 15:654–668.[Abstract/Free Full Text]

  8. Nikolajewa S, Beyer A, Friedel M, Hollunder J, Wilhelm T. Common patterns in type II restriction enzyme binding sites. Nucleic Acids Res. (2005) 33:2726–2733.[Abstract/Free Full Text]

  9. Karas H, Knüppel R, Schulz W, Sklenar H, Wingender E. Combining structural analysis of DNA with search routines for the detection of transcription regulatory elements. Comput. Appl. Biosci. (1996) 12:441–446.[Abstract/Free Full Text]

  10. Pérez A, Noy A, Lankas F, Luque FJ, Orozco M. The relative flexibility of B-DNA and A-RNA duplexes: database analysis. Nucleic Acids Res. (2004) 32:6144–6151.[Abstract/Free Full Text]

  11. Gorin AA, Zhurkin VB, Olson WK. B-DNA twisting correlates with base-pair morphology. J. Mol. Biol. (1995) 247:34–48.[CrossRef][Web of Science][Medline]

  12. Suzuki M, Yagi N, Finch JT. Role of base-backbone and base-base interactions in alternating DNA conformations. FEBS Lett. (1996) 379:148–152.[CrossRef][Web of Science][Medline]

  13. Shpigelman ES, Trifonov EN, Bolshoy A. CURVATURE: software for the analysis of curved DNA. Comput. Appl. Biosci. (1993) 9:435–440.[Abstract/Free Full Text]

  14. Ward JH. Hierarchical grouping to optimize an objective function. J. Am. Stat. Assoc. (1963) 58:236–244.[CrossRef][Web of Science]

  15. Pearson K. On lines and planes of closest fit to systems of points in space. Philos. Magazine (1901) 2:559–572.


Add to CiteULike CiteULike   Add to Connotea Connotea   Add to Del.icio.us Del.icio.us    What's this?


This article has been cited by other articles:


Home page
BioinformaticsHome page
M. Friedel, S. Nikolajewa, J. Suhnel, and T. Wilhelm
DiProGB: the dinucleotide properties genome browser
Bioinformatics, October 1, 2009; 25(19): 2603 - 2604.
[Abstract] [Full Text] [PDF]


This Article
Right arrow Abstract Freely available
Right arrow Print PDF (4004K) Freely available
Right arrow Screen PDF (446K) Freely available
Right arrow Supplementary Data
Right arrowOA All Versions of this Article:
37/suppl_1/D37    most recent
gkn597v2
gkn597v1
Right arrow Alert me when this article is cited
Right arrow Alert me if a correction is posted
Services
Right arrow Email this article to a friend
Right arrow Similar articles in this journal
Right arrow Similar articles in PubMed
Right arrow Alert me to new issues of the journal
Right arrow Add to My Personal Archive
Right arrow Download to citation manager
Right arrow Commercial Re-use Guidelines
for Open Access NAR Content
Google Scholar
Right arrow Articles by Friedel, M.
Right arrow Articles by Wilhelm, T.
Right arrow Search for Related Content
PubMed
Right arrow PubMed Citation
Right arrow Articles by Friedel, M.
Right arrow Articles by Wilhelm, T.
Social Bookmarking
 Add to CiteULike   Add to Connotea   Add to Del.icio.us  
What's this?