Skip Navigation



Nucleic Acids Research Advance Access published online on October 2, 2008

Nucleic Acids Research, doi:10.1093/nar/gkn599
This Article
Right arrow Abstract Freely available
Right arrow Print PDF (3740K) Freely available
Right arrow Screen PDF (513K) Freely available
Right arrowOA All Versions of this Article:
37/suppl_1/D338    most recent
gkn599v1
Right arrow Alert me when this article is cited
Right arrow Alert me if a correction is posted
Services
Right arrow Email this article to a friend
Right arrow Similar articles in this journal
Right arrow Similar articles in PubMed
Right arrow Alert me to new issues of the journal
Right arrow Add to My Personal Archive
Right arrow Download to citation manager
Right arrow Commercial Re-use Guidelines
for Open Access NAR Content
Google Scholar
Right arrow Articles by Günther, S.
Right arrow Articles by Preissner, R.
Right arrow Search for Related Content
PubMed
Right arrow PubMed Citation
Right arrow Articles by Günther, S.
Right arrow Articles by Preissner, R.
Social Bookmarking
 Add to CiteULike   Add to Connotea   Add to Del.icio.us  
What's this?

© 2008 The Author(s)
This is an Open Access article distributed under the terms of the Creative Commons Attribution Non-Commercial License (http://creativecommons.org/licenses/by-nc/2.0/uk/) which permits unrestricted non-commercial use, distribution, and reproduction in any medium, provided the original work is properly cited.


Database Issue

JAIL: a structure-based interface library for macromolecules

Stefan Günther1, Joachim von Eichborn1, Patrick May2 and Robert Preissner1,*

1Institute of Molecular Biology and Bioinformatics, Charité-University Medicine Berlin, Arnimallee 22, 14195 Berlin and 2Max-Planck-Institute for Molecular Plant Physiology, 14476 Potsdam-Golm, Germany

*To whom correspondence should be addressed. Tel: +49-30-8445-1649; Fax: +49-30-8445-1551; Email: robert.preissner{at}charite.de

Received August 20, 2008. Accepted September 4, 2008.


    ABSTRACT
 TOP
 ABSTRACT
 INTRODUCTION
 DATABASE
 EXAMPLE OF USE
 FUNDING
 REFERENCES
 
The increasing number of solved macromolecules provides a solid number of 3D interfaces, if all types of molecular contacts are being considered. JAIL annotates three different kinds of macromolecular interfaces, those between interacting protein domains, interfaces of different protein chains and interfaces between proteins and nucleic acids. This results in a total number of about 184 000 database entries. All the interfaces can easily be identified by a detailed search form or by a hierarchical tree that describes the protein domain architectures classified by the SCOP database. Visual inspection of the interfaces is possible via an interactive protein viewer. Furthermore, large scale analyses are supported by an implemented sequential and by a structural clustering. Similar interfaces as well as non-redundant interfaces can be easily picked out. Additionally, the sequential conservation of binding sites was also included in the database and is retrievable via Jmol. A comprehensive download section allows the composition of representative data sets with user defined parameters. The huge data set in combination with various search options allow a comprehensive view on all interfaces between macromolecules included in the Protein Data Bank (PDB). The download of the data sets supports numerous further investigations in macromolecular recognition. JAIL is publicly available at http://bioinformatics.charite.de/jail.


    INTRODUCTION
 TOP
 ABSTRACT
 INTRODUCTION
 DATABASE
 EXAMPLE OF USE
 FUNDING
 REFERENCES
 
Proteins interact quickly and specifically with each other or with nucleic acids. All interactions form a biochemical network that reflects the high complexity of cellular metabolism. Nevertheless, the vast majority of all interactions are not yet identified and are subject to current research (1). An important step towards the mechanistic descriptions of such interactions is the 3D structural information of macromolecules. However, complexed proteins are difficult to co-crystallize and the number of publicly available X-ray structures in the Protein Data Bank (PDB) (2) is very limited (3). Thus, a structure-based analysis of particular interacting macromolecules is often only possible by using docking models. For systematic analyses the problem of the low number of protein–protein complexes might be avoided by taking into consideration interacting domains or chains. Such types of interfaces often exhibit a similar behaviour like those of interacting proteins (4). For instance, knowledge-based potential functions that represent the co-occurrence of certain residues might be similar for contacts between domains of a single chain as well as contacts between proteins. Another approach is the utilization of the interfaces between domains or chains to detect structural similarities to binding sites of interacting proteins. First applications apply this method for protein–protein complex modelling (5). Consequentially, some structure-based databases exist that focus on the interacting parts of proteins. SCOPPI (6), SNAPPI (7) and PIBASE (8) classify interfaces between domains, the domain information was retrieved from SCOP (9), CATH (10) or Pfam (11). Since they depend on domain definitions extracted from secondary databases especially structures solved during the last few years are normally not yet classified (12). HotSprint (13) focuses on conserved residues in chain contact sites and is regularly updated but domain information is ignored. Dockground (3) comprises the so far most comprehensive data set of interacting proteins and chains respectively. The excellent database also provides user defined data sets of the associated unbound protein-binding sites. Nevertheless, it focuses on protein–protein interaction, so information about intra-chain contacts is not retrievable. None of the mentioned databases contains interfaces between proteins and nucleic acids. Although each application is useful to enlighten questions the database is specialized for, a comprehensive web resource that combines all the different kinds of interfaces between macromolecules is not available. Furthermore, the database should be characterized by regular updates and the opportunity to download appropriate data sets. To overcome this lack we developed JAIL, a structure-based interface library for macromolecules.


    DATABASE
 TOP
 ABSTRACT
 INTRODUCTION
 DATABASE
 EXAMPLE OF USE
 FUNDING
 REFERENCES
 
Currently, the database contains more than 184 000 interfaces that are composed of four different fractions: 81 000 interfaces between domains classified by SCOP, 76 000 interfaces between different protein chains, 8000 interfaces between proteins and nucleic acids and 19 000 interfaces which were calculated based on the assumed biological units. Since they were not directly solved cristallographically, they are annotated separately. The interfaces result from the evaluation of 52 000 different asymmetric unit files as well as the associated biological unit files provided by the PDB. An interface is defined as those atoms of a chain or domain that are located within a range of 10 Å around the C{alpha}-atoms of the interacting counterpart. In the case of nucleic acids the backbone (P/C4')-atoms were considered. Each binding site has to consist of at least five C{alpha}-atoms as the case may be backbone atoms of the nucleic acids. Assumed biological units were calculated based on the first two models build up by reflection of the unit cells. Information on the evolutionary conservation was extracted from the HSSP database (14). The PDB-IDs of nucleic acids containing structures were retrieved from the Nucleic Acid Database (15). All chains of the database were sequentially clustered using the regular updated lists provided by the PDB calculated by the Cd-hit program (16). Thus, it is possible, to select interfaces of proteins, which are similar in sequence to each other as well as to download non-redundant data sets based on protein sequences. Structural clustering was implemented by the selection of representative interfaces of each family–family or superfamily–superfamily contact between domains classified in SCOP. A protein can be identified by a detailed search form that allows searches by PDB-ID, protein name, EC-number, UniProt accession number or SCOP-ID. An implemented full text search allows the screening to the full header information of the structure-file as well as the SCOP domain descriptions. Visualization of the interfaces was implemented by pre-generated thumbnails of each interface and on the other hand by the interactive protein visualizer Jmol (http://www.jmol.org). The download section provides various possibilities to build up a user defined data set. Structurally, clustered interfaces based on the SCOP domain definitions as well as sequentially clustered interfaces based on protein chain clustering are separately retrievable but can also be combined. Parameters like the SCOP-hierarchy (family/superfamily) or the sequence identity level (50%, 70%, 90% and 95%) are selectable. The database is automatically updated six times a year.


    EXAMPLE OF USE
 TOP
 ABSTRACT
 INTRODUCTION
 DATABASE
 EXAMPLE OF USE
 FUNDING
 REFERENCES
 
The provided browsable interfaces support the answering of various further investigations. One of them is the comparative study of molecular recognition of nucleic acids and proteins. For instance, experimental evidence exists, that proteins mimic nucleic acids to usurp the role of interacting macromolecules (17,18). Shape comparisons of different kinds of interfaces (protein–protein/protein–nucleic acids) may help to identify cases of molecular mimicry.

A matter of particular interest in this context is the identification of protein domains that interact with other proteins as well as with nucleic acids. Figure 1 shows an example of a search for such a case by using the implemented ‘Show related proteins’-option. Figure 2 shows the resulting interfaces of a fab-fragment in complex with an enzyme. The same domain architecture is also capable to bind single stranded hairpin DNA.


Figure 1
View larger version (41K):
[in this window]
[in a new window]
[Download PowerPoint slide]
 
Figure 1. Example for a search by homology. (a) Entry of PDB-ID 2FR4. The highlighted link (Show related proteins) yields a list of homologous proteins. (b) List of homologs of 2FR4 and the associated interfaces. The last entry is PDB-ID 2J88 and is shown in Figure 2b.

 

Figure 2
View larger version (56K):
[in this window]
[in a new window]
[Download PowerPoint slide]
 
Figure 2. Two similar Fab-fragments (sequence identity >95%) which interact with different kinds of macromolecules. The complexes were identified with the homology search option (see Figure 1) of JAIL. (a) Fab-fragment in complex with a stem-loop DNA (PDB-ID: 2FR4). (b) A monoclonal IgG Fab-fragment in complex with hyaloronidase (PDB-ID: 2J88).

 

    FUNDING
 TOP
 ABSTRACT
 INTRODUCTION
 DATABASE
 EXAMPLE OF USE
 FUNDING
 REFERENCES
 
Deutsche Forschungsgemeinschaft (DFG SFB-449); the International Research Training Group on Genomics and Systems Biology of Molecular Networks (GRK1360); German Federal Ministry of Education and Research (GoFORSYS Grant Nr. 0313924 to PM). This work is licensed under a Creative Commons Attribution-Noncommercial-Share Alike 3.0 License. Funding for open access charge: Deutsche Forschungsgemeinschaft (SFB-449).

Conflict of interest statement. None declared.


    ACKNOWLEDGEMENTS
 
The authors want to thank Björn Grüning for maintaining the webserver.


    REFERENCES
 TOP
 ABSTRACT
 INTRODUCTION
 DATABASE
 EXAMPLE OF USE
 FUNDING
 REFERENCES
 

  1. Amaral LA. A truer measure of our ignorance. Proc. Natl Acad. Sci. USA (2008) 105:6795–6796.[Free Full Text]

  2. Berman HM, Bhat TN, Bourne PE, Feng Z, Gilliland G, Weissig H, Westbrook J. The Protein Data Bank and the challenge of structural genomics. Nat Struct Biol. (2000) 7(Suppl):957–959.[CrossRef][Medline]

  3. Gao Y, Douguet D, Tovchigrechko A, Vakser IA. DOCKGROUND system of databases for protein recognition studies: unbound structures for docking. Proteins (2007) 69:845–851.[CrossRef][Web of Science][Medline]

  4. Tuncbag N, Gursoy A, Guney E, Nussinov R, Keskin O. Architectures and functional coverage of protein-protein interfaces. J. Mol. Biol. (2008) 381:785–802.[CrossRef][Web of Science][Medline]

  5. Gunther S, May P, Hoppe A, Frommel C, Preissner R. Docking without docking: ISEARCH–prediction of interactions using known interfaces. Proteins (2007) 69:839–844.[CrossRef][Web of Science][Medline]

  6. Winter C, Henschel A, Kim WK, Schroeder M. SCOPPI: a structural classification of protein-protein interfaces. Nucleic Acids Res. (2006) 34:D310–D314.[Abstract/Free Full Text]

  7. Jefferson ER, Walsh TP, Roberts TJ, Barton GJ. SNAPPI-DB: a database and API of Structures, iNterfaces and Alignments for Protein-Protein Interactions. Nucleic Acids Res. (2007) 35:D580–D589.[Abstract/Free Full Text]

  8. Davis FP, Sali A. PIBASE: a comprehensive database of structurally defined protein interfaces. Bioinformatics (2005) 21:1901–1907.[Abstract/Free Full Text]

  9. Andreeva A, Howorth D, Chandonia JM, Brenner SE, Hubbard TJ, Chothia C, Murzin AG. Data growth and its impact on the SCOP database: new developments. Nucleic Acids Res. (2008) 36:D419–D425.[Abstract/Free Full Text]

  10. Greene LH, Lewis TE, Addou S, Cuff A, Dallman T, Dibley M, Redfern O, Pearl F, Nambudiry R, Reid A, et al. The CATH domain structure database: new protocols and classification levels give a more comprehensive resource for exploring evolution. Nucleic Acids Res. (2007) 35:D291–D297.[Abstract/Free Full Text]

  11. Finn RD, Tate J, Mistry J, Coggill PC, Sammut SJ, Hotz HR, Ceric G, Forslund K, Eddy SR, Sonnhammer EL, et al. The Pfam protein families database. Nucleic Acids Res. (2008) 36:D281–D288.[Abstract/Free Full Text]

  12. Rother K, Michalsky E, Leser U. How well are protein structures annotated in secondary databases? Proteins (2005) 60:571–576.[Medline]

  13. Guney E, Tuncbag N, Keskin O, Gursoy A. HotSprint: database of computational hot spots in protein interfaces. Nucleic Acids Res. (2008) 36:D662–D666.[Abstract/Free Full Text]

  14. Dodge C, Schneider R, Sander C. The HSSP database of protein structure-sequence alignments and family profiles. Nucleic Acids Res. (1998) 26:313–315.[Abstract/Free Full Text]

  15. Berman HM, Westbrook J, Feng Z, Iype L, Schneider B, Zardecki C. The Nucleic Acid Database. Acta. Crystallogr. D Biol. Crystallogr. (2002) 58:889–898.[CrossRef][Medline]

  16. Li W, Godzik A. Cd-hit: a fast program for clustering and comparing large sets of protein or nucleotide sequences. Bioinformatics (2006) 22:1658–1659.[Abstract/Free Full Text]

  17. Liang H, Landweber LF. Molecular mimicry: quantitative methods to study structural similarity between protein and RNA. Rna (2005) 11:1167–1172.[Abstract/Free Full Text]

  18. Putnam CD, Tainer JA. Protein mimicry of DNA and pathway regulation. DNA Repair (Amst) (2005) 4:1410–1420.[CrossRef][Medline]


Add to CiteULike CiteULike   Add to Connotea Connotea   Add to Del.icio.us Del.icio.us    What's this?



This Article
Right arrow Abstract Freely available
Right arrow Print PDF (3740K) Freely available
Right arrow Screen PDF (513K) Freely available
Right arrowOA All Versions of this Article:
37/suppl_1/D338    most recent
gkn599v1
Right arrow Alert me when this article is cited
Right arrow Alert me if a correction is posted
Services
Right arrow Email this article to a friend
Right arrow Similar articles in this journal
Right arrow Similar articles in PubMed
Right arrow Alert me to new issues of the journal
Right arrow Add to My Personal Archive
Right arrow Download to citation manager
Right arrow Commercial Re-use Guidelines
for Open Access NAR Content
Google Scholar
Right arrow Articles by Günther, S.
Right arrow Articles by Preissner, R.
Right arrow Search for Related Content
PubMed
Right arrow PubMed Citation
Right arrow Articles by Günther, S.
Right arrow Articles by Preissner, R.
Social Bookmarking
 Add to CiteULike   Add to Connotea   Add to Del.icio.us  
What's this?