Skip Navigation


Nucleic Acids Research Advance Access originally published online on October 11, 2007
Nucleic Acids Research 2008 36(Database issue):D218-D221; doi:10.1093/nar/gkm794
This Article
Right arrow Abstract Freely available
Right arrow Print PDF (1925K) Freely available
Right arrow Screen PDF (338K) Freely available
Right arrowOA All Versions of this Article:
36/suppl_1/D218    most recent
gkm794v1
Right arrow Alert me when this article is cited
Right arrow Alert me if a correction is posted
Services
Right arrow Email this article to a friend
Right arrow Similar articles in this journal
Right arrow Similar articles in PubMed
Right arrow Alert me to new issues of the journal
Right arrow Add to My Personal Archive
Right arrow Download to citation manager
Right arrow Commercial Re-use Guidelines
for Open Access NAR Content
Google Scholar
Right arrow Articles by Pugalenthi, G.
Right arrow Articles by Chakrabarti, S.
Right arrow Search for Related Content
PubMed
Right arrow PubMed Citation
Right arrow Articles by Pugalenthi, G.
Right arrow Articles by Chakrabarti, S.
Social Bookmarking
 Add to CiteULike   Add to Connotea   Add to Del.icio.us  
What's this?

Nucleic Acids Research, 2008, Vol. 36, Database issue D218-D221
© 2007 The Author(s)
This is an Open Access article distributed under the terms of the Creative Commons Attribution Non-Commercial License (http://creativecommons.org/licenses/by-nc/2.0/uk/) which permits unrestricted non-commercial use, distribution, and reproduction in any medium, provided the original work is properly cited.

This article appears in the following Nucleic Acids Research issue: Database issue [View the issue table of contents]

Articles

MegaMotifBase: a database of structural motifs in protein families and superfamilies

Ganesan Pugalenthi1, P. N. Suganthan1, R. Sowdhamini2,* and Saikat Chakrabarti3

1School of Electrical and Electronic Engineering, Nanyang Technological University, Singapore 639798, Singapore, 2National Centre for Biological Sciences, UAS-GKVK Campus, Bellary Road, Bangalore 560 065, India and 3National Center for Biotechnology Information, National Library of Medicine, National Institutes of Health, Bethesda, MD 20894, USA

* To whom correspondence should be addressed. Tel: +91-80-23666250; Fax: +91-80-3636662; Email: mini{at}ncbs.res.in Correspondence may also be addressed to Saikat Chakrabarti. Tel: 301-594-6474; Email: chakraba{at}mail.nlm.nih.gov

Received August 14, 2007. Revised September 16, 2007. Accepted September 17, 2007.


    ABSTRACT
 TOP
 ABSTRACT
 INTRODUCTION
 KEY FEATURES OF THE...
 CONTENTS OF THE DATABASE
 APPLICATIONS
 REFERENCES
 
Structural motifs are important for the integrity of a protein fold and can be employed to design and rationalize protein engineering and folding experiments. Such conserved segments represent the conserved core of a family or superfamily and can be crucial for the recognition of potential new members in sequence and structure databases. We present a database, MegaMotifBase, that compiles a set of important structural segments or motifs for protein structures. Motifs are recognized on the basis of both sequence conservation and preservation of important structural features such as amino acid preference, solvent accessibility, secondary structural content, hydrogen-bonding pattern and residue packing. This database provides 3D orientation patterns of the identified motifs in terms of inter-motif distances and torsion angles. Important applications of structural motifs are also provided in several crucial areas such as similar sequence and structure search, multiple sequence alignment and homology modeling. MegaMotifBase can be a useful resource to gain knowledge about structure and functional relationship of proteins. The database can be accessed from the URL http://caps.ncbs.res.in/MegaMotifbase/index.html


    INTRODUCTION
 TOP
 ABSTRACT
 INTRODUCTION
 KEY FEATURES OF THE...
 CONTENTS OF THE DATABASE
 APPLICATIONS
 REFERENCES
 
Previous studies have pointed out that short segments of sequence and/or structural elements are required for retention of fold and function of a protein (1,2). Sequence-based representations, however, are only an approximation to the underlying structural and functional information. Therefore, motifs identified at 3D structure level provide significant and reliable information. We had earlier identified such structurally invariant segments through careful manual intervention for superfamilies where proteins are distantly related but retain similar fold and biological functions (3,4).

Here, we present a database, MegaMotifBase, which provides a set of important structural segments or motifs for protein structures related at family or superfamily level on the basis of conservation of both sequence and structural features. Motifs among structurally aligned proteins are recognized by the conservation of amino acid preference and solvent accessibility and are examined for the conservation of important structural features like secondary structural content, hydrogen-bonding pattern and residue packing (3–5). These motifs may form the common structural core by maintaining a particular spatial pattern, when compared across different proteins belonging to the same family or superfamily. Such motifs can also be employed to design and rationalize protein engineering and folding experiments. MegaMotifBase provides a comprehensive compilation of structural motifs identified through a completely automated method for large number of families (1032) and superfamilies (1194) of proteins, in contrast to earlier efforts (3,4,6), which were limited to poor coverage and extensive manual supervision. This database can be accessed from the URL http://caps.ncbs.res.in/MegaMotifbase/index.html.


    KEY FEATURES OF THE DATABASE
 TOP
 ABSTRACT
 INTRODUCTION
 KEY FEATURES OF THE...
 CONTENTS OF THE DATABASE
 APPLICATIONS
 REFERENCES
 

  • Identification and collection of important conserved structural segments that are crucial for the integrity of the fold and can be projected as the minimum structural requirements for a new member to be considered as part of a pre-existing family or superfamily. It is also possible to use simple sequence conservation to recognize motifs.
  • Interactive 3D views of the motifs on the superposed structures are displayed for better understanding and visualization.
  • Spatial orientations of the motifs, in terms of inter-motif distances and torsion angles, are provided. This enables the users to analyze the structural variations that are felt even at conserved core of the fold owing to poor sequence identity and evolutionary pressures.
  • Options are provided for scanning multiple structural motifs along with their spatial orientation in a given query protein structure and to scan multiple motifs in a query structure against the entire structural motif database. This could be very useful in protein classification and assignment of family or superfamily relationship to newly solved protein structures with unknown function.
  • Options are also provided to search for similar sequences by a multimotif-based database scanning procedure called SCANMOT (7). This scanning option provides an opportunity to identify distantly related sequences for each family or superfamily. The specificity of the search engine is increased by utilizing the inter-motif spacing and pairwise global alignment of the query and hits.
  • The current version of the database provides options to align similar sequence with the query protein structure(s). It allows the user to obtain a control over the alignment by providing sequence–structure motif regions as input to the alignment program to achieve a more structurally relevant and functionally useful alignment of protein sequences. The alignment algorithm employs local conserved regions of the sequences to be fixed, and aligns the rest based on normal progressive alignment. The chances of global misalignment are thereby reduced and the possibility of obtaining overall better alignment is increased (8).
  • The database also allows users to build 3D models of similar protein sequences of unknown structure.
  • The entire database of motifs and the alignments can be downloaded as a flat file for further use and analyses (Figure 1).


Figure 1
View larger version (47K):
[in this window]
[in a new window]
[Download PowerPoint slide]
 
Figure 1. A snapshot image capturing the key features of the MegaMotifBase database for an example superfamily.

 

    CONTENTS OF THE DATABASE
 TOP
 ABSTRACT
 INTRODUCTION
 KEY FEATURES OF THE...
 CONTENTS OF THE DATABASE
 APPLICATIONS
 REFERENCES
 
MegaMotifBase compiles structural motifs at different levels of protein classification strata.

Structural motifs at the family level
We have collected 1032 structural alignments of protein domains that are related at the family levels from the HOMSTRAD (9) database. Motifs among structurally aligned proteins were recognized by the conservation of amino acid preference and solvent accessibility and examined for the conservation of other important structural features like secondary structural content, hydrogen-bonding pattern and residue packing. Sequentially conserved regions were identified from the multiple alignments by examining the nature of amino acid exchanges using a standard 20 x 20 substitution matrix (10). Solvent accessibility was measured using the PSA program from JOY4.0 suite (11). SSTRUC and HBOND programs, that are also part of JOY4.0 suite, were used to identify secondary structural positions and hydrogen bonds, respectively. Residue packing has been measured in terms of Ooi number (12) that provides the number of residues surrounding each C{alpha} atom of residues in a protein. Higher Ooi numbers correspond to better residue packing and suggest that the residue is in a well-packed environment.

A structural feature is considered conserved at an alignment position if it is present in all or all but one member within the alignment. We found this condition was more practical for families with poor structural representation. The structural motifs identified are mapped on the alignment using different color code and often represent the conserved core of the family. Ranking of the motifs is performed considering the extent of conservation of the structural feature. An idea of the 3D orientation pattern of the structural motifs is provided via graphic displays and spatial orientation matrices.

Structural motifs at the superfamily level
Structural motifs for multimember superfamilies
The superfamily is a hierarchical classification that contains proteins of different families having similar structure and function. These proteins might have very low sequence identities but retain the same fold through well-conserved secondary structural parts. Therefore, identification of structural motifs for superfamilies is even more valuable since the evolutionary divergence makes it impossible to derive conserved sequence or structural segments simply by residue conservation. We identified structural motifs for all the multimember superfamilies (628) available in the latest PASS2 and SCOP (version 1.63) databases (13,14) following the same protocol described above to identify motifs for proteins related at the family level.

Structural motifs for single member superfamilies
A majority of the entries in protein structural databank are single member superfamilies for which it is hard to derive structural motifs due to the paucity of structural homologues. Important conserved segments for these 566 superfamilies (PASS2 database, (13)) have been identified and compiled into the MegaMotifBase. Conserved regions, recognized by permitted amino acid exchanges, were mapped onto the structure and content of various structural features (solvent accessibility, secondary structure content, hydrogen bonding and residue packing) were examined. Only the conserved segments with high structural feature content were projected as sequence-structural templates for the particular superfamily member. Interactive 3D displays of the templates in 3D structure [in Chime® and RASMOL (15)] were provided for better understanding and visualization. A static image of the 3D structure is provided using MOLSCRIPT (16).

We also provide the application of sequence–structural templates in three different areas: multimotif-based sequence search, multiple sequence alignment and homology modeling. In each case, the inclusion of the sequence-structural templates can give rise to sensitive and accurate results. This emphasizes the need for inclusion of singletons to provide added value to the recognition of additional members, comparative modeling and in designing experiments.


    APPLICATIONS
 TOP
 ABSTRACT
 INTRODUCTION
 KEY FEATURES OF THE...
 CONTENTS OF THE DATABASE
 APPLICATIONS
 REFERENCES
 
The availability of structural motifs is useful since these conserved patterns form the common core by maintaining a particular spatial orientation pattern. These motifs can also assist in the identification of new potential members of an existing family and/or superfamily. Scanning of multiple structural motifs, along with their spatial orientation in a given query protein structure, could be very useful in protein structural classification. In MegaMotifBase database, we also provide the application of motifs in three other crucial areas: motif-based similar sequence search, multiple sequence alignment and in homology modeling. In each case, the inclusion of the sequence–structural motifs can give rise to sensitive and accurate results.


    ACKNOWLEDGEMENTS
 
G.P. and P.N.S. acknowledge the financial support offered by A*Star (Agency for Science, Technology and Research). R.S. acknowledges Wellcome Trust, UK and National Centre for Biological Sciences (TIFR) for financial and infrastructural support. S.C. acknowledges Intramural Research Program of the National Library of Medicine at NIH/DHHS. Funding to pay the Open Access publication charges for this article was provided by Wellcome Trust, U.K.

Conflict of interest statement. None declared.


    REFERENCES
 TOP
 ABSTRACT
 INTRODUCTION
 KEY FEATURES OF THE...
 CONTENTS OF THE DATABASE
 APPLICATIONS
 REFERENCES
 

  1. Farber GK, Petsko GA. The evolution of alpha/beta barrel enzymes. Trends Biochem. Sci. (1990) 15:228–234.[CrossRef][Web of Science][Medline]

  2. Kannan N, Selvaraj S, Gromiha MM, Vishveshwara S. Clusters in alpha/beta barrel proteins: implications for protein structure, function, and folding: a graph theoretical approach. Proteins (2001) 43:103–112.[CrossRef][Web of Science][Medline]

  3. Chakrabarti S, Venkatramanan K, Sowdhamini R. SMoS: a database of structural motifs of protein superfamilies. Protein Eng. (2003) 16:791–793.[Abstract/Free Full Text]

  4. Chakrabarti S, Sowdhamini R. Regions of minimal structural variation among members of protein domain superfamilies: application to remote homology detection and modeling using distant relationships. FEBS Lett. (2004) 569:31–36.[CrossRef][Web of Science][Medline]

  5. Pugalenthi G, Suganthan PN, Sowdhamini R, Chakrabarti S. SMotif: a server for structural motifs in proteins. Bioinformatics (2007) 23:637–638.[Abstract/Free Full Text]

  6. Chakrabarti S, Manohari G, Pugalenthi G, Sowdhamini R. SSToSS—sequence-structural templates of single-member superfamilies. In Silico Biol. (2006) 6:311–319.[Medline]

  7. Chakrabarti S, Anand AP, Bhardwaj N, Pugalenthi G, Sowdhamini R. SCANMOT: searching for similar sequences using a simultaneous scan of multiple sequence motifs. Nucleic Acids Res. (2005) 33:W274–W276.[Abstract/Free Full Text]

  8. Chakrabarti S, Bhardwaj N, Anand PA, Sowdhamini R. Improvement of alignment accuracy utilizing sequentially conserved motifs. BMC Bioinformatics (2004) 5:167–178.[CrossRef][Medline]

  9. Mizuguchi K, Deane CM, Blundell TL, Overington JP. HOMSTRAD: a database of protein structure alignments for homologous families. Protein Sci. (1998) 7:2469–2471.[Web of Science][Medline]

  10. Johnson MS, Overington JP. A structural basis for sequence comparisons. An evaluation of scoring methodologies. J. Mol. Biol. (1993) 233:716–738.[CrossRef][Web of Science][Medline]

  11. Mizuguchi K, Deane CM, Blundell TL, Johnson MS, Overington JP. JOY: protein sequence-structure representation and analysis. Bioinformatics (1998) 14:617–623.[Abstract/Free Full Text]

  12. Nishikawa K, Ooi TJ. Radial locations of amino acid residues in a globular protein: correlation with the sequence. J. Biochem. (1986) 100:1043–1047.[Abstract/Free Full Text]

  13. Bhaduri A, Pugalenthi G, Sowdhamini R. PASS2: an automated database of protein alignments organised as structural superfamilies. BMC Bioinformatics (2004) 5:35–41.[CrossRef][Medline]

  14. Murzin AG, Brenner SE, Hubbard T, Chothia C. SCOP: a structural classification of proteins database for the investigation of sequences and structures. J. Mol. Biol. (1995) 247:536–540.[CrossRef][Web of Science][Medline]

  15. Sayle A, Milner-White EJ. RASMOL: biomolecular graphics for all. Trends Biochem. Sci. (1995) 20:374–376.[CrossRef][Web of Science][Medline]

  16. Kraulis PJ. MOLSCRIPT: a program to produce both detailed and schematic plots of protein structures. J. Appl. Crystallogr. (1991) 24:946–950.[CrossRef][Web of Science]


Add to CiteULike CiteULike   Add to Connotea Connotea   Add to Del.icio.us Del.icio.us    What's this?


This article has been cited by other articles:


Home page
Nucleic Acids ResHome page
G. Debret, A. Martel, and P. Cuniasse
RASMOT-3D PRO: a 3D motif search webserver
Nucleic Acids Res., July 1, 2009; 37(suppl_2): W459 - W464.
[Abstract] [Full Text] [PDF]


Home page
BioinformaticsHome page
G. Pugalenthi, K. Tang, P. N. Suganthan, and S. Chakrabarti
Identification of structurally conserved residues of proteins in absence of structural homologs using neural network ensemble
Bioinformatics, January 15, 2009; 25(2): 204 - 210.
[Abstract] [Full Text] [PDF]


This Article
Right arrow Abstract Freely available
Right arrow Print PDF (1925K) Freely available
Right arrow Screen PDF (338K) Freely available
Right arrowOA All Versions of this Article:
36/suppl_1/D218    most recent
gkm794v1
Right arrow Alert me when this article is cited
Right arrow Alert me if a correction is posted
Services
Right arrow Email this article to a friend
Right arrow Similar articles in this journal
Right arrow Similar articles in PubMed
Right arrow Alert me to new issues of the journal
Right arrow Add to My Personal Archive
Right arrow Download to citation manager
Right arrow Commercial Re-use Guidelines
for Open Access NAR Content
Google Scholar
Right arrow Articles by Pugalenthi, G.
Right arrow Articles by Chakrabarti, S.
Right arrow Search for Related Content
PubMed
Right arrow PubMed Citation
Right arrow Articles by Pugalenthi, G.
Right arrow Articles by Chakrabarti, S.
Social Bookmarking
 Add to CiteULike   Add to Connotea   Add to Del.icio.us  
What's this?