Skip Navigation



Nucleic Acids Research Advance Access published online on May 30, 2007

Nucleic Acids Research, doi:10.1093/nar/gkm232
This Article
Right arrow Abstract Freely available
Right arrow Print PDF (6212K) Freely available
Right arrow Screen PDF (1006K) Freely available
Right arrowOA All Versions of this Article:
35/suppl_2/W384    most recent
gkm232v1
Right arrow Alert me when this article is cited
Right arrow Alert me if a correction is posted
Services
Right arrow Email this article to a friend
Right arrow Similar articles in this journal
Right arrow Similar articles in PubMed
Right arrow Alert me to new issues of the journal
Right arrow Add to My Personal Archive
Right arrow Download to citation manager
Right arrow Commercial Re-use Guidelines
for Open Access NAR Content
Google Scholar
Right arrow Articles by Uzun, A.
Right arrow Articles by Ilyin, V.
Right arrow Search for Related Content
PubMed
Right arrow PubMed Citation
Right arrow Articles by Uzun, A.
Right arrow Articles by Ilyin, V.
Social Bookmarking
 Add to CiteULike   Add to Connotea   Add to Del.icio.us  
What's this?

© 2007 The Author(s)
This is an Open Access article distributed under the terms of the Creative Commons Attribution Non-Commercial License (http://creativecommons.org/licenses/by-nc/2.0/uk/) which permits unrestricted non-commercial use, distribution, and reproduction in any medium, provided the original work is properly cited.


Web Server Paper

Structure SNP (StSNP): a web server for mapping and modeling nsSNPs on protein structures with linkage to metabolic pathways

Alper Uzun, Chesley M. Leslin, Alexej Abyzov and Valentin Ilyin*

Department of Biology, Northeastern University, 360 Huntington Ave., Boston, MA 02115, USA

*To whom correspondence should be addressed. Tel: +617 373 7048; Fax: +617 373 3724; Email: ilyin{at}neu.edu

Received January 29, 2007. Revised March 21, 2007. Accepted March 29, 2007.


    ABSTRACT
 TOP
 ABSTRACT
 INTRODUCTION
 DESIGN AND IMPLEMENTATION...
 WEB SERVER FEATURES
 EXAMPLES OF USE
 CONCLUSIONS
 REFERENCES
 
SNPs located within the open reading frame of a gene that result in an alteration in the amino acid sequence of the encoded protein [nonsynonymous SNPs (nsSNPs)] might directly or indirectly affect functionality of the protein, alone or in the interactions in a multi-protein complex, by increasing/decreasing the activity of the metabolic pathway. Understanding the functional consequences of such changes and drawing conclusions about the molecular basis of diseases, involves integrating information from multiple heterogeneous sources including sequence, structure data and pathway relations between proteins. The data from NCBI's SNP database (dbSNP), gene and protein databases from Entrez, protein structures from the PDB and pathway information from KEGG have all been cross referenced into the StSNP web server, in an effort to provide combined integrated, reports about nsSNPs. StSNP provides ‘on the fly’ comparative modeling of nsSNPs with links to metabolic pathway information, along with real-time visual comparative analysis of the modeled structures using the Friend software application. The use of metabolic pathways in StSNP allows a researcher to examine possible disease-related pathways associated with a particular nsSNP(s), and link the diseases with the current available molecular structure data. The server is publicly available at http://glinka.bio.neu.edu/StSNP/.


    INTRODUCTION
 TOP
 ABSTRACT
 INTRODUCTION
 DESIGN AND IMPLEMENTATION...
 WEB SERVER FEATURES
 EXAMPLES OF USE
 CONCLUSIONS
 REFERENCES
 
SNPs represent one of the most common forms of genetic variation in a population (1,2). Currently, (December 2006) the public SNP database (dbSNP) (3) contains 11.9 million SNP candidates, of which 5.6 million have been validated. Nonsynonymous SNPs (nsSNPs), the SNPs located within the open reading frame of a gene that result in an alteration in the amino acid sequence of the encoded protein might directly or indirectly affect protein functionality alone or its interactions in a multi-protein complex, by increasing/decreasing the activity of the metabolic pathway (1,4). nsSNPs have been linked to a wide variety of diseases; affecting protein function, altering DNA and transcription factor binding sites, reducing protein solubility and destabilizing protein structures (4). Therefore, understanding the functional consequences of nonsynonymous changes and predicting potential causes and the molecular basis of diseases involves integration of information from multiple heterogeneous sources including sequence, structure data and pathway relations between proteins.

SNP information is currently collected in several databases, including: dbSNP, the Human Genome Variation Database (HGVbase) (5), the Japanese Single Nucleotide Polymorphism (JSNP) database (6) and the HapMap Project (1). Currently, there is a number of studies and resources which have begun to explore the effects of nsSNPs on the tertiary structure of proteins and their functionality, including: SNPs3D (7), PolyPhen (8), TopoSNP (9), ModSNP (10), LS-SNP (11), SNPeffect (12), MutDB (13,14) and Snap (15), have all been released for public use. We have provided a brief description of the available resources for SNP analysis in Tables 1 and 2. It should be noted, this is not a comparison table but a reference table, as the field is in its infancy and all resources are currently evolving, with each database having strengths.


View this table:
[in this window]
[in a new window]

 
Table 1. Representing query and modeling options for resources

 

View this table:
[in this window]
[in a new window]

 
Table 2. Table shows the differences and the similarities of the resources for their search options and background information

 
We present StSNP, a web-based server, which provides the ability to analyze and compare human nsSNP(s) in protein structures, protein complexes and protein–protein interfaces, where nsSNP and structure data on protein complexes are available in PDB, along with the analysis of the metabolic data within a given pathway. Usually nsSNP do not inactivate protein functionality completely, otherwise the mutation would most likely be lethal, instead nsSNPs change the protein activity at some level, either directly (occurring close to active site) or indirectly through interactions with other proteins in the pathway; therefore, such information has to be considered mutually. As a result, we have developed StSNP, which utilizes information from different sources and provides ‘on the fly’ comparative modeling of the wild-type and mutated proteins (when an appropriate structural template is available) along with real-time analysis and visualization of structures and sequences (16) to assist researchers in visual inspection of the possible effects of the nsSNPs in protein structure. StSNP enables users to analyze data in different formats by utilizing different search capabilities, by keyword, NCBI protein accession numbers, PDB IDs (17) and NCBI nsSNP ids quickly retrieve targeted information.


    DESIGN AND IMPLEMENTATION SOURCES
 TOP
 ABSTRACT
 INTRODUCTION
 DESIGN AND IMPLEMENTATION...
 WEB SERVER FEATURES
 EXAMPLES OF USE
 CONCLUSIONS
 REFERENCES
 
In general, the internal database structure has been inherited from the Structural Exon database (SEDB) (18). StSNP was implemented using a MySQL database running on a Linux server, with PERL scripts used for all data retrieval and output (Figure 1). StSNP utilizes three major data sources: (1) Protein sequences from NCBI, (2) the reference and nsSNPs locations from NCBI's dbSNP and (3) structures and sequences from the PDB. Every protein sequence has a pre-calculated list of structural modeling templates found by BLAST (19), and stored in a database for quick retrieval. The actual aligning of the protein sequence and the PDB sequence was implemented with the Smith–Waterman algorithm (20,21), using similarity specific scoring matrices, from BLOSUM30 to BLOSUM90 (22). The pathway information is utilized from KEGG (23,24), human gene/protein information is gathered from NCBI's Entrez Gene (25), and the comparative modeling phase is done by MODELLER (26). The modeling part of StSNP is interactive and allows the user to choose a template from the list, select particular mutations to be modeled, calculate the model and subsequently visualize the superimposition of the models and template in the Friend applet. Additionally, simultaneous analysis of structurally similar proteins/models for structural correlation of nsSNP locations can be done in the Friend applet by the TOPOFIT structure alignment method (27,28). StSNP currently contains 33 692 nsSNPs, 14 858 protein sequences, 12 741 genes and 25 617 protein structures.


Figure 1
View larger version (32K):
[in this window]
[in a new window]
[Download PowerPoint slide]
 
Figure 1. StSNP is an interactive web server, which utilizes several heterogeneous data sources.

 

    WEB SERVER FEATURES
 TOP
 ABSTRACT
 INTRODUCTION
 DESIGN AND IMPLEMENTATION...
 WEB SERVER FEATURES
 EXAMPLES OF USE
 CONCLUSIONS
 REFERENCES
 
StSNP has several types of search options, including search by a Protein ID, PDB ID or keyword, all of which together integrates nsSNP related information. For example, the Protein ID search displays the known nsSNP(s) for the protein, while the PDB ID search provides a list of similar Protein IDs with nsSNP(s). Both searches will provide a link to pathway information if the data is available. The resulting report pages provide the user with options for model template selection. Only templates satisfying the following two criteria are shown: the nsSNP(s) has to be within the alignment of the protein sequence with template and the sequence identity of the alignment has to be ≥30%. The modeling step provides the user with the ability to choose which nsSNPs to map, and after completion, a user can instantly visualize the models with the Friend applet. StSNP has several browsing and search capabilities as well, for example, searching for available structures by protein length and percent similarity, or by a specifically chosen reference and nonsynonymous residue within a particular chromosome. The features found in StSNP have been design with graphics, plots and easily readable tables with the end user in mind.


    EXAMPLES OF USE
 TOP
 ABSTRACT
 INTRODUCTION
 DESIGN AND IMPLEMENTATION...
 WEB SERVER FEATURES
 EXAMPLES OF USE
 CONCLUSIONS
 REFERENCES
 
Mapping nsSNPs on to protein structures
Results shown in Figure 2 were generated with the query Glutathione S Transferase (GST, Protein ID NP_000843 [GenBank] ), a family of multifunctional enzymes involved in cellular detoxification of xenobiotics and reactive endogenous compounds of oxidative metabolism (29). The output page reports the available reference and nonsynonymous residues for the protein with the rs number, amino acid properties for the variations, and the alignment picture of protein sequence with template including nsSNP locations. In this example, all nsSNPs are located inside the alignment and thus available for mapping onto PDB ID 1aqv chain B. The next step is to choose the nsSNPs for modeling. All the known nsSNPs associated with GST, I105V, T110S, A114V, D147Y and L176M have been modeled in this example and are presented in Figure 3A. A black circle denotes where isoleucine has changed to valine at position 105. The role of functional I105V GSTP1 polymorphism in the pathogenesis of methamphetamine abuse was studied, with researchers noting that individuals with the G allele (valine) are expected to have decreased GST detoxification (29). It is visible from the mapping of this nsSNP onto the protein structure (Figure 3A) the location of I105V is located in direct contact with the glutathione, and could potentially have a strong effect on the GST activity or its binding affinity with glutathione. The results section also provides a user with a link to glutathione metabolism in order to view other members found in the pathway (Figure 3B).


Figure 2
View larger version (64K):
[in this window]
[in a new window]
[Download PowerPoint slide]
 
Figure 2. Data generation in StSNP. (A) Main query page, (B) Formatted data for nsSNPs along with graphical alignment representation, (C) nsSNP(s) selection for modeling, (D) Output page, and (E) Visualization in the Friend applet.

 

Figure 3
View larger version (40K):
[in this window]
[in a new window]
[Download PowerPoint slide]
 
Figure 3. (A) Glutathione S Transferase is shown with nsSNP locations displayed in ball and stick representation, with I105V marked with a black circle. The reference residues are shown in blue, nonsynonymous residues in red and the substrate glutathione is displayed in space fill representation (yellow). The query for the example was Protein ID NP_000843 and template PDB ID 1aqv chain B. (B) The Results section also provides a user with a link to glutathione metabolism in order to view other members found in the pathway.

 
Another example, Aldehyde Dehydrogenase-2 (ALDH2) (PROTEIN ID NUMBER= NP_000681 [GenBank] ) is illustrated in Figure 4. ALDH2 is involved in acetaldehyde oxidation at physiological concentrations and found when a person consumes alcohol. Worldwide, the Lys504 allele has the highest prevalence (30–50%) in Asian populations (30). In this example, glutamate is replaced by lysine at position 504 (Glu504Lys), where it has been demonstrated to essentially eliminate ALDH2 activity (31). From these examples, one can see how a quick search in StSNP in conjunction with the structural mapping of the nsSNP locations provides structural support to the medical studies mentioned here and may facilitate in the designing of future experiments.


Figure 4
View larger version (79K):
[in this window]
[in a new window]
[Download PowerPoint slide]
 
Figure 4. Aldehyde dehydrogenase-2 is shown with nsSNP locations displayed in ball and stick representation, with E504K marked with a black circle. The reference residues are shown in blue, nonsynonymous residues in red and the substrate NAD is displayed in space fill representation (green). The query for the example was Protein ID NP_000681 and template PDB ID 1ag8 chain A.

 

    CONCLUSIONS
 TOP
 ABSTRACT
 INTRODUCTION
 DESIGN AND IMPLEMENTATION...
 WEB SERVER FEATURES
 EXAMPLES OF USE
 CONCLUSIONS
 REFERENCES
 
StSNP provides practical, user friendly access to the wealth of information related to nsSNPs by seamlessly connecting various databases into one pipeline. Key functional and structural information along with known pathways the proteins are involved in, have all been linked together to provide users some advantages when compared to other current resources: (a) the sequence, structure and pathway information have all been cross-referenced, which enables a user to quickly query and visualize the inter-related nsSNP data; (b) a graphical display of the nsSNPs provides a user with the location of the nsSNP(s) in terms of primary sequence, and whether such nsSNP(s) can be modeled; (c) the modeling options provide the user with a choice of which nsSNP to map and visualize which nsSNPs could potentially have deleterious effects on a protein's function; (d) the modeled protein structures are automatically loaded in Friend, where they can be easily viewed, compared and analyzed; (e) finally, StSNP will be updated on a regular basis following the updates on the major sources, dbSNP, PDB, KEGG and others.

Thus, the first steps have been taken in the development of a resource for mapping nsSNPs onto protein structures, providing structural insight into the effects of nsSNPs on proteins such as, stability, functionality, protein–protein interactions and other structurally related issues. As a web server in a rapidly evolving area of research, StSNP is designed to evolve with other related resources; future directions include; a more detailed analysis of the SNP, predictions of the functional/biological implications of the SNP(s) and the use of image map technology from the KEGG API for more interactive data retrieval. StSNP creates the basis for further studies involving the metabolic pathways and the disease(s) associated with a particular SNP.


    ACKNOWLEDGEMENT
 
The Open Access publication charges for this manuscript were waived by Oxford University Press. Funding to pay the Open Access charges for this paper were waived by Oxford University Press.

Conflict of interest statement. None declared.


    Footnotes
 
The authors wish it to be known that, in their opinion, the first two authors should be regarded as joint First Authors.


    REFERENCES
 TOP
 ABSTRACT
 INTRODUCTION
 DESIGN AND IMPLEMENTATION...
 WEB SERVER FEATURES
 EXAMPLES OF USE
 CONCLUSIONS
 REFERENCES
 

  1. Consortium. The International HapMap Project. Nature (2003) 426:789–796.[CrossRef][Medline]

  2. Sachidanandam R, Weissman D, Schmidt SC, Kakol JM, Stein LD, Marth G, Sherry S, Mullikin JC, Mortimore BJ, et al. A map of human genome sequence variation containing 1.42 million single nucleotide polymorphisms. Nature (2001) 409:928–933.[CrossRef][Medline]

  3. Sherry ST, Ward M, Sirotkin K. dbSNP-database for single nucleotide polymorphisms and other classes of minor genetic variation. Genome Res. (1999) 9:677–679.[Free Full Text]

  4. Chasman D, Adams RM. Predicting the functional consequences of non-synonymous single nucleotide polymorphisms: structure-based assessment of amino acid variation. J. Mol. Biol. (2001) 307:683–706.[CrossRef][ISI][Medline]

  5. Fredman D, Siegfried M, Yuan YP, Bork P, Lehvaslaiho H, Brookes AJ. HGVbase: a human sequence variation database emphasizing data quality and a broad spectrum of data sources. Nucleic Acids Res. (2002) 30:387–391.[Abstract/Free Full Text]

  6. Hirakawa M, Tanaka T, Hashimoto Y, Kuroda M, Takagi T, Nakamura Y. JSNP: a database of common gene variations in the Japanese population. Nucleic Acids Res. (2002) 30:158–162.[Abstract/Free Full Text]

  7. Wang Z, Moult J. SNPs, protein structure, and disease. Hum. Mutat. (2001) 17:263–270.[CrossRef][ISI][Medline]

  8. Sunyaev S, Ramensky V, Koch I, Lathe W III, Kondrashov AS, Bork P. Prediction of deleterious human alleles. Hum. Mol. Genet. (2001) 10:591–597.[Abstract/Free Full Text]

  9. Stitziel NO, Binkowski TA, Tseng YY, Kasif S, Liang J. topoSNP: a topographic database of non-synonymous single nucleotide polymorphisms with and without known disease association. Nucleic Acids Res. (2004) 32:D520–D522.[Abstract/Free Full Text]

  10. Yip YL, Scheib H, Diemand AV, Gattiker A, Famiglietti LM, Gasteiger E, Bairoch A. The Swiss-Prot variant page and the ModSNP database: a resource for sequence and structure information on human protein variants. Hum. Mutat. (2004) 23:464–470.[CrossRef][ISI][Medline]

  11. Karchin R, Diekhans M, Kelly L, Thomas DJ, Pieper U, Eswar N, Haussler D, Sali A. LS-SNP: large-scale annotation of coding non-synonymous SNPs based on multiple information sources. Bioinformatics (2005) 12:2814–2820.

  12. Reumers J, Schymkowitz J, Ferkinghoff-Borg J, Stricher F, Serrano L, Rousseau F. SNPeffect: a database mapping molecular phenotypic effects of human non-synonymous coding SNPs. Nucleic Acids Res. (2005) 33:D527–D532.[Abstract/Free Full Text]

  13. Dantzer J, Moad C, Heiland R, Mooney S. MutDB services: interactive structural analysis of mutation data. Nucleic Acids Res. (2005) 33:W311–W314.[Abstract/Free Full Text]

  14. Han A, Kang HJ, Cho Y, Lee S, Kim YJ, Gong S. SNP@Domain: a web resource of single nucleotide polymorphisms (SNPs) within protein domain structures and sequences. Nucleic Acids Res. (2006) 34:W642–W644.[Abstract/Free Full Text]

  15. Li S, Ma L, Li H, Vang S, Hu Y, Bolund L, Wang J. Snap: an integrated SNP annotation platform. Nucleic Acids Res. (2007) 35:D707–D710.[Abstract/Free Full Text]

  16. Abyzov A, Errami M, Leslin CM, Ilyin VA. Friend, an integrated analytical front-end application for bioinformatics. Bioinformatics (2005) 21:3677–3678.[Abstract/Free Full Text]

  17. Berman HM, Battistuz T, Bhat TN, Bluhm WF, Bourne PE, Burkhardt K, Feng Z, Gilliland GL, Iype L, et al. The Protein Data Bank. Acta Crystallogr. D. Biol. Crystallogr. (2002) 58:899–907.[CrossRef][Medline]

  18. Leslin CM, Abyzov A, Ilyin VA. Structural exon database, SEDB, mapping exon boundaries on multiple protein structures. Bioinformatics (2004) 20:1801–1803.[Abstract/Free Full Text]

  19. Altschul SF, Gish W, Miller W, Myers EW, Lipman DJ. Basic local alignment search tool. J. Mol. Biol. (1990) 215:403–410.[CrossRef][ISI][Medline]

  20. Smith TF, Waterman MS. Comparison of biosequences. Adv. Appl. Math. (2005) 2:482–489.[CrossRef]

  21. Smith TF, Waterman MS. Identification of common molecular subsequences. J. Mol. Biol. (1981) 147:195–197.[CrossRef][ISI][Medline]

  22. Henikoff S, Henikoff JG. Performance evaluation of amino acid substitution matrices. Proteins (1993) 17:49–61.[CrossRef][ISI][Medline]

  23. Kanehisa M. A database for post-genome analysis. Trends Genet. (1997) 13:375–376.[CrossRef][ISI][Medline]

  24. Kanehisa M. The KEGG database. Novartis. Found. Symp. (2002) 247:91–101.[ISI][Medline]

  25. Pruitt KD, Maglott DR. RefSeq and LocusLink: NCBI gene-centered resources. Nucleic Acids Res. (2001) 29:137–140.[Abstract/Free Full Text]

  26. Marti-Renom MA, Stuart AC, Fiser A, Sanchez R, Melo F, Sali A. Comparative protein structure modeling of genes and genomes. Annu. Rev. Biophys. Biomol. Struct. (2000) 29:291–325.[CrossRef][ISI][Medline]

  27. Ilyin VA, Abyzov A, Leslin CM. Structural alignment of proteins by a novel TOPOFIT method, as a superimposition of common volumes at a topomax point. Protein Sci. (2004) 13:1865–1874.[Abstract/Free Full Text]

  28. Leslin CM, Abyzov A, Ilyin VA. TOPOFIT-DB, a database of protein structural alignments based on the TOPOFIT method. Nucleic Acids Res. (2007) 35:D317–D321.[Abstract/Free Full Text]

  29. Hashimoto T, Hashimoto K, Matsuzawa D, Shimizu E, Sekine Y, Inada T, Ozaki N, Iwata N, Harano M, Komiyama T, et al. A functional glutathione S-transferase P1 gene polymorphism is associated with methamphetamine-induced psychosis in Japanese population. Am. J. Med. Genet. B Neuropsychiatr. Genet. (2005) 135:5–9.[Medline]

  30. Goedde HW, Agarwal DP, Harada S, Meier-Tackmann D, Ruofu D, Bienzle U, Kroeger A, Hussein L. Population genetic studies on aldehyde dehydrogenase isozyme deficiency and alcohol sensitivity. Am. J. Hum. Genet. (1983) 35:769–772.[ISI][Medline]

  31. Li Y, Zhang D, Jin W, Shao C, Yan P, Xu C, Sheng H, Liu Y, Yu J, et al. Mitochondrial aldehyde dehydrogenase-2 (ALDH2) Glu504Lys polymorphism contributes to the variation in efficacy of sublingual nitroglycerin. J. Clin. Invest (2006) 116:506–511.[CrossRef][ISI][Medline]


Add to CiteULike CiteULike   Add to Connotea Connotea   Add to Del.icio.us Del.icio.us    What's this?



This Article
Right arrow Abstract Freely available
Right arrow Print PDF (6212K) Freely available
Right arrow Screen PDF (1006K) Freely available
Right arrowOA All Versions of this Article:
35/suppl_2/W384    most recent
gkm232v1
Right arrow Alert me when this article is cited
Right arrow Alert me if a correction is posted
Services
Right arrow Email this article to a friend
Right arrow Similar articles in this journal
Right arrow Similar articles in PubMed
Right arrow Alert me to new issues of the journal
Right arrow Add to My Personal Archive
Right arrow Download to citation manager
Right arrow Commercial Re-use Guidelines
for Open Access NAR Content
Google Scholar
Right arrow Articles by Uzun, A.
Right arrow Articles by Ilyin, V.
Right arrow Search for Related Content
PubMed
Right arrow PubMed Citation
Right arrow Articles by Uzun, A.
Right arrow Articles by Ilyin, V.
Social Bookmarking
 Add to CiteULike   Add to Connotea   Add to Del.icio.us  
What's this?