Nucleic Acids Research Advance Access originally published online on May 30, 2007
Nucleic Acids Research 2007 35(Web Server issue):W384-W392; doi:10.1093/nar/gkm232
Nucleic Acids Research, 2007, Vol. 35, No. suppl_2 W384-W392
© 2007 The Author(s)
This is an Open Access article distributed under the terms of the Creative Commons Attribution Non-Commercial License (http://creativecommons.org/licenses/by-nc/2.0/uk/) which permits unrestricted non-commercial use, distribution, and reproduction in any medium, provided the original work is properly cited.
Structure SNP (StSNP): a web server for mapping and modeling nsSNPs on protein structures with linkage to metabolic pathways
Alper Uzun,
Chesley M. Leslin,
Alexej Abyzov and
Valentin Ilyin*
Department of Biology, Northeastern University, 360 Huntington Ave., Boston, MA 02115, USA
*To whom correspondence should be addressed. Tel: +617 373 7048; Fax: +617 373 3724; Email: ilyin{at}neu.edu
Received January 29, 2007. Revised March 21, 2007. Accepted March 29, 2007.
 |
ABSTRACT
|
|---|
SNPs located within the open reading frame of a gene that result
in an alteration in the amino acid sequence of the encoded protein
[nonsynonymous SNPs (nsSNPs)] might directly or indirectly affect
functionality of the protein, alone or in the interactions in
a multi-protein complex, by increasing/decreasing the activity
of the metabolic pathway. Understanding the functional consequences
of such changes and drawing conclusions about the molecular
basis of diseases, involves integrating information from multiple
heterogeneous sources including sequence, structure data and
pathway relations between proteins. The data from NCBI's SNP
database (dbSNP), gene and protein databases from Entrez, protein
structures from the PDB and pathway information from KEGG have
all been cross referenced into the StSNP web server, in an effort
to provide combined integrated, reports about nsSNPs. StSNP
provides on the fly comparative modeling of nsSNPs
with links to metabolic pathway information, along with real-time
visual comparative analysis of the modeled structures using
the Friend software application. The use of metabolic pathways
in StSNP allows a researcher to examine possible disease-related
pathways associated with a particular nsSNP(s), and link the
diseases with the current available molecular structure data.
The server is publicly available at
http://glinka.bio.neu.edu/StSNP/.
 |
INTRODUCTION
|
|---|
SNPs represent one of the most common forms of genetic variation
in a population (
1,
2). Currently, (December 2006) the public
SNP database (dbSNP) (
3) contains 11.9 million SNP candidates,
of which 5.6 million have been validated. Nonsynonymous SNPs
(nsSNPs), the SNPs located within the open reading frame of
a gene that result in an alteration in the amino acid sequence
of the encoded protein might directly or indirectly affect protein
functionality alone or its interactions in a multi-protein complex,
by increasing/decreasing the activity of the metabolic pathway
(
1,
4). nsSNPs have been linked to a wide variety of diseases;
affecting protein function, altering DNA and transcription factor
binding sites, reducing protein solubility and destabilizing
protein structures (
4). Therefore, understanding the functional
consequences of nonsynonymous changes and predicting potential
causes and the molecular basis of diseases involves integration
of information from multiple heterogeneous sources including
sequence, structure data and pathway relations between proteins.
SNP information is currently collected in several databases, including: dbSNP, the Human Genome Variation Database (HGVbase) (5), the Japanese Single Nucleotide Polymorphism (JSNP) database (6) and the HapMap Project (1). Currently, there is a number of studies and resources which have begun to explore the effects of nsSNPs on the tertiary structure of proteins and their functionality, including: SNPs3D (7), PolyPhen (8), TopoSNP (9), ModSNP (10), LS-SNP (11), SNPeffect (12), MutDB (13,14) and Snap (15), have all been released for public use. We have provided a brief description of the available resources for SNP analysis in Tables 1 and 2. It should be noted, this is not a comparison table but a reference table, as the field is in its infancy and all resources are currently evolving, with each database having strengths.
View this table:
[in this window]
[in a new window]
|
Table 2. Table shows the differences and the similarities of the resources for their search options and background information
|
|
We present StSNP, a web-based server, which provides the ability
to analyze and compare human nsSNP(s) in protein structures,
protein complexes and proteinprotein interfaces, where
nsSNP and structure data on protein complexes are available
in PDB, along with the analysis of the metabolic data within
a given pathway. Usually nsSNP do not inactivate protein functionality
completely, otherwise the mutation would most likely be lethal,
instead nsSNPs change the protein activity at some level, either
directly (occurring close to active site) or indirectly through
interactions with other proteins in the pathway; therefore,
such information has to be considered mutually. As a result,
we have developed StSNP, which utilizes information from different
sources and provides on the fly comparative modeling
of the wild-type and mutated proteins (when an appropriate structural
template is available) along with real-time analysis and visualization
of structures and sequences (
16) to assist researchers in visual
inspection of the possible effects of the nsSNPs in protein
structure. StSNP enables users to analyze data in different
formats by utilizing different search capabilities, by keyword,
NCBI protein accession numbers, PDB IDs (
17) and NCBI nsSNP
ids quickly retrieve targeted information.
 |
DESIGN AND IMPLEMENTATION SOURCES
|
|---|
In general, the internal database structure has been inherited
from the Structural Exon database (SEDB) (
18). StSNP was implemented
using a MySQL database running on a Linux server, with PERL
scripts used for all data retrieval and output (
Figure 1). StSNP
utilizes three major data sources: (
1) Protein sequences from
NCBI, (
2) the reference and nsSNPs locations from NCBI's dbSNP
and (
3) structures and sequences from the PDB. Every protein
sequence has a pre-calculated list of structural modeling templates
found by BLAST (
19), and stored in a database for quick retrieval.
The actual aligning of the protein sequence and the PDB sequence
was implemented with the SmithWaterman algorithm (
20,
21),
using similarity specific scoring matrices, from BLOSUM30 to
BLOSUM90 (
22). The pathway information is utilized from KEGG
(
23,
24), human gene/protein information is gathered from NCBI's
Entrez Gene (
25), and the comparative modeling phase is done
by MODELLER (
26). The modeling part of StSNP is interactive
and allows the user to choose a template from the list, select
particular mutations to be modeled, calculate the model and
subsequently visualize the superimposition of the models and
template in the Friend applet. Additionally, simultaneous analysis
of structurally similar proteins/models for structural correlation
of nsSNP locations can be done in the Friend applet by the TOPOFIT
structure alignment method (
27,
28). StSNP currently contains
33 692 nsSNPs, 14 858 protein sequences, 12 741 genes and 25
617 protein structures.
 |
WEB SERVER FEATURES
|
|---|
StSNP has several types of search options, including search
by a Protein ID, PDB ID or keyword, all of which together integrates
nsSNP related information. For example, the Protein ID search
displays the known nsSNP(s) for the protein, while the PDB ID
search provides a list of similar Protein IDs with nsSNP(s).
Both searches will provide a link to pathway information if
the data is available. The resulting report pages provide the
user with options for model template selection. Only templates
satisfying the following two criteria are shown: the nsSNP(s)
has to be within the alignment of the protein sequence with
template and the sequence identity of the alignment has to be

30%. The modeling step provides the user with the ability to
choose which nsSNPs to map, and after completion, a user can
instantly visualize the models with the Friend applet. StSNP
has several browsing and search capabilities as well, for example,
searching for available structures by protein length and percent
similarity, or by a specifically chosen reference and nonsynonymous
residue within a particular chromosome. The features found in
StSNP have been design with graphics, plots and easily readable
tables with the end user in mind.
 |
EXAMPLES OF USE
|
|---|
Mapping nsSNPs on to protein structures
Results shown in
Figure 2 were generated with the query Glutathione
S Transferase (GST, Protein ID NP_000843
[GenBank]
), a family of multifunctional
enzymes involved in cellular detoxification of xenobiotics and
reactive endogenous compounds of oxidative metabolism (
29).
The output page reports the available reference and nonsynonymous
residues for the protein with the rs number, amino acid properties
for the variations, and the alignment picture of protein sequence
with template including nsSNP locations. In this example, all
nsSNPs are located inside the alignment and thus available for
mapping onto PDB ID 1aqv chain B. The next step is to choose
the nsSNPs for modeling. All the known nsSNPs associated with
GST, I105V, T110S, A114V, D147Y and L176M have been modeled
in this example and are presented in
Figure 3A. A black circle
denotes where isoleucine has changed to valine at position 105.
The role of functional I105V GSTP1 polymorphism in the pathogenesis
of methamphetamine abuse was studied, with researchers noting
that individuals with the G allele (valine) are expected to
have decreased GST detoxification (
29). It is visible from the
mapping of this nsSNP onto the protein structure (
Figure 3A)
the location of I105V is located in direct contact with the
glutathione, and could potentially have a strong effect on the
GST activity or its binding affinity with glutathione. The results
section also provides a user with a link to glutathione metabolism
in order to view other members found in the pathway (
Figure 3B).

View larger version (64K):
[in this window]
[in a new window]
[Download PowerPoint slide]
|
Figure 2. Data generation in StSNP. (A) Main query page, (B) Formatted data for nsSNPs along with graphical alignment representation, (C) nsSNP(s) selection for modeling, (D) Output page, and (E) Visualization in the Friend applet.
|
|

View larger version (40K):
[in this window]
[in a new window]
[Download PowerPoint slide]
|
Figure 3. (A) Glutathione S Transferase is shown with nsSNP locations displayed in ball and stick representation, with I105V marked with a black circle. The reference residues are shown in blue, nonsynonymous residues in red and the substrate glutathione is displayed in space fill representation (yellow). The query for the example was Protein ID NP_000843 and template PDB ID 1aqv chain B. (B) The Results section also provides a user with a link to glutathione metabolism in order to view other members found in the pathway.
|
|
Another example, Aldehyde Dehydrogenase-2 (ALDH2) (PROTEIN ID
NUMBER= NP_000681
[GenBank]
) is illustrated in
Figure 4. ALDH2 is involved
in acetaldehyde oxidation at physiological concentrations and
found when a person consumes alcohol. Worldwide, the Lys504
allele has the highest prevalence (3050%) in Asian populations
(
30). In this example, glutamate is replaced by lysine at position
504 (Glu504Lys), where it has been demonstrated to essentially
eliminate ALDH2 activity (
31). From these examples, one can
see how a quick search in StSNP in conjunction with the structural
mapping of the nsSNP locations provides structural support to
the medical studies mentioned here and may facilitate in the
designing of future experiments.

View larger version (79K):
[in this window]
[in a new window]
[Download PowerPoint slide]
|
Figure 4. Aldehyde dehydrogenase-2 is shown with nsSNP locations displayed in ball and stick representation, with E504K marked with a black circle. The reference residues are shown in blue, nonsynonymous residues in red and the substrate NAD is displayed in space fill representation (green). The query for the example was Protein ID NP_000681 and template PDB ID 1ag8 chain A.
|
|
 |
CONCLUSIONS
|
|---|
StSNP provides practical, user friendly access to the wealth
of information related to nsSNPs by seamlessly connecting various
databases into one pipeline. Key functional and structural information
along with known pathways the proteins are involved in, have
all been linked together to provide users some advantages when
compared to other current resources: (a) the sequence, structure
and pathway information have all been cross-referenced, which
enables a user to quickly query and visualize the inter-related
nsSNP data; (b) a graphical display of the nsSNPs provides a
user with the location of the nsSNP(s) in terms of primary sequence,
and whether such nsSNP(s) can be modeled; (c) the modeling options
provide the user with a choice of which nsSNP to map and visualize
which nsSNPs could potentially have deleterious effects on a
protein's function; (d) the modeled protein structures are automatically
loaded in Friend, where they can be easily viewed, compared
and analyzed; (e) finally, StSNP will be updated on a regular
basis following the updates on the major sources, dbSNP, PDB,
KEGG and others.
Thus, the first steps have been taken in the development of a resource for mapping nsSNPs onto protein structures, providing structural insight into the effects of nsSNPs on proteins such as, stability, functionality, proteinprotein interactions and other structurally related issues. As a web server in a rapidly evolving area of research, StSNP is designed to evolve with other related resources; future directions include; a more detailed analysis of the SNP, predictions of the functional/biological implications of the SNP(s) and the use of image map technology from the KEGG API for more interactive data retrieval. StSNP creates the basis for further studies involving the metabolic pathways and the disease(s) associated with a particular SNP.
 |
ACKNOWLEDGEMENT
|
|---|
The Open Access publication charges for this manuscript were
waived by Oxford University Press. Funding to pay the Open Access
charges for this paper were waived by Oxford University Press.
Conflict of interest statement. None declared.
 |
Footnotes
|
|---|
The authors wish it to be known that, in their opinion, the
first two authors should be regarded as joint First Authors
 |
REFERENCES
|
|---|
- Consortium. The International HapMap Project. Nature (2003) 426:789796.[CrossRef][Medline]
- Sachidanandam R, Weissman D, Schmidt SC, Kakol JM, Stein LD, Marth G, Sherry S, Mullikin JC, Mortimore BJ, et al. A map of human genome sequence variation containing 1.42 million single nucleotide polymorphisms. Nature (2001) 409:928933.[CrossRef][Medline]
- Sherry ST, Ward M, Sirotkin K. dbSNP-database for single nucleotide polymorphisms and other classes of minor genetic variation. Genome Res (1999) 9:677679.[Free Full Text]
- Chasman D, Adams RM. Predicting the functional consequences of non-synonymous single nucleotide polymorphisms: structure-based assessment of amino acid variation. J. Mol. Biol (2001) 307:683706.[CrossRef][ISI][Medline]
- Fredman D, Siegfried M, Yuan YP, Bork P, Lehvaslaiho H, Brookes AJ. HGVbase: a human sequence variation database emphasizing data quality and a broad spectrum of data sources. Nucleic Acids Res (2002) 30:387391.[Abstract/Free Full Text]
- Hirakawa M, Tanaka T, Hashimoto Y, Kuroda M, Takagi T, Nakamura Y. JSNP: a database of common gene variations in the Japanese population. Nucleic Acids Res (2002) 30:158162.[Abstract/Free Full Text]
- Wang Z, Moult J. SNPs, protein structure, and disease. Hum. Mutat (2001) 17:263270.[CrossRef][ISI][Medline]
- Sunyaev S, Ramensky V, Koch I, Lathe W III, Kondrashov AS, Bork P. Prediction of deleterious human alleles. Hum. Mol. Genet (2001) 10:591597.[Abstract/Free Full Text]
- Stitziel NO, Binkowski TA, Tseng YY, Kasif S, Liang J. topoSNP: a topographic database of non-synonymous single nucleotide polymorphisms with and without known disease association. Nucleic Acids Res (2004) 32:D520D522.[Abstract/Free Full Text]
- Yip YL, Scheib H, Diemand AV, Gattiker A, Famiglietti LM, Gasteiger E, Bairoch A. The Swiss-Prot variant page and the ModSNP database: a resource for sequence and structure information on human protein variants. Hum. Mutat (2004) 23:464470.[CrossRef][ISI][Medline]
- Karchin R, Diekhans M, Kelly L, Thomas DJ, Pieper U, Eswar N, Haussler D, Sali A. LS-SNP: large-scale annotation of coding non-synonymous SNPs based on multiple information sources. Bioinformatics (2005) 12:28142820.
- Reumers J, Schymkowitz J, Ferkinghoff-Borg J, Stricher F, Serrano L, Rousseau F. SNPeffect: a database mapping molecular phenotypic effects of human non-synonymous coding SNPs. Nucleic Acids Res (2005) 33:D527D532.[Abstract/Free Full Text]
- Dantzer J, Moad C, Heiland R, Mooney S. MutDB services: interactive structural analysis of mutation data. Nucleic Acids Res (2005) 33:W311W314.[Abstract/Free Full Text]
- Han A, Kang HJ, Cho Y, Lee S, Kim YJ, Gong S. SNP@Domain: a web resource of single nucleotide polymorphisms (SNPs) within protein domain structures and sequences. Nucleic Acids Res (2006) 34:W642W644.[Abstract/Free Full Text]
- Li S, Ma L, Li H, Vang S, Hu Y, Bolund L, Wang J. Snap: an integrated SNP annotation platform. Nucleic Acids Res (2007) 35:D707D710.[Abstract/Free Full Text]
- Abyzov A, Errami M, Leslin CM, Ilyin VA. Friend, an integrated analytical front-end application for bioinformatics. Bioinformatics (2005) 21:36773678.[Abstract/Free Full Text]
- Berman HM, Battistuz T, Bhat TN, Bluhm WF, Bourne PE, Burkhardt K, Feng Z, Gilliland GL, Iype L, et al. The Protein Data Bank. Acta Crystallogr. D. Biol. Crystallogr (2002) 58:899907.[CrossRef][Medline]
- Leslin CM, Abyzov A, Ilyin VA. Structural exon database, SEDB, mapping exon boundaries on multiple protein structures. Bioinformatics (2004) 20:18011803.[Abstract/Free Full Text]
- Altschul SF, Gish W, Miller W, Myers EW, Lipman DJ. Basic local alignment search tool. J. Mol. Biol (1990) 215:403410.[CrossRef][ISI][Medline]
- Smith TF, Waterman MS. Comparison of biosequences. Adv. Appl. Math (2005) 2:482489.[CrossRef]
- Smith TF, Waterman MS. Identification of common molecular subsequences. J. Mol. Biol (1981) 147:195197.[CrossRef][ISI][Medline]
- Henikoff S, Henikoff JG. Performance evaluation of amino acid substitution matrices. Proteins (1993) 17:4961.[CrossRef][ISI][Medline]
- Kanehisa M. A database for post-genome analysis. Trends Genet (1997) 13:375376.[CrossRef][ISI][Medline]
- Kanehisa M. The KEGG database. Novartis. Found. Symp (2002) 247:91101.[ISI][Medline]
- Pruitt KD, Maglott DR. RefSeq and LocusLink: NCBI gene-centered resources. Nucleic Acids Res (2001) 29:137140.[Abstract/Free Full Text]
- Marti-Renom MA, Stuart AC, Fiser A, Sanchez R, Melo F, Sali A. Comparative protein structure modeling of genes and genomes. Annu. Rev. Biophys. Biomol. Struct (2000) 29:291325.[CrossRef][ISI][Medline]
- Ilyin VA, Abyzov A, Leslin CM. Structural alignment of proteins by a novel TOPOFIT method, as a superimposition of common volumes at a topomax point. Protein Sci (2004) 13:18651874.[Abstract/Free Full Text]
- Leslin CM, Abyzov A, Ilyin VA. TOPOFIT-DB, a database of protein structural alignments based on the TOPOFIT method. Nucleic Acids Res (2007) 35:D317D321.[Abstract/Free Full Text]
- Hashimoto T, Hashimoto K, Matsuzawa D, Shimizu E, Sekine Y, Inada T, Ozaki N, Iwata N, Harano M, Komiyama T, et al. A functional glutathione S-transferase P1 gene polymorphism is associated with methamphetamine-induced psychosis in Japanese population. Am. J. Med. Genet. B Neuropsychiatr. Genet (2005) 135:59.[Medline]
- Goedde HW, Agarwal DP, Harada S, Meier-Tackmann D, Ruofu D, Bienzle U, Kroeger A, Hussein L. Population genetic studies on aldehyde dehydrogenase isozyme deficiency and alcohol sensitivity. Am. J. Hum. Genet (1983) 35:769772.[ISI][Medline]
- Li Y, Zhang D, Jin W, Shao C, Yan P, Xu C, Sheng H, Liu Y, Yu J, et al. Mitochondrial aldehyde dehydrogenase-2 (ALDH2) Glu504Lys polymorphism contributes to the variation in efficacy of sublingual nitroglycerin. J. Clin. Invest (2006) 116:506511.[Abstract/Free Full Text]

CiteULike
Connotea
Del.icio.us What's this?