Skip Navigation

This Article
Right arrow Abstract Freely available
Right arrow Print PDF (137K) Freely available
Right arrow Alert me when this article is cited
Right arrow Alert me if a correction is posted
Services
Right arrow Email this article to a friend
Right arrow Similar articles in this journal
Right arrow Similar articles in ISI Web of Science
Right arrow Similar articles in PubMed
Right arrow Alert me to new issues of the journal
Right arrow Add to My Personal Archive
Right arrow Download to citation manager
Right arrow Search for citing articles in:
ISI Web of Science (429)
Right arrowRequest Permissions
Right arrow Commercial Re-use Guidelines
for Open Access NAR Content
Google Scholar
Right arrow Articles by Ramensky, V.
Right arrow Articles by Sunyaev, S.
Right arrow Search for Related Content
PubMed
Right arrow PubMed Citation
Right arrow Articles by Ramensky, V.
Right arrow Articles by Sunyaev, S.
Social Bookmarking
 Add to CiteULike   Add to Connotea   Add to Del.icio.us  
What's this?

Nucleic Acids Research, 2002, Vol. 30, No. 17 3894-3900
© 2002 Oxford University Press

Human non-synonymous SNPs: server and survey

Vasily Ramensky1,2,3, Peer Bork1,2 and Shamil Sunyaev*,1,3

1 European Molecular Biology Laboratory, Meyerhofstrasse 1, 69117 Heidelberg, Germany, 2 Max-Delbrueck Center for Molecular Medicine, Robert-Roessle-Strasse 10, 13122 Berlin, Germany and 3 Engelhardt Institute of Molecular Biology, Vavilova 32, 119991 Moscow, Russia

*To whom correspondence should be addressed at present address: Genetics Division, Department of Medicine, Brigham & Women’s Hospital and Harvard Medical School, Boston, MA 02115, USA. Tel: +1 617 7325856; Fax: +1 617 7325123; Email: ssunyaev{at}rics.bwh.harvard.edu

Received March 19, 2002; Revised and Accepted July 8, 2002


    ABSTRACT
 TOP
 ABSTRACT
 INTRODUCTION
 MATERIALS AND METHODS
 RESULTS
 DISCUSSION
 REFERENCES
 
Human single nucleotide polymorphisms (SNPs) represent the most frequent type of human population DNA variation. One of the main goals of SNP research is to understand the genetics of the human phenotype variation and especially the genetic basis of human complex diseases. Non-synonymous coding SNPs (nsSNPs) comprise a group of SNPs that, together with SNPs in regulatory regions, are believed to have the highest impact on phenotype. Here we present a World Wide Web server to predict the effect of an nsSNP on protein structure and function. The prediction method enabled analysis of the publicly available SNP database HGVbase, which gave rise to a dataset of nsSNPs with predicted functionality. The dataset was further used to compare the effect of various structural and functional characteristics of amino acid substitutions responsible for phenotypic display of nsSNPs. We also studied the dependence of selective pressure on the structural and functional properties of proteins. We found that in our dataset the selection pressure against deleterious SNPs depends on the molecular function of the protein, although it is insensitive to several other protein features considered. The strongest selective pressure was detected for proteins involved in transcription regulation.


    INTRODUCTION
 TOP
 ABSTRACT
 INTRODUCTION
 MATERIALS AND METHODS
 RESULTS
 DISCUSSION
 REFERENCES
 
A considerable effort is underway to relate human phenotypes to variation at the DNA level. Most human genetic variation is represented by single nucleotide polymorphisms (SNPs) and many of them are believed to cause phenotypic differences between individuals. However, identifying SNPs responsible for specific phenotypes appears to be a problem that is very difficult to solve.

The concept of association studies has been proposed as an experimental technique to identify SNPs underlying complex phenotypes, mostly human multifactorial disorders (1). The question of study design is, however, disputable. Linkage disequilibrium-based whole genome scanning (2,3) has the advantage of being a completely hypothesis-free approach, though possibly too demanding because of the extraordinary number of markers to be screened. Candidate gene studies (2,4) try to reduce the number of SNPs to those from genes most likely to constitute the genetic basis of the disease. Although, even in the latter case, especially if large sets of candidate genes are considered, multiple testing of hundreds and even thousands of SNPs makes detection of the association difficult.

A possible way to overcome the problem of testing overwhelming numbers of SNPs, especially in the case of candidate gene studies, would be to prioritise SNPs according to their functional significance (4,5). As a priori biological knowledge can be used to reduce the number of SNPs by focusing on specific genomic regions or gene sets, bioinformatics expertise may help to discriminate between neutral SNPs, which constitute the majority of genetic variation, and SNPs of likely functional importance. Below, we specifically focus on non-synonymous SNPs (nsSNPs), i.e. SNPs located in coding regions and resulting in amino acid variation in the protein products of genes. It has been shown in several recent studies (611) that the impact of amino acid allelic variants on protein structure and function can be predicted by analysis of multiple sequence alignments and protein 3D structures. As we demonstrated in an earlier work, these predictions correlate with the effect of natural selection seen as an excess of rare alleles (7,12). Therefore, predictions at the molecular level reveal SNPs affecting actual phenotypes.

Here we present: (i) a Web server for annotation of functional nsSNPs (www.bork.embl-heidelberg.de/Poly Phen); (ii) a dataset of nsSNPs extracted from a public SNP database, HGVbase (13) (www.bork.embl-heidelberg.de/PolyPhen/data); (iii) an analysis of these data with regard to predicted effect on protein structure and function.

Prioritisation of SNPs in the candidate gene approach is not the only suggested use of the PolyPhen (polymorphism phenotyping) server and the collection of nsSNPs. The server could also be useful to reveal the structural basis of disease mutations and explain the molecular cause of a disease. This might help in some cases to identify the causative allelic variant (14) after a disease has been linked to a particular locus.

On the other hand, since numerous disease associations published recently could not be confirmed by subsequent independent studies (2,4), the independent evidence of functionality of a nsSNP could be an additional argument to discriminate true associations from false positives.

Analysis of the database of nsSNPs enabled us to test whether certain characteristics of proteins are associated with accumulation of nsSNPs (especially slightly deleterious nsSNPs).


    MATERIALS AND METHODS
 TOP
 ABSTRACT
 INTRODUCTION
 MATERIALS AND METHODS
 RESULTS
 DISCUSSION
 REFERENCES
 
PolyPhen is a World Wide Web server devoted to automated functional annotation of coding nsSNPs. PolyPhen input is the amino acid sequence of a protein or the SWALL database (14) ID or accession number together with sequence position and two amino acid variants characterising the polymorphism. Given the input, PolyPhen starts a fully automated pipeline of several programs described step by step in this section. The pipeline is schematically presented in Figure 1. The server was used to annotate all SNPs deposited in the HGVbase database and the resulting dataset of annotated SNPs is available at http://www.bork.embl-heidelberg.de/PolyPhen/data.



View larger version (23K):
[in this window]
[in a new window]
 
Figure 1. PolyPhen query processing flowchart. PolyPhen combines information on sequence features, multiple alignment with homologous proteins and structural parameters and contacts to make a prediction of nsSNP effect on protein function. hs_swall is the abbreviation for the Homo sapiens subset of the SWALL database (also known as SPTR, i.e. SwissProt + TrEMBL). Var1,2, two amino acid variants; ACC/ID, SWALL accession number or ID.

 
Identifying nsSNPs in known genes
The necessary first step in the analysis of nsSNPs is to identify whether a given SNP is indeed non-synonymous. For this purpose we map SNPs onto known proteins on the basis of SNP DNA flanking sequences. Flanking genomic sequences of SNPs from HGVbase (13) with length 25 bp each have been translated in all six possible frames and searched for in the proteins in the human proteins subset of the SWALL database (15). Protein sequences and genomic fragments were pre-processed with the SEG (16), XNU (17), RepeatMasker (18) and DUST programs, which are used to filter out areas of low compositional complexity, regions containing internal repeats of short periodicity and known human genomic repeat sequences. ALU subfamily proteins were also excluded from the set. We required that at least one translated flanking sequence should have an exact match with a database protein sequence. If this match was detected, we further required that the second flanking sequence had either an exact match with the protein sequence or matched the protein sequence in all positions until the end of the protein or a conventional exon/intron border is observed. The resulting mapping of a SNP onto a protein sequence is always unique.

The above procedure is available as a stand alone World Wide Web-based program snp2prot. The link to this program is provided from the main PolyPhen page. We also provide a link to the SNP annotation tool HNP (Y.Yuan, unpublished results).

After processing HGVbase v.12 (983 589 SNP entries), we obtained a set of 20 462 coding SNPs. Of these, 11 152 were non-synonymous, whereas 9310 were synonymous SNPs and do not produce any change in the amino acid sequence. The nsSNPs formed our dataset, which can be downloaded as one text file or searched against with a straightforward World Wide Web-based engine. The search results contain links to the other databases that provide additional information, e.g. chromosomal location of a nsSNP.

PolyPhen analysis of nsSNPs
Sequence-based characterisation of the substitution site. The substitution may occur at a specific site, e.g. active or binding, or in a non-globular, e.g. transmembrane, region. A query identifies the protein by its SWALL accession number or ID or by the sequence itself. In the latter case, PolyPhen tries to find the given sequence in the human subset of the SWALL database and use the FT (feature table) section of the corresponding entry. If the sequence cannot be found in the human subset of SWALL, this step is skipped. PolyPhen checks if the amino acid replacement occurs at a site that is annotated in the SWALL database feature table as DISULFID, THIOLEST or THIOETH bond, BINDING, ACT_SITE, LIPID, METAL, SITE or MOD_RES site or as a site located in a TRANSMEM, SIGNAL or PROPEP region.

PolyPhen also uses the TMHMM (19) algorithm to predict transmembrane regions, the Coils2 (20) program to predict coiled coil regions and the SignalP (21) program to predict signal peptide regions of the protein sequences.

For a substitution in a transmembrane region, PolyPhen uses the PHAT (22) transmembrane-specific matrix score to evaluate possible functional effect of a nsSNP in the transmembrane region.

At this step PolyPhen memorises all positions that are annotated in the query protein as BINDING, ACT_SITE, LIPID or METAL. At a later stage, if the search for a homologous protein with known 3D structure is successful, it is checked whether the substitution site is in spatial contact with these critical residues.

Profile analysis of homologous sequences. The amino acid replacement may be incompatible with the spectrum of substitutions observed at that position in a family of homologous proteins. PolyPhen identifies homologues of the input sequences via a BLAST (23) search of the NRDB database. The set of aligned sequences with sequence identity to the input sequence in the range 30–94% (inclusive) is used by the new version of the PSIC (position-specific independent counts) software (24) to calculate the so-called profile matrix (http://strand.imb.ac.ru/PSIC/). Elements of the matrix (profile scores) are logarithmic ratios of the likelihood of a given amino acid occurring at a particular site to the likelihood of this amino acid occurring at any site (background frequency). PolyPhen computes the absolute value of the difference between profile scores of both allelic variants in the polymorphic position. PolyPhen also shows the number of aligned sequences at the query position; this may be used to assess the reliability of profile score calculations.

Mapping of the substitution site to known protein 3-dimensional structures. Mapping of an amino acid replacement to a known 3D structure reveals whether the replacement is likely to destroy the hydrophobic core of a protein, electrostatic interactions, interactions with ligands or other important features of a protein. If the spatial structure of a query protein is unknown, one can use a homologous protein of known structure.

PolyPhen carries out a BLAST query of a sequence against a protein structure database [PDB (25) or PQS (26), see below] and retains all hits that meet the given criteria. For instance, the default sequence identity threshold is set to 50%, since this value guarantees the conservation of basic structural characteristics. Minimal hit length and maximal length of gaps are by default set to 100 and 20, respectively. The position of the substitution is then mapped onto the corresponding positions in all retained hits. By default, a hit with 3D structure is rejected if its amino acid at the position under study differs from the amino acid in the input sequence. Hits are sorted according to the sequence identity or E-value of the sequence alignment with the input protein.

Structural parameters used to evaluate the effect of amino acid substitution. Structural analysis performed by PolyPhen is based on the use of several structural parameters, as suggested previously (79). Importantly, although all parameters are reported in the output, only some of them are used in the final decision rules.

PolyPhen uses the DSSP (27) database to obtain the following structural parameters for the mapped amino acid residues: secondary structure (according to the DSSP nomenclature); solvent accessible surface area (absolute value in Å2); {phi}{psi} dihedral angles.

The following values are also calculated by PolyPhen: normalised accessible surface area [the absolute value divided by the maximal area defined as the 99% quantile of surface area distribution for this particular amino acid type in PDB (25)]; change in accessible surface propensity (knowledge-based hydrophobic ‘potentials’) resulting from the substitution; change in residue side chain volume (in Å3); region of the {phi}{psi} map (Ramachandran map) derived from the dihedral angles (9); normalised B factor (temperature factor) for the residue [following Chasman and Adams (9)]; loss of a hydrogen bond [following Wang and Moult (8)] according to the HBplus program (28).

By default, the parameters above are calculated for the first hit only.

Contacts with ‘critical sites’, ligands and other polypeptide chains. The presence of specific spatial contacts of a residue may reveal its role in protein function. PolyPhen checks three types of contacts for a variable amino acid residue. First, contacts with ligands (defined as all heteroatoms excluding water and ‘non-biological’ crystallographic ligands). Second, interactions between subunits of the protein molecule. Technically these are defined as contacts of a polymorphic residue with residues from other polypeptide chains present in the PDB (PQS) file. For this particular type of interaction, it is more advantageous to use the PQS (Protein Quaternary Structure) database (26) rather than PDB, since PQS entries are supposed to provide a more adequate picture of protein quaternary structure architecture.

The third type of contact analysed by PolyPhen is represented by contacts with ‘critical’ residues, where the latter are derived from the sequence annotation. The suggested default threshold for all contacts to be displayed in the output is 6 Å. However, a value of 3 Å is used in the decision rule. For evaluation of a contact between two residues or between a residue and a ligand molecule, PolyPhen finds the minimal distance amongst all possible between atoms of two residues. By default, contacts are calculated for all hits with structure. This is essential for cases where several structures correspond to one protein but carry different information about complexes with other macromolecules and ligands (see for example figure 2 in ref. 7).

Prediction rules. PolyPhen uses empirically derived rules (Table 1) to predict that an nsSNP is damaging, i.e. is supposed to affect protein function, or benign, i.e. most likely lacking any phenotypic effect. The rule is based on the analysis of the ability of various structural parameters and profile scores to discriminate between disease mutations and substitutions between human proteins and closely related mammalian orthologues (7). We introduced two categories of prediction: nsSNPs possibly damaging protein function/ structure and nsSNPs probably damaging protein function/structure. The scheme presented in Table 1 successfully predicts ~82% (~57% for the more stringent set of rules) of disease-causing mutations annotated in SwissProt database 14 and produces ~8% (~3% for the more stringent set of rules) false positives given the control set of between-species substitutions. We note that many parameters, though computed by the server, were excluded from the decision rule. Due to correlation with other parameters they did not help to increase sensitivity without significant loss of specificity of predictions. Multiple alignment-based profile scores provided the major contribution to the prediction. Therefore, even in the case of proteins with no homologue with known 3D structure, predictions remain reasonably reliable.


View this table:
[in this window]
[in a new window]
 
Table 1. Rules used by PolyPhen to predict effect of nsSNPs on protein function and structure
 

    RESULTS
 TOP
 ABSTRACT
 INTRODUCTION
 MATERIALS AND METHODS
 RESULTS
 DISCUSSION
 REFERENCES
 
Retrieval of nsSNPs
HGVbase v.12 (13), a comprehensive public database with extensive curation, was chosen as a source of SNP data. The database had 983 589 SNP entries, which represented SNPs from various sources. Importantly, SNPs in the database are classified according to reliability. Namely, SNPs confirmed by independent and solid experimental verification are marked as ‘Proven’, whereas other SNP candidates are marked as ‘Suspected’. Version 12 of the database contained 984 093 entries, 983 589 of these being SNPs, while the rest represent other types of genetic variants. Only 14 986 SNPs, however, appeared in the ‘Proven’ category. We mapped all available SNPs onto known proteins and found 9310 of them to be synonymous and 11 152 non-synonymous, causing amino acid changes in protein sequences. 1276 of these identified nsSNPs were ‘Proven’.

Only 1026 nsSNPs were mapped to proteins with at least 50% sequence identity to a protein with known 3D structure. The analysis for the rest of the nsSNPs was performed on the basis of multiple alignment information only.

The database of these nsSNPs and their analysis using PolyPhen is available at http://www.bork.embl-heidelberg.de/PolyPhen/data. PolyPhen analysis was only possible for 9165 (82%) of these nsSNPs, as the remainder have been mapped to proteins with no applicable site annotation and no reasonably close homologous sequences available in the SWALL database for multiple alignment or structural analysis.

The results of the PolyPhen analysis are presented in Figure 2.



View larger version (22K):
[in this window]
[in a new window]
 
Figure 2. Results of the PolyPhen analysis of the HGVbase database v.12. hs_swall denotes the Homo sapiens subset of the SWALL database. snp2prot is an in-house command line tool to map HGVbase SNPs onto sequences of known human proteins. 11 152 nsSNPs were identified. 1591 of them have been predicted as possibly damaging for protein structure and function and an additional 1257 as probably damaging. The number of structure-based predictions is much lower compared with the number of sequence-based predictions because structural information was available in only 1026 cases.

 
Structural characterisation of nsSNPs
As has been noted by Wang and Moult (9), most disease mutations and supposedly deleterious nsSNPs affect protein stability rather than functionality. Various structural parameters have been proposed (69,11) to detect the effects of amino acid substitutions. We selected a group of structural parameters and evaluated their impact through a comparison of disease mutations, nsSNPs and substitutions between human proteins and closely related mammalian orthologues [datasets from Sunyaev et al. (7)]. We also selected three characteristics responsible for functionality: annotation of the site as BINDING, ACT_SITE, LIPID or METAL (SwissProt feature table terms); proximity to an annotated site; proximity to a co-crystallised ligand. The data presented in Table 2 confirm that functionality parameters have a smaller impact on the molecular origin of disease mutations and deleterious nsSNPs than protein stability characteristics. Among the structural characteristics presented in Table 2, hydrophobic core stability parameters are the best predictors.


View this table:
[in this window]
[in a new window]
 
Table 2. Structural characteristics of disease mutations, nsSNPs and amino acid substitutions between species
 
Interestingly, for all parameters analysed we observed the same pattern in Table 2. The fraction of SNPs that affect a structural parameter is always much lower than that of disease-causing mutations. At the same time, it is always higher than the corresponding number of substitutions between species. This observation suggests that all effects associated with these structural parameters are responsible for the accumulation of deleterious alleles in the human genome. Disease-causing mutations are subject to very strong selective pressure and are eliminated from the population very quickly. In contrast, slightly deleterious SNPs detected in panels of healthy individuals are supposedly under lower selective pressure and therefore have a much longer persistence time in the population. As suggested by Table 2, we did not observe any structural feature responsible solely for strong or solely for weak selection, as all parameters display the same pattern.

Although many structural parameters can serve as reasonably reliable predictors of the effect of a substitution, a strong correlation within structural parameters and especially between structural parameters and long-term selective pressure signals seen from multiple sequence alignment made exclusion of many parameters from the combined prediction rule necessary. For the set of nsSNPs predicted to be damaging, based on the combined set of rules that incorporate both multiple alignment and structural information (available for these cases), structural parameters worked as predictors in 40% of cases. However, the prediction cannot be made solely at the sequence level in 22% of cases (28% if the ‘probably damaging’ category only is considered).

Protein structural and functional characteristics and selective constraints
As has been shown by systematic studies on cSNP (coding SNPs) discovery (2931), the distribution of nsSNP density over human genes is highly non-uniform. Apart from differences in the coalescent history of loci, this notable difference in the rate of nsSNPs is likely to be caused by variations in selective pressure against deleterious variants. We expected that the difference in selective pressure might be caused by structural properties because the number of sites important for stability or functionality might depend on the protein structure type. Also, extracellular proteins can be expected to have higher stability compared with intracellular ones and this may affect selective constraints. On the other hand, selective pressure may depend on the impact of the gene on the overall fitness of the organism (32). In order to test whether the above properties of proteins have an effect on the density of nsSNPs (considered for genes with the same number of synonymous SNPs to correct for various sources of bias), we subdivided genes from our database into groups according to the SCOP (33) and GO (34) classifications. Contrary to our expectations, we did not detect a significant correlation of selective pressure against deleterious nsSNPs with secondary structure class, localisation or biological process. This might be because we grouped genes into very large classes and the effect might be detected if a finer classification were considered. Alternatively, we have to conclude that there is no strong impact of these characteristics on the selective constraints.

In contrast, molecular function of the protein showed a statistically significant association with the strength of selective pressure (the P value of the {chi}2 test was 0.009). The functional class showing the highest selective pressure against deleterious nsSNPs is the class of transcription factors. This class displays the greatest departure from the average level of selective constraints. Enzymes are the class of proteins with the lowest selective pressure. The fraction of nsSNPs predicted as damaging by PolyPhen is also highest for enzymes and lowest for transcription factors. This is expected and shows that low selective constraints allow for accumulation of slightly deleterious SNPs. We hypothesise that this observation can be explained in terms of the molecular basis of dominance (35). Mutations in enzymes are likely to be recessive because the flux in a metabolic pathway undergoes very minor change in response to a decrease in enzyme activity (35). In contrast, changes in the activity of transcription factors can have a high impact on the transcription level of the regulated genes. Transcription factors listed in the OMIM database (http://www.ncbi.nlm.nih.gov/Omim/) are reported to be dominant genes much more frequently than enzymes.

However, we should note that the current SNP databases are probably biased towards ‘popular’ genes, which could have affected our results. More accurate selective pressure studies will be possible in the future with larger datasets arising from large-scale systematic studies.


    DISCUSSION
 TOP
 ABSTRACT
 INTRODUCTION
 MATERIALS AND METHODS
 RESULTS
 DISCUSSION
 REFERENCES
 
Server
Ideally, the end point of disease gene identification should be functional analysis of the disease-associated allele and an understanding of the molecular mechanism of causation of the disease phenotype. This functional characterisation can be facilitated by the computational analysis provided by our tool.

Unlike fully penetrant mutations causing Mendelian diseases, SNPs involved in complex human phenotypes are not a necessary and sufficient condition defining the phenotype but their effect depends on many other genetic and environmental components. In other words, SNPs may comprise risk factors of having a specific phenotype in the statistical sense. Therefore, the effect of a particular SNP on phenotype might be seen only as a frequency difference between individuals that display the phenotype and unaffected controls.

Given the very high rate of false associations recently reported, any independent evidence of the impact of the suspected allelic variant should be valued. Sequence and structure analysis of the suspected amino acid variant can increase the confidence of the finding by revealing the structural background of the disease. The PolyPhen server can be used to evaluate whether the reported/identified association can indeed have a functional meaning and therefore is less likely to represent a false positive due to statistical reasons or reasons of inappropriate study design and population choice.

Consequently, even if an association of a genomic locus with a particular phenotype is unambiguously demonstrated, it is not always clear that the identified DNA variant has a causative relationship with the disease and that statistical association is not a result of linkage disequilibrium with the true functional variant (14). In this case the PolyPhen server can be used to distinguish casual from non-casual relationships between a nsSNP and the phenotype of interest.

The database of nsSNPs annotated by PolyPhen provides a source of functionally annotated nsSNPs. The collection might be a useful resource for selection of nsSNPs for candidate gene-based association studies. The question of how to choose the set of SNPs to be screened is critical to the success of a study. The major hurdle in any model of association studies is posed by the large number of these SNPs (4,36). One side of the problem is the limitations of currently available genotyping technologies, which make studies on large SNP sets in large panels of individuals impractical. The other side, however, is of a purely statistical nature and is therefore independent of the technological progress. Multiple test correction in the case of many thousands of SNPs to be analysed makes the detection of otherwise significant allele frequency differences problematical. Possible allelic and non-allelic heterogeneity, epistatic interactions between alleles, low penetrance of the phenotype and complexity of environmental factors involved make the SNP-based detection of disease genes even more difficult (2). Without any careful pre-selection of SNPs to be screened, unrealistically large panels of individuals might be required to detect association at a reasonable level of statistical significance. Therefore, computational prediction of functional importance can be considered as one of the reasons to prioritise SNPs while looking for an association.

Survey
PolyPhen analysis of the nsSNP database confirmed earlier observations (69,12,37) that a significant number of human nsSNPs is represented by slightly deleterious alleles. The fraction of nsSNPs predicted to be damaging in the much larger dataset of 9165 nsSNPs is similar to the earlier result. Most predictions were computed based solely on the multiple alignment information, since structural data are available for only a very small fraction of cases.

It is important to note that the number of functional nsSNPs predicted for the whole database is likely to be an overestimate due to pollution of the database by erroneous SNP reports, on the one hand, and possible bias of the database towards disease-related allelic variants on the other. To test the impact of these biases on the overall conclusion of the presence of multiple slightly deleterious SNPs in individual human genomes, we compared fractions of nsSNPs predicted to be damaging (both possibly and probably) for HGVbase entries annotated as ‘Proven’ and ‘Suspected’. Additionally, we compared the prediction rate for ‘Proven’ nsSNPs originating from systematic studies (2931) with the overall prediction rate. The overall prediction rate for the category ‘Suspected’ nsSNPs was 31.4%, for the category ‘Proven’ nsSNPs it was 28.9% and for ‘Proven’ nsSNPs from systematic studies on healthy individuals (2931) it was 27.6%. This shows that inaccuracy and bias of the database data lead to overprediction of the fraction of deleterious nsSNPs. However, the effect of the prediction rate for nsSNPs compared with the species divergence data on a much higher fraction is seen even from the cleanest possible dataset. Similarly, trends observed in Table 2 are the same for any subset of nsSNP data.

Our analysis showed that various effects on protein stability are responsible for accumulation of slightly deleterious nsSNPs in human genes. The selection against these variants is likely to depend on the molecular function of proteins rather than on the type of structure or cellular localisation. This can possibly be explained by the relationship between molecular function and mutation dominance. Transcription factors appear to be the group with the highest selective constraints.

With the growth of public SNP data and the improvement in the quality of SNP databases, functional analysis of SNPs can possibly play a role in our understanding of the inheritance of complex human phenotypes.


    ACKNOWLEDGEMENTS
 
The authors are thankful to Evgenia Kriventseva for her help in the work with the GO database. S.S. acknowledges Alexey Kondrashov for useful discussions.


    REFERENCES
 TOP
 ABSTRACT
 INTRODUCTION
 MATERIALS AND METHODS
 RESULTS
 DISCUSSION
 REFERENCES
 

  1. Risch,N. and Merikangas,K. (1996) The future of genetic studies of complex human diseases. Science, 273, 1516–1517.[Abstract/Free Full Text]

  2. Risch,N.J. (2000) Searching for genetic determinants in the new millennium. Nature, 15, 847–856.

  3. Lai,E., Riley,J., Purvis,I. and Roses,A. (1998) A 4-Mb high-density single nucleotide polymorphism-based map around human APOE. Genomics, 54, 31–38.[Web of Science][Medline]

  4. Emahazion,T., Feuk,L., Jobs,M., Sawyer,S.L., Fredman,D., St Clair,D., Prince,J.A. and Brookes,A.J. (2001) SNP association studies in Alzheimer’s disease highlight problem for complex disease analysis. Trends Genet., 17, 407–413.[Web of Science][Medline]

  5. Schork,N.J., Fallin,D. and Lanchbury,J.S. (2000) Single nucleotide polymorphisms and the future of genetic epidemiology. Clin. Genet., 58, 250–264.[Web of Science][Medline]

  6. Sunyaev,S., Ramensky,V. and Bork,P. (2000) Towards a structural basis of human non-synonymous single nucleotide polymorphisms. Trends Genet., 16, 198–200.[Web of Science][Medline]

  7. Sunyaev,S., Ramensky,V., Koch,I., Lathe,W.,III, Kondrashov,A.S. and Bork,P. (2001) Prediction of deleterious human alleles. Hum. Mol. Genet., 10, 591–597.[Abstract/Free Full Text]

  8. Wang,Z. and Moult,J. (2001) SNPs, protein structure and disease. Hum. Mutat., 17, 263–270.[Web of Science][Medline]

  9. Chasman,D. and Adams,R.M. (2001) Predicting the functional consequences of non-synonymous single nucleotide polymorphisms: structure-based assessment of amino acid variation. J. Mol. Biol., 307, 683–706.[Web of Science][Medline]

  10. Ng,P.C. and Henikoff,S. (2001) Predicting deleterious amino acid substitutions. Genome Res., 11, 863–874.[Abstract/Free Full Text]

  11. Ferrer-Costa,C., Orozco,M. and de la Cruz,X. (2002) Characterization of disease-associated single amino acid polymorphisms in terms of sequence and structure properties. J. Mol. Biol., 315, 771–786.[Web of Science][Medline]

  12. Sunyaev,S.R., Lathe,W.C.,III, Ramensky,V.E. and Bork,P. (2000) SNP frequencies in human genes: an excess of rare alleles and differing modes of selection. Trends Genet., 16, 335–337.[Web of Science][Medline]

  13. Fredman,D., Siegfried,M., Yuan,Y.P., Bork,P., Lehvaslaiho,H. and Brookes,A.J. (2002) HGVbase: a human sequence variation database emphasizing data quality and a broad spectrum of data sources. Nucleic Acids Res., 30, 387–391.[Abstract/Free Full Text]

  14. Johnson,G.C. and Todd,J.A. (2000) Strategies in complex disease mapping. Curr. Opin. Genet. Dev., 10, 330–334.[Web of Science][Medline]

  15. Apweiler,R. (2000) Protein sequence databases. Adv. Protein Chem., 54, 31–71.[Web of Science][Medline]

  16. Wootton,J.C. and Federhen,S. (1993) Statistics of local complexity in amino-acid-sequences and sequence databases. Comput. Chem., 17, 149–163.

  17. Claverie,J.M. and States,D.J. (1993) Information enhancement methods for large-scale sequence analysis. Comput. Chem., 17, 191–201.[Web of Science]

  18. Jurka,J. (2000) Repbase update: a database and an electronic journal of repetitive elements. Trends Genet., 16, 418–420[Web of Science][Medline]

  19. Krogh,A., Larsson,B., von Heijne,G. and Sonnhammer,E.L. (2001) Predicting transmembrane protein topology with a hidden Markov model: application to complete genomes. J. Mol. Biol., 305, 567–580.[Web of Science][Medline]

  20. Lupas,A., Van Dyke,M. and Stock,J. (1991) Predicting coiled coils from protein sequences. Science, 252, 1162–1164.[Free Full Text]

  21. Nielsen,H., Engelbrecht,J., Brunak,S. and von Heijne G. (1997) Identification of prokaryotic and eukaryotic signal peptides and prediction of their cleavage sites. Protein Eng., 10, 1–6.[Abstract/Free Full Text]

  22. Ng,P.C., Henikoff,J.G. and Henikoff,S. (2000) PHAT: a transmembrane-specific substitution matrix. Predicted hydrophobic and transmembrane. Bioinformatics, 16, 760–766.[Abstract/Free Full Text]

  23. Altschul,S.F., Gish,W., Miller,W., Myers,E.W. and Lipman,D.J. (1990) Basic local alignment search tool. J. Mol. Biol., 215, 403–410.[Web of Science][Medline]

  24. Sunyaev,S.R., Eisenhaber,F., Rodchenkov,I.V., Eisenhaber,B., Tumanyan,V.G. and Kuznetsov,E.N. (1999) PSIC: profile extraction from sequence alignments with position-specific counts of independent observations. Protein Eng., 12, 387–394.[Abstract/Free Full Text]

  25. Berman,H.M., Westbrook,J., Feng,Z., Gilliland,G., Bhat,T.N., Weissig,H., Shindyalov,I.N. and Bourne,P.E. (2000) The Protein Data Bank. Nucleic Acids Res., 28, 235–242.[Abstract/Free Full Text]

  26. Henrick,K. and Thornton,J.M. (1998) PQS: a protein quaternary structure file server. Trends Biochem. Sci., 23, 358–361.[Web of Science][Medline]

  27. Kabsch,W. and Sander,C. (1983) Dictionary of protein secondary structure: pattern recognition of hydrogen-bonded and geometrical features. Biopolymers, 22, 2577–2637.[Web of Science][Medline]

  28. McDonald,I.K. and Thornton,J.M. (1994) Satisfying hydrogen bonding potential in proteins. J. Mol. Biol., 238, 777–793.[Web of Science][Medline]

  29. Cargill,M., Altshuler,D., Ireland,J., Sklar,P., Ardlie,K., Patil,N., Shaw,N., Lane,C.R., Lim,E.P., Kalyanaraman,N. et al. (1999) Characterization of single-nucleotide polymorphisms in coding regions of human genes. Nature Genet., 22, 231–238.[Web of Science][Medline]

  30. Halushka,M.K., Fan,J.B., Bentley,K., Hsie,L., Shen,N., Weder,A., Cooper,R., Lipshutz,R. and Chakravarti,A. (1999) Patterns of single-nucleotide polymorphisms in candidate genes for blood-pressure homeostasis. Nature Genet., 22, 239–247.[Web of Science][Medline]

  31. Cambien,F., Poirier,O., Nicaud,V., Hermann,S.M., Mallet,C., Ricard,S., Beague,I., Hallet,V., Blanc,H., Loucaci,V. et al. (1999) Sequence diversity in 36 candidate genes for cardiovascular disorders. Am. J. Hum. Genet., 65, 183–191.[Web of Science][Medline]

  32. Hirsh,A.E. and Fraser,H.B. (2001) Protein dispensability and rate of evolution. Nature, 411, 1046–1049.[Medline]

  33. Lo Conte,L., Brenner,S.E., Hubbard,T.J., Chothia,C. and Murzin,A.G. (2002) SCOP database in 2002: refinements accommodate structural genomics. Nucleic Acids Res., 30, 264–267.[Abstract/Free Full Text]

  34. The Gene Ontology Consortium (2001) Creating the gene ontology resource: design and implementation. Genome Res., 11, 1425–1433.[Abstract/Free Full Text]

  35. Kacser,H. and Burns,J.A. (1981) The molecular basis of dominance. Genetics, 97, 639–666.[Abstract/Free Full Text]

  36. Weiss,K.M. and Terwilliger,J.D. (2000) How many diseases does it take to map a gene with SNPs? Nature Genet., 26, 151–157.[Web of Science][Medline]

  37. Fay,J.C., Wyckoff,G.J. and Wu,C.I. (2001) Positive and negative selection on the human genome. Genetics, 158, 1227–1234.[Abstract/Free Full Text]


Add to CiteULike CiteULike   Add to Connotea Connotea   Add to Del.icio.us Del.icio.us    What's this?


This article has been cited by other articles:


Home page
J. Mol. Diagn.Home page
J. McDonald, F. Gedge, A. Burdette, J. Carlisle, C. J. Bukjiok, M. Fox, and P. Bayrak-Toydemir
Multiple Sequence Variants in Hereditary Hemorrhagic Telangiectasia Cases: Illustration of Complexity in Molecular Diagnostic Interpretation
J. Mol. Diagn., November 1, 2009; 11(6): 569 - 575.
[Abstract] [Full Text] [PDF]


Home page
CarcinogenesisHome page
M. J. Garcia, V. Fernandez, A. Osorio, A. Barroso, F. Fernandez, M. Urioste, and J. Benitez
Mutational analysis of FANCL, FANCM and the recently identified FANCI suggests that among the 13 known Fanconi Anemia genes, only FANCD1/BRCA2 plays a major role in high-risk breast cancer predisposition
Carcinogenesis, November 1, 2009; 30(11): 1898 - 1902.
[Abstract] [Full Text] [PDF]


Home page
BioinformaticsHome page
B. Li, V. G. Krishnan, M. E. Mort, F. Xin, K. K. Kamati, D. N. Cooper, S. D. Mooney, and P. Radivojac
Automated inference of molecular mechanisms of disease from amino acid substitutions
Bioinformatics, November 1, 2009; 25(21): 2744 - 2750.
[Abstract] [Full Text] [PDF]


Home page
J. Med. Genet.Home page
E A Otto, K Tory, M Attanasio, W Zhou, M Chaki, Y Paruchuri, E L Wise, M T F Wolf, B Utsch, C Becker, et al.
Hypomorphic mutations in meckelin (MKS3/TMEM67) cause nephronophthisis with liver fibrosis (NPHP11)
J. Med. Genet., October 1, 2009; 46(10): 663 - 670.
[Abstract] [Full Text] [PDF]


Home page
Mol Biol EvolHome page
A. Moreno-Estrada, K. Tang, M. Sikora, T. Marques-Bonet, F. Casals, A. Navarro, F. Calafell, J. Bertranpetit, M. Stoneking, and E. Bosch
Interrogating 11 Fast-Evolving Genes for Signatures of Recent Positive Selection in Worldwide Human Populations
Mol. Biol. Evol., October 1, 2009; 26(10): 2285 - 2297.
[Abstract] [Full Text] [PDF]


Home page
Proc. Natl. Acad. Sci. USAHome page
K. Nakata, B. K. Lipska, T. M. Hyde, T. Ye, E. N. Newburn, Y. Morita, R. Vakkalanka, M. Barenboim, Y. Sei, D. R. Weinberger, et al.
DISC1 splice variants are upregulated in schizophrenia and associated with risk polymorphisms
PNAS, September 15, 2009; 106(37): 15873 - 15878.
[Abstract] [Full Text] [PDF]


Home page
Am. J. Respir. Cell Mol. Bio.Home page
C. P. Hersh, N. N. Hansel, K. C. Barnes, D. A. Lomas, S. G. Pillai, H. O. Coxson, R. A. Mathias, N. M. Rafaels, R. A. Wise, J. E. Connett, et al.
Transforming Growth Factor-{beta} Receptor-3 Is Associated with Pulmonary Emphysema
Am. J. Respir. Cell Mol. Biol., September 1, 2009; 41(3): 324 - 331.
[Abstract] [Full Text] [PDF]


Home page
Genome ResHome page
S. Kumar, M. P. Suleski, G. J. Markov, S. Lawrence, A. Marco, and A. J. Filipski
Positional conservation and amino acids shape the correct diagnosis and population frequencies of benign and damaging personal amino acid mutations
Genome Res., September 1, 2009; 19(9): 1562 - 1569.
[Abstract] [Full Text] [PDF]


Home page
Genome ResHome page
K. J. McKernan, H. E. Peckham, G. L. Costa, S. F. McLaughlin, Y. Fu, E. F. Tsung, C. R. Clouser, C. Duncan, J. K. Ichikawa, C. C. Lee, et al.
Sequence and structural variation in a human genome uncovered by short-read, massively parallel ligation sequencing using two-base encoding
Genome Res., September 1, 2009; 19(9): 1527 - 1541.
[Abstract] [Full Text] [PDF]


Home page
Genome ResHome page
L. G. Biesecker, J. C. Mullikin, F. M. Facio, C. Turner, P. F. Cherukuri, R. W. Blakesley, G. G. Bouffard, P. S. Chines, P. Cruz, N. F. Hansen, et al.
The ClinSeq Project: Piloting large-scale genome sequencing for research in genomic medicine
Genome Res., September 1, 2009; 19(9): 1665 - 1674.
[Abstract] [Full Text] [PDF]


Home page
Genome ResHome page
S. Chun and J. C. Fay
Identification of deleterious mutations within three human genomes
Genome Res., September 1, 2009; 19(9): 1553 - 1561.
[Abstract] [Full Text] [PDF]


Home page
Arterioscler. Thromb. Vasc. Bio.Home page
S. J.R. Meex, D. Weissglas-Volkov, C. J.H. van der Kallen, D. J. Thuerauf, M. M.J. van Greevenbroek, C. G. Schalkwijk, C. D.A. Stehouwer, E. J.M. Feskens, L. Heldens, T. A. Ayoubi, et al.
The ATF6-Met[67]Val Substitution Is Associated With Increased Plasma Cholesterol Levels
Arterioscler Thromb Vasc Biol, September 1, 2009; 29(9): 1322 - 1327.
[Abstract] [Full Text] [PDF]


Home page
J. Med. Genet.Home page
V Tran-Fadulu, H Pannu, D H Kim, G W Vick III, C M Lonsford, A L Lafont, C Boccalandro, S Smart, K L Peterson, J Z. Hain, et al.
Analysis of multigenerational families with thoracic aortic aneurysms and dissections due to TGFBR1 or TGFBR2 mutations
J. Med. Genet., September 1, 2009; 46(9): 607 - 613.
[Abstract] [Full Text] [PDF]


Home page
haematolHome page
K. M.K. de Vooght, W. W. van Solinge, A. C. van Wesel, S. Kersting, and R. van Wijk
First mutation in the red blood cell-specific promoter of hexokinase combined with a novel missense mutation causes hexokinase deficiency and mild chronic hemolysis
Haematologica, September 1, 2009; 94(9): 1203 - 1210.
[Abstract] [Full Text] [PDF]


Home page
Cancer Res.Home page
H. Carter, S. Chen, L. Isik, S. Tyekucheva, V. E. Velculescu, K. W. Kinzler, B. Vogelstein, and R. Karchin
Cancer-Specific High-Throughput Annotation of Somatic Mutations: Computational Prediction of Driver Missense Mutations
Cancer Res., August 15, 2009; 69(16): 6660 - 6667.
[Abstract] [Full Text] [PDF]


Home page
Mol Hum ReprodHome page
A. Khattri, R.K. Pandey, N.J. Gupta, B. Chakravarty, M. Deenadayal, L. Singh, and K. Thangaraj
Estrogen receptor {beta} gene mutations in Indian infertile men
Mol. Hum. Reprod., August 1, 2009; 15(8): 513 - 520.
[Abstract] [Full Text] [PDF]


Home page
BMJ Case ReportsHome page
E J Marco, F E Abidi, J Bristow, W B Dean, P Cotter, R J Jeremy, C E Schwartz, and E H Sherr
ARHGEF9 disruption in a female patient is associated with X linked mental retardation and sensory hyperarousal
BMJ Case Reports, July 2, 2009; 2009(jul02_1): bcr0620091999 - bcr0620091999.
[Abstract] [Full Text]


Home page
Proc. Natl. Acad. Sci. USAHome page
L. A. Hindorff, P. Sethupathy, H. A. Junkins, E. M. Ramos, J. P. Mehta, F. S. Collins, and T. A. Manolio
Potential etiologic and functional implications of genome-wide association loci for human diseases and traits
PNAS, June 9, 2009; 106(23): 9362 - 9367.
[Abstract] [Full Text] [PDF]


Home page
Cancer Epidemiol. Biomarkers Prev.Home page
I. Cheng, D. O. Stram, N. P. Burtt, L. Gianniny, R. R. Garcia, L. Pooler, B. E. Henderson, L. Le Marchand, and C. A. Haiman
IGF2R Missense Single-Nucleotide Polymorphisms and Breast Cancer Risk: The Multiethnic Cohort Study
Cancer Epidemiol. Biomarkers Prev., June 1, 2009; 18(6): 1922 - 1924.
[Abstract] [Full Text] [PDF]


Home page
J DAIRY SCIHome page
F. Caravaca, J. Carrizosa, B. Urrutia, F. Baena, J. Jordana, M. Amills, B. Badaoui, A. Sanchez, A. Angiolillo, and J. M. Serradilla
Short communication: Effect of {alpha}S1-casein (CSN1S1) and {kappa}-casein (CSN3) genotypes on milk composition in Murciano-Granadina goats
J Dairy Sci, June 1, 2009; 92(6): 2960 - 2964.
[Abstract] [Full Text] [PDF]


Home page
J. Med. Genet.Home page
A Ghalamkarpour, W Holnthoner, P Saharinen, L M Boon, J B Mulliken, K Alitalo, and M Vikkula
Recessive primary congenital lymphoedema caused by a VEGFR3 mutation
J. Med. Genet., June 1, 2009; 46(6): 399 - 404.
[Abstract] [Full Text] [PDF]


Home page
J. Med. Genet.Home page
C Espinos, M Pineda, D Martinez-Rubio, V Lupo, A Ormazabal, M A Vilaseca, L J M Spaapen, F Palau, and R Artuch
Mutations in the urocanase gene UROC1 are associated with urocanic aciduria
J. Med. Genet., June 1, 2009; 46(6): 407 - 411.
[Abstract] [Full Text] [PDF]


Home page
Drug Metab. Dispos.Home page
L.-L. Wang, Y. Li, and S.-F. Zhou
A Bioinformatics Approach for the Phenotype Prediction of Nonsynonymous Single Nucleotide Polymorphisms in Human Cytochromes P450
Drug Metab. Dispos., May 1, 2009; 37(5): 977 - 991.
[Abstract] [Full Text] [PDF]


Home page
J Child NeurolHome page
M. Mangelsdorf, E. Chevrier, A. Mustonen, and D. J. Picketts
Borjeson-Forssman-Lehmann Syndrome Due to a Novel Plant Homeodomain Zinc Finger Mutation in the PHF6 Gene
J Child Neurol, May 1, 2009; 24(5): 610 - 614.
[Abstract] [PDF]


Home page
GutHome page
P T Campbell, K Curtin, C M Ulrich, W S Samowitz, J Bigler, C M Velicer, B Caan, J D Potter, and M L Slattery
Mismatch repair polymorphisms and risk of colon cancer, tumour microsatellite instability and interactions with lifestyle factors
Gut, May 1, 2009; 58(5): 661 - 667.
[Abstract] [Full Text] [PDF]


Home page
BioinformaticsHome page
P. H. Lee and H. Shatkay
An integrative scoring system for ranking SNPs by their potential deleterious effects
Bioinformatics, April 15, 2009; 25(8): 1048 - 1055.
[Abstract] [Full Text] [PDF]


Home page
Hum Mol GenetHome page
M. A. Calton, B. A. Ersoy, S. Zhang, J. P. Kane, M. J. Malloy, C. R. Pullinger, Y. Bromberg, L. A. Pennacchio, R. Dent, R. McPherson, et al.
Association of functionally significant Melanocortin-4 but not Melanocortin-3 receptor mutations with severe adult obesity in a large North American case-control study
Hum. Mol. Genet., March 15, 2009; 18(6): 1140 - 1147.
[Abstract] [Full Text] [PDF]


Home page
BioinformaticsHome page
C. Chelala, A. Khan, and N. R Lemoine
SNPnexus: a web database for functional annotation of newly discovered and public domain single nucleotide polymorphisms
Bioinformatics, March 1, 2009; 25(5): 655 - 661.
[Abstract] [Full Text] [PDF]


Home page
Physiol. GenomicsHome page
D. A. Blizard, A. Lionikas, D. J. Vandenbergh, T. Vasilopoulos, G. S. Gerhard, J. W. Griffith, L. C. Klein, J. T. Stout, H. A. Mack, J. M. Lakoski, et al.
Blood pressure and heart rate QTL in mice of the B6/D2 lineage: sex differences and environmental influences
Physiol Genomics, February 2, 2009; 36(3): 158 - 166.
[Abstract] [Full Text] [PDF]


Home page
NEJMHome page
K. Boztug, G. Appaswamy, A. Ashikov, A. A. Schaffer, U. Salzer, J. Diestelhorst, M. Germeshausen, G. Brandes, J. Lee-Gossler, F. Noyan, et al.
A Syndrome with Congenital Neutropenia and Mutations in G6PC3
N. Engl. J. Med., January 1, 2009; 360(1): 32 - 43.
[Abstract] [Full Text] [PDF]


Home page
J. Clin. Endocrinol. Metab.Home page
A. M. Joshua, S. Ezzat, S. L. Asa, A. Evans, R. Broom, M. Freeman, and J. J. Knox
Rationale and Evidence for Sunitinib in the Treatment of Malignant Paraganglioma/Pheochromocytoma
J. Clin. Endocrinol. Metab., January 1, 2009; 94(1): 5 - 9.
[Abstract] [Full Text] [PDF]


Home page
Brief BioinformHome page
R. Karchin
Next generation tools for the annotation of human SNPs
Brief Bioinform, January 1, 2009; 10(1): 35 - 52.
[Abstract] [Full Text] [PDF]


Home page
Nucleic Acids ResHome page
A. R. Pico, I. V. Smirnov, J. S. Chang, R.-F. Yeh, J. L. Wiemels, J. K. Wiencke, T. Tihan, B. R. Conklin, and M. Wrensch
SNPLogic: an interactive single nucleotide polymorphism selection, annotation, and prioritization system
Nucleic Acids Res., January 1, 2009; 37(suppl_1): D803 - D809.
[Abstract] [Full Text] [PDF]


Home page
Cancer Res.Home page
A. Enjuanes, Y. Benavente, F. Bosch, I. Martin-Guerrero, D. Colomer, S. Perez-Alvarez, O. Reina, M. T. Ardanaz, P. Jares, A. Garcia-Orad, et al.
Genetic Variants in Apoptosis and Immunoregulation-Related Genes Are Associated with Risk of Chronic Lymphocytic Leukemia
Cancer Res., December 15, 2008; 68(24): 10178 - 10186.
[Abstract] [Full Text] [PDF]


Home page
BloodHome page
D. C. Johnson, S. Corthals, C. Ramos, A. Hoering, K. Cocks, N. J. Dickens, J. Haessler, H. Goldschmidt, J. A. Child, S. E. Bell, et al.
Genetic associations with thalidomide mediated venous thrombotic events in myeloma identified using targeted genotyping
Blood, December 15, 2008; 112(13): 4924 - 4934.
[Abstract] [Full Text] [PDF]


Home page
J. Med. Genet.Home page
F E Abidi, L Holloway, C A Moore, D D Weaver, R J Simensen, R E Stevenson, R C Rogers, and C E Schwartz
Mutations in JARID1C are associated with X-linked mental retardation, short stature and hyperreflexia
J. Med. Genet., December 1, 2008; 45(12): 787 - 793.
[Abstract] [Full Text] [PDF]


Home page
Am. J. Respir. Crit. Care Med.Home page
M. M. Wurfel, A. C. Gordon, T. D. Holden, F. Radella, J. Strout, O. Kajikawa, J. T. Ruzinski, G. Rona, R. A. Black, S. Stratton, et al.
Toll-like Receptor 1 Polymorphisms Affect Innate Immune Responses and Outcomes in Sepsis
Am. J. Respir. Crit. Care Med., October 1, 2008; 178(7): 710 - 720.
[Abstract] [Full Text] [PDF]


Home page
IOVSHome page
C. Zeitz, A. K. Gross, D. Leifert, B. Kloeckener-Gruissem, S. D. McAlear, J. Lemke, J. Neidhardt, and W. Berger
Identification and Functional Characterization of a Novel Rhodopsin Mutation Associated with Autosomal Dominant CSNB
Invest. Ophthalmol. Vis. Sci., September 1, 2008; 49(9): 4105 - 4114.
[Abstract] [Full Text] [PDF]


Home page
BioinformaticsHome page
P. Radivojac, P. H. Baenziger, M. G. Kann, M. E. Mort, M. W. Hahn, and S. D. Mooney
Gain and loss of phosphorylation sites in human cancer
Bioinformatics, August 15, 2008; 24(16): i241 - i247.
[Abstract] [Full Text] [PDF]


Home page
BioinformaticsHome page
Y. Bromberg and B. Rost
Comprehensive in silico mutagenesis highlights functionally important residues in proteins
Bioinformatics, August 15, 2008; 24(16): i207 - i212.
[Abstract] [Full Text] [PDF]


Home page
J. Med. Genet.Home page
Z-B Jin, M Mandai, T Yokota, K Higuchi, K Ohmori, F Ohtsuki, S Takakura, T Itabashi, Y Wada, M Akimoto, et al.
Identifying pathogenic genetic background of simplex or multiplex retinitis pigmentosa patients: a large scale mutation screening study
J. Med. Genet., July 1, 2008; 45(7): 465 - 472.
[Abstract] [Full Text] [PDF]


Home page
Hum Mol GenetHome page
C. Palles, N. Johnson, B. Coupland, C. Taylor, J. Carvajal, J. Holly, I. S. Fentiman, I. dos Santos Silva, A. Ashworth, J. Peto, et al.
Identification of genetic variants that influence circulating IGF1 levels: a targeted search strategy
Hum. Mol. Genet., May 15, 2008; 17(10): 1457 - 1464.
[Abstract] [Full Text] [PDF]


Home page
J Mol EndocrinolHome page
M. Plourde, C. Manhes, G. Leblanc, F. Durocher, M. Dumont, O. Sinilnikova, I. BRCAs, and J. Simard
Mutation analysis and characterization of HSD17B2 sequence variants in breast cancer cases from French Canadian families with high risk of breast and ovarian cancer
J. Mol. Endocrinol., April 1, 2008; 40(4): 161 - 172.
[Abstract] [Full Text] [PDF]


Home page
Hum Mol GenetHome page
G. Gasparre, E. Hervouet, E. de Laplanche, J. Demont, L. F. Pennisi, M. Colombel, F. Mege-Lechevallier, J.-Y. Scoazec, E. Bonora, R. Smeets, et al.
Clonal expansion of mutated mitochondrial DNA is associated with tumor formation and complex I deficiency in the benign renal oncocytoma
Hum. Mol. Genet., April 1, 2008; 17(7): 986 - 995.
[Abstract] [Full Text] [PDF]


Home page
Hum Mol GenetHome page
C. A. Haiman, C. Hsu, P. I.W. de Bakker, M. Frasco, X. Sheng, D. Van Den Berg, J. T. Casagrande, L. N. Kolonel, L. Le Marchand, S. E. Hankinson, et al.
Comprehensive association testing of common genetic variation in DNA repair pathway genes in relationship with breast cancer risk in multiple populations
Hum. Mol. Genet., March 15, 2008; 17(6): 825 - 834.
[Abstract] [Full Text] [PDF]


Home page
NeurologyHome page
C. L. Morgia, A. Achilli, L. Iommarini, P. Barboni, M. Pala, A. Olivieri, C. Zanna, S. Vidoni, C. Tonon, R. Lodi, et al.
Rare mtDNA variants in Leber hereditary optic neuropathy families with recurrence of myoclonus
Neurology, March 4, 2008; 70(10): 762 - 770.
[Abstract] [Full Text] [PDF]


Home page
J. Lipid Res.Home page
D. C. Crawford, A. S. Nord, M. D. Badzioch, J. Ranchalis, L. A. McKinstry, M. Ahearn, C. Bertucci, C. Shephard, M. Wong, M. J. Rieder, et al.
A common VLDLR polymorphism interacts with APOE genotype in the prediction of carotid artery disease risk
J. Lipid Res., March 1, 2008; 49(3): 588 - 596.
[Abstract] [Full Text] [PDF]


Home page
GeneticsHome page
J. L. Kelley and W. J. Swanson
Dietary Change and Adaptive Evolution of enamelin in Humans and Among Primates
Genetics, March 1, 2008; 178(3): 1595 - 1603.
[Abstract] [Full Text] [PDF]


Home page
J. Med. Genet.Home page
E J Marco, F E Abidi, J Bristow, W B Dean, P Cotter, R J Jeremy, C E Schwartz, and E H Sherr
ARHGEF9 disruption in a female patient is associated with X linked mental retardation and sensory hyperarousal
J. Med. Genet., February 1, 2008; 45(2): 100 - 105.
[Abstract] [Full Text] [PDF]


Home page
BloodHome page
G. S. Sellick, R. Wade, S. Richards, D. G. Oscier, D. Catovsky, and R. S. Houlston
Scan of 977 nonsynonymous SNPs in CLL4 trial patients for the identification of genetic variants influencing prognosis
Blood, February 1, 2008; 111(3): 1625 - 1633.
[Abstract] [Full Text] [PDF]


Home page
Nucleic Acids ResHome page
A. Singh, A. Olowoyeye, P. H. Baenziger, J. Dantzer, M. G. Kann, P. Radivojac, R. Heiland, and S. D. Mooney
MutDB: update on development of tools for the biochemical analysis of genetic variation
Nucleic Acids Res., January 11, 2008; 36(suppl_1): D815 - D819.
[Abstract] [Full Text] [PDF]


Home page
Nucleic Acids ResHome page
H. Kono, T. Yuasa, S. Nishiue, and K. Yura
coliSNP database server mapping nsSNPs on protein structures
Nucleic Acids Res., January 11, 2008; 36(suppl_1): D409 - D413.
[Abstract] [Full Text] [PDF]


Home page
Nucleic Acids ResHome page
P. H. Lee and H. Shatkay
F-SNP: computationally predicted functional SNPs for disease association studies
Nucleic Acids Res., January 11, 2008; 36(suppl_1): D820 - D824.
[Abstract] [Full Text] [PDF]


Home page
Protein Eng Des SelHome page
B. Shen, J. Bai, and M. Vihinen
Physicochemical feature-based classification of amino acid mutations
Protein Eng. Des. Sel., January 1, 2008; 21(1): 37 - 44.
[Abstract] [Full Text] [PDF]


Home page
Hum ReprodHome page
T. D. Gallardo, G. B. John, K. Bradshaw, C. Welt, R. Reijo-Pera, P. H. Vogt, P. Touraine, S. Bione, D. Toniolo, L. M. Nelson, et al.
Sequence variation at the human FOXO3 locus: a study of premature ovarian failure and primary amenorrhea
Hum. Reprod., January 1, 2008; 23(1): 216 - 221.
[Abstract] [Full Text] [PDF]


Home page
BioinformaticsHome page
M. Masso and I. I. Vaisman
Accurate prediction of enzyme mutant activity based on a multibody statistical potential
Bioinformatics, December 1, 2007; 23(23): 3155 - 3161.
[Abstract] [Full Text] [PDF]


Home page
HypertensionHome page
D. Conen, R. J. Glynn, J. E. Buring, P. M Ridker, and R. Y.L. Zee
Natriuretic Peptide Precursor A Gene Polymorphisms and Risk of Blood Pressure Progression and Incident Hypertension
Hypertension, December 1, 2007; 50(6): 1114 - 1119.
[Abstract] [Full Text] [PDF]


Home page
Cancer Epidemiol. Biomarkers Prev.Home page
P. Ang, I. H.K. Lim, T.-C. Lee, J.-T. Luo, D. C.T. Ong, P. H. Tan, and A. S.G. Lee
BRCA1 and BRCA2 Mutations in an Asian Clinic-based Population Detected Using a Comprehensive Strategy
Cancer Epidemiol. Biomarkers Prev., November 1, 2007; 16(11): 2276 - 2284.
[Abstract] [Full Text] [PDF]


Home page
NEJMHome page
S. M. Holland, F. R. DeLeo, H. Z. Elloumi, A. P. Hsu, G. Uzel, N. Brodsky, A. F. Freeman, A. Demidowich, J. Davis, M. L. Turner, et al.
STAT3 Mutations in the Hyper-IgE Syndrome
N. Engl. J. Med., October 18, 2007; 357(16): 1608 - 1619.
[Abstract] [Full Text] [PDF]


Home page
Arch Gen PsychiatryHome page
E. L. Dempster, I. Burcescu, K. Wigg, E. Kiss, I. Baji, J. Gadoros, Z. Tamas, J. L. Kennedy, A. Vetro, M. Kovacs, et al.
Evidence of an Association Between the Vasopressin V1b Receptor Gene (AVPR1B) and Childhood-Onset Mood Disorders
Arch Gen Psychiatry, October 1, 2007; 64(10): 1189 - 1195.
[Abstract] [Full Text] [PDF]


Home page
Cancer Res.Home page
I. P. Gorlov, P. Meyer, T. Liloglou, J. Myles, M. B. Boettger, A. Cassidy, L. Girard, J. D. Minna, R. Fischer, S. Duffy, et al.
Seizure 6-Like (SEZ6L) Gene and Risk for Lung Cancer
Cancer Res., September 1, 2007; 67(17): 8406 - 8411.
[Abstract] [Full Text] [PDF]


Home page
Nephrol Dial TransplantHome page
V. Charlton-Menys, L. Pisciotta, P. N. Durrington, R. Neary, C. D. Short, L. Calabresi, S. Calandra, and S. Bertolini
Molecular characterization of two patients with severe LCAT deficiency
Nephrol. Dial. Transplant., August 1, 2007; 22(8): 2379 - 2382.
[Full Text] [PDF]


Home page
CarcinogenesisHome page
S. Castellvi-Bel, A. Castells, R. de Cid, J. Munoz, F. Balaguer, V. Gonzalo, C. Ruiz-Ponte, M. Andreu, X. Llor, R. Jover, et al.
Association of the ARLTS1 Cys148Arg variant with sporadic and familial colorectal cancer
Carcinogenesis, August 1, 2007; 28(8): 1687 - 1691.
[Abstract] [Full Text] [PDF]


Home page
Nucleic Acids ResHome page
M. Brudno, A. Poliakov, S. Minovitsky, I. Ratnere, and I. Dubchak
Multiple whole genome alignments and novel biomedical applications at the VISTA portal
Nucleic Acids Res., July 13, 2007; 35(suppl_2): W669 - W674.
[Abstract] [Full Text] [PDF]


Home page
Nucleic Acids ResHome page
J. S. Kaminker, Y. Zhang, C. Watanabe, and Z. Zhang
CanPredict: a computational tool for predicting cancer-associated missense mutations
Nucleic Acids Res., July 13, 2007; 35(suppl_2): W595 - W598.
[Abstract] [Full Text] [PDF]


Home page
BioinformaticsHome page
F. Panitz, H. Stengaard, H. Hornshoj, J. Gorodkin, J. Hedegaard, S. Cirera, B. Thomsen, L. B. Madsen, A. Hoj, R. K. Vingborg, et al.
SNP mining porcine ESTs with MAVIANT, a novel tool for SNP evaluation and annotation
Bioinformatics, July 1, 2007; 23(13): i387 - i391.
[Abstract] [Full Text] [PDF]


Home page
Nucleic Acids ResHome page
Y. Bromberg and B. Rost
SNAP: predict effect of non-synonymous polymorphisms on function
Nucleic Acids Res., June 28, 2007; 35(11): 3823 - 3835.
[Abstract] [Full Text] [PDF]


Home page
BioinformaticsHome page
Z.-Q. Ye, S.-Q. Zhao, G. Gao, X.-Q. Liu, R. E. Langlois, H. Lu, and L. Wei
Finding new structural and sequence attributes to predict possible disease association of single amino acid polymorphism (SAP)
Bioinformatics, June 15, 2007; 23(12): 1444 - 1450.
[Abstract] [Full Text] [PDF]


Home page
Proc. Natl. Acad. Sci. USAHome page
G. Gasparre, A. M. Porcelli, E. Bonora, L. F. Pennisi, M. Toller, L. Iommarini, A. Ghelli, M. Moretti, C. M. Betts, G. N. Martinelli, et al.
Disruptive mitochondrial DNA mutations in complex I subunits are markers of oncocytic phenotype in thyroid tumors
PNAS, May 22, 2007; 104(21): 9001 - 9006.
[Abstract] [Full Text] [PDF]


Home page
Hum Mol GenetHome page
N. Johnson, O. Fletcher, C. Palles, M. Rudd, E. Webb, G. Sellick, I. dos Santos Silva, V. McCormack, L. Gibson, A. Fraser, et al.
Counting potentially functional variants in BRCA1, BRCA2 and ATM predicts breast cancer susceptibility
Hum. Mol. Genet., May 1, 2007; 16(9): 1051 - 1057.
[Abstract] [Full Text] [PDF]


Home page
J. Am. Soc. Nephrol.Home page
K. Tory, T. Lacoste, L. Burglen, V. Moriniere, N. Boddaert, M.-A. Macher, B. Llanas, H. Nivet, A. Bensman, P. Niaudet, et al.
High NPHP1 and NPHP6 Mutation Rate in Patients with Joubert Syndrome and Nephronophthisis: Potential Epistatic Effect of NPHP6 and AHI1 Mutations in Patients with NPHP1 Mutations
J. Am. Soc. Nephrol., May 1, 2007; 18(5): 1566 - 1575.
[Abstract] [Full Text] [PDF]


Home page
Physiol. GenomicsHome page
S. Savas, I. W. Taylor, J. L. Wrana, and H. Ozcelik
Functional nonsynonymous single nucleotide polymorphisms from the TGF-{beta} protein interaction network
Physiol Genomics, April 24, 2007; 29(2): 109 - 117.
[Abstract] [Full Text] [PDF]


Home page
BioinformaticsHome page
M. A. Care, C. J. Needham, A. J. Bulpitt, and D. R. Westhead
Deleterious SNP prediction: be mindful of your training data!
Bioinformatics, March 15, 2007; 23(6): 664 - 672.
[Abstract] [Full Text] [PDF]


Home page
Mol Biol EvolHome page
S. Seixas, G. Suriano, F. Carvalho, R. Seruca, J. Rocha, and A. Di Rienzo
Sequence Diversity at the Proximal 14q32.1 SERPIN Subcluster: Evidence for Natural Selection Favoring the Pseudogenization of SERPINA2
Mol. Biol. Evol., February 1, 2007; 24(2): 587 - 598.
[Abstract] [Full Text] [PDF]


Home page
Cancer Res.Home page
J. S. Kaminker, Y. Zhang, A. Waugh, P. M. Haverty, B. Peters, D. Sebisanovic, J. Stinson, W. F. Forrest, J. F. Bazan, S. Seshagiri, et al.
Distinguishing Cancer-Associated Missense Mutations from Common Polymorphisms
Cancer Res., January 15, 2007; 67(2): 465 - 473.
[Abstract] [Full Text] [PDF]


Home page
Nucleic Acids ResHome page
A. G. Jegga, S. Gowrisankar, J. Chen, and B. J. Aronow
PolyDoms: a whole genome database for the identification of non-synonymous coding SNPs with the potential to impact disease
Nucleic Acids Res., January 12, 2007; 35(suppl_1): D700 - D706.
[Abstract] [Full Text] [PDF]


Home page
BloodHome page
P. D. James, C. Notley, C. Hegadorn, J. Leggo, A. Tuttle, S. Tinlin, C. Brown, C. Andrews, A. Labelle, Y. Chirinian, et al.
The mutational spectrum of type 1 von Willebrand disease: results from a Canadian cohort study
Blood, January 1, 2007; 109(1): 145 - 154.
[Abstract] [Full Text] [PDF]


Home page
Cancer Epidemiol. Biomarkers Prev.Home page
A. Koushik, P. Kraft, C. S. Fuchs, S. E. Hankinson, W. C. Willett, E. L. Giovannucci, and D. J. Hunter
Nonsynonymous Polymorphisms in Genes in the One-Carbon Metabolism Pathway and Associations with Colorectal Cancer
Cancer Epidemiol. Biomarkers Prev., December 1, 2006; 15(12): 2408 - 2417.
[Abstract] [Full Text] [PDF]


Home page
BioinformaticsHome page
E. Capriotti, R. Calabrese, and R. Casadio
Predicting the insurgence of human genetic diseases associated to single point protein mutations with support vector machines and evolutionary information
Bioinformatics, November 15, 2006; 22(22): 2729 - 2734.
[Abstract] [Full Text] [PDF]


Home page
Hum Mol GenetHome page
E. L. Webb, M. F. Rudd, G. S. Sellick, R. El Galta, L. Bethke, W. Wood, O. Fletcher, S. Penegar, L. Withey, M. Qureshi, et al.
Search for low penetrance alleles for colorectal cancer through a scan of 1467 non-synonymous SNPs in 2575 cases and 2707 controls with validation by kin-cohort analysis of 14 704 first-degree relatives
Hum. Mol. Genet., November 1, 2006; 15(21): 3263 - 3271.
[Abstract] [Full Text] [PDF]


Home page
Am J EpidemiolHome page
P. Bhatti, D. M. Church, J. L. Rutter, J. P. Struewing, and A. J. Sigurdson
Candidate Single Nucleotide Polymorphism Selection using Publicly Available Tools: A Guide for Epidemiologists
Am. J. Epidemiol., October 15, 2006; 164(8): 794 - 804.
[Abstract] [Full Text] [PDF]


Home page
DiabetesHome page
S. Onengut-Gumuscu, J. H. Buckner, and P. Concannon
A Haplotype-Based Analysis of the PTPN22 Locus in Type 1 Diabetes.
Diabetes, October 1, 2006; 55(10): 2883 - 2889.
[Abstract] [Full Text] [PDF]


Home page
J ANIM SCIHome page
A. Tomas, J. Casellas, O. Ramirez, G. Munoz, J. L. Noguera, and A. Sanchez
High amino acid variation in the intracellular domain of the pig prolactin receptor (PRLR) and its relation to ovulation rate and piglet survival traits
J Anim Sci, August 1, 2006; 84(8): 1991 - 1998.
[Abstract] [Full Text] [PDF]


Home page
BloodHome page
M. F. Rudd, G. S. Sellick, E. L. Webb, D. Catovsky, and R. S. Houlston
Variants in the ATM-BRCA2-CHEK2 axis predispose to chronic lymphocytic leukemia
Blood, July 15, 2006; 108(2): 638 - 644.
[Abstract] [Full Text] [PDF]


Home page
IOVSHome page
L. S. Sullivan, S. J. Bowne, D. G. Birch, D. Hughbanks-Wheaton, J. R. Heckenlively, R. A. Lewis, C. A. Garcia, R. S. Ruiz, S. H. Blanton, H. Northrup, et al.
Prevalence of disease-causing mutations in families with autosomal dominant retinitis pigmentosa: a screen of known genes in 200 families.
Invest. Ophthalmol. Vis. Sci., July 1, 2006; 47(7): 3052 - 3064.
[Abstract] [Full Text] [PDF]


This Article
Right arrow Abstract Freely available
Right arrow Print PDF (137K) Freely available
Right arrow Alert me when this article is cited
Right arrow Alert me if a correction is posted
Services
Right arrow Email this article to a friend
Right arrow Similar articles in this journal
Right arrow Similar articles in ISI Web of Science
Right arrow Similar articles in PubMed
Right arrow Alert me to new issues of the journal
Right arrow Add to My Personal Archive
Right arrow Download to citation manager
Right arrow Search for citing articles in:
ISI Web of Science (429)
Right arrowRequest Permissions
Right arrow Commercial Re-use Guidelines
for Open Access NAR Content
Google Scholar
Right arrow Articles by Ramensky, V.
Right arrow Articles by Sunyaev, S.
Right arrow Search for Related Content
PubMed
Right arrow PubMed Citation
Right arrow Articles by Ramensky, V.
Right arrow Articles by Sunyaev, S.
Social Bookmarking
 Add to CiteULike   Add to Connotea   Add to Del.icio.us  
What's this?