Nucleic Acids Research Advance Access originally published online on June 21, 2007
Nucleic Acids Research 2007 35(Web Server issue):W495-W498; doi:10.1093/nar/gkm406
| ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
Nucleic Acids Research, 2007, Vol. 35, No. suppl_2 W495-W498
© 2007 The Author(s)
This is an Open Access article distributed under the terms of the Creative Commons Attribution Non-Commercial License (http://creativecommons.org/licenses/by-nc/2.0/uk/) which permits unrestricted non-commercial use, distribution, and reproduction in any medium, provided the original work is properly cited.
Articles |
Sequence harmony: detecting functional specificity from alignments
1Centre for Integrative Bioinformatics VU (IBIVU) and 2Institute of Molecular Cell Biology, Vrije Universiteit Amsterdam, De Boelelaan 1081A, 1081 HV Amsterdam, The Netherlands
*To whom correspondence should be addressed. Tel: +31 20 598 7649; Fax: +31 20 598 7653; Email: heringa{at}few.vu.nl
Received January 31, 2007. Revised April 5, 2007. Accepted May 3, 2007.
| ABSTRACT |
|---|
|
|
|---|
Multiple sequence alignments are often used for the identification of key specificity-determining residues within protein families. We present a web server implementation of the Sequence Harmony (SH) method previously introduced. SH accurately detects subfamily specific positions from a multiple alignment by scoring compositional differences between subfamilies, without imposing conservation. The SH web server allows a quick selection of subtype specific sites from a multiple alignment given a subfamily grouping. In addition, it allows the predicted sites to be directly mapped onto a protein structure and displayed. We demonstrate the use of the SH server using the family of plant mitochondrial alternative oxidases (AOX). In addition, we illustrate the usefulness of combining sequence and structural information by showing that the predicted sites are clustered into a few distinct regions in an AOX homology model. The SH web server can be accessed at www.ibi.vu.nl/programs/seqharmwww.
| INTRODUCTION |
|---|
|
|
|---|
During the past years there has been a wide interest in studies of specificity-determining residues within protein subfamilies (1). Consequently, an increasing number of methods and web applications has become available that offer functional analyses of subtype specificity from multiple alignments (25).
Previously we evaluated advantages and limitations of several of the state-of-the-art methods and introduced a new method called Sequence Harmony (SH) (6). SH accurately detects positions within an alignment that are responsible for functional differences between two protein subfamilies. To further facilitate the use of SH for a broad audience, we have implemented a comprehensive web server. The SH web server offers a fast, one-step analysis with a number of options, yielding results that can be interpreted easily.
In this article we will guide the user through all the steps of the SH web-application by means of a biologically relevant example. We will look for subtype specific sites for the two subfamilies of the alternative oxidases (AOX) protein family of plant alternative oxidases. The subtype specific sites found are the best candidates to explain the functional differences. Other relevant applications of this method include pathway specificity, ligand specificity, host-specific viral infection, viral disease progression differences and viral drug resistance.
| METHODS |
|---|
|
|
|---|
Implementation
The SH method, as described previously (6), is currently implemented as an awk program. The main steps are as follows. Sequences are read from the alignment and separated into two user-specified groups. For each group separately and combined entropies (E) are calculated. SH values are calculated as
(pA + pB) log (pA + pB), i.e. using the sum of the normalized frequencies of groups A and B. SH values range from zero for completely non-overlapping residue compositions, to one for identical compositions. Next, sites are selected that have a SH score below a cutoff. Stretches of neighbouring selected sites are identified and the size of each of these stretches is assigned to the sites as the `Rank'. Finally, selected sites are sorted on i) increasing SH, ii) decreasing Rank and iii) increasing entropy. This sorted list of selected sites is the primary result of the SH algorithm. Currently alignments of up to about 5 million residues (including gaps) can be processed with runtimes on the order of tens of seconds, excluding the generation of the high-quality output PyMol image. The separate chains of the optional protein structure (PDB file) are aligned with the input alignment using the `profile' option of Muscle (7). Subsequently, the chains with the highest average SH score with respect to the input alignment are selected for graphical display as an image in an interactive Jmol applet.
Use of the Server
The SH web server contains basic and advanced options. Sanity of the input options is checked, e.g. whether an input alignment is uploaded or pasted, and not both, whether it is proper FASTA format. Problems are reported with an error message and offending input fields are highlighted. An example screenshot of the main page and input form is shown in Figure 1. The main input is a multiple sequence alignment of the protein family and a subdivision into two groups. Sequence data in FASTA format is case-insensitive and may be split over multiple lines; gaps should be represented with a dash (`'); and lines beginning with a semi-colon (`;') are ignored as comment. From the alignment the program automatically generates a list of sequence IDs. The user defines the two input groups by selecting the identifier of the first sequence of the second group. By default, a cutoff of 0.2 for the SH score is used.
|
Advanced features allow more control over the analysis and output, but the default settings usually suffice for a basic analysis. The SH cutoff can be adjusted between zero (allowing no compositional overlap) and one (allowing full overlap). A higher cutoff will lead to the selection of a larger number of sites. A reference sequence can be selected from the alignment and a starting position can be set to provide a reference numbering in the output tables. Additionally, a PDB identifier can be specified by its four-letter code, to visualize the selected sites in the protein structure. Alternatively, a PDB file can be uploaded. The PDB file is automatically aligned with the multiple alignment, or manually when the reference position is set.
The output is a sorted table and optional graphical representations of the selected sites as an image (generated by PyMol, www.pymol.org) and as an interactive Jmol applet (8). The Jmol applet includes buttons to set the SH cutoff for displaying selected sites, and to show or hide non-aligned chains in the PDB file. An example screenshot of the main output page is shown in Figure 2. An additional, non-sorted, table is available that lists all sites in the alignment. The tables are coloured from red (predicted subtype specific sites) to blue (sites not predicted).
|
| RESULTS |
|---|
|
|
|---|
The AOX family of alternative plant oxidases
The cyanide-insensitive plant alternative oxidase (AOX) is a mitochondrial non-haeme quinol oxidase (9). An established function of AOX is thermogenesis in the spadices of Aroid lilies, but little is known about AOX function in other tissues (10). AOX is encoded in two discrete gene subfamilies (11). AOX1 is found in monocot and dicot plants and is induced by stress stimuli (11,12). AOX2 is usually constitutive and has at present not been found in monocot plants (11).
We retrieved 31 AOX sequences from the NCBI non-redundant database (www.ncbi.nih.gov); 24 of the AOX1 and 7 of the AOX2 subfamily. Sequences were aligned using PSI-Praline (13,14) using secondary structure prediction information from PSIPRED (15). The alignment is available as Supplementary Data. Lacking an experimental AOX structure, we constructed a homology model using of the catalytic iron-binding N-terminal domain. Swiss-Model (16) was used to align the Sauromatum guttatum AOX1 sequence with the template structure 1AFR (17), following the description of Andersson and Nordlund (18). The PDB file is provided as Supplementary Data.
Sample input
The multiple alignment obtained was uploaded in the appropriate box of the SH web server and the first AOX2 sequence (AOX2_Arab_th) was selected as the first sequence of the second group (Figure 1). The default SH cutoff of 0.2 was used, meaning that partial overlap in residue composition is allowed. We chose AOX1 from Sauromatum guttatum as reference sequence so that the subtype specific sites obtained will be numbered accordingly (18). The PDB file of the model structure was uploaded to the SH web server.
Sample output
The results page of the SH server (Figure 2) shows at the top a summary of the input parameters, and optionally the graphical representations of the protein structure. Below that is the table containing all sites with a SH score (SH) below the cutoff value (default 0.2). These are sorted first on increasing SH score, then on decreasing rank (Rnk, number of neighbouring sites below the cutoff) and finally on increasing total Entropy (AB) of the alignment position, as described previously (6). Position Ali and Ref show the sequence position in the alignment and reference sequence, respectively. The `Consensus' columns give all residue types present in groups A and B, respectively, in order of decreasing frequency and in lowercase when the frequency is less than half of the highest. In addition, a link is provided to the raw table, that holds all information of the complete alignment without selection or ranking. The raw table is available as Supplementary Data.
For the AOX family we found nine sites with a SH value of zero (bright red in Figure 2). An additional eight sites have SH values between zero and 0.2 (from light red to white). In the `Rnk' column, a value of 2 signifies a stretch of two consecutive sites with SH values below the cutoff (alignment positions 244245 and 121122). Interestingly, out of nine sites with zero harmony, four have non-zero entropy in at least one group, i.e. they are not totally conserved within both groups (alignment positions 136, 245, 269, 354). This is also reflected in the `Consensus' columns, which show multiple residue types occurring.
The graphical representations of the protein structure show the predicted sites in our homology model of the C-terminal domain (positions 182352), see Figure 3 and also Figure 2. This domain contains two sites of zero harmony and six of low harmony. Four low-harmony sites on the upper-left (Phe224, Val227, Ala228 and Asn284) are close to the Glu and His iron binding residues (18) (Figure 3), and could conceivably play a role in ligand binding or activation mechanisms. Three other low-harmony sites, Arg199, Tyr249 and Gly252 are located around a putative membrane-binding region (18), and could conceivably play a role in modulating the protein-membrane interactions or substrate access. The functional prediction of SH from the multiple alignment seems to be consistent with the structural prediction using homology modelling, although a different structural model could lead to different conclusions.
|
| CONCLUSION |
|---|
|
|
|---|
Our SH web server identifies putative subtype specific sites based on an alignment and selection of two groups as input only. In addition, sites can be easily linked to structural information using a structure from the PDB directly, or, as shown in our example, using a homology model of the protein. Using this structural information, selected subtype specific sites can be grouped into spatial clusters of sites that are likely to share functional relationships.
The combination of the SH algorithm and protein structural information has yielded a useful tool to interpret multiple sequence alignments, and to guide subsequent selection of interesting sites for experimental investigation.
| SUPPLEMENTARY DATA |
|---|
|
|
|---|
The AOX input alignment, raw output table, `coloured' PDB and pymol script are available as Supplementary Data. In addition, the output is available at www.ibi.vu.nl/programs/seqharmwww/showcase_aox.
| ACKNOWLEDGEMENTS |
|---|
K.A.F. and W.P. acknowledge financial support from the Netherlands Bioinformatics Centre, BioRange Bioinformatics research programme (SP 2.3.1 and SP 3.2.2) and the EC Network of Excellence `ENFIN' (LSHG-CT-2005-518254). The Open Access publication charges for this article were waived by Oxford University Press.
Conflict of interest statement. None declared.
| REFERENCES |
|---|
|
|
|---|
- Whisstock JC, Lesk AM. Prediction of protein function from protein sequence and structure. Quart. Rev. Biophys (2003) 36:307340.[CrossRef][Web of Science][Medline]
- Mirny LA, Gelfand MS. Using orthologous and paralogous proteins to identify specificity-determining residues in bacterial transcription factors. J. Mol. Biol (2002) 321:720.[CrossRef][Web of Science][Medline]
- Kalinina OV, Mironov AA, Gelfand MS, Rakhmaninova AB. Automated selection of positions determining functional specificity of proteins by comparative analysis of orthologous groups in protein families. Prot. Sci (2004) 13:443456.[CrossRef][Web of Science][Medline]
- Donald JE, Shakhnovich EI. Predicting specificity-determining residues in two large eukaryotic transcription factor families. Nucleic Acids Res (2005) 33:44554465.
[Abstract/Free Full Text] - Ye K, Lameijer E, Beukers MI, Jzerman A. A two-entropies analysis to identify functional positions in the transmembrane region of class A G protein-coupled receptors. Proteins (2006) 63:10181030.[CrossRef][Web of Science][Medline]
- Pirovano W, Feenstra KA, Heringa J. Sequence comparison by sequence harmony identifies subtype specific functional sites. Nucleic Acids Res (2006) 34:65406548.
[Abstract/Free Full Text] - Edgar RC. MUSCLE: multiple sequence alignment with high accuracy and high throughput. Nucleic Acids Res (2004) 32:17921797.
[Abstract/Free Full Text] - Herráez A. Biomolecules in the computer: Jmol to the rescue. Biochem. Educ (2006) 34:255261.[Web of Science]
- Affourtit C, Albury M, Crichton P, Moore A. Exploring the molecular nature of alternative oxidase regulation and catalysis. FEBS Lett (2002) 510:121126.[CrossRef][Web of Science][Medline]
- Wagner A, Moore A. Structure and function of the plant alternative oxidase: its putative role in the oxygen defence mechanism. Biosci. Reports (1997) 17:319333.[CrossRef][Web of Science][Medline]
- Considine M, Holzapffel R, Day D, Whelan J, Millar A. Molecular distinction between alternative oxidase from monocots and dicots. Plant Physiol (2002) 129:949953.
[Free Full Text] - Vanlerberghe G, McIntosh L. Alternative oxidase: from gene to function. Annu. Rev. Plant Physiol. Plant Mol. Biol (1997) 48:703734.[CrossRef][Web of Science][Medline]
- Simossis VA, Heringa J. The PRALINE online server: optimising progressive multiple alignment on the web. Comput. Biol. Chem (2003) 27:511519.[CrossRef][Web of Science][Medline]
- Simossis VA, Heringa J. PRALINE: a multiple sequence alignment toolbox that integrates homology-extended and secondary structure information. Nucleic Acids Res (2005) 33:W289W294.
[Abstract/Free Full Text] - Jones D. Protein secondary structure prediction based on position-specific scoring matrices. J. Mol. Biol (1999) 292:195202.[CrossRef][Web of Science][Medline]
- Guex N, Peitsch M. Swiss-model and the swiss-pdbviewer: an environment for comparative protein modeling. Electrophoresis (1997) 18:27142723.[CrossRef][Web of Science][Medline]
- Lindqvist Y, Huang W, Schneider G, Shanklin J. Crystal structure of
9 stearoyl-acyl carrier protein desaturase from castor seed and its relationship to other di-iron proteins. EMBO J (1996) 15:40814092.[Web of Science][Medline] - Andersson M, Nordlund P. A revised model of the active site of alternative oxidase. FEBS Lett (1999) 449:1722.[CrossRef][Web of Science][Medline]
This article has been cited by other articles:
![]() |
B. W. Brandt and J. Heringa webPRC: the Profile Comparer for alignment-based searching of public domain databases Nucleic Acids Res., July 1, 2009; 37(suppl_2): W48 - W52. [Abstract] [Full Text] [PDF] |
||||
![]() |
S. Sankararaman, B. Kolaczkowski, and K. Sjolander INTREPID: a web server for prediction of functionally important residues by evolutionary analysis Nucleic Acids Res., July 1, 2009; 37(suppl_2): W390 - W395. [Abstract] [Full Text] [PDF] |
||||
![]() |
J. E. Donald and E. I. Shakhnovich SDR: a database of predicted specificity-determining residues in proteins Nucleic Acids Res., January 1, 2009; 37(suppl_1): D191 - D194. [Abstract] [Full Text] [PDF] |
||||
![]() |
K. Ye, K. Anton Feenstra, J. Heringa, A. P. IJzerman, and E. Marchiori Multi-RELIEF: a method to recognize specificity determining residues from multiple sequence alignments using a Machine-Learning approach for feature weighting Bioinformatics, January 1, 2008; 24(1): 18 - 25. [Abstract] [Full Text] [PDF] |
||||
| ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||




