Nucleic Acids Research Advance Access originally published online on May 5, 2007
Nucleic Acids Research 2007 35(Web Server issue):W429-W432; doi:10.1093/nar/gkm256
| ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
Nucleic Acids Research, 2007, Vol. 35, No. suppl_2 W429-W432
© 2007 The Author(s)
This is an Open Access article distributed under the terms of the Creative Commons Attribution Non-Commercial License (http://creativecommons.org/licenses/by-nc/2.0/uk/) which permits unrestricted non-commercial use, distribution, and reproduction in any medium, provided the original work is properly cited.
Articles |
Advantages of combined transmembrane topology and signal peptide predictionthe Phobius web server
1Center for Genomics and Bioinformatics, Karolinska Institutet, S-17177 Stockholm, Sweden and 2Department of Molecular Biology, University of Copenhagen, Ole Maaloes Vej 5, 2200 Copenhagen, Denmark
*To whom correspondence should be addressed. Tel: +1 206 616 5021; Fax: +1 206 685 7301; Email: lukall{at}u.washington.edu
Received January 26, 2007. Revised March 22, 2007. Accepted April 8, 2007.
| ABSTRACT |
|---|
|
|
|---|
When using conventional transmembrane topology and signal peptide predictors, such as TMHMM and SignalP, there is a substantial overlap between these two types of predictions. Applying these methods to five complete proteomes, we found that 3065% of all predicted signal peptides and 2535% of all predicted transmembrane topologies overlap. This impairs predictions of 510% of the proteome, hence this is an important issue in protein annotation.
To address this problem, we previously designed a hidden Markov model, Phobius, that combines transmembrane topology and signal peptide predictions. The method makes an optimal choice between transmembrane segments and signal peptides, and also allows constrained and homology-enriched predictions.
We here present a web interface (http://phobius.cgb.ki.se and http://phobius.binf.ku.dk) to access Phobius.
| INTRODUCTION |
|---|
|
|
|---|
Traditional transmembrane topology predictors often predict signal peptides as transmembrane segments, and vice versa signal peptide predictors often predict N-terminal transmembrane segments as signal peptides. This fact is often overlooked when testing prediction methods, and is the main cause for very different test results. A frequent advice how to circumvent the problem of these cross-predictions is to remove predicted signal peptides before predicting transmembrane proteins (1), or to remove proteins with transmembrane segments when predicting signal peptides (2). However, as the number of errors due to cross predictions is roughly the same for the two kinds of predictors (3), the gain will be as high as the loss by such approaches.
To resolve the ambiguities we have, in a previous study, designed a hidden Markov model, Phobius, containing submodels for both signal peptides and transmembrane segments (see Figure 1). We obtain better discrimination by forcing the predictor to chose between the two types of features. A benchmark (3) showed that false classifications of signal peptides were reduced from TMHMM's (4) 26 to 4% and false classifications of transmembrane helices were reduced from SignalP 2.0's (5) 19 to 8%. An advantage is that the method even increased the high accuracy of TMHMM in predicting pure transmembrane topologies from 44.5 to 53.9% correctly predicted topologies. Since this benchmark, a new version SignalP 3.0 (6) has been published. Its false positive rate on transmembrane proteins is however as high as before. On the same set of transmembrane proteins without signal peptides used in the previous benchmark, SignalP 3.0 produces false predictions on 21% (52 of 247) of the test sequences.
|
Here, we present an overlap analysis between signal peptides predictions and transmembrane segment predictions done by conventional predictors on five proteomes. We also give a description of the Phobius web interface.
| WHOLE PROTEOME OVERLAP ANALYSIS |
|---|
|
|
|---|
To investigate how large a problem the overlap between predictions between conventional signal peptide predictors and transmembrane topology predictors are at whole proteome level, we tried to annotate five different proteomes using a combination of SignalP 3.0 (6) and TMHMM 2.0 (4). The results are given in Table 1. We found that 510% of all the proteins have predicted transmembrane segments that overlap predicted signal peptides. Since only one of the methods can be correct, this casts doubt on 3065% of all predicted signal peptides and 2535% of all predicted transmembrane topologies. Both predictions are roughly equally frequent in a proteome, and their false positive rates are more or less the same, hence we cannot tell which method is correct based on SignalP and TMHMM predictions in these cases. Phobius thus solves a problem that other signal peptide predictors and transmembrane topology predictors cannot handle.
|
| DESCRIPTION OF WEB INTERFACE |
|---|
|
|
|---|
The Phobius web server provides an easy and accurate mean to predict signal peptides and transmembrane topology from an amino acid sequence. The sequences should be submitted in fasta format, preferably uploaded as a file. The predictions are given either in shortsingle line text output or longUniProt feature table styled output (see Figure 2).
|
All predictions made by the Phobius server can optionally be accompanied by a posterior label (location) probability plot. The posterior label probability is the probability for a location (cytoplasm, non-cytoplasm, membrane or signal peptide) of a residue given the whole sequence (see Figure 2). Note that the posterior probability plot is not a prediction in itself. The pattern of the plot might even deviate from the prediction, which would be a sign of uncertainty in the prediction.
In normal prediction mode as well as in the constrained prediction mode described below, sequences are decoded with the 1-best algorithm (7).
| CONSTRAINED PREDICTION |
|---|
|
|
|---|
The accuracy of the predictions can be greatly improved if we can include information about the location of a part of the sequence in a constrained prediction (8). Typically we could have experimental data at hand from reporter fusions (9), antibody experiments, or have knowledge of the location due to functional requirements of a site (10). The Phobius web server provides a service to let the user specify such constraints for a prediction. The user may specify that a residue resides in a cytoplasmic loop, non-cytoplasmic loop or a transmembrane segment. One can also specify that the N-terminal part of the sequence is a signal peptide.
Here we maximize P(Labels,Sequence | Model) P(Labels | Constraints). This is implemented by a modification in the forwardbackward (11) calculations; we multiply the forward probability for a state with the P(Label | Constraint) in the constrained sequence positions.
As the membrane, signal peptide or a cytoplasmic loop states are uniquely identified by one single label in the Phobius model (Figure 1), we set P(Label | Constraint) to 1 for the label corresponding to the constraint and 0 for all other labels in the constrained position. Non-cytoplasmic loops, on the other hand, can have two different labels. Here we assign 0.5 probability to each of the two constrained labels, and 0 to all other labels.
| PREDICTION WITH HOMOLOGS |
|---|
|
|
|---|
Since homologous sequences are likely to share both transmembrane topology and absence or presence of signal peptides, we can gain extra support for a prediction by examining the query sequence's homologs. This is the supporting idea for PolyPhobius, whose algorithm is described in a separate paper (12).
Here the server BLASTs the query sequence against UniProt. Hits with an E-value lower than 1E5 covering more than 75% of the sequence length are used as support for the prediction. The full-length sequences are then realigned using a multiple sequence alignment program, and weighted with the Henikoff and Henikoff weighting scheme (13).
When we measured the performance of the approach, we found a significant increase in accuracy for transmembrane topology prediction accuracy (from 67.8 to 74.7% correct topologies) and as well as improvement in signal peptide prediction accuracy (increase in Matthews correlation from 0.901 to 0.921) as compared to Phobius without homologyenrichment (12).
The user can also submit his own alignment in Fasta format. In this case, the transmembrane topology and presence of signal peptide of the first sequence will be predicted taking the other sequences in the alignment into account.
| IMPLEMENTATION |
|---|
|
|
|---|
The Phobius web server is implemented as a Perl CGI-script. Plots are produced by gnuplot. Normal predictions are made with the ANHMM package (our unpublished data), while constrained predictions and predictions with homologs are done by HomologHMM package (12). Multiple sequence alignments are produced with Kalign (14).
| AVAILABILITY |
|---|
|
|
|---|
The Phobius web server is available at http://phobius.cgb.ki.se/ and http://phobius.binf.ku.dk/. Stand-alone versions of the software for academic users for Linux and SunOS are available on request.
| ACKNOWLEDGEMENTS |
|---|
Funding to pay the Open Access publication charges for this article was provided by Pharmacia corp.
Conflict of interest statement. None declared.
| Footnotes |
|---|
Present addresses: Lukas Käll, Department of Genome Sciences, University of Washington, Seattle WA, USA
Erik L.L. Sonnhammer, Stockholm Bioinformatics Center, Stockholm University, Stockholm, Sweden
| REFERENCES |
|---|
|
|
|---|
- Lao DM, Arai M, Ikeda M, Shimizu T. The presence of signal peptide significantly affects transmembrane topology prediction. Bioinformatics (2002) 18:15621566.
[Free Full Text] - Klee EW, Ellis LBM. Evaluating eukaryotic secreted protein prediction. BMC Bioinformatics (2005) 6:256.[CrossRef][Medline]
- Käll L, Krogh A, Sonnhammer ELL. A combined transmembrane topology and signal peptide prediction method. J. Mol. Biol (2004) 338:10271036.[CrossRef][Web of Science][Medline]
- Krogh A, Larsson B, vonHeijne G, Sonnhammer EL. Predicting transmembrane protein topology with a hidden Markov model: application to complete genomes. J. Mol. Biol (2001) 305:567580.[CrossRef][Web of Science][Medline]
- Nielsen H, Engelbrecht J, Brunak S, vonHeijne G. A neural network method for identification of prokaryotic and eukaryotic signal peptides and prediction of their cleavage sites. Int. J. Neural Syst (1997) 8:581599.[CrossRef][Medline]
- Bendtsen JD, Nielsen H, vonHeijne G, Brunak S. Improved prediction of signal peptides: SignalP 3.0. J. Mol. Biol (2004) 340:783795.[CrossRef][Web of Science][Medline]
- Schwartz R, Chow Y. The N-best algorithm: an efficient and exact procedure for finding the N most likely sentence hypotheses. In. Proceedings of ICASSP 1990 (1990) 8184.
- Melen K, Krogh A, vonHeijne G. Reliability measures for membrane protein topology prediction algorithms. J. Mol. Biol (2003) 327:735744.[CrossRef][Web of Science][Medline]
- Daley DO, Rapp M, Granseth E, Melen K, Drew D, vonHeijne G. Global topology analysis of the Escherichia coli inner membrane proteome. Science (2005) 308:13211323.
[Abstract/Free Full Text] - Henricson A, Käll L, Sonnhammer ELL. A novel transmembrane topology of presenilin based on reconciling experimental and computational evidence. FEBS J (2005) 272:27272733.[CrossRef][Medline]
- Rabiner L. A tutorial on hidden Markov models and selected aplications in speech recognition. Proceedings of the IEEE (1989) 77:257286.[CrossRef][Web of Science]
- Käll L, Krogh A, Sonnhammer ELL. An HMM posterior decoder for sequence feature prediction that includes homology information. Bioinformatics (2005) 21((Suppl. 1)):251257.[CrossRef]
- Henikoff S, Henikoff JG. Position-based sequence weights. J. Mol. Biol (1994) 243:574578.[CrossRef][Web of Science][Medline]
- Lassmann T, Sonnhammer ELL. Kalignan accurate and fast multiple sequence alignment algorithm. BMC Bioinformatics (2005) 6:298.[CrossRef][Medline]
This article has been cited by other articles:
![]() |
B. Bostan, R. Greiner, D. Szafron, and P. Lu Predicting homologous signaling pathways using machine learning Bioinformatics, November 15, 2009; 25(22): 2913 - 2920. [Abstract] [Full Text] [PDF] |
||||
![]() |
J. A. Hiss and G. Schneider Architecture, function and prediction of long signal peptides Brief Bioinform, September 1, 2009; 10(5): 569 - 578. [Abstract] [Full Text] [PDF] |
||||
![]() |
D. Heintz, S. Gallien, S. Wischgoll, A. K. Ullmann, C. Schaeffer, A. K. Kretzschmar, A. van Dorsselaer, and M. Boll Differential Membrane Proteome Analysis Reveals Novel Proteins Involved in the Degradation of Aromatic Compounds in Geobacter metallireducens Mol. Cell. Proteomics, September 1, 2009; 8(9): 2159 - 2169. [Abstract] [Full Text] [PDF] |
||||
![]() |
C. A. Bippes, A. Zeltina, F. Casagrande, M. Ratera, M. Palacin, D. J. Muller, and D. Fotiadis Substrate Binding Tunes Conformational Flexibility and Kinetic Stability of an Amino Acid Antiporter J. Biol. Chem., July 10, 2009; 284(28): 18651 - 18663. [Abstract] [Full Text] [PDF] |
||||
![]() |
K. Dejima, D. Murata, S. Mizuguchi, K. H. Nomura, K. Gengyo-Ando, S. Mitani, S. Kamiyama, S. Nishihara, and K. Nomura The ortholog of human solute carrier family 35 member B1 (UDP-galactose transporter-related protein 1) is involved in maintenance of ER homeostasis and essential for larval development in Caenorhabditis elegans FASEB J, July 1, 2009; 23(7): 2215 - 2225. [Abstract] [Full Text] [PDF] |
||||
![]() |
W. T. M. Mooij, E. Mitsiki, and A. Perrakis ProteinCCD: enabling the design of protein truncation constructs for expression and crystallization experiments Nucleic Acids Res., July 1, 2009; 37(suppl_2): W402 - W405. [Abstract] [Full Text] [PDF] |
||||
![]() |
T. A. Prokhorova, K. T. G. Rigbolt, P. T. Johansen, J. Henningsen, I. Kratchmarova, M. Kassem, and B. Blagoev Stable Isotope Labeling by Amino Acids in Cell Culture (SILAC) and Quantitative Comparison of the Membrane Proteomes of Self-renewing and Differentiating Human Embryonic Stem Cells Mol. Cell. Proteomics, May 1, 2009; 8(5): 959 - 970. [Abstract] [Full Text] [PDF] |
||||
![]() |
P.G. Bagos, K.D. Tsirigos, S.K. Plessas, T.D. Liakopoulos, and S.J. Hamodrakas Prediction of signal peptides in archaea Protein Eng. Des. Sel., January 1, 2009; 22(1): 27 - 35. [Abstract] [Full Text] [PDF] |
||||
![]() |
R. J. Dutton, D. Boyd, M. Berkmen, and J. Beckwith Bacterial species exhibit diversity in their mechanisms and capacity for protein disulfide bond formation PNAS, August 19, 2008; 105(33): 11933 - 11938. [Abstract] [Full Text] [PDF] |
||||
![]() |
S. K. Ward, E. A. Hoye, and A. M. Talaat The Global Responses of Mycobacterium tuberculosis to Physiological Levels of Copper J. Bacteriol., April 15, 2008; 190(8): 2939 - 2946. [Abstract] [Full Text] [PDF] |
||||
| ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||










