ConPred II: a consensus prediction method for obtaining transmembrane topology models with high reliability
1 Department of Electronic and Information System Engineering, Faculty of Science and Technology, Hirosaki University, Hirosaki 036-8561, Japan, 2 Department of Developmental Biology and Neuroscience, Graduate School of Life Sciences, Tohoku University, Sendai 980-8577, Japan, 3 Science of Bioresources Program, The United Graduate School of Agricultural Sciences, Iwate University, Morioka 020-8550, Japan and 4 Department of Molecular Immunology, Institute of Development, Aging and Cancer, Tohoku University, Sendai 980-8575, Japan
* To whom correspondence should be addressed. Tel: +81 172 39 3638; Fax: +81 172 39 3638; Email: slsimi{at}si.hirosaki-u.ac.jp
Received February 7, 2004; Revised and Accepted March 15, 2004
| ABSTRACT |
|---|
|
|
|---|
ConPred II (http://bioinfo.si.hirosaki-u.ac.jp/~ConPred2/) is a server for the prediction of transmembrane (TM) topology [i.e. the number of TM segments (TMSs), TMS positions and N-tail location] based on a consensus approach by combining the results of several proposed methods. The ConPred II system is constructed from ConPred_elite and ConPred_all (previously named ConPred), proposed earlier by our group. The prediction accuracy of ConPred_elite is almost 100%, which is achieved by sacrificing the prediction coverage (2030%). ConPred_all predicts TM topologies for all the input sequences with accuracies improved by up to 11% over individual proposed methods. In the ConPred II system, the TM topology prediction of input TM protein sequences is executed following a two-step process: (i) input sequences are first run through the ConPred_elite program; (ii) sequences for which ConPred_elite does not give the TM topology are delivered to the ConPred_all program for TM topology prediction. Users can get access to the ConPred II system automatically by submitting sequences to the server. The ConPred II server will return the predicted TM topology models and graphical representations of their contents (hydropathy plots, helical wheel diagrams of predicted TMSs and snake-like diagrams).
| INTRODUCTION |
|---|
|
|
|---|
The functions of transmembrane (TM) proteins are inferable, at least roughly, from knowing the TM topology, i.e. the number of TM segments (TMSs), TMS positions and N-tail location (1,2), and high-quality TM topology data are required for the comprehensive functional identification of TM proteins. For this reason, various TM topology prediction methods have been developed to date, but they are not accurate enough, i.e. at most 5060% accuracy in terms of the prediction of whole TM topology (35).
We have proposed two TM topology prediction methods with improved accuracies based on the consensus approach: ConPred_elite (6) and ConPred_all (previously named as ConPred) (4,7). ConPred_elite achieves prediction reliability of >95% by sacrificing prediction coverage (estimated at 2030%). ConPred_all improves the prediction accuracy of TM topology by up to 11% over individual proposed methods.
In this paper, we present a consensus prediction server, ConPred II, constructed from ConPred_elite and ConPred_all, for obtaining more reliable TM topology models which should serve for e.g. the comprehensive classification and identification of TM protein functions, and three-dimensional (3D) structural modeling of TM proteins.
| ALGORITHMS AND PREDICTION ACCURACIES |
|---|
|
|
|---|
The ConPred II system predicts the TM topology of an input TM protein sequence using a two-step procedure. First, the ConPred_elite program is applied to the input sequence. In cases when ConPred_elite does not give a TM topology prediction, the ConPred_all program predicts the TM topology. The details of the ConPred_elite and ConPred_all programs are described below.
Dataset and TM topology prediction methods used
As a training dataset for tuning ConPred II (both ConPred_elite and ConPred_all), we used the TMPDB_alpha_non-redundant dataset (Release 6.3), which is composed of 138 prokaryotic and 93 eukaryotic sequences with experimentally characterized TM topology information (7). The prediction methods used for consensus are KKD (8), TMpred (9), TopPred II (10), DAS (11), TMAP (12), MEMSAT 1.8 (13), SOSUI (14), TMHMM 2.0 (15) and HMMTOP 2.0 (16).
ConPred_elite
ConPrede_elite makes use of five TM topology prediction methods which can predict N-tail location too: TMpred, TMAP, MEMSAT 1.8, TMHMM 2.0 and HMMTOP 2.0. ConPred_elite targets only the sequences to which all five methods assign the same number of TMS(s) (
1). The largest distance between the center positions of the corresponding predicted TMSs is then calculated. If the distance is within 15 residues for prokaryotic and 11 for eukaryotic sequences, the average of the 5 center positions is calculated. Only when all the TMSs fit this condition the consensus TMS prediction is concluded for the target sequence. And then we determine both ends of the final individual TMSs by extending TMS stretches by as many as 10 residues toward both N- and C-termini from the average center positions. The ConPred_elite prediction is classified into two modes: (i) agree_one mode, when all the five predictions agree on one TM topology model; (ii) split_two mode, when the prediction splits into two models, i.e. all the five predictions agree on the number of TMSs and TMS positions but disagree on N-tail location.
Table 1 shows ConPred_elite's performance for 138 prokaryotic and 93 eukaryotic sequences in the TMPDB_alpha_non-redundant dataset. The prediction accuracy was evaluated on a per-sequence basis. As for TMS positions, when all the center positions of the predicted TMSs coincided within 11 residues with the corresponding TMSs in the actual data, the prediction was regarded as correct. ConPred_elite can predict TM topology almost perfectly, with reliabilities of 0.98 and 0.95 for prokaryotic and eukaryotic sequences, respectively. These high prediction reliabilities are attained by sacrificing prediction coverage (which could be called yield). The yields for prokaryotic and eukaryotic sequences are 30.4 and 21.5%, respectively, as is shown in Table 1.
|
ConPred_all
ConPred_all comprises four combinations of five prediction methods: (i) KKD, TopPred II, MEMSAT 1.8, SOSUI and TMHMM 2.0 for TMS prediction for prokaryotic sequences; (ii) TopPred II, TMAP, MEMSAT 1.8, TMHMM 2.0 and HMMTOP 2.0 for N-tail location prediction for prokaryotic sequences; (iii) KKD, DAS, MEMSAT 1.8, SOSUI and HMMTOP 2.0 for TMS prediction for eukaryotic sequences; (iv) TMpred, TopPred II, MEMSAT 1.8, TMHMM 2.0 and HMMTOP 2.0 for N-tail location for eukaryotic sequences.
TMS prediction is carried out by iterating the following four steps (from N-terminus to C-terminus): (i) the prediction results of the five individual methods are scanned toward the C-terminus to find the center position of the TMS; (ii) a window of 10 residues for prokaryotic or 11 for eukaryotic sequences is extended toward the C-terminus from the center position; (iii) when at least three TMSs are in the window, the average of the center positions is calculated and the predicted TMS is obtained as a region of 21 residues around the averaged center position; (iv) the TMSs used in the voting are masked, and the scanning of the prediction results is restarted from the residue next to the average-center position. In the consensus N-tail location prediction, a simple majority voting system is adopted. The final TM topology prediction is obtained by integrating the two results, i.e. the predicted TMSs and N-tail location.
In Table 2, the prediction accuracies of ConPred_all, together with the nine selected TM topology prediction methods, are shown for 138 prokaryotic and 93 eukaryotic sequences in the TMPDB_alpha_non-redundant dataset. The evaluation criterion with respect to TMS position is the same as in the case of ConPred_elite. It can be clearly seen that ConPred_all has accuracies higher by 510% than even the best-performing individual methods.
|
| INPUT, OUTPUT AND BEHAVIOR |
|---|
|
|
|---|
ConPred II runs as a CGI server, written in PHP, and is accessible at http://bioinfo.si.hirosaki-u.ac.jp/~ConPred2/ (Figure 1). Users need first to enter their email addresses in the appropriate text box to prove they are in an academic or governmental organization, and then to select a super-kingdom (prokaryotic or eukaryotic) from the radio buttons. Sequences must be specified in the single-letter amino acid notation; there is no sensitivity between lowercase and uppercase letters. Users have a choice in inputting query sequences between pasting the sequences directly into the input window and uploading the sequences from their local disks. The acceptable formats of input sequences are Raw (only in the case of one sequence) or FASTA. The length of each sequence is limited to a range of 301999 residues, since several prediction methods used in ConPred II have a restriction on sequence length.
|
Signal peptide regions should be removed in advance of submitting the sequences for TM topology prediction, since the hydrophobic core of signal peptides is predicted as the first TMS in most cases (17). We provide DetecSig (18) for signal peptide prediction as an input option in the ConPred II system. As for TM proteins that have a signal peptide, the N-tail location is automatically determined as non-cytoplasmic (Nout) without using the N-tail location prediction (19).
When a query sequence is submitted, the individual TM topology prediction programs used in ConPred_elite and ConPred_all start running. First, the ConPred_elite program tries to combine the prediction results of the five methods to obtain a TM topology model. In cases when ConPred_elite gives no prediction result, the ConPred_all program is called for the TM topology prediction.
The default output is an HTML-formatted file which is able to be displayed in any browser. The prediction result (i.e. the number of TMSs, TMS positions and N-tail location) from ConPred II appears on the result page, as shown in Figure 1. It should be noted that the predictions made by individual methods are neither displayed on the result page nor able to be downloaded. As output options for the prediction result, the following three graphical representations (written in Java Applet) are provided (Figure 1): (i) a hydropathy plot, (ii) helical wheel diagrams of the predicted TMSs and (iii) a snake-like diagram of the predicted TM topology model.
| Notes |
|---|
The online version of this article has been published under an open access model. Users are entitled to use, reproduce, disseminate, or display the open access version of this article provided that: the original authorship is properly and fully attributed; the Journal and Oxford University Press are attributed as the original place of publication with the correct citation details given; if an article is subsequently reproduced or disseminated not in its entirety but only in part or as a derivative work this must be clearly indicated.
| REFERENCES |
|---|
|
|
|---|
- Sugiyama,Y., Polulyakh,N. and Shimizu,T. ( (2003) ) Identification of transmembrane protein functions by binary topology patterns. Protein Eng., , 16, , 479488.
[Abstract/Free Full Text] - Inoue,Y., Ikeda,M. and Shimizu,T. ( (2004) ) Proteome-wide classification and identification of mammalian-type GPCRs by binary topology pattern. Comput. Chem. Biol., , 28, , 3949.
- Möller,S., Croning,M.D. and Apweiler,R. ( (2001) ) Evaluation of methods for prediction of membrane spanning regions. Bioinformatics, , 17, , 646653.
[Abstract/Free Full Text] - Ikeda,M., Arai,M., Lao,D.M. and Shimizu,T. ( (2002) ) Transmembrane topology prediction methods: a re-assessment and improvement by a consensus method using a dataset of experimentally-characterized transmembrane topologies. In Silico Biol., , 2, , 1933.[Medline]
- Chen,C.P., Kernytsky,A. and Rost,B. ( (2002) ) Transmembrane helix predictions revisited. Protein Sci., , 11, , 27742791.[CrossRef][Web of Science][Medline]
- Xia,J.-X., Ikeda,M. and Shimizu,T. ( (2004) ) ConPred_elite: a highly reliable approach to transmembrane topology prediction. Comput. Biol. Chem., , 28, , 5160.[CrossRef][Web of Science][Medline]
- Ikeda,M., Arai,M., Okuno,T. and Shimizu,T. ( (2003) ) TMPDB: a database of experimentally-characterized transmembrane topologies. Nucleic Acids Res., , 31, , 406409.
[Abstract/Free Full Text] - Klein,P., Kanehisa,M. and DeLisi,C. ( (1985) ) The detection and classification of membrane-spanning proteins. Biochim. Biophys. Acta, , 815, , 468476.[Medline]
- Hofmann,K. and Stoffel,W. ( (1993) ) TMbasea database of membrane spanning proteins segments. Biol. Chem. Hoppe-Seyler, , 347, , 166.
- Claros,M.G. and von Heijne,G. ( (1994) ) TopPred II: an improved software for membrane protein structure predictions. Comput. Appl. Biosci., , 10, , 685686.
[Free Full Text] - Cserzö,M., Wallin,E., Simon,I., von Heijne,G. and Elofsson,A. ( (1997) ) Prediction of transmembrane
-helices in prokaryotic membrane proteins: the dense alignment surface method. Protein Eng., , 10, , 673676.[Abstract/Free Full Text] - Persson,B. and Argos,P. ( (1997) ) Prediction of membrane protein topology utilizing multiple sequence alignments. J. Protein Chem., , 16, , 453457.[CrossRef][Web of Science][Medline]
- Jones,D.T., Taylor,W.R. and Thornton,J.M. ( (1994) ) A model recognition approach to the prediction of all-helical membrane protein structure and topology. Biochemistry, , 33, , 30383049.[CrossRef][Medline]
- Hirokawa,T., Boon-Chieng,S. and Mitaku,S. ( (1998) ) SOSUI: classification and secondary structure prediction system for membrane proteins. Bioinformatics, , 14, , 378379.
[Abstract/Free Full Text] - Krogh,A., Larsson,B., von Heijne,G. and Sonnhammer,E.L. ( (2001) ) Predicting transmembrane protein topology with a hidden Markov model: application to complete genomes. J. Mol. Biol., , 305, , 567580.
- Tusnády,G.E. and Simon,I. ( (2001) ) The HMMTOP transmembrane topology prediction server. Bioinformatics, , 17, , 849850.
[Abstract/Free Full Text] - Lao,D.M., Arai,M., Ikeda,M. and Shimizu,T. ( (2002) ) The presence of signal peptide significantly affects transmembrane topology prediction. Bioinformatics, , 18, , 15621566.
[Free Full Text] - Lao,D.M. and Shimizu,T. ( (2001) ) A method for discriminating a signal peptide and a putative 1st transmembrane segment. In Valafar,F. (ed.), Proceedings of the 2001 International Conference on Mathematics and Engineering Techniques in Medicine and Biological SciencesMETMBS '01. CSREA Press, USA, pp. 119125.
- Arai,M., Ikeda,M. and Shimizu,T. ( (2003) ) Comprehensive analysis of transmembrane topologies in prokaryotic genomes. Gene, , 304, , 7786.[CrossRef][Web of Science][Medline]
This article has been cited by other articles:
![]() |
J. Engel, P. S. Schmalhorst, T. Dork-Bousset, V. Ferrieres, and F. H. Routier A Single UDP-galactofuranose Transporter Is Required for Galactofuranosylation in Aspergillus fumigatus J. Biol. Chem., December 4, 2009; 284(49): 33859 - 33868. [Abstract] [Full Text] [PDF] |
||||
![]() |
C. Paungfoo-Lonhienne, P. M. Schenk, T. G. A. Lonhienne, R. Brackin, S. Meier, D. Rentsch, and S. Schmidt Nitrogen affects cluster root formation and expression of putative peptide transporters J. Exp. Bot., July 1, 2009; 60(9): 2665 - 2676. [Abstract] [Full Text] [PDF] |
||||
![]() |
A. Bernsel, H. Viklund, A. Hennerdal, and A. Elofsson TOPCONS: consensus prediction of membrane protein topology Nucleic Acids Res., July 1, 2009; 37(suppl_2): W465 - W468. [Abstract] [Full Text] [PDF] |
||||
![]() |
K. Yuki, H. Shindou, D. Hishikawa, and T. Shimizu Characterization of mouse lysophosphatidic acid acyltransferase 3: an enzyme with dual functions in the testis J. Lipid Res., May 1, 2009; 50(5): 860 - 869. [Abstract] [Full Text] [PDF] |
||||
![]() |
M. Linka, A. Jamai, and A. P.M. Weber Functional Characterization of the Plastidic Phosphate Translocator Gene Family from the Thermo-Acidophilic Red Alga Galdieria sulphuraria Reveals Specific Adaptations of Primary Carbon Partitioning in Green Plants and Red Algae Plant Physiology, November 1, 2008; 148(3): 1487 - 1496. [Abstract] [Full Text] [PDF] |
||||
![]() |
N. Rayapuram, J. Hagenmuller, J. M. Grienenberger, G. Bonnard, and P. Giege The Three Mitochondrial Encoded CcmF Proteins Form a Complex That Interacts with CCMH and c-Type Apocytochromes in Arabidopsis J. Biol. Chem., September 12, 2008; 283(37): 25200 - 25208. [Abstract] [Full Text] [PDF] |
||||
![]() |
S. Schmitz-Esser, I. Haferkamp, S. Knab, T. Penz, M. Ast, C. Kohl, M. Wagner, and M. Horn Lawsonia intracellularis Contains a Gene Encoding a Functional Rickettsia-Like ATP/ADP Translocase for Host Exploitation J. Bacteriol., September 1, 2008; 190(17): 5746 - 5752. [Abstract] [Full Text] [PDF] |
||||
![]() |
X. Cai Unicellular Ca2+ Signaling 'Toolkit' at the Origin of Metazoa Mol. Biol. Evol., July 1, 2008; 25(7): 1357 - 1361. [Abstract] [Full Text] [PDF] |
||||
![]() |
J. H. De Maeyer, J. Aerssens, P. Verhasselt, and R. A. Lefebvre Alternative splicing and exon duplication generates 10 unique porcine 5-HT4 receptor splice variants including a functional homofusion variant Physiol Genomics, June 1, 2008; 34(1): 22 - 33. [Abstract] [Full Text] [PDF] |
||||
![]() |
H. Kaji, J.-i. Kamiie, H. Kawakami, K. Kido, Y. Yamauchi, T. Shinkawa, M. Taoka, N. Takahashi, and T. Isobe Proteomics Reveals N-Linked Glycoprotein Diversity in Caenorhabditis elegans and Suggests an Atypical Translocation Mechanism for Integral Membrane Proteins Mol. Cell. Proteomics, December 1, 2007; 6(12): 2100 - 2109. [Abstract] [Full Text] [PDF] |
||||
![]() |
D. Meesapyodsuk, D. W. Reed, P. S. Covello, and X. Qiu Primary Structure, Regioselectivity, and Evolution of the Membrane-bound Fatty Acid Desaturases of Claviceps purpurea J. Biol. Chem., July 13, 2007; 282(28): 20191 - 20199. [Abstract] [Full Text] [PDF] |
||||
![]() |
D. Lopez-Serrano, F. Solano, and A. Sanchez-Amat Involvement of a novel copper chaperone in tyrosinase activity and melanin synthesis in Marinomonas mediterranea Microbiology, July 1, 2007; 153(7): 2241 - 2249. [Abstract] [Full Text] [PDF] |
||||
![]() |
X. Zhang, N. J. Cherrington, and S. H. Wright Molecular identification and functional characterization of rabbit MATE1 and MATE2-K Am J Physiol Renal Physiol, July 1, 2007; 293(1): F360 - F370. [Abstract] [Full Text] [PDF] |
||||
![]() |
C. Maggio, A. Barbante, F. Ferro, L. Frigerio, and E. Pedrazzini Intracellular sorting of the tail-anchored protein cytochrome b5 in plants: a comparative study using different isoforms from rabbit and Arabidopsis J. Exp. Bot., April 1, 2007; 58(6): 1365 - 1379. [Abstract] [Full Text] [PDF] |
||||
![]() |
K. G. Thakur, A. M. Joshi, and B. Gopal Structural and Biophysical Studies on Two Promoter Recognition Domains of the Extra-cytoplasmic Function {sigma} Factor {sigma}C from Mycobacterium tuberculosis J. Biol. Chem., February 16, 2007; 282(7): 4711 - 4718. [Abstract] [Full Text] [PDF] |
||||
![]() |
K. L. Wilson, K. R. Fitch, B. T. Bafus, and B. T. Wakimoto Sperm plasma membrane breakdown during Drosophila fertilization requires Sneaky, an acrosomal membrane protein Development, December 15, 2006; 133(24): 4871 - 4879. [Abstract] [Full Text] [PDF] |
||||
![]() |
M. Amico, M. Finelli, I. Rossi, A. Zauli, A. Elofsson, H. Viklund, G. von Heijne, D. Jones, A. Krogh, P. Fariselli, et al. PONGO: a web server for multiple predictions of all-alpha transmembrane proteins. Nucleic Acids Res., July 1, 2006; 34(Web Server issue): W169 - W172. [Abstract] [Full Text] [PDF] |
||||
![]() |
M.-J. Han and S. Y. Lee The Escherichia coli Proteome: Past, Present, and Future Prospects Microbiol. Mol. Biol. Rev., June 1, 2006; 70(2): 362 - 439. [Abstract] [Full Text] [PDF] |
||||
![]() |
J. M. L. Sung, P. D. Chantler, and D. H. Lloyd Accessory Gene Regulator Locus of Staphylococcus intermedius. Infect. Immun., May 1, 2006; 74(5): 2947 - 2956. [Abstract] [Full Text] [PDF] |
||||
![]() |
J. E. Bakema, S. de Haij, C. F. den Hartog-Jager, J. Bakker, G. Vidarsson, M. van Egmond, J. G. J. van de Winkel, and J. H. W. Leusen Signaling through Mutants of the IgA Receptor CD89 and Consequences for Fc Receptor {gamma}-Chain Interaction J. Immunol., March 15, 2006; 176(6): 3603 - 3610. [Abstract] [Full Text] [PDF] |
||||
![]() |
A. Karnholz, C. Hoefler, S. Odenbreit, W. Fischer, D. Hofreuter, and R. Haas Functional and Topological Characterization of Novel Components of the comB DNA Transformation Competence System in Helicobacter pylori J. Bacteriol., February 1, 2006; 188(3): 882 - 893. [Abstract] [Full Text] [PDF] |
||||
![]() |
J. Sakurai, F. Ishikawa, T. Yamaguchi, M. Uemura, and M. Maeshima Identification of 33 Rice Aquaporin Genes and Analysis of Their Expression and Function Plant Cell Physiol., September 1, 2005; 46(9): 1568 - 1577. [Abstract] [Full Text] [PDF] |
||||
![]() |
R. Y. Kahsay, G. Gao, and L. Liao An improved hidden Markov model for transmembrane protein detection and topology prediction and its applications to complete genomes Bioinformatics, May 1, 2005; 21(9): 1853 - 1858. [Abstract] [Full Text] [PDF] |
||||
![]() |
J. A. Saugstad, J. A. Roberts, J. Dong, S. Zeitouni, and R. J. Evans Analysis of the Membrane Topology of the Acid-sensing Ion Channel 2a J. Biol. Chem., December 31, 2004; 279(53): 55514 - 55519. [Abstract] [Full Text] [PDF] |
||||
| ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||

















