Nucleic Acids Research, 2003, Vol. 31, No. 1 406-409
© 2003 Oxford University Press
TMPDB: a database of experimentally-characterized transmembrane topologies
1 Department of Electronic Information System Engineering, Faculty of Science and Technology, Hirosaki University, Hirosaki 036-8561, Japan 2 Science of Bioresources Program, The United Graduate School of Agricultural Sciences, Iwate University, Morioka 020-8550, Japan
*To whom correspondence should be addressed. Tel: +81 172393638; Fax: +81 172393638; Email: slsimi{at}si.hirosaki-u.ac.jp
Received August 1, 2002; Revised September 9, 2002. Accepted September 20, 2002
ABSTRACT
TMPDB is a database of experimentally-characterized transmembrane (TM) topologies. TMPDB release 6.2 contains a total of 302 TM protein sequences, in which 276 are
-helical sequences, 17 ß-stranded, and 9
-helical sequences with short pore-forming helices buried in the membrane. The TM topologies in TMPDB were determined experimentally by means of X-ray crystallography, NMR, gene fusion technique, substituted cysteine accessibility method, N-linked glycosylation experiment and other biochemical methods. TMPDB would be useful as a test and/or training dataset in improving the proposed TM topology prediction methods or developing novel methods with higher performance, and as a guide for both the bioinformaticians and biologists to better understand TM proteins. TMPDB and its subsets are freely available at the following web site: http://bioinfo.si.hirosaki-u.ac.jp/~TMPDB/.
INTRODUCTION
Transmembrane (TM) proteins serve extremely important functions in life as pump, channel, receptor, energy transducer etc., and have been reported recently to share
2030% of genes in a whole genome (14). Nevertheless, the number of three-dimensional (3D) structures with high-resolution is far below one hundred at present, in contrast to more than 18 000 3D structures for soluble proteins registered in PDB (5). It is because TM protein molecules are difficult to crystallize due to their amphiphilic characteristicshydrophobic TM segments (TMSs) and hydrophilic loops. The functions of TM proteins, however, can be inferred rather easily from their TM topology (i.e., the number of TMSs, TMS position and orientation of TMS to the membrane lipid bilayer) without knowing their 3D structures because of rather simple structural characteristics (6).
In this context, a number of TM topology prediction methods have been developed to determine the structure and function of TM proteins from their amino acid sequences (2,722). However, the proposed prediction methods have not attained the desired accuracies for this purpose. The recent reports of evaluating prediction performance by using experimentally-characterized TM topology datasets have revealed that even the best methods predict the TM topology with accuracies of only around 60% (2325). This could be attributed mainly to the lack of well-characterized topology data to be used for training or tuning TM topology prediction methods. Thus, more high-quality TM topology data are required to evaluate the existing prediction methods more precisely.
For this reason, we have constructed a transmembrane protein database, TMPDB (19,24,26) which is a collection of TM proteins with topologies based on definite experimental evidence such as X-ray crystallography, NMR, gene fusion technique, substituted cysteine accessibility method, Asp (N)-linked glycosylation experiment and other biochemical methods. TMPDB would serve the requirements of both bioinformaticians and biologists, as a test and/or training dataset, for improving the existing TM topology prediction methods and developing novel prediction methods with higher performance as well as for gaining better understanding of TM proteins.
CONSTRUCTION OF TMPDB
We have collected 1074 articles reporting TM topology, by using MEDLINE (27) search with the keywords, transmembrane and topology (895 articles), by searching directly without using MEDLINE (46 articles), and by referring to the reference position line (RP) of the entries with the following annotations: X-RAY CRYSTALLOGRAPHY, STRUCTURE BY NEUTRON DIFFRACTION, STRUCTURE BY ELECTRON CRYO-MICROSCOPY, STRUCTURE BY NMR or TOPOLOGY in SWISS-PROT and TrEMBL (28) (133 articles). By checking the content of each collected article, we extracted the experimentally-characterized 302 TM topology models. To obtain the complete sequence annotation that the articles often lack, we crosschecked the sequences in question to public databases such as DDBJ (29), SWISS-PROT, PIR (30) and PDB (5), using the protein name or the partial sequence as a clue. By combining the information contained in the articles and other information of the cross-referenced public databases, we constructed TMPDB in the SWISS-PROT format.
There are 21 cases in total in TMPDB in which two or more articles report topology models for a single sequence, which are almost the same as each other with only a small TMS-position difference (at most 5 amino acids). For these cases, we selected the topology model based on the highest-quality experiment among the reported ones.
TMPDB CURRENT HOLDINGS
The latest release of TMPDB contains 302 TM protein sequences: 276
-helical sequences (TMPDB_alpha dataset), 17 ß-stranded sequences (TMPDB_beta dataset) and 9
-helical sequences with short pore-forming
-helices buried in the membrane (e.g., aquaporin 1) (TMPDB_alpha-buried dataset). The dataset of TMPDB_alpha comprises 165 prokaryotic and 111 eukaryotic sequences while the TMPDB_beta dataset includes only prokaryotic sequences with topologies determined by X-ray diffraction. The TMPDB_alpha-buried dataset includes 6 prokaryotic and 3 eukaryotic sequences, in which topologies are given by X-ray diffraction (7 entries), N-linked glycosylation and protease-protection assays (1 entry each, respectively). The distributions of the number of TMSs included in TMPDB_alpha, TMPDB_alpha-buried and TMPDB_beta datasets are summarized in Table 1. We note that TMPDB widely covers a variety of numbers of TMSs.
|
Furthermore, we subjected TMPDB_alpha, TMPDB_beta and TMPDB_alpha-buried datasets to a sequence similarity check (<30%) using CLUSTALW version 1.81 (31), and finally obtained non-redundant datasetsTMPDB_alpha_non-redundant with 231 entries (138 prokaryotic and 93 eukaryotic), TMPDB_beta_non-redundant with 15 entries, and TMPDB_alpha-buried_non-redundant with 7 entries (4 prokaryotic and 3 eukaryotic). Among the TMPDB_alpha_non-redundant entries, 112 topology models are determined by gene fusion experiment, 47 by X-ray diffraction, 5 by NMR, 2 by substituted cysteine accessibility method, 11 by Asp (N)-linked glycosylation and 54 by other biochemical experiments.
The results of comparing TMPDB with other published TM topology datasets, i.e., MEMSAT 1.5 (9), HTP (11), PHDhtm (12), DAS (13), SOSUI (16), HMMTOP 1.1 (17), TMHMM 1.0 (18), PRED-TMR (20), Moeller's (32) and MPtopo (33) are shown in Table 2. We can see, for example, that 122 sequences are common in both TMPDB_alpha_non-redundant and Moeller's non-redundant datasets, and 109 sequences are unique in the former while only 26 in the latter.
|
By applying TMPDB_alpha_non-redundant dataset TMPDB, we have evaluated 10 proposed TM topology prediction methods: KKD (7), TMpred (8), TopPred II (10), DAS (13), TMAP (14), MEMSAT 2 (15), SOSUI (16), PRED-TMR2+OrienTM (20,21), TMHMM 2.0 (2) and HMMTOP 2.0 (22) (see 24 for the details). The result shows that even the methods with the highest performance could predict the number of TMSs, number of TMSs+position, and N-tail location with accuracies of only 69.6%, 66.7% and 79.7% for prokaryotic sequences, and 68.8%, 64.5% and 72.0% for eukaryotic ones, respectively. Furthermore, by combining several methods out of the 10 and employing a simple majority-voting approach, we have improved the prediction accuracies to 79.7%, 76.8% and 89.1% for prokaryotic sequences, and 73.1%, 69.9% and 80.6% for eukaryotic ones, respectively (ConPred, 24). The detailed results of prediction performance evaluation are posted in our web site: http://bioinfo.si.hirosaki-u.ac.jp/~ConPred/table_accuracy.html.
ConPred is available for use at the sitehttp://bioinfo.si.hirosaki-u.ac.jp/~ConPred which is linked to the methods involved in the consensus prediction, where users have the individual methods run, and manually copy individual results and paste them in the input field of the ConPred web page. More detailed information on how to use ConPred can be obtained from the site, http://bioinfo.si.hirosaki-u.ac.jp/~ConPred/help.html.
SEARCHING TMPDB
In the TMPDB web page (http://bioinfo.si.hirosaki-u.ac.jp/~TMPDB/), the Database Search is available with the use of gene name, PubMed (27) identifier (PMID), accession numbers of DDBJ (29), SWISS-PROT (28) and PIR (30), PDB (5) identifier, the number of TMS(s), or any combinations of those, over TMPDB or selected subset among the 6 TMPDB subsets. A returned search result will be displayed as a list of retrieved entry(ies) with a link to the complete database entry(ies).
FUTURE DIRECTIONS
TMPDB will be updated continuously at least once every year to increase the number of entries as well to add other detailed experimental information from the referred articles, such as gene-fusion points, the accessibility for fused proteins, etc. Also, the next release of TMPDB on the web will support the display of the graphical image of TM topology and employ an SQL-based engine for more efficient and rapid database search.
CITING AND ACCESSING TMPDB
TMPDB should be cited with the present publication as a reference. TMPDB and its subsets are available for anonymous ftp download as a plain text file from http://bioinfo.si.hirosaki-u.ac.jp/~TMPDB/. We appreciate feedback from users concerning new experimentally-characterized TM topology models for submissions, additions and corrections.
ACKNOWLEDGEMENTS
We appreciate Dr Kenta Nakai for taking time to begin this study, and also Takumi Watanabe for his technical support. This research was supported in part by a Grant-in-Aid for Scientific Research on Priority Areas (C) Genome Information Science from the Ministry of Education, Culture, Sports, Science and Technology of Japan (grant 14015203).
REFERENCES
- Stevens,T.J. and Arkin,I.T. (2000) Do more complex organisms have a greater proportion of membrane proteins in their genomes? Proteins, 39, 417420.[CrossRef][ISI][Medline]
- Krogh,A., Larsson,B., von Heijne,G. and Sonnhammer,E.L. (2001) Predicting transmembrane protein topology with a hidden Markov model: application to complete genomes. J. Mol. Biol., 305, 567580.[CrossRef][ISI][Medline]
- Liu,J. and Rost,B. (2001) Comparing function and structure between entire proteomes. Protein Sci., 10, 19701979.
[Abstract/Free Full Text] - Arai,M., Noto,K., Lao,D.M., Ikeda,M. and Shimizu,T. (2001) Comprehensive analysis of transmembrane protein sequences in 39 microbial genomes. In Matsuda,H., Wong,L., Miyano,S. and Takagi,T. (eds), Genome Informatics 2001. Universal Academy Press, Tokyo, pp. 338339.
- Westbrook,J., Feng,Z., Jain,S., Bhat,T.N., Thanki,N., Ravichandran,V., Gilliland,G.L., Bluhm,W., Weissig,H., Greer,D.S., Bourne,P.E. and Berman,H.M. (2002) The Protein Data Bank: unifying the archive. Nucleic Acids Res., 30, 245248.
[Abstract/Free Full Text] - Sugiyama,Y., Arai,M. and Shimizu,T. (2001) Comprehensive functional identification of prokaryotic transmembrane proteins by binary topology pattern. In Matsuda,H., Wong,L., Miyano,S. and Takagi,T. (eds), Genome Informatics 2001. Universal Academy Press, Tokyo, pp. 334335.
- Klein,P., Kanehisa,M. and De Lisi,C. (1985) The detection and classification of membrane-spanning proteins. Biochim. Biophys. Acta, 815, 468476.[Medline]
- Hofmann,K. and Stoffel,W. (1993) TMbase-a database of membrane spanning proteins segments. Biol. Chem. Hoppe Seyler, 347, 166.
- Jones,D.T., Taylor,W.R. and Thornton,J.M. (1994) A model recognition approach to the prediction of all-helical membrane protein structure and topology. Biochemistry, 33, 30383049.[CrossRef][Medline]
- Claros,M.G. and von Heijne,G. (1994) TopPred II: An improved software for membrane protein structure predictions. Comput. Appl. Biosci., 10, 685686.
[Free Full Text] - Fariselli,P. and Casadio,R. (1996) HTP: a neural network-based method for predicting the topology of helical transmembrane domains in proteins. Comput. Appl. Biosci., 12, 4148.
[Abstract/Free Full Text] - Rost,B., Casadio,R. and Fariselli,P. (1996) Refining neural network predictions for helical transmembrane proteins by dynamic programming. In States,D.T., Agarwal,P., Gaasterland,T., Hunter,L. and Smith,R.F. (eds), Proceedings of the Second International Conference on Intelligent Systems for Molecular Biology. AAAI Press, Menlo Park, California, pp. 192200.
- Cserzo,M., Wallin,E., Simon,I., von Heijne,G. and Elofsson,A. (1997) Prediction of transmembrane alpha-helices in prokaryotic membrane proteins: The dense alignment surface method. Protein Eng., 10, 673676.
[Abstract/Free Full Text] - Persson,B. and Argos,P. (1997) Prediction of membrane protein topology utilizing multiple sequence alignments. J. Protein Chem., 16, 453457.[CrossRef][ISI][Medline]
- McGuffin,L.J., Bryson,K. and Jones,D.T. (2000) The PSIPRED protein structure prediction server. Bioinformatics, 16, 404405.
[Abstract/Free Full Text] - Hirokawa,T., Boon-Chieng,S. and Mitaku,S. (1998) SOSUI: Classification and secondary structure prediction system for membrane proteins. Bioinformatics, 14, 378379.
[Abstract/Free Full Text] - Tusnady,G.E. and Simon,I. (1998) Principles governing amino acid composition of integral membrane proteins: application to topology prediction. J. Mol. Biol., 283, 489506.[CrossRef][ISI][Medline]
- Sonnhammer,E.L., von Heijne,G. and Krogh,A. (1998) A hidden Markov model for predicting transmembrane helices in protein sequences. In Glasgow,J., Littlejohn,T., Major,F., Lathrop,R., Sankoff,D. and Sensen,C. (eds), Proceedings of the Eighth International Conference on Intelligent Systems for Molecular Biology. AAAI Press, Menlo Park, California, pp. 175182.
- Kihara,D., Shimizu,T. and Kanehisa,M. (1998) Prediction of membrane proteins based on classification of transmembrane segments. Protein Eng., 11, 961970.
[Abstract/Free Full Text] - Pasquier,C., Promponas,V.J., Palaios,G.A., Hamodrakas,J.S. and Hamodrakas,S.J. (1999) A novel method for predicting transmembrane segments in proteins based on a statistical analysis of the SwissProt database: the PRED-TMR algorithm. Protein Eng., 12, 381385.
[Abstract/Free Full Text] - Liakopoulos,T.D., Pasquier,C. and Hamodrakas,S.J. (2001) A novel tool for the prediction of transmembrane protein topology based on a statistical analysis of the SwissProt database: the OrienTM algorithm. Protein Eng., 14, 387390.
[Abstract/Free Full Text] - Tusnady,G.E. and Simon,I. (2001) The HMMTOP transmembrane topology prediction server. Bioinformatics, 17, 849850.
[Abstract/Free Full Text] - Moeller,S., Croning,M.D. and Apweiler,R. (2001) Evaluation of methods for the prediction of membrane spanning regions. Bioinformatics, 17, 646653.
[Abstract/Free Full Text] - Ikeda,M., Arai,M., Lao,D.M. and Shimizu,T. (2002) Transmembrane topology prediction methods: a re-assessment and improvement by a consensus method using dataset of experimentally-characterized transmembrane topologies. In Silico Biol., 2, 1933.[Medline]
- Chen,C.P. and Rost,B. (2002) State-of-the-art in membrane protein prediction. Appl. Bioinformatics, 1, 2135.[Medline]
- Shimizu,T. and Nakai,K. (1994) Construction of a membrane protein database and an evaluation of several prediction methods of transmembrane segments. In Miyano,S., Akutsu,T., Imai,H., Gotoh,O. and Takagi,T. (eds), Proceedings of Genome Informatics Workshop 1994. Universal Academy Press, Tokyo, pp. 148149.
- Wheeler,D.L., Church,D.M., Lash,A.E., Leipe,D.D., Madden,T.L., Pontius,J.U., Schuler,G.D., Schriml,L.M., Tatusova,T.A., Wagner,L. and Rapp,B.A. (2002) Database resources of the National Center for Biotechnology Information: 2002 update. Nucleic Acids Res., 30, 1316.
[Abstract/Free Full Text] - Bairoch,A. and Apweiler,R. (2000) The SWISS-PROT protein sequence database and its supplement TrEMBL in 2000. Nucleic Acids Res., 28, 4548.
[Abstract/Free Full Text] - Tateno,Y., Imanishi,T., Miyazaki,S., Fukami-Kobayashi,K., Saitou,N., Sugawara,H. and Gojobori,T. (2002) DNA Data Bank of Japan (DDBJ) for genome scale research in life science. Nucleic Acids Res., 30, 2730.
[Abstract/Free Full Text] - Wu,C.H., Huang,H., Arminski,L., Castro-Alvear,J., Chen,Y., Hu,Z.Z., Ledley,R.S., Lewis,K.C., Mewes,H.W., Orcutt,B.C., Suzek,B.E., Tsugita,A., Vinayaka,C.R., Yeh,L.S., Zhang,J. and Barker,W.C. (2002) The Protein Information Resource: an integrated public resource of functional annotation of proteins. Nucleic Acids Res., 30, 3537.
[Abstract/Free Full Text] - Thompson,J.D., Higgins,D.G. and Gibson,T.J. (1994) CLUSTAL W: Improving the sensitivity of progressive multiple sequence alignment through sequence weighting, position-specific gap penalties and weight matrix choice. Nucleic Acids Res., 22, 46734680.
[Abstract/Free Full Text] - Moeller,S., Kriventseva,E.V. and Apweiler,R. (2000) A collection of well characterised integral membrane proteins. Bioinformatics, 16, 11591160.
[Abstract/Free Full Text] - Jayasinghe,S., Hristova,K. and White,S.H. (2001) MPtopo: A database of membrane protein topology. Protein Sci., 10, 455458.
[Abstract/Free Full Text]
This article has been cited by other articles:
![]() |
A. Randall, J. Cheng, M. Sweredoski, and P. Baldi TMBpro: secondary structure, {beta}-contact and tertiary structure prediction of transmembrane {beta}-barrel proteins Bioinformatics, February 15, 2008; 24(4): 513 - 520. [Abstract] [Full Text] [PDF] |
||||
![]() |
G. E. Tusnady, L. Kalmar, and I. Simon TOPDB: topology data bank of transmembrane proteins Nucleic Acids Res., January 11, 2008; 36(suppl_1): D234 - D239. [Abstract] [Full Text] [PDF] |
||||
![]() |
G. Zhao and E. London An amino acid "transmembrane tendency" scale that approaches the theoretical limit to accuracy for prediction of transmembrane helices: Relationship to biological hydrophobicity. Protein Sci., August 1, 2006; 15(8): 1987 - 2001. [Abstract] [Full Text] [PDF] |
||||
![]() |
S. Lee, B. Lee, I. Jang, S. Kim, and J. Bhak Localizome: a server for identifying transmembrane topologies and TM helices of eukaryotic proteins utilizing domain information. Nucleic Acids Res., July 1, 2006; 34(Web Server issue): W99 - W103. [Abstract] [Full Text] [PDF] |
||||
![]() |
M.-J. Han and S. Y. Lee The Escherichia coli Proteome: Past, Present, and Future Prospects Microbiol. Mol. Biol. Rev., June 1, 2006; 70(2): 362 - 439. [Abstract] [Full Text] [PDF] |
||||
![]() |
A. Bernsel and G. Von Heijne Improved membrane protein topology prediction by domain assignments Protein Sci., July 1, 2005; 14(7): 1723 - 1728. [Abstract] [Full Text] [PDF] |
||||
![]() |
Y. Chen, Y. Zhang, Y. Yin, G. Gao, S. Li, Y. Jiang, X. Gu, and J. Luo SPD--a web-based secreted protein database Nucleic Acids Res., January 1, 2005; 33(suppl_1): D169 - D173. [Abstract] [Full Text] [PDF] |
||||
![]() |
J. A. Saugstad, J. A. Roberts, J. Dong, S. Zeitouni, and R. J. Evans Analysis of the Membrane Topology of the Acid-sensing Ion Channel 2a J. Biol. Chem., December 31, 2004; 279(53): 55514 - 55519. [Abstract] [Full Text] [PDF] |
||||
![]() |
Y. Zhang and J. Skolnick Tertiary Structure Predictions on a Comprehensive Benchmark of Medium to Large Size Proteins Biophys. J., October 1, 2004; 87(4): 2647 - 2655. [Abstract] [Full Text] [PDF] |
||||
![]() |
M. Arai, H. Mitsuke, M. Ikeda, J.-X. Xia, T. Kikuchi, M. Satake, and T. Shimizu ConPred II: a consensus prediction method for obtaining transmembrane topology models with high reliability Nucleic Acids Res., July 1, 2004; 32(suppl_2): W390 - W393. [Abstract] [Full Text] [PDF] |
||||
![]() |
Y. Sugiyama, N. Polulyakh, and T. Shimizu Identification of transmembrane protein functions by binary topology patterns Protein Eng. Des. Sel., July 1, 2003; 16(7): 479 - 488. [Abstract] [Full Text] [PDF] |
||||
| ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||






