Skip Navigation

This Article
Right arrow Abstract Freely available
Right arrow Print PDF (436K) Freely available
Right arrow Alert me when this article is cited
Right arrow Alert me if a correction is posted
Services
Right arrow Email this article to a friend
Right arrow Similar articles in this journal
Right arrow Similar articles in ISI Web of Science
Right arrow Similar articles in PubMed
Right arrow Alert me to new issues of the journal
Right arrow Add to My Personal Archive
Right arrow Download to citation manager
Right arrow Search for citing articles in:
ISI Web of Science (21)
Right arrowRequest Permissions
Right arrow Commercial Re-use Guidelines
for Open Access NAR Content
Google Scholar
Right arrow Articles by Kawabata, T.
Right arrow Articles by Nishikawa, K.
Right arrow Search for Related Content
PubMed
Right arrow PubMed Citation
Right arrow Articles by Kawabata, T.
Right arrow Articles by Nishikawa, K.
Social Bookmarking
 Add to CiteULike   Add to Connotea   Add to Del.icio.us  
What's this?

Nucleic Acids Research Pages 355-357  


The Protein Mutant Database
Introduction
Description
Viewing And Retrieving The System
   Show mutated sequences
   Show 3D structure
   Sequence homology search
   Summary of mutants at a certain site
Future Direction
Acknowledgements
References


The Protein Mutant Database

The Protein Mutant Database

Takeshi Kawabata*, Motonori Ota and Ken Nishikawa

Center for Information Biology, National Institute of Genetics, 1111 Yata, Mishima, Shizuoka 411-8540, Japan

Received August 31, 1998; Revised September 22, 1998; Accepted October 14, 1998

ABSTRACT

Currently the protein mutant database (PMD) contains over 81 000 mutants, including artificial as well as natural mutants of various proteins extracted from about 10 000 articles. We recently developed a powerful viewing and retrieving system (http://pmd.ddbj.nig.ac.jp ), which is integrated with the sequence and tertiary structure databases. The system has the following features: (i) mutated sequences are displayed after being automatically generated from the information described in the entry together with the sequence data of wild-type proteins integrated. This is a convenient feature because it allows one to see the position of altered amino acids (shown in a different color) in the entire sequence of a wild-type protein; (ii) for those proteins whose 3D structures have been experimentally determined, a 3D structure is displayed to show mutation sites in a different color; (iii) a sequence homology search against PMD can be carried out with any query sequence; (iv) a summary of mutations of homologous sequences can be displayed, which shows all the mutations at a certain site of a protein, recorded throughout the PMD.

INTRODUCTION

Protein mutant database (PMD) is a compilation of protein mutant data, providing information on functional and/or structural influences brought about by amino acid mutations at specific positions of a protein (1). Among other mutant databases, PMD is unique in two respects: (i) almost all proteins are included, except for natural mutants of the globin and immunoglobulin families; (ii) natural as well as artificial mutants are covered, including random and site-directed mutants. PMD data have been extracted from literature published in the 1970s up to the middle of 1995. More than 10 000 articles are now recorded, comprising about 81 000 protein mutants. When the project started in 1989, database construction, i.e., reading articles, extracting necessary information and keying-in the data, was carried out at the Protein Engineering Research Institute in Osaka. Since April 1997, all the tasks have been moved to the National Institute of Genetics in Mishima. Table 1 summarizes proteins categorized by the PIR superfamilies that most frequently appeared in PMD, showing how various proteins are contained in our database.

DESCRIPTION

The data complied in PMD are based on published literature, not on proteins. That is, each entry in the database corresponds to one article, which may contain several or a number of protein mutants. Each database entry is identified by a serial number and is defined as either natural or artificial, depending on the type of the mutation. For each entry the following items are recorded: `JOURNAL'; `TITLE'; `CROSS-REFERENCE'; `PROTEIN'; `N-TERMINAL'; `CHANGE'; `FUNCTION'; `STRUCTURE'; `STABILITY'; etc. `CROSS-REFERENCE' indicates the code names of the protein given in other databases, such as, Protein Identification Resources (2). `N-TERMINAL' shows the N-terminal five amino acids of sequence, which may help to show the unambiguous numbering of the sequence. As the full sequence is usually not shown in a paper, mutated and original amino acid residues, with their position numbers allocated in the paper, are checked against the PIR, to confirm the starting five residues at the N terminus. `CHANGE' indicates the position and kind of mutation, such as amino acid substitution, insertion and deletion. Each mutation is denoted with a specific notation. Any functional or structural features (`FUNCTION', `STRUCTURE', `STABILITY', etc.) observed in the mutant are described immediately after `CHANGE'. Relative differences in activity and/or stability, in comparison with the wild-type protein, are indicated with symbols [- -],[-],[=],[+] or [+ +]. Complete loss of activity is denoted as [0].

Table 1. The top 20 proteins most frequently appearing in PMD
Protein superfamily Entrya Mutant
1    ras transforming protein 162 767
2 insulin 155 628
3 antithrombin III 143 856
4 cytochrome P450 131 691
5 NGF receptor repeat homology 116 760
6 globin 115 583
7 vertebrate rhodopsin 111 753
8 insulin receptor 109 393
9 coagulation factor X 107 691
10 cytochrome c 104 691
11 leucine-rich [alpha]-2-glycoprotein repeat homology 100 601
12 apolipoprotein A-I 99 428
13 phage T4 lysozyme 80 520
14 poliovirus genome polyprotein 73 485
15 cellular tumor antigen p53 73 672
16 acetylcholine receptor 72 477
17 pol polyprotein 70 949
18 lysozyme c 70 358
19 bacteriorhodopsin 70 430
20 cystic fibrosis transmembrane conductance regulator 69 349
aThe number of entries is equivalent to the number of articles.


Figure 1. Schematic diagram of the viewing and retrieving system of PMD. Three databases (PMD, PIR and PDB) are integrated to the system.


Figure 2. Sample entries of PMD displayed in the viewing system. (a) A plain PMD entry. (b) An entry appeared by clicking `SHOW WITH SEQUENCE'. Mutation sites are displayed in red. (c) A 3D structure window appeared by clicking `SHOW 3D STRUCTURE', showing mutation sites displayed in yellow.

Since 1997, we have extended the `CHANGE' description in two ways: (i) type of operation is added to the header `CHANGE', for example, `CHANGE-POINT' stands for point mutation, `CHANGE-DELETE' for sequence deletion; (ii)'CHANGE-CHIMERA' indicates a chimeric protein. In this case, the entire sequence of the chimera is explicitly shown, as it is difficult to describe the sequence by operational words. Details of the description are shown in our web page (http://pmd.ddbj.nig.ac.jp ).

VIEWING AND RETRIEVING THE SYSTEM

Recently, we developed the powerful viewing and retrieving system of PMD, which is integrated with the sequence database, PIR (2), the tertiary structure database, PDB (3), and has world wide web interface (http://pmd.ddbj.nig.ac.jp ). The relationship between the three databases is schematically shown in Figure 1. This system has the following features.

Show mutated sequences

The PMD entry only records information on altered sequences in operational words, such as `Cys 117 Ser' or `Ser-Ala 84-94 AWEKDL' (Fig. 2a). These brief descriptions are sufficient as minimum information, but it is sometimes difficult to know the complete sequence of the mutant protein. Therefore, we have developed a program that generates a mutant sequence from change operations in PMD and the corresponding wild-type sequence retrieved from PIR database. Changed regions are shown in a different color. A sample of generated sequences is shown in Figure 2b.

Show 3D structure

If a tertiary structure of a wild-type sequence is experimentally determined, the 3D structure is displayed to show mutation sites in a different color. The sequence is linked with any one of the 3D structures in PDB with more than 50% sequence identity. A structure is shown in a Ca wire-frame model, and the mutated sites are displayed in yellow. Various color schemes to highlight secondary structures or solvent accessible surface (6) are available. A sample of the 3D structure view is shown in Figure 2c.

Sequence homology search

A sequence homology search against the PMD database can be carried out with any query sequence, pasted on the input area of our web page. Using this function, it is easy to find entries that have related sequences. The search is performed against the wild-type, but not against mutated sequences. The program for sequence homology search was written by one of the authors (T.K.). The algorithm was based on the standard alignment technique (4) and the ktup filtering (5). The user can choose a threshold value of sequence identity between 30 and 100%. A result of the search is displayed as a multiple alignment of wild-type sequences, whose mutated sites are differently colored. A sample of the result is shown in Figure 3a.


Figure 3. Sample results of the sequence search against PMD. (a) The first page of the search results. Each entry stands for its own wild-type sequence, whose mutated sites are displayed in red. (b) A summary of mutation change at a specified site, which appears by clicking the bottom line of the first page.

Summary of mutants at a certain site

By clicking a site of the sequence homology search result, a summary of mutants at the site can be generated from all related PMD entries. An example is shown in Figure 3b. This function is convenient because it allows the user to find active sites or structurally important sites.

FUTURE DIRECTION

As the amount of literature concerning mutant proteins increases every year, the task of constructing the database is becoming more difficult. We are currently about three years behind, dealing with articles published in 1995. One way to overcome this problem, is to limit proteins that should be reviewed. We are now planning to deal primarily with those proteins of known structure, in order to cut down the number of articles to be handled. This would reduce the amount of data to one third. Another problem is the complexity of mutation data. In the early stages of the site-directed mutagenesis technique, simple amino acid substitutions were the main form of protein mutations. At this time, however, more complicated and/or larger scaled mutations have frequently been incorporated into natural proteins, and even de novo designed proteins are synthesized. This trend results in the technical difficulty of expressing alterations in a mutant protein in comparison with the wild-type protein. In addition, the concept of mutant proteins itself is becoming relatively obscure, for example, de novo proteins. De novo designed proteins, to which the wild-types cannot uniquely be defined, are excluded since the standard for a mutant in PMD is to be described relative to a natural protein. On the same principle, mutations introduced into chimeric proteins are excluded, although simple chimera made of two natural proteins are included.

ACKNOWLEDGEMENTS

We are indebted to the PMD staff, Kimiko Mimura, Naoko Nakayama, Minako Kuromaru, Kayoko Yamamoto and Rika Kadowaki for constructing the database. The work was supported by a grant-in-aid from the Ministry of Education, Science, Sports and Culture, Japan.

REFERENCES

1. Nishikawa,K., Ishino,S., Takenaka,H., Norioka,N., Hirai,T., Yao,T. and Seto,Y. (1994) Protein Engng, 7, 553.

2. Barker,W.C., Garavelli,J.S, Haft,D.H., Hunt,L.T., Marzec,C.R., Orcutt,B.C., Srinivasarao,G.Y., Yeh,L.-S.L., Ledley,R.S., Mewes,H.-W., Pfeiffer,F. and Tsugita,A. (1998) Nucleic Acids Res., 26, 27-32. MEDLINE Abstract

3. Abola,E.E., Sussman,J.L., Prilusky,J. and Manning,N.O. (1997) In Carter,C.W.,Jr and Sweet,R.M. (eds), Methods in Enzymology. Academic Press, San Diego, 277, 556-571. MEDLINE Abstract

4. Gotoh,O. (1982) J. Mol. Biol., 162, 705-708. MEDLINE Abstract

5. Pearson,W.R. and Lipman,D. (1988) Proc. Natl Acad. Sci. USA, 85, 2444-2448. MEDLINE Abstract

6. Kabsch,W. and Sander,C. (1983) Biopolymers, 22, 2577-2637. MEDLINE Abstract


*To whom correspondence should be addressed. Tel: +81 559 81 6859; Fax: +81 559 81 6889; Email: takawaba@lab.nig.ac.jp


This page is run by Oxford University Press, Great Clarendon Street, Oxford OX2 6DP, as part of the OUP Journals
Comments and feedback: www-admin{at}oup.co.uk
Last modification: 9 Dec 1998
Copyright©Oxford University Press, 1998.

Add to CiteULike CiteULike   Add to Connotea Connotea   Add to Del.icio.us Del.icio.us    What's this?


This article has been cited by other articles:


Home page
Nucleic Acids ResHome page
B. Contreras-Moreira
3D-footprint: a database for the structural analysis of protein-DNA complexes
Nucleic Acids Res., September 18, 2009; (2009) gkp781v1.
[Abstract] [Full Text] [PDF]


Home page
FASEB J.Home page
Y. Bromberg, J. Overton, C. Vaisse, R. L. Leibel, and B. Rost
In silico mutagenesis: a case study of the melanocortin 4 receptor
FASEB J, September 1, 2009; 23(9): 3059 - 3069.
[Abstract] [Full Text] [PDF]


Home page
Nucleic Acids ResHome page
A. Pavelka, E. Chovancova, and J. Damborsky
HotSpot Wizard: a web server for identification of hot spots in protein engineering
Nucleic Acids Res., July 1, 2009; 37(suppl_2): W376 - W383.
[Abstract] [Full Text] [PDF]


Home page
Brief BioinformHome page
R. Karchin
Next generation tools for the annotation of human SNPs
Brief Bioinform, January 1, 2009; 10(1): 35 - 52.
[Abstract] [Full Text] [PDF]


Home page
Nucleic Acids ResHome page
M. M. Gromiha, Y. Yabuki, M. X. Suresh, A. M. Thangakani, M. Suwa, and K. Fukui
TMFunction: database for functional residues in membrane proteins
Nucleic Acids Res., January 1, 2009; 37(suppl_1): D201 - D204.
[Abstract] [Full Text] [PDF]


Home page
Nucleic Acids ResHome page
M. Lonquety, Z. Lacroix, N. Papandreou, and J. Chomilier
SPROUTS: a database for the evaluation of protein stability upon point mutation
Nucleic Acids Res., January 1, 2009; 37(suppl_1): D374 - D379.
[Abstract] [Full Text] [PDF]


Home page
Nucleic Acids ResHome page
H. Xi, J. Park, G. Ding, Y.-H. Lee, and Y. Li
SysPIMP: the web-based systematical platform for identifying human disease-related mutated sequences from mass spectrometry
Nucleic Acids Res., January 1, 2009; 37(suppl_1): D913 - D920.
[Abstract] [Full Text] [PDF]


Home page
BioinformaticsHome page
Y. Bromberg, G. Yachdav, and B. Rost
SNAP predicts effect of mutations on protein function
Bioinformatics, October 15, 2008; 24(20): 2397 - 2398.
[Abstract] [Full Text] [PDF]


Home page
Nucleic Acids ResHome page
Y. Bromberg and B. Rost
SNAP: predict effect of non-synonymous polymorphisms on function
Nucleic Acids Res., June 28, 2007; 35(11): 3823 - 3835.
[Abstract] [Full Text] [PDF]


Home page
Nucleic Acids ResHome page
G. Lopez, A. Valencia, and M. Tress
FireDB--a database of functionally important residues from proteins of known structure
Nucleic Acids Res., January 12, 2007; 35(suppl_1): D219 - D223.
[Abstract] [Full Text] [PDF]


Home page
Protein Eng Des SelHome page
M. R. Stam, E. G.J. Danchin, C. Rancurel, P. M. Coutinho, and B. Henrissat
Dividing the large glycoside hydrolase family 13 into subfamilies: towards improved functional annotations of {alpha}-amylase-related proteins
Protein Eng. Des. Sel., December 1, 2006; 19(12): 555 - 562.
[Abstract] [Full Text] [PDF]


Home page
Nucleic Acids ResHome page
R. R. Gabdoulline, S. Ulbrich, S. Richter, and R. C. Wade
ProSAT2--Protein Structure Annotation Server.
Nucleic Acids Res., July 1, 2006; 34(Web Server issue): W79 - W83.
[Abstract] [Full Text] [PDF]


Home page
GeneticsHome page
L. Y. Yampolsky and A. Stoltzfus
The Exchangeability of Amino Acids in Proteins
Genetics, August 1, 2005; 170(4): 1459 - 1472.
[Abstract] [Full Text] [PDF]


Home page
Nucleic Acids ResHome page
D. Cotter, P. Guda, E. Fahy, and S. Subramaniam
MitoProteome: mitochondrial protein sequence database and annotation system
Nucleic Acids Res., January 1, 2004; 32(90001): D463 - 467.
[Abstract] [Full Text] [PDF]


Home page
Nucleic Acids ResHome page
A. Schafferhans, J. E. W. Meyer, and S. I. O'Donoghue
The PSSH database of alignments between protein sequences and tertiary structures
Nucleic Acids Res., January 1, 2003; 31(1): 494 - 498.
[Abstract] [Full Text] [PDF]


Home page
Nucleic Acids ResHome page
M. M. Gromiha, J. An, H. Kono, M. Oobatake, H. Uedaira, P. Prabakaran, and A. Sarai
ProTherm, version 2.0: thermodynamic database for proteins and mutants
Nucleic Acids Res., January 1, 2000; 28(1): 283 - 285.
[Abstract] [Full Text] [PDF]


This Article
Right arrow Abstract Freely available
Right arrow Print PDF (436K) Freely available
Right arrow Alert me when this article is cited
Right arrow Alert me if a correction is posted
Services
Right arrow Email this article to a friend
Right arrow Similar articles in this journal
Right arrow Similar articles in ISI Web of Science
Right arrow Similar articles in PubMed
Right arrow Alert me to new issues of the journal
Right arrow Add to My Personal Archive
Right arrow Download to citation manager
Right arrow Search for citing articles in:
ISI Web of Science (21)
Right arrowRequest Permissions
Right arrow Commercial Re-use Guidelines
for Open Access NAR Content
Google Scholar
Right arrow Articles by Kawabata, T.
Right arrow Articles by Nishikawa, K.
Right arrow Search for Related Content
PubMed
Right arrow PubMed Citation
Right arrow Articles by Kawabata, T.
Right arrow Articles by Nishikawa, K.
Social Bookmarking
 Add to CiteULike   Add to Connotea   Add to Del.icio.us  
What's this?