iMolTalk: an interactive, internet-based protein structure analysis server
University of Lausanne and Swiss Institute of Bioinformatics, Chemin des Boveresses 155, 1066 Epalinges s/Lausanne, Switzerland and 1 University of Geneva and Swiss Institute of Bioinformatics, Centre Médicale Universitaire, 1 rue Michel-Servet, 1211 Geneva 4, Switzerland
* To whom correspondence should be addressed. Tel: +41 0 79 213 0571; Fax: +41 0 21 692 5945; Email: alexander.diemand{at}isb-sib.ch
Received February 14, 2004; Revised and Accepted March 29, 2004
| ABSTRACT |
|---|
|
|
|---|
iMolTalk (http://i.moltalk.org) is a new and interactive web server for protein structure analysis. It addresses the need to identify and highlight biochemically important regions in protein structures. As input, the server requires only the four-digit Protein Data Bank (PDB) identifier, of an experimentally determined structure or a structure file in PDB format stemming e.g. from comparative modelling. iMolTalk offers a wide range of implemented tools (i) to extract general information from PDB files, such as generic header information or the sequence derived from three-dimensional co-ordinates; (ii) to map corresponding residues from sequence to structure; (iii) to search for contacts of residues (amino or nucleic acids) or heterogeneous groups to the protein, present cofactors and substrates; and (iv) to identify proteinprotein interfaces between chains in a structure. The server provides results as user-friendly two-dimensional graphical representations and in textual format, ideal for further processing. At any time during the analysis, the user can choose, for the following step, from the set of implemented tools or submit his/her own script to the server to extend the functionality of iMolTalk.
| INTRODUCTION |
|---|
|
|
|---|
Today, numerous complete genomes are in our hands, ready to be deciphered. Sequencing genomes has become a standard protocol, producing an ever-larger amount of nucleic acids and, eventually, protein sequences. In contrast to the many protein sequences available today, structure elucidation still happens at a slower pace. Yet, structural information deposited in the publicly accessible Protein Data Bank (PDB) (1) also increases at a high rate thanks to the recent structural genomics initiatives (2) (for details see http://targetdb.rutgers.edu) and significant improvements in structure determination methods in general. To further narrow the gap between known sequences and structures, three-dimensional model structures can be predicted by virtue of evolutionary homology (3,4). Therefore, molecular biologists interested in protein function and structure can benefit greatly from the growing amount of structural information. Typical questions concern the relationship between sequence and co-ordinates, or the spatial organization of residues in active sites, as well as their interactions with bound ligands, inhibitors, cofactors and metal ions. Or else they are concerned with identifying the residues that are located at the interface between chains in a structure. However, all too often, protein structure analysis remains an expert task. One way to analyse 3D structures today is by using molecular modelling and visualization tools, e.g. MolMol (5), Rasmol (6), VMD (7) and SPDBV (8). But, to use them, data have to be stored locally and software installed, an increasingly embarrassing task on local area networks. In many cases, the quality of structural analyses can benefit greatly from installing additional hardware, e.g. shutter glasses for three-dimensional representation. In addition to these system prerequisites, many of these tools are not intuitive at first for non-expert users, who require time and effort to become adequately trained.
iMolTalk takes a different approach. It does not require local installation either of soft- or hardware or of data. All computations are carried out on the server and results are presented in the browser of the user. The methods available can be applied to structure, chain or residue. They are organized in so-called toolchains, which represent a logical sequence of steps to gather the necessary input for particular algorithms. From each toolchain, any other toolchain can be accessed with the current result as input for further analyses on the level of structure, chain and residue. Furthermore, the functionality of the server is not limited to the implemented methods but can be extended by users providing their own structural data or tailor-made scripts, which are then executed on the server.
In an example analysis, we used iMolTalk to relate the annotation of a protein at the residue level from sequence to structure. For the covalently bound cofactor, we determined its contacts to the protein environment, in both the apo and holo forms, all within minutes of connecting to the server.
| IMPLEMENTATION |
|---|
|
|
|---|
iMolTalk was implemented as a web server using CGI (common gateway interface) for communication with standard Internet browsers. Computations are server-centric, i.e. all data and programs are available on the server and do not require local installation by the user. Nevertheless, users are able to upload their own structure files in PDB format and crafted scripts to extend the functionality of the server. A weekly mirror of the PDB was installed on the server to guarantee the availability of the most recently released structures to iMolTalk.
The underlying software was implemented in Objective-C using the MolTalk library libmoltalk (http://www.moltalk.org, submitted) or was directly programmed in the MolTalk scripting language. MolTalk is a computational environment which maps PDB files to an object-oriented representation. iMolTalk provides computation on three types of structural objects: structure, chain and residue. A structure is the representation of a PDB file containing single or multiple chains, which themselves hold a list of residues. A residue is either an amino acid or a nucleic acid or can be a heterogeneous group of atoms. The scripting language included (related to the programming language Smalltalk) is inherently object-oriented and allows access to all objects (structure, chain and residue) and implemented algorithms (e.g. structure superposition and geometric hashing of residue co-ordinates).
| TOOLCHAINS |
|---|
|
|
|---|
The services provided by the iMolTalk server are organized into predefined logical sequences of mandatory user input termed toolchains (Table 1, Figure 1). During the last step of a toolchain, a result is computed and reported back to the user. Objects of type structure, chain and residue in the report are turned into active links. These links lead to characteristic pop-up menus, which, for each object, provide direct access to the results of other toolchains (Table 2, Figure 2B). Within a toolchain, one can go back and forth to change input parameters and to re-compute results (Figure 1).
|
|
|
|
| EXAMPLE ANALYSIS |
|---|
|
|
|---|
Some possible ways to analyse protein structures with iMolTalk are presented for the analysis of the mitochondrial aspartate aminotransferase (Swiss-Prot identifier AATM_CHICK) and its corresponding structures in open (PDB code 7AAT [PDB] ) and closed (PDB code 1AMA [PDB] ) form (9,10). The family of aspartate aminotransferases exists as two isozymes: one located in the cytosol and the other in the mitochondria. The enzyme catalyses the reversible transfer of an amino group with the help of PLP (pyridoxal-5'-phosphate or vitamin B6) as a cofactor. The homo-dimer of two subunits forms the active enzyme with two independent active sites.
First, the correspondence of the residues in the protein sequence to the residues present in the structure was established (Figure 2B). Often, residues in a structure cannot be identified by their number in the sequence, and vice versa, because the numbering schemes differ. A pairwise global alignment of the two sequences can reveal such a correspondence, assuming a reasonably high homology between the two sequences. For the mitochondrial aspartate aminotransferases, the alignment showed that the structure lacks the N-terminal target sequence. The shift in numbering could be detected easily. As an example, K272, which in Swiss-Prot was annotated to bind pyridoxal phosphate, corresponds to K258 in the open-form structure. In the alignment, the sequence of the structure is coloured according to the secondary structure assignment based on STRIDE (11). Each residue in the structure sequence represents an active link to a pop-up menu. As shown for K258, the pop-up menu allows direct access to the results of the toolchains Residue contacts and Scripting editor for this residue.
Second, the report of the toolchain Residue contacts (Figure 2C; detail in Figure 3A) showed that the cofactor (PLP258) is covalently bound to the terminal ammonium group (atom NZ) of K258. Moreover, a specific H-bond to Y70 of the other subunit (chain B) in the dimer was highlighted. This contact of the active site in chain A to Y70 of chain B might be an important functional feature of the dimer. In the closed form, the cofactor is covalently bound to the substrate in exchange for K258. The report of the toolchain Residue contacts for PLP258 in the structure of the protein in open conformation revealed that the terminal ammonium group of K258 now formed a hydrogen bond to the phosphate group of the cofactorsubstrate complex (Figure 3B).
|
The C
-distance map highlights contacts between secondary structure elements in a structure. A parallel ß-sheet shows up as a diagonal line parallel to the main diagonal of the graph; an anti-parallel ß-sheet is perpendicular to the main diagonal. Owing to their local contacts,
-helices appear thicker along the main diagonal. For chain A of the PDB structure 7AAT
[PDB]
, the typical pattern of helixhelix contacts and parallel ß-sheets at the C-terminal end of the sequence can be displayed (Figure 2D). In the C
-distance map provided by iMolTalk, rows and columns represent the residues in a protein and are coloured according to their secondary structure. Rows are active links to the residue-specific pop-up menu (Figure 2B, Table 2). | CONCLUSION |
|---|
|
|
|---|
With iMolTalk we provide a web server for protein structure analysis, e.g. to map annotation from sequence to structure or to investigate atom contacts and structural interfaces in a highly interactive manner. Results are represented in a user-friendly format and can be readily used in further analyses. As input, the server requires only a PDB identifier or a file in PDB format. The functionality of the server can be extended by user-provided scripts.
| Notes |
|---|
The online version of this article has been published under an open access model. Users are entitled to use, reproduce, disseminate, or display the open access version of this article provided that: the original authorship is properly and fully attributed; the Journal and Oxford University Press are attributed as the original place of publication with the correct citation details given; if an article is subsequently reproduced or disseminated not in its entirety but only in part or as a derivative work this must be clearly indicated.
| REFERENCES |
|---|
|
|
|---|
- Berman,H.M., Westbrook,J., Feng,Z., Gilliland,G., Bhat,T.N., Weissig,H., Shindyalov,I.N. and Bourne,P.E. ( (2000) ) The Protein Data Bank. Nucleic Acids Res., , 28, , 235242.
[Abstract/Free Full Text] - Stevens,R.C., Yokoyama,S. and Wilson,I.A. ( (2001) ) Global efforts in structural genomics. Science, , 294, , 8992.
[Abstract/Free Full Text] - Chothia,C. and Lesk,A.M. ( (1986) ) The relation between the divergence of sequence and structure in proteins. EMBO J., , 5, , 823826.[Web of Science][Medline]
- Baker,D. and Sali,A. ( (2001) ) Protein structure prediction and structural genomics. Science, , 294, , 9396.
[Abstract/Free Full Text] - Koradi,R., Billeter,M. and Wuthrich,K. ( (1996) ) MOLMOL: a program for display and analysis of macromolecular structures. J. Mol. Graph., , 14, , 5155.[CrossRef][Web of Science][Medline]
- Sayle,R.A. and Milner-White,E.J. ( (1995) ) RASMOL: biomolecular graphics for all. Trends Biochem. Sci., , 20, , 374.[CrossRef][Web of Science][Medline]
- Humphrey,W., Dalke,A. and Schulten,K. ( (1996) ) VMD: visual molecular dynamics. J. Mol. Graph., , 14, , 2728.
- Guex,N. and Peitsch,M.C. ( (1997) ) SWISS-MODEL and the Swiss-PdbViewer: an environment for comparative protein modeling. Electrophoresis, , 18, , 27142723.[CrossRef][Web of Science][Medline]
- Boeckmann,B., Bairoch,A., Apweiler,R., Blatter,M.C., Estreicher,A., Gasteiger,E., Martin,M.J., Michoud,K., O'Donovan,C., Phan,I., Pilbout,S. and Schneider,M. ( (2003) ) The SWISS-PROT protein knowledgebase and its supplement TrEMBL in 2003. Nucleic Acids Res., , 31, , 365370.
[Abstract/Free Full Text] - McPhalen,C.A., Vincent,M.G., Picot,D., Jansonius,J.N., Lesk,A.M. and Chothia,C. ( (1992) ) Domain closure in mitochondrial aspartate aminotransferase. J. Mol. Biol., , 227, , 197213.[CrossRef][Web of Science][Medline]
- Frishman,D. and Argos,P. ( (1995) ) Knowledge-based protein secondary structure assignment. Proteins, , 23, , 566579.[CrossRef][Web of Science][Medline]
- Ramachandran,G.N., Ramakrishnan,C. and Sasisekharan,V. ( (1963) ) Stereochemistry of polypeptide chain configurations. J. Mol. Biol., , 7, , 9599.[Web of Science][Medline]
- Morris,A.L., MacArthur,M.W., Hutchinson,E.G. and Thornton,J.M. ( (1992) ) Stereochemical quality of protein structure coordinates. Proteins, , 12, , 345364.[CrossRef][Web of Science][Medline]
- Richardson,J.S. ( (1981) ) The anatomy and taxonomy of protein structure. Adv. Protein Chem., , 34, , 167339.[Medline]
- Stickle,D.F., Presta,L.G., Dill,K.A. and Rose,G.D. ( (1992) ) Hydrogen bonding in globular proteins. J. Mol. Biol., , 226, , 11431159.[CrossRef][Web of Science][Medline]
This article has been cited by other articles:
![]() |
F. I. Andersson, A. Tryggvesson, M. Sharon, A. V. Diemand, M. Classen, C. Best, R. Schmidt, J. Schelin, T. M. Stanne, B. Bukau, et al. Structure and Function of a Novel Type of ATP-dependent Clp Protease J. Biol. Chem., May 15, 2009; 284(20): 13519 - 13532. [Abstract] [Full Text] [PDF] |
||||
![]() |
J. Guhaniyogi, I. Sohar, K. Das, A. M. Stock, and P. Lobel Crystal Structure and Autoactivation Pathway of the Precursor Form of Human Tripeptidyl-peptidase 1, the Enzyme Deficient in Late Infantile Ceroid Lipofuscinosis J. Biol. Chem., February 6, 2009; 284(6): 3985 - 3997. [Abstract] [Full Text] [PDF] |
||||
![]() |
A. Celic, E. T. Petri, B. Demeler, B. E. Ehrlich, and T. J. Boggon Domain Mapping of the Polycystin-2 C-terminal Tail Using de Novo Molecular Modeling and Biophysical Analysis J. Biol. Chem., October 17, 2008; 283(42): 28305 - 28312. [Abstract] [Full Text] [PDF] |
||||
![]() |
J. Guhaniyogi, T. Wu, S. S. Patel, and A. M. Stock Interaction of CheY with the C-Terminal Peptide of CheZ J. Bacteriol., February 15, 2008; 190(4): 1419 - 1428. [Abstract] [Full Text] [PDF] |
||||
![]() |
B. G. Fry, H. Scheib, L. van der Weerd, B. Young, J. McNaughtan, S. F. R. Ramjan, N. Vidal, R. E. Poelmann, and J. A. Norman Evolution of an Arsenal: Structural and Functional Diversification of the Venom System in the Advanced Snakes (Caenophidia) Mol. Cell. Proteomics, February 1, 2008; 7(2): 215 - 246. [Abstract] [Full Text] [PDF] |
||||
![]() |
A. Biegert, C. Mayer, M. Remmert, J. Soding, and A. N. Lupas The MPI Bioinformatics Toolkit for protein sequence analysis. Nucleic Acids Res., July 1, 2006; 34(Web Server issue): W335 - W339. [Abstract] [Full Text] [PDF] |
||||
![]() |
V. Sobolev, E. Eyal, S. Gerzon, V. Potapov, M. Babor, J. Prilusky, and M. Edelman SPACE: a suite of tools for protein structure prediction and analysis based on complementarity and environment Nucleic Acids Res., July 1, 2005; 33(suppl_2): W39 - W43. [Abstract] [Full Text] [PDF] |
||||
| ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||



of K258 at a distance of 1.29 Å, forming a Schiff base. (B) In the closed form (PDB code 1AMA



