Skip Navigation

This Article
Right arrow Abstract Freely available
Right arrow Print PDF (339K) Freely available
Right arrow Alert me when this article is cited
Right arrow Alert me if a correction is posted
Services
Right arrow Email this article to a friend
Right arrow Similar articles in this journal
Right arrow Similar articles in ISI Web of Science
Right arrow Similar articles in PubMed
Right arrow Alert me to new issues of the journal
Right arrow Add to My Personal Archive
Right arrow Download to citation manager
Right arrow Search for citing articles in:
ISI Web of Science (34)
Right arrowRequest Permissions
Right arrow Commercial Re-use Guidelines
for Open Access NAR Content
Google Scholar
Right arrow Articles by Eisenhaber, F.
Right arrow Articles by Wildpaner, M.
Right arrow Search for Related Content
PubMed
Right arrow PubMed Citation
Right arrow Articles by Eisenhaber, F.
Right arrow Articles by Wildpaner, M.
Social Bookmarking
 Add to CiteULike   Add to Connotea   Add to Del.icio.us  
What's this?

Nucleic Acids Research, 2003, Vol. 31, No. 13 3631-3634
© 2003 Oxford University Press

Prediction of lipid posttranslational modifications and localization signals from protein sequences: big-{Pi}, NMT and PTS1

Frank Eisenhaber*, Birgit Eisenhaber, Werner Kubina, Sebastian Maurer-Stroh, Georg Neuberger, Georg Schneider and Michael Wildpaner

Research Institute of Molecular Pathology, Dr. Bohr-Gasse 7, A-1030 Vienna, Republic of Austria

*To whom correspondence should be addressed. Tel: +43 179730557; Fax: +43 17987153; Email: frank.eisenhaber{at}imp.univie.ac.at

Received February 11, 2003; Revised and Accepted March 27, 2003


    ABSTRACT
 TOP
 ABSTRACT
 INTRODUCTION
 BIG-{Pi}: PREDICTION OF THE...
 NMT: PREDICTION OF N...
 PTS1: PREDICTION OF THE...
 REFERENCES
 
Many posttranslational modifications (N-myristoylation or glycosylphosphatidylinositol (GPI) lipid anchoring) and localization signals (the peroxisomal targeting signal PTS1) are encoded in short, partly compositionally biased regions at the N- or C-terminus of the protein sequence. These sequence signals are not well defined in terms of amino acid type preferences but they have significant interpositional correlations. Although the number of verified protein examples is small, the quantification of several physical conditions necessary for productive protein binding with the enzyme complexes executing the respective transformations can lead to predictors that recognize the signals from the amino acid sequence of queries alone. Taxon-specific prediction functions are required due to the divergent evolution of the active complexes. The big-{Pi} tool for the prediction of the C-terminal signal for GPI lipid anchor attachment is available for metazoan, protozoan and plant sequences. The myristoyl transferase (NMT) predictor recognizes glycine N-myristoylation sites (at the N-terminus and for fragments after processing) of higher eukaryotes (including their viruses) and fungi. The PTS1 signal predictor finds proteins with a C-terminus appropriate for peroxisomal import (for metazoa and fungi). Guidelines for application of the three WWW-based predictors (http://mendel.imp.univie.ac.at/) and for the interpretation of their output are described.


    INTRODUCTION
 TOP
 ABSTRACT
 INTRODUCTION
 BIG-{Pi}: PREDICTION OF THE...
 NMT: PREDICTION OF N...
 PTS1: PREDICTION OF THE...
 REFERENCES
 
For researchers who want to analyze the occurrence of a potential PTS1 signal, of a putative GPI lipid anchor attachment or myristoylation sites in their target protein sequences, this text provides application and output interpretation guidelines for the WWW-servers big-{Pi}, NMT and PTS1. The methodology behind those tools and their validation is described in great detail elsewhere (Table 1) except for the new big-{Pi} plant predictor (B. Eisenhaber, M. Wildpaner, C.J. Schultz, G.H.H. Borner, P. Dupree and F. Eisenhaber, manuscript submitted). In the following, we summarize aspects that are important from the user's point of view.


View this table:
[in this window]
[in a new window]
 
Table 1. Big-{Pi}, NMT, PTS1: web URL, taxonomic range and prediction accuracy
 
A number of sequence motifs at the termini of proteins encode signals for targeting to cellular compartments and for posttranslational modifications. The N-terminal signal peptide responsible for export into the ER is the most well known, the mitochondrial and the chloroplast signals are also N-terminally located. In contrast, the peroxisomal targeting signal PTS1 is C-terminal. Many posttranslational modifications are attached N-terminally (N-myristoylation) or C-terminally (GPI lipid anchors, farnesylation, geranylgeranylation), to name just a few (1).

Despite the functional importance of these sequence signals, the theoretical methods for their prediction from the sequence of query proteins has received less general attention than those for studying globular domains. With the concept of homology, the assumption of a common ancestor originating a family of sequentially similar sequences in an evolutionary process involving gene duplications and mutations, function can be assigned to globular domains (having a typical length of 100–150 amino acids) by annotation transfer from experimentally studied sequence family members (2). Unfortunately, the signals for subcellular targeting and posttranslational modification are located in relatively short (<40 amino acids), non-globular regions with typical amino acid compositional bias and interpositional correlations. Therefore, the measures for quantifying remote sequence similarity cannot be directly applied for family classification of these signals.

Even in the absence of knowledge of the active complex responsible for translocation or modification of the substrate protein, the sequence requirements for productive binding with the active protein complex can be derived from the variability of sequences of experimentally verified substrate protein sequences. If the learning set is large, procedures of unsupervised, automated learning successfully extract complex sequence patterns [for example, in the case of SIGNALP (3), the current standard for signal peptide prediction]. The same methodology is considerably less powerful if the learning set is an order of magnitude smaller and less reliable as for the mitochondrial or chloroplast targeting signal (4,5), especially for rejecting false-positive predictions.

If the sequence motif in the substrate protein is considered from the view point of productive binding with the active complex, simple physical conditions for the rejection of non-permissive query sequences can be formulated (6,7). Typically, a core of the sequence motif with several positions of amino acid type conservation is necessary for binding in the active site of the modifying enzyme or the recognition site of the translocator. Conformational flexibility in the motif region is required to adapt to the catalytic cleft. The sequence environment of the core has to provide accessibility of the sequence signal, mechanical linkage to the remainder of the substrate protein and appropriate interaction with the aqueous or membrane surrounding of the active complex. A combined score function with profile terms (for evaluating amino acid type preferences) and physical property terms (with only non-positive scores for rejecting unsuitable queries) can successfully discriminate queries even in cases of single-residue mutations that affect modification efficiency (1,8,9).


    BIG-{Pi}: PREDICTION OF THE C-TERMINAL GPI LIPID ANCHOR MOTIF IN METAZOAN, PROTOZOAN AND PLANT SEQUENCES
 TOP
 ABSTRACT
 INTRODUCTION
 BIG-{Pi}: PREDICTION OF THE...
 NMT: PREDICTION OF N...
 PTS1: PREDICTION OF THE...
 REFERENCES
 
Posttranslational modification with a GPI lipid anchor consists of two reactions executed by the transamidase complex in the endoplasmic reticulum, the attachment of the GPI moiety to the carboxyl terminus ({omega}-site) of the polypeptide after proteolytic cleavage of a C-terminal propeptide. Typically, a GPI lipid anchored protein is finally moved to the extracellular side of the cytomembrane via vesicular transport. The classical sequence pattern consists of four regions defined by the preferred pattern of physical properties of amino acid side chains (6,10). (i) The region {omega}-11... {omega}-1 is a flexible, polar linker. This stretch has been hypothesized to occupy a channel in the transamidase complex. In the structural model of the transamidase (11), access to the active site cleft of the cysteine protease PIG-K/gpi8 is regulated by the endoplasmic lumenal domain of PIG-T, a ß-propeller structure with a central hole. (ii) The region {omega}-1... {omega}+2 has volume constraints and is occupied preferentially by small residues. (iii) The spacer region {omega}+3... {omega}+9 is composed of moderately polar residues. (iv) The typical hydrophobic tail begins with {omega}+9 or {omega}+10 and extends up to the C-terminal end.

The big-{Pi} tool (Table 1) evaluates the concordance of a query with this sequence motif. In the output, the primary and, if available, the secondary {omega}-sites are reported. Together with their sequence position, the prediction quality [strong prediction or twilight zone (8)], the score and the probability of false positive prediction are presented. In the case of sequences without GPI lipid anchor motif, the nevertheless best site is listed. In either case, a detailed description of score components is shown that allows the evaluation of the agreement with amino acid type profile and with physical pattern properties and, especially, to analyze reasons for negative predictions. Therefore, the big-{Pi} predictor is well suited for designing mutations aimed at abolishing GPI lipid anchoring capacity. For example, modified query sequences where the putative site is substituted by a residue with large side chain or with more immobile backbone can be tested prior to the experiment.

A positive prediction by big-{Pi} does not necessarily mean capacity for GPI lipid anchoring in vivo. Big-{Pi} assesses only the concordance of the C-terminus with the GPI lipid anchor modification motif. In the evaluation of the prediction outcome, the issue of ER export signal should receive special independent attention. One can routinely check for signal leaders (3) but alternative export signals [see, for example (12)] should also be taken into account.

Further, the function parameterization relies on the small set of known GPI lipid anchor modified proteins. Thus, a largely negative physical property term (‘profile independent score’) can be considered a sure sign for the absence of the GPI anchor motif because only a handful of very stably derived parameters enter this term (8). In contrast, a small profile score can also be a result of the still limited learning set with biased amino acid type preferences and, consequently, an insufficiently general profile matrix.


    NMT: PREDICTION OF N-MYRISTOYLATION OF N-TERMINAL GLYCINES FOR HIGHER EUKARYOTE, VIRAL AND FUNGAL QUERY SEQUENCES
 TOP
 ABSTRACT
 INTRODUCTION
 BIG-{Pi}: PREDICTION OF THE...
 NMT: PREDICTION OF N...
 PTS1: PREDICTION OF THE...
 REFERENCES
 
N-terminal N-myristoylation is the attachment of a myristoyl anchor to an N-terminal glycine by a myristoyltransferase (NMT) for modulation of interaction of the modified protein with intracellular membranes or with other proteins. At least the N-terminal 17 residues of the substrate protein experience amino acid type variability restrictions for N-myristoylation (7). Positions 1–6 with glycine in the leading position fit the binding pocket of the NMT, positions 7–10 interact non-specifically with the NMT's surface at the mouth of the catalytic cavity, and positions 11–17 form a hydrophilic linker. Thus, in addition to the segment physically interacting with the NMT, 10–11 more residues in a linker region experience weaker sequence variability restrictions and contribute to the recognition motif.

The NMT predictor (Table 1) scores the agreement of a query N-terminus with the N-myristoylation pattern and returns the corresponding probability of false-positive prediction (Fig. 1). We distinguish reliably predicted targets (score≥0), twilight zone predictions (0>score≥-2), and proteins that are predicted not to be NMT targets. It should also be noted that, for example in the case of viral polyproteins, internal glycines become N-terminal after protein processing and are myristoylated. Optionally, possible myristoylation at internal glycines (in typical processing patterns) may be analyzed, too (9).



View larger version (50K):
[in this window]
[in a new window]
 
Figure 1. Example output of the NMT predictor. The information generated upon sequence submission is similarly structured for all three servers. As example, the server output is presented for the yeast 26 S protease regulatory subunit 4 homologue (RPT2, SWISS-PROT accession P40327 [GenBank] ). For control purposes, the complete sequence is returned first with the examined motif highlighted. After the general classification of the prediction (reliable, twilight zone, not predicted) and the overall score and probability of false positive prediction, the components of the score function are listed. In this case, no deviation for the physical property pattern was measured although the protein is not part of the learning set. N-myristoylation of the RPT2 protein has been predicted (9) and the experimental verification reported (15).

 
The N-myristoylation signal is commonly applied to target proteins to membranes. The NMT predictor can be used for testing protein constructs with engineered N-terminal N-myristoylation motif prior to the experiment. With the complete output of score components, the agreement with the physical property pattern can be checked in detail (for example, the suitability of the linker region) and it becomes easy to examine the effects of changes in the construct.


    PTS1: PREDICTION OF THE PTS1 PEROXISOMAL IMPORT SIGNAL FOR HIGHER EUKARYOTES AND FUNGI
 TOP
 ABSTRACT
 INTRODUCTION
 BIG-{Pi}: PREDICTION OF THE...
 NMT: PREDICTION OF N...
 PTS1: PREDICTION OF THE...
 REFERENCES
 
To date, two different signals that can trigger peroxisomal import have been characterized, termed PTS1 and PTS2. PTS1, the major targeting signal, consists of the three C-terminal amino acids (mainly the canonical tripeptide S/A/C-K/R/H-L, but not only) that bind to the inner cavity of the receptor molecule Pex5 in addition to several residues further upstream (13) that either interact with the surface of Pex5 or serve as a short conformationally unrestricted linker to the remainder of the protein.

The concordance with this motif is searched for using an algorithm implemented in the PTS1 signal server (Table 1). Reliably predicted targets should have a non-negative total score; queries with a score larger than -10 are considered as twilight zone hits. In all other cases, the protein is predicted not to have a PTS1 signal. We must emphasize that the server analyzes exclusively the concordance of the query's C-terminus with the generalized PTS1 motif as described above. The PTS1 signal competes with other signals if contained in the sequence. Proteins with dual localizations, including for example a peroxisomal and a mitochondrial fraction (14), are known; proteins with a strong signal peptide are most probably co-translationally exported to the ER.


    ACKNOWLEDGEMENTS
 
The authors are grateful for generous support from Boehringer Ingelheim. This project has been partly funded by the Austrian National Bank (Österreichische Nationalbank), by the Fonds zur Förderung der wissenschaftlichen Forschung Österreichs (FWF P15037 [GenBank] ) and by the Austrian Gen-AU bioinformatics integration network sponsored by BM-BWK.


    REFERENCES
 TOP
 ABSTRACT
 INTRODUCTION
 BIG-{Pi}: PREDICTION OF THE...
 NMT: PREDICTION OF N...
 PTS1: PREDICTION OF THE...
 REFERENCES
 

  1. Eisenhaber,F., Eisenhaber,B. and Maurer-Stroh,S. (2003) Prediction of post-translational modifications from amino acid sequence: problems, pitfalls, methodological hints. In Andrade,M.M. (ed.), Bioinformatics and Genomes: Current Perspectives. Horizon Scientific Press, Wymondham, pp. 81–105.

  2. Bork,P., Dandekar,T., Diaz-Lazcoz,Y., Eisenhaber,F., Huynen,M. and Yuan,Y. (1998) Predicting function: from genes to genomes and back. J. Mol. Biol., 283, 707–725.[CrossRef][Web of Science][Medline]

  3. Nielsen,H., Brunak,S. and von Heijne,G. (1999) Machine learning approaches for the prediction of signal peptides and other protein sorting signals. Protein Eng., 12, 3–9.[Abstract/Free Full Text]

  4. Emanuelsson,O., Nielsen,H. and von Heijne,G. (1999) ChloroP, a neural network-based method for predicting chloroplast transit peptides and their cleavage sites. Protein Sci., 8, 978–984.[Web of Science][Medline]

  5. Emanuelsson,O., von Heijne,G. and Schneider,G. (2001) Analysis and prediction of mitochondrial targeting peptides. Methods Cell Biol., 65, 175–187.[Web of Science][Medline]

  6. Eisenhaber,B., Bork,P. and Eisenhaber,F. (1998) Sequence properties of GPI-anchored proteins near the omega-site: constraints for the polypeptide binding site of the putative transamidase. Protein Eng., 11, 1155–1161.[Abstract/Free Full Text]

  7. Maurer-Stroh,S., Eisenhaber,B. and Eisenhaber,F. (2002) N-terminal N-myristoylation of proteins: refinement of the sequence motif and its taxon-specific differences. J. Mol. Biol., 317, 523–540.[CrossRef][Web of Science][Medline]

  8. Eisenhaber,B., Bork,P. and Eisenhaber,F. (1999) Prediction of potential GPI-modification sites in proprotein sequences. J. Mol. Biol., 292, 741–758.[CrossRef][Web of Science][Medline]

  9. Maurer-Stroh,S., Eisenhaber,B. and Eisenhaber,F. (2002) N-terminal N-myristoylation of proteins: prediction of substrate proteins from amino acid sequence. J. Mol. Biol., 317, 541–557.[CrossRef][Web of Science][Medline]

  10. Eisenhaber,B., Bork,P. and Eisenhaber,F. (2001) Post-translational GPI lipid anchor modification of proteins in kingdoms of life: analysis of protein sequence data from complete genomes. Protein Eng., 14, 17–25.[Abstract/Free Full Text]

  11. Eisenhaber,B., Maurer-Stroh,S., Novatchkova,M., Schneider,G. and Eisenhaber,F. (2003) Enzymes and auxiliary factors for GPI lipid anchor biosynthesis and post-translational transfer to proteins. Bioessays, 25, 367–385.[CrossRef][Web of Science][Medline]

  12. Denny,P.W., Gokool,S., Russell,D.G., Field,M.C. and Smith,D.F. (2000) Acylation-dependent protein export in Leishmania. J. Biol. Chem., 275, 11017–11025.[Abstract/Free Full Text]

  13. Lametschwandtner,G., Brocard,C., Fransen,M., Van Veldhoven,P., Berger,J. and Hartig,A. (1998) The difference in recognition of terminal tripeptides as peroxisomal targeting signal 1 between yeast and human is due to different affinities of their receptor Pex5p to the cognate signal and to residues adjacent to it. J. Biol. Chem., 273, 33635–33643.[Abstract/Free Full Text]

  14. Holbrook,J.D., Birdsey,G.M., Yang,Z., Bruford,M.W. and Danpure,C.J. (2000) Molecular adaptation of alanine : glyoxylate aminotransferase targeting in primates. Mol. Biol. Evol., 17, 387–400.[Abstract/Free Full Text]

  15. Kimura,Y., Saeki,Y., Yokosawa,H., Polevoda,B., Sherman,F. and Hirano,H. (2003) N-terminal modifications of the 19S regulatory particle subunits of the yeast proteasome. Arch. Biochem. Biophys., 409, 341–348.[CrossRef][Web of Science][Medline]

  16. Eisenhaber,B., Bork,P., Yuan,Y., Loffler,G. and Eisenhaber,F. (2000) Automated annotation of GPI anchor sites: case study C. elegans. Trends Biochem. Sci., 25, 340–341.

  17. Cserzo,M., Eisenhaber,F., Eisenhaber,B. and Simon,I. (2002) On filtering false positive transmembrane protein predictions. Protein Eng., 15, 745–752.[Abstract/Free Full Text]

  18. Altschul,S.F., Boguski,M.S., Gish,W. and Wootton,J.C. (1994) Issues in searching molecular sequence databases. Nature Genet., 6, 119–129.[CrossRef][Web of Science][Medline]

  19. Neuberger,G., Maurer-Stroh,S., Eisenhaber,B., Hartig,A. and Eisenhaber,F. (2003) Motif refinement of the peroxisomal targeting signal 1 and evaluation of taxon-specific differences. J. Mol. Biol., 328, 567–579.[CrossRef][Web of Science][Medline]

  20. Neuberger,G., Maurer-Stroh,S., Eisenhaber,B., Hartig,A. and Eisenhaber,F. (2003) Prediction of peroxisomal targeting signal 1 containing proteins from amino acid sequence. J. Mol. Biol., 328, 581–592.[CrossRef][Web of Science][Medline]


Add to CiteULike CiteULike   Add to Connotea Connotea   Add to Del.icio.us Del.icio.us    What's this?


This article has been cited by other articles:


Home page
Protein Eng Des SelHome page
J. Ren, L. Wen, X. Gao, C. Jin, Y. Xue, and X. Yao
CSS-Palm 2.0: an updated software for palmitoylation sites prediction
Protein Eng. Des. Sel., November 1, 2008; 21(11): 639 - 644.
[Abstract] [Full Text] [PDF]


Home page
BioinformaticsHome page
I. B. Kuznetsov
ProBias: a web-server for the identification of user-specified types of compositionally biased segments in protein sequences
Bioinformatics, July 1, 2008; 24(13): 1534 - 1535.
[Abstract] [Full Text] [PDF]


Home page
FASEB J.Home page
D. D. O. Martin, G. L. Vilas, J. A. Prescher, G. Rajaiah, J. R. Falck, C. R. Bertozzi, and L. G. Berthiaume
Rapid detection, discovery, and identification of post-translationally myristoylated proteins during apoptosis using a bio-orthogonal azidomyristate analog
FASEB J, March 1, 2008; 22(3): 797 - 806.
[Abstract] [Full Text] [PDF]


Home page
J. Lipid Res.Home page
P. Orlean and A. K. Menon
Thematic review series: Lipid Posttranslational Modifications. GPI anchoring of protein in yeast and mammalian cells, or: how we learned to stop worrying and love glycophospholipids
J. Lipid Res., May 1, 2007; 48(5): 993 - 1011.
[Abstract] [Full Text] [PDF]


Home page
BioinformaticsHome page
J. Reumers, S. Maurer-Stroh, J. Schymkowitz, and F. Rousseau
SNPeffect v2.0: a new step in investigating the molecular phenotypic effects of human non-synonymous SNPs
Bioinformatics, September 1, 2006; 22(17): 2183 - 2185.
[Abstract] [Full Text] [PDF]


Home page
J. Lipid Res.Home page
H. H. Lin, L. Y. Han, H. L. Zhang, C. J. Zheng, B. Xie, and Y. Z. Chen
Prediction of the functional class of lipid binding proteins from sequence-derived properties irrespective of sequence similarity
J. Lipid Res., April 1, 2006; 47(4): 824 - 831.
[Abstract] [Full Text] [PDF]


Home page
BioinformaticsHome page
N. Fankhauser and P. Maser
Identification of GPI anchor attachment signals by a Kohonen self-organizing map
Bioinformatics, May 1, 2005; 21(9): 1846 - 1852.
[Abstract] [Full Text] [PDF]


Home page
J. Biol. Chem.Home page
M. R. MacPherson, S. M. Lohmann, and S.-A. Davies
Analysis of Drosophila cGMP-dependent Protein Kinases and Assessment of Their in Vivo Roles by Targeted Expression in a Renal Transporting Epithelium
J. Biol. Chem., September 17, 2004; 279(38): 40026 - 40034.
[Abstract] [Full Text] [PDF]


Home page
J. Exp. Biol.Home page
P. Cabrero, V. P. Pollock, S. A. Davies, and J. A. T. Dow
A conserved domain of alkaline phosphatase expression in the Malpighian tubules of dipteran insects
J. Exp. Biol., September 1, 2004; 207(19): 3299 - 3305.
[Abstract] [Full Text] [PDF]


Home page
Mol. Cell. ProteomicsHome page
F. Elortza, T. S. Nuhse, L. J. Foster, A. Stensballe, S. C. Peck, and O. N. Jensen
Proteomic Analysis of Glycosylphosphatidylinositol-anchored Membrane Proteins
Mol. Cell. Proteomics, December 1, 2003; 2(12): 1261 - 1270.
[Abstract] [Full Text] [PDF]


This Article
Right arrow Abstract Freely available
Right arrow Print PDF (339K) Freely available
Right arrow Alert me when this article is cited
Right arrow Alert me if a correction is posted
Services
Right arrow Email this article to a friend
Right arrow Similar articles in this journal
Right arrow Similar articles in ISI Web of Science
Right arrow Similar articles in PubMed
Right arrow Alert me to new issues of the journal
Right arrow Add to My Personal Archive
Right arrow Download to citation manager
Right arrow Search for citing articles in:
ISI Web of Science (34)
Right arrowRequest Permissions
Right arrow Commercial Re-use Guidelines
for Open Access NAR Content
Google Scholar
Right arrow Articles by Eisenhaber, F.
Right arrow Articles by Wildpaner, M.
Right arrow Search for Related Content
PubMed
Right arrow PubMed Citation
Right arrow Articles by Eisenhaber, F.
Right arrow Articles by Wildpaner, M.
Social Bookmarking
 Add to CiteULike   Add to Connotea   Add to Del.icio.us  
What's this?