Skip Navigation

This Article
Right arrow Abstract Freely available
Right arrow Print PDF (83K) Freely available
Right arrow Alert me when this article is cited
Right arrow Alert me if a correction is posted
Services
Right arrow Email this article to a friend
Right arrow Similar articles in this journal
Right arrow Similar articles in ISI Web of Science
Right arrow Similar articles in PubMed
Right arrow Alert me to new issues of the journal
Right arrow Add to My Personal Archive
Right arrow Download to citation manager
Right arrow Search for citing articles in:
ISI Web of Science (61)
Right arrowRequest Permissions
Right arrow Commercial Re-use Guidelines
for Open Access NAR Content
Google Scholar
Right arrow Articles by Rawlings, N. D.
Right arrow Articles by Barrett, A. J.
Right arrow Search for Related Content
PubMed
Right arrow PubMed Citation
Right arrow Articles by Rawlings, N. D.
Right arrow Articles by Barrett, A. J.
Social Bookmarking
 Add to CiteULike   Add to Connotea   Add to Del.icio.us  
What's this?

Nucleic Acids Research, 2000, Vol. 28, No. 1 323-325
© 2000 Oxford University Press

MEROPS: the peptidase database

Neil D. Rawlings* and Alan J. Barrett

MRC Molecular Enzymology Laboratory, The Babraham Institute, Babraham, Cambridgeshire CB2 4AT, UK

Received September 29, 1999; Revised and Accepted October 8, 1999.


    ABSTRACT
 TOP
 ABSTRACT
 INTRODUCTION
 SEQUENCE ALIGNMENTS
 CLADOGRAMS
 SUBFAMILIES
 DISTINCTION OF KNOWN FROM...
 REFERENCES
 
Important additions have been made to the MEROPS database (http://www.bi.bbsrc.ac.uk/Merops/Merops.htm ). These include sequence alignments and cladograms for many of the families of peptidases, and these have proved very helpful in the difficult task of distinguishing the sequences of peptidases that are simply species variants of already known enzymes from those that represent novel enzymes.


    INTRODUCTION
 TOP
 ABSTRACT
 INTRODUCTION
 SEQUENCE ALIGNMENTS
 CLADOGRAMS
 SUBFAMILIES
 DISTINCTION OF KNOWN FROM...
 REFERENCES
 
The MEROPS database (http://www.bi.bbsrc.ac.uk/Merops/Merops.htm) provides a catalogue and a structure-based classification of peptidases (i.e. all proteolytic enzymes) (1). This is a large group of proteins (~2% of all gene products) that is of particular importance in medicine and biotechnology. An index of the peptidases by name or synonym (http://www.bi.bbsrc.ac.uk/Merops/indexes/pepidx.htm) gives access to a set of files termed PepCards (e.g. caspase-1; http://www.bi.bbsrc.ac.uk/Merops/pepcards/c14p001.htm) each of which provides information on classification and nomenclature for a single peptidase. Also provided are an interface to the relevant entries in databases for human genetics, protein and nucleic acid sequence data and tertiary structure data, and if the tertiary structure of the enzyme has been determined, a Richardson diagram (http://www.bi.bbsrc.ac.uk/Merops/images/1ice.htm) (2) is shown. Another index provides access to the PepCards by organism name (http://www.bi.bbsrc.ac.uk/Merops/indexes/pepidx.htm) so that the user can retrieve all known peptidases from a particular species. The peptidases are classified into families on the basis of statistically significant similarities between the protein sequences in the part termed the ‘peptidase unit’ that is most directly responsible for activity. Families that are thought to have common evolutionary origins because they have similar tertiary folds are grouped into clans. The MEROPS database provides sets of pages called FamCards (e.g. C14; http://www.bi.bbsrc.ac.uk/Merops/famcards/c14.htm) and ClanCards (e.g. CD; http://www.bi.bbsrc.ac.uk/Merops/clancards/cd.htm) describing the individual families and clans. Each FamCard page provides links to other databases of sequence motifs and secondary and tertiary structures, and shows the distribution of the family across the major kingdoms of living creatures.

Among the recent developments to the MEROPS database have been links to the Drosophila genome database FlyBase (3) and to TrEMBLNew, the updates to the TrEMBL database (4). Biomedical, biotechnological and physiological information have been incorporated into the PepCards. We have also included amino acid sequence alignments of the peptidase units for many families and subfamilies, and cladograms that give a graphical representation of the degrees of similarity of the sequences. The way in which we have been able to use these to recognise homologous but different peptidases within a single organism, and forms of the same peptidase in different organisms, is the main topic of the present paper.


    SEQUENCE ALIGNMENTS
 TOP
 ABSTRACT
 INTRODUCTION
 SEQUENCE ALIGNMENTS
 CLADOGRAMS
 SUBFAMILIES
 DISTINCTION OF KNOWN FROM...
 REFERENCES
 
Many peptidases are complex proteins in which the part of the molecule most directly responsible for peptide bond hydrolysis [which we term the ‘peptidase unit’ (1)] is linked at one or both ends to domains with other functions. Generally, the peptidase unit is a contiguous section of the sequence and contains about 200 amino acids. The peptidase unit is identified by reference to crystallographic structures if they are available. It cannot be larger than the smallest active peptidase in the family, and may be smaller. When the smallest active peptidase is a mosaic protein, we restrict the peptidase unit still further by excluding domains that show similarity to domains in proteins that are not peptidases.

Once the peptidase unit has been determined for the ‘type example’ (1) of the family, then the limits of the peptidase unit for all members of the family can be determined with a FastA search (5). A library of all the full-length sequences of active peptidases in the family is compiled from the SWISS-PROT, TrEMBL and PIR protein sequence databases, and this is searched with the peptidase unit of the type example. The limits of the peptidase unit of each hit returned from the pairwise alignments in the FastA results file are calculated. If the alignment does not include the N- and C-termini of the type example, e.g. if the hit is a fragment of a sequence, the sequence is usually discarded. When, exceptionally, the hit has an insert longer than that permitted in the FastA alignment algorithm, a manual comparison is made and the limits of the peptidase unit are calculated by hand. The protein sequences of all the peptidase units identified in this way are extracted and concatenated into a file which is read directly into the ClustalX program (6) and aligned using the default parameters. The alignment is checked visually to ensure that known catalytic residues are correctly aligned, and any sequences that cannot be aligned are discarded. Finally, the ClustalX alignment is converted into an HTML file.

The value of an alignment is enhanced by annotation. The method we have used to annotate alignments automatically is to collect data that can be related directly to the sequence of the type example of the family or subfamily, and to use a combination of symbols and color highlighting on the alignment to show conservation of these structural features. An example of an annotated alignment is that for the peptidase family M8 (http://www.bi.bbsrc.ac.uk/Merops/aln/m08_fra.htm) of leishmano­lysin. An amino acid residue that is identical with that in the type example (Leishmania major leishmanolysin) is shown in purple and in bold. In keeping with the MSF format of the GCG package (7), gaps are shown as dots, and the sequences are displayed in blocks of 60 amino acids. Each block is preceded by two lines of residue numbering and up to three lines for the sequence-specific annotation symbols, and followed by three lines of consensus sequence. The alignment is numbered according to the complete sequence of the type example, and letters indicate insertions relative to the type example. The color-coded annotation symbols are used consistently in all the alignments, and the full list is shown in Table 1. At the end of each alignment there is a key to the symbols. The first consensus line shows residues that occur in more than half of the sequences (or an ‘x’ if no single amino acid predominates). The second line shows the type of amino acid that occurs at each position by use of the SMART groupings and symbols (8). The third line shows an amino acid that is absent from that position in the alignment but is a member of the same SMART group depicted in the second line.


View this table:
[in this window]
[in a new window]
 
Table 1. Features that are annotated in the alignments and the symbols used
 

    CLADOGRAMS
 TOP
 ABSTRACT
 INTRODUCTION
 SEQUENCE ALIGNMENTS
 CLADOGRAMS
 SUBFAMILIES
 DISTINCTION OF KNOWN FROM...
 REFERENCES
 
Cladograms are calculated from the amino acid sequence alignments. A difference matrix is calculated and converted to accepted point mutations (PAMs) according to the table of Dayhoff et al. (9). A single tree is calculated from this matrix with the Kitsch program of the Phylip package (10); this uses the algorithm of Fitch and Margoliash (11) with contempory tips, with the assumption that mutation rate is constant throughout the family. The tree is displayed with the branches horizontal, the ‘root’ on the left and the tips on the right. (Trees generated by the Kitsch algorithm are unrooted, and the ‘root’ is the mid-point of the most ancient divergence.) We rearrange the branches so that the longest are towards the bottom of the image.

The cladograms that are included in MEROPS are intended to show how similar sequences cluster together, and not to give an accurate depiction of evolution in the family, so no bootstrapping is done.

Both alignments and cladograms are presented within frames in the HTML pages of the database; each is shown in a frame that covers the top half of the screen, and the frame below contains the key. For a given family the order of sequences in the alignment is adjusted to match the order in the tree. Each tip (sequence) on the tree is numbered, and an asterisk marks the type example of the family. The key indicates the peptidase name, the scientific binomial of the species it is derived from, the SWISS-PROT or TrEMBL accession number and the MEROPS identifier. If the sequence has not been assigned to a MEROPS identifier, then it is described as an ‘other peptidase’ and an alternative name is given in parenthesis. If the peptidase family is divided into subfamilies (see below), then the key is also divided into subfamilies.

The maximum number of sequences that are included in an alignment or a tree is 100 so as to make them easy to read. If the family contains more than 100 sequences, then the alignments and trees are drawn for separate subfamilies where possible.


    SUBFAMILIES
 TOP
 ABSTRACT
 INTRODUCTION
 SEQUENCE ALIGNMENTS
 CLADOGRAMS
 SUBFAMILIES
 DISTINCTION OF KNOWN FROM...
 REFERENCES
 
Criteria we have used to recognise subfamilies within a peptidase family were described previously (1), and the method of use of cladograms for this purpose is illustrated by the tree for peptidase family M10 (the interstitial collagenase family; http://www.bi.bbsrc.ac.uk/Merops/trees/m10_frt.htm). The family can be divided into three subfamilies, with tips 1–69 in subfamily M10A (the matrixin subfamily), tips 70–80 in subfamily M10B (the serralysin subfamily) and tips 81–82 in subfamily M10C (the fragilysin subfamily).


    DISTINCTION OF KNOWN FROM NOVEL PEPTIDASES
 TOP
 ABSTRACT
 INTRODUCTION
 SEQUENCE ALIGNMENTS
 CLADOGRAMS
 SUBFAMILIES
 DISTINCTION OF KNOWN FROM...
 REFERENCES
 
As a result of the rapid progress of the nucleotide sequencing projects we are aware of the deduced amino acid sequences of many members of the known families of peptidases for which there are no biochemical data. For each of these putative peptidases a decision has to be made whether to assign it to the MEROPS identifier of an already known peptidase, with an existing PepCard, and with the implication that its properties and functions are similar, or to treat it as a novel entity. The cladograms and sequence alignments provide valuable clues in making such decisions. Criteria we have used for a single peptidase were described previously (1).

Cladograms are used to give a preliminary indication of which sequences are species variants of a known peptidase. The tree for peptidase family A1 (the pepsin family; http://www.bi.bbsrc.ac.uk/Merops/trees/a01_frt.htm) will be used as an example for a family in which it has been difficult to identify the sequences that represent species variants of known enzymes. One problem is that species variants have sometimes been given different names simply because of their different origins, but also some organisms contain multiple duplicated genes encoding very similar enzymes.

As an example of easy identification of orthologues, tips 1–11 have been assigned to the MEROPS identifier for cathepsin D (A01.009). The branching pattern matches that expected for orthologues, the most ancient divergences being in nematodes and trematodes, then in insects and vertebrates.

Aspartic endopeptidases from fungi have been assumed in the past to represent a variety of different enzymes, and the tree supports this view in many cases. Fungal vacuolar peptidases (tips 13–16) appear to be the fungal equivalents of cathepsin D, a lysosomal enzyme in animals, whereas aspergillopepsins I (tips 62–71), rhizopuspepsins (tips 72–77), mucorpepsins (tips 78–79) and candidapepsins (tips 82–90) are unlike known animal or plant peptidases. In several of these broad groups genes are known to have proliferated in some species (notably Candida albicans and Rhizopus niveus), and when biochemical evidence exists to show that these represent distinct enzymes we have maintained this distinction. This explains why penicillo­pepsin, for example, appears within the aspergillopepsin I branch. Presumably, accumulation of point mutations has led to a change in substrate specificity subsequent to the divergence of the Aspergillus and Penicillium genera. The distinction is that penicillopepsin can cleave {kappa}-casein and thus clot milk (12) whereas aspergillopepsin I cannot (13).

There are many sequences that apparently represent novel enzymes but which have not yet been characterized bio-chemically. This is most notable for the nematode sequences in tips 50–56.

Examples of mutations that change peptidase properties can be seen from the cladogram and alignment for cysteine endopeptidases of family C1 (the papain family). The cladogram for family C1 (http://www.bi.bbsrc.ac.uk/Merops/trees/c01a_frt.htm) shows that four peptidase sequences from papaya (Carica papaya) form a branch distinct from those of other plants. These are papain, chymopapain, caricain and glycyl endopeptidase, but the cladogram gives no clue to their enzymology. The sequence alignment for plant members of family C1 (http://www.bi.bbsrc.ac.uk/Merops/aln/c01a_fra.htm) is more informative if inspected in the light of the known structure/function relationships in the family, in particular the residues that form the specificity sites. Thus, Gly156 and Gly198 are both in the S1 specificity pocket of papain and are highly conserved except in glycyl endopeptidase where they are substituted with Glu and Arg, respectively. Enzymological data show that papain, chymopapain and caricain are all similar in specificity to the animal lysosomal endopeptidase cathepsin L, although they differ in details of the cleavage of the insulin B chain. Glycyl endopeptidase, however, has a very restricted specificity, able only to cleave glycyl bonds because of the bulky replacements in the S1 site (14). In the kiwi fruit (Actinidia deliciosa) actinidain is similar in properties to papain but is less efficient in cleaving substrates in which the P2 residue is aromatic, and the explanation for this is that Ser338 has been substituted by Met, making the S2 subsite shorter (15), and again this is apparent from the alignment.


    FOOTNOTES
 
* To whom correspondence should be addressed. Tel: +44 1223 496649; Fax: +44 1223 496023; Email: neil.rawlings@bbsrc.ac.uk Back


    REFERENCES
 TOP
 ABSTRACT
 INTRODUCTION
 SEQUENCE ALIGNMENTS
 CLADOGRAMS
 SUBFAMILIES
 DISTINCTION OF KNOWN FROM...
 REFERENCES
 

    1 Rawlings,N.D. and Barrett,A.J. (1999) Nucleic Acids Res., 27, 325–331.[Abstract/Free Full Text]

    2 Richardson,J.S. (1985) Methods Enzymol., 115, 359–380.[Web of Science][Medline]

    3 The FlyBase Consortium (1999) Nucleic Acids Res., 27, 85–88.[Abstract/Free Full Text]

    4 Bairoch,A. and Apweiler,R. (1998) Nucleic Acids Res., 26, 38–42. Updated article in this issue: Nucleic Acids Res. (2000), 28, 45–48.[Abstract/Free Full Text]

    5 Pearson,W.R. and Lipman,D.J. (1988) Proc. Natl Acad. Sci. USA, 85, 2444–2448.[Abstract/Free Full Text]

    6 Thompson,J.D., Gibson,T.J., Plewniak,F., Jeanmougin,F. and Higgins,D.G. (1997) Nucleic Acids Res., 25, 4876–4882.[Abstract/Free Full Text]

    7 Genetics Computer Group (1994) Program Manual for the Wisconsin Package, Version 8, September 1994. University of Madison, Wisconsin.

    8 Ponting,C.P., Schultz,J., Milpetz,F. and Bork,P. (1999) Nucleic Acids Res., 27, 229–232. Updated article in this issue: Nucleic Acids Res. (2000), 28, 231–234.[Abstract/Free Full Text]

    9 Dayhoff,M.O., Schwartz,R.M. and Orcutt,B.C. (1978) In Dayhoff,M.O. (ed.), Atlas of Protein Sequence and Structure 1978. National Biomedical Research Foundation, Washington D.C., pp. 353–358.

    10 Felsenstein,J. (1989) Cladistics, 5, 164–166.

    11 Fitch,W.M. and Margoliash,E. (1967) Science, 155, 281–284.

    12 Hofmann,T. (1998) In Barrett,A.J., Rawlings,N.D. and Woessner,J.F. (eds), Handbook of Proteolytic Enzymes. Academic Press, London, pp. 878–883.

    13 Ichishima,E. (1998) In Barrett,A.J., Rawlings,N.D. and Woessner,J.F. (eds), Handbook of Proteolytic Enzymes. Academic Press, London, pp. 872–878.

    14 O’Hara,B.P., Hemmings,A.M., Buttle,D.J. and Pearl,L.H. (1995) Biochemistry, 34, 13190–13195.[Medline]

    15 Watts,A.B. and Brocklehurst,K. (1998) In Barrett,A.J., Rawlings,N.D. and Woessner,J.F. (eds), Handbook of Proteolytic Enzymes. Academic Press, London, pp. 573–576.


Add to CiteULike CiteULike   Add to Connotea Connotea   Add to Del.icio.us Del.icio.us    What's this?


This article has been cited by other articles:


Home page
Appl. Environ. Microbiol.Home page
G. Felfoldi, J. Marokhazi, M. Kepiro, and I. Venekei
Identification of Natural Target Proteins Indicates Functions of a Serralysin-Type Metalloprotease, PrtA, in Anti-Immune Mechanisms
Appl. Envir. Microbiol., May 15, 2009; 75(10): 3120 - 3126.
[Abstract] [Full Text] [PDF]


Home page
HypertensionHome page
K. Kotlo, D. E. Hughes, V. L.M. Herrera, N. Ruiz-Opazo, R. H. Costa, R. B. Robey, and R. S. Danziger
Functional Polymorphism of the Anpep Gene Increases Promoter Activity in the Dahl Salt-Resistant Rat
Hypertension, March 1, 2007; 49(3): 467 - 472.
[Abstract] [Full Text] [PDF]


Home page
FASEB J.Home page
I. Petermann, C. Mayer, J. Stypmann, M. L. Biniossek, D. J. Tobin, M. A. Engelen, T. Dandekar, T. Grune, L. Schild, C. Peters, et al.
Lysosomal, cytoskeletal, and metabolic alterations in cardiomyopathy of cathepsin L knockout mice
FASEB J, June 1, 2006; 20(8): 1266 - 1268.
[Abstract] [Full Text] [PDF]


Home page
Cancer Res.Home page
O. Vasiljeva, A. Papazoglou, A. Kruger, H. Brodoefel, M. Korovin, J. Deussing, N. Augustin, B. S. Nielsen, K. Almholt, M. Bogyo, et al.
Tumor cell-derived and macrophage-derived cathepsin B promotes progression and lung metastasis of mammary cancer.
Cancer Res., May 15, 2006; 66(10): 5242 - 5250.
[Abstract] [Full Text] [PDF]


Home page
J. Am. Soc. Nephrol.Home page
D. Andreasen, G. Vuagniaux, N. Fowler-Jaeger, E. Hummler, and B. C. Rossier
Activation of Epithelial Sodium Channels by Mouse Channel Activating Proteases (mCAP) Expressed in Xenopus Oocytes Requires Catalytic Activity of mCAP3 and mCAP2 but not mCAP1
J. Am. Soc. Nephrol., April 1, 2006; 17(4): 968 - 976.
[Abstract] [Full Text] [PDF]


Home page
Nucleic Acids ResHome page
N. D. Rawlings, D. P. Tolle, and A. J. Barrett
MEROPS: the peptidase database
Nucleic Acids Res., January 1, 2004; 32(90001): D160 - 164.
[Abstract] [Full Text] [PDF]


Home page
BloodHome page
S. Lee, A. K. Debnath, and C. M. Redman
Active amino acids of the Kell blood group protein and model of the ectodomain based on the structure of neutral endopeptidase 24.11
Blood, October 15, 2003; 102(8): 3028 - 3034.
[Abstract] [Full Text] [PDF]


Home page
Microbiol. Mol. Biol. Rev.Home page
C. Goffin and J.-M. Ghuysen
Biochemistry and Comparative Genomics of SxxK Superfamily Acyltransferases Offer a Clue to the Mycobacterial Paradox: Presence of Penicillin-Susceptible Target Proteins versus Lack of Efficiency of Penicillin as Therapeutic Agent
Microbiol. Mol. Biol. Rev., December 1, 2002; 66(4): 702 - 738.
[Abstract] [Full Text] [PDF]


Home page
Infect. Immun.Home page
F. Brouta, F. Descamps, M. Monod, S. Vermout, B. Losson, and B. Mignon
Secreted Metalloprotease Gene Family of Microsporum canis
Infect. Immun., October 1, 2002; 70(10): 5676 - 5683.
[Abstract] [Full Text] [PDF]


Home page
J. Biol. Chem.Home page
M. Heitzer and A. Hallmann
An Extracellular Matrix-localized Metalloproteinase with an Exceptional QEXXH Metal Binding Site Prefers Copper for Catalytic Activity
J. Biol. Chem., July 26, 2002; 277(31): 28280 - 28286.
[Abstract] [Full Text] [PDF]


Home page
CROBMHome page
D.P. Dickinson
CYSTEINE PEPTIDASES OF MAMMALS: THEIR BIOLOGICAL ROLES AND POTENTIAL EFFECTS IN THE ORAL CAVITY AND OTHER TISSUES IN HEALTH AND DISEASE
Critical Reviews in Oral Biology & Medicine, May 1, 2002; 13(3): 238 - 275.
[Abstract] [Full Text] [PDF]


Home page
Proc. Natl. Acad. Sci. USAHome page
J. Stypmann, K. Glaser, W. Roth, D. J. Tobin, I. Petermann, R. Matthias, G. Monnig, W. Haverkamp, G. Breithardt, W. Schmahl, et al.
Dilated cardiomyopathy in mice deficient for the lysosomal cysteine peptidase cathepsin L
PNAS, April 30, 2002; 99(9): 6234 - 6239.
[Abstract] [Full Text] [PDF]


Home page
J. Bacteriol.Home page
N. T. Hoa, J. A. Brannigan, and S. M. Cutting
The Bacillus subtilis Signaling Protein SpoIVB Defines a New Family of Serine Peptidases
J. Bacteriol., January 1, 2002; 184(1): 191 - 199.
[Abstract] [Full Text] [PDF]


Home page
GeneticsHome page
A. V. Strunnikov, L. Aravind, and E. V. Koonin
Saccharomyces cerevisiae SMT4 Encodes an Evolutionarily Conserved Protease With a Role in Chromosome Condensation Regulation
Genetics, May 1, 2001; 158(1): 95 - 107.
[Abstract] [Full Text]


Home page
Proc. Natl. Acad. Sci. USAHome page
J. Stypmann, K. Glaser, W. Roth, D. J. Tobin, I. Petermann, R. Matthias, G. Monnig, W. Haverkamp, G. Breithardt, W. Schmahl, et al.
Dilated cardiomyopathy in mice deficient for the lysosomal cysteine peptidase cathepsin L
PNAS, April 30, 2002; 99(9): 6234 - 6239.
[Abstract] [Full Text] [PDF]


This Article
Right arrow Abstract Freely available
Right arrow Print PDF (83K) Freely available
Right arrow Alert me when this article is cited
Right arrow Alert me if a correction is posted
Services
Right arrow Email this article to a friend
Right arrow Similar articles in this journal
Right arrow Similar articles in ISI Web of Science
Right arrow Similar articles in PubMed
Right arrow Alert me to new issues of the journal
Right arrow Add to My Personal Archive
Right arrow Download to citation manager
Right arrow Search for citing articles in:
ISI Web of Science (61)
Right arrowRequest Permissions
Right arrow Commercial Re-use Guidelines
for Open Access NAR Content
Google Scholar
Right arrow Articles by Rawlings, N. D.
Right arrow Articles by Barrett, A. J.
Right arrow Search for Related Content
PubMed
Right arrow PubMed Citation
Right arrow Articles by Rawlings, N. D.
Right arrow Articles by Barrett, A. J.
Social Bookmarking
 Add to CiteULike   Add to Connotea   Add to Del.icio.us  
What's this?