Nucleic Acids Research Advance Access published online on May 7, 2008
Nucleic Acids Research, doi:10.1093/nar/gkn202
© 2008 The Author(s)
This is an Open Access article distributed under the terms of the Creative Commons Attribution Non-Commercial License (http://creativecommons.org/licenses/by-nc/2.0/uk/) which permits unrestricted non-commercial use, distribution, and reproduction in any medium, provided the original work is properly cited.
NetMHC-3.0: accurate web accessible predictions of human, mouse and monkey MHC class I affinities for peptides of length 8–11
Claus Lundegaard1,*,
Kasper Lamberth2,
Mikkel Harndahl2,
Søren Buus2,
Ole Lund1 and
Morten Nielsen1
1CBS, Department of Systems Biology, Technical University of Denmark DTU, Kemitorvet Build. 208, 2800 Lyngby and 2Department of International Health, Immunology and Microbiology, University of Copenhagen, Panum Institute 22.3.6, Blegdamsvej 18, 2200 Copenhagen N, Denmark
*To whom correspondence should be addressed. Tel: +45 21900767; Fax: +45 45931585; Email: lunde{at}cbc.dtu.dk
Received January 31, 2008. Revised March 27, 2008. Accepted April 4, 2008.
 |
ABSTRACT
|
|---|
NetMHC-3.0 is trained on a large number of quantitative peptide
data using both affinity data from the Immune Epitope Database
and Analysis Resource (IEDB) and elution data from SYFPEITHI.
The method generates high-accuracy predictions of major histocompatibility
complex (MHC): peptide binding. The predictions are based on
artificial neural networks trained on data from 55 MHC alleles
(43 Human and 12 non-human), and position-specific scoring matrices
(PSSMs) for additional 67 HLA alleles. As only the MHC class
I prediction server is available, predictions are possible for
peptides of length 8–11 for all 122 alleles. artificial
neural network predictions are given as actual IC
50 values whereas
PSSM predictions are given as a log-odds likelihood scores.
The output is optionally available as download for easy post-processing.
The training method underlying the server is the best available,
and has been used to predict possible MHC-binding peptides in
a series of pathogen viral proteomes including SARS, Influenza
and HIV, resulting in an average of 75–80% confirmed MHC
binders. Here, the performance is further validated and benchmarked
using a large set of newly published affinity data, non-redundant
to the training set. The server is free of use and available
at:
http://www.cbs.dtu.dk/services/NetMHC.
 |
INTRODUCTION
|
|---|
Intracellular infections with pathogens such as viruses and
certain bacteria are defeated by cytotoxic T lymphocytes (CTL).
The CTL T-cell receptor (TCR) recognizes foreign peptides in
complex with major histocompatibility complex (MHC) class I
molecules on the surface of the infected cells. MHC class I
molecules preferably bind and present nine amino acid long peptides,
which mainly originates from proteins expressed in the cytosol
of the presenting cell. In most vertebrates, MHCs exist in a
number of different allelic variants that each binds a specific
and very limited set of peptides. For a number of years, prediction
methods have developed to identify which peptides will bind
a given MHC (
1), and such predictions can be highly valuable
in a broad range of applications, including rational vaccine
design and disease diagnostics. The artificial neural network
(ANN) training method behind NetMHC (
2,
3) has been benchmarked
to be the best among available methods (
4). Preliminary versions
of the algorithm have been used to predict possible MHC-binding
peptides in a large set of pathogenic viral proteomes, resulting
in an average of >75% confirmed MHC binders (
5). Most MHC
prediction algorithms (a list of other servers is included in
the
Supplementary Material) are trained on peptides of the same
length as they predict, but since data for peptide lengths different
from nine are much more scarce, the broadness of MHC binding
predictions for different peptide lengths is accordingly limited.
In this server, however, a method is implemented making it possible
to predict 8-, 10- and 11-mer peptide binding using 9-mer trained
predictors, which extends the MHC coverage for these peptide
lengths significantly compared to other available MHC:peptide-binding
servers.
 |
METHODS
|
|---|
The server is trained on the largest number of quantitative
peptide:MHC affinity measurements ever published using both
affinity data from the Immune Epitope Database and Analysis
Resource (IEDB) (
6), eluted peptide data from the SYFPEITHI
database (
7) and proprietary affinity data. The predictions
based on ANNs are trained essentially as described in (
3) on
data from 55 MHC alleles (43 Human and 12 non-human), and the
predictions based on position specific scoring matrices (PSSMs)
are trained as described in (
2) for additional 67 HLA alleles.
A large number of 9-mer MHC affinity data have become available
from the IEDB database, since the training of the ANNs used
at NetMHC-3.0, and all peptides not used in the training (6452
9-mer peptide affinity data points, covering 32 HLA alleles)
were used for evaluation of the server performance. These data
are available at the server. In this dataset, 3104 were measured
to be binders (IC
50<500 nM), 76% of these were correctly
predicted as such. 3030 peptides were predicted to bind to a
given HLA, and 78% of these had a measured IC
50<500 nM. The
average Pearson correlation coefficient (PCC) and area under
a ROC curve (AUC) value using a 500 nM classification threshold
were 0.71 and 0.86, respectively. For the full per allele results,
see the Supplementary Material (
Supplementary Table 1 and Supplementary Figure 1).
NetMHC-3.0 uses a new approximation algorithm that reliably
predicts the affinity of peptides of lengths 8, 10 and 11, for
which affinity data for training are rare (
8). The method uses
predictors trained on peptides of length 9 to successfully extrapolate
to other lengths. In short, the method approximates each peptide
of any length to a number of 9-mers, by inserting X (for 8-mers)
or deleting amino acid(s) (for 10- and 11-mers) and set the
final prediction to an average of the 9-mer predictions. We
had previously trained ANN predictors directly on 10-mer affinity
data and since this training more than 2000 10-mer peptide:MHC
affinities had become available from the IEDB database (
6).
Area under a ROC curve (AUC) values were calculated for each
allele using either ANNs trained on 10-mers or the approximation
method. For 12 of the 16 alleles, the approximation method performed
better than the 10-mer trained ANNs (
P < 0.01), see
Supplementary Material Figure 2.
However, for the four HLA-alleles, this evaluation showed better
performance for ANNs trained on 10-mer peptides; these 10-mer
trained ANNs are used for predictions by the server. For 8-mers,
2002 affinity data were extracted covering 35 MHC alleles. The
overall PCC and AUC were 0.68 and 0.86, respectively. For 8-mer
per allele performance, see the
Supplementary Material Figure 4.
For 8-mers, predictors trained on actual 8-mers seems to be
better than the approximation method otherwise used, so for
the alleles with available 8-mer affinity data, 8-mer trained
ANNs are used for the predictions. In general, it is not possible
to estimate how reliable a single prediction is. However, the
stronger the affinity is predicted the higher are the chance
that the actual affinity is stronger than the generally accepted
binding threshold of 500 nM.
 |
SERVER
|
|---|
NetMHC-3.0 predicts the binding affinity of either a list of
peptides with a defined length (8–11 residues) or all
possible sub-peptides hosted within full-length proteins. The
input must be in the FASTA format, or as peptides all of equal
length, one peptide pr. line. The server will accept a maximum
of 5000 sequences per submission; each sequence not more than
20 000 amino acids with a minimum length corresponding to the
selected length of prediction (see subsequently). Input data
can be pasted into a text field or uploaded from a local file
on the user's computer.
If the input is in peptide, format the corresponding tick-box must be selected. The input must not exceed 5000 sequences and with a maximum of 20 000 amino acids in each sequence. One or more MHCs must be selected, as well as the desired peptide length. Only one prediction length at a time can be used. The output can optionally be sorted according to the predicted affinity by selecting a tick-box. The predictions start by clicking the Submit button. An example input in FASTA format is shown in Figure 1.
The output is displayed as raw text with a header indicating
the server name, the type of prediction (PSSM, ANN or ANN-approximation)
the first selected allele and the date (
Figure 2) followed by
the prediction output in a column format. The columns are named
in the first line of the prediction output. The first column
[pos] is the position of the first amino acid of the predicted
peptide within the possibly longer sequence, numbering starting
with 0. Column (peptide) is the primary sequence of the (sub-)peptide.
Column (logscore) is the raw prediction output, which for ANNs
is 1-log
50000 to the affinity in nanomolar units. For PSSM predictions
the raw prediction score is a log-odds likelihood score. Additionally
a column is included for ANN predictions, [affinity (nM)], which
is the predicted affinity presented in nanomolar units. Column
(Bind Level) indicates if the peptide is predicted to bind stronger
than a certain threshold [for ANN predictions stronger than
50 nM (SB) or stronger than 500 nM (WB); for PSSM high-binding
peptides (SB) have a prediction score greater than the 0.1%
percentile score value of 1 000 000 random natural peptides,
and weak binding (WB) peptides a score value above the 1% percentile
score of 1 000 000 random natural peptides predictions]. Predicted
affinities weaker than 500 nM or lower than the 1% percentile
score have no indications. Column (Protein Name) gives the name
of the predicted protein. If peptide input was used, the name
will always be Sequence. Column (Allele) gives
the name of the MHC allele chosen. The output contains all the
sub-peptides for each protein for a given allele either in the
order they appear in the sequence or sorted by predicted affinity
within each protein (if chosen). If more than one protein sequence
were entered, a dashed line will separate the peptides from
each protein. If more than one allele were chosen, the output
will show a header similar to the first immediately after the
first predictions, all in the same web output page.
In each header, there is a link to a file with the output in
tab as separated format, where the filename ends on.xls making
it easily imported into spreadsheet programs. This file always
contains the predicted peptides in the order they appeared in
the input file. The output data for each peptide will be displayed
on a single line with predictions for each of the selected alleles
in different columns (
Figure 3).
 |
FINAL REMARKS
|
|---|
This server is developed to aid research and limit the resources
needed for rational and effective CTL epitope discovery and
will be continuously updated as new data become available. All
comments and suggestions for usability improvements are most
welcome.
 |
SUPPLEMENTARY DATA
|
|---|
Supplementary data are available at NAR Online.
 |
ACKNOWLEDGEMENTS
|
|---|
This work was funded by European Commission (LSHB-CT-2003-503231,
LSHB-CT-2004-012175) and National Institutes of Health (HHSNN26600400006C,
HHSN266200400025C, HHSN266200400083C).
Conflict of interest statement. None declared.
 |
REFERENCES
|
|---|
- Lundegaard C, Lund O, Kesmir C, Brunak S, Nielsen M. Modeling the adaptive immune system: predictions and simulations. Bioinformatics (2007) 23:3265–3275.[Abstract/Free Full Text]
- Nielsen M, Lundegaard C, Worning P, Hvid CS, Lamberth K, Buus S, Brunak S, Lund O. Improved prediction of MHC class I and class II epitopes using a novel Gibbs sampling approach. Bioinformatics (2004) 20:1388–1397.[Abstract/Free Full Text]
- Nielsen M, Lundegaard C, Worning P, Lauemoller SL, Lamberth K, Buus S, Brunak S, Lund O. Reliable prediction of T-cell epitopes using neural networks with novel sequence representations. Protein Sci (2003) 12:1007–1017.[Abstract/Free Full Text]
- Peters B, Bui HH, Frankild S, Nielson M, Lundegaard C, Kostem E, Basch D, Lamberth K, Harndahl M, Fleri W, et al. A community resource benchmarking predictions of peptide binding to MHC-I molecules. PLoS Comput. Biol (2006) 2:e65.[CrossRef][Medline]
- Sylvester-Hvid C, Nielsen M, Lamberth K, Roder G, Justesen S, Lundegaard C, Worning P, Thomadsen H, Lund O, Brunak S, et al. SARS CTL vaccine candidates; HLA supertype-, genome-wide scanning and biochemical validation. Tissue Antigens (2004) 63:395–400.[CrossRef][ISI][Medline]
- Sette A, Fleri W, Peters B, Sathiamurthy M, Bui HH, Wilson S. A roadmap for the immunomics of category A-C pathogens. Immunity (2005) 22:155–161.[CrossRef][Medline]
- Rammensee H, Bachmann J, Emmerich NP, Bachor OA, Stevanovic S. SYFPEITHI: database for MHC ligands and peptide motifs. Immunogenetics (1999) 50:213–219.[CrossRef][ISI][Medline]
- Lundegaard C, Lund O, Nielsen M. Accurate approximation method for prediction of class I MHC af-finities for peptides of length 8, 10 and 11 using prediction tools trained on 9mers. Bioinformatics (2008) in press, doi:10.1093/bioinformatics/btn128.

CiteULike
Connotea
Del.icio.us What's this?