ABSTRACT
The Yeast Protein Database (YPD) is a database for the proteins of the budding
yeast,
Saccharomyces cerevisiae
. YPD is the first annotated database for the complete proteome of any organism.
Now that the complete genome sequence of yeast is available, YPD contains
entries for each of the characterized proteins and for each of the
uncharacterized proteins predicted from the sequence. Contained in YPD are the
calculated properties of each protein such as molecular weight and isoelectric
point, experimentally determined properties such as subcellular localization
and post-translational modifications, and extensive annotations from the yeast
literature. YPD contains 25 000 lines of textual annotation that describe the
known functions, mutant phenotypes, interactions, and other properties for the
approximately 6000 proteins in the yeast proteome. The information in YPD is
updated daily, and it is available on the World Wide Web at
http://www.proteome.com/YPDhome.html .
The Yeast Protein Database (YPD) is the first database to describe the complete
proteome of an organism. YPD combines analysis of the genome sequence with an
extensive review of the literature to provide a comprehensive description of
the proteins of
Saccharomyces cerevisiae
. Each yeast protein, whether characterized experimentally or known only as an
ORF (open reading frame) identified by the genome project, has an entry in YPD.
Some information in YPD, such as molecular weight and isoelectric point, is
calculated from the sequence. Most of the information, including subcellular
localization, post-translational modifications, and information on function is derived from
the yeast literature. Since our previous description of YPD (
1
), the yeast genome sequence has been completed (
2
). YPD is now a database that encompasses the entire yeast proteome.
The most rapidly growing part of YPD is the textual annotations drawn from our
review of the yeast literature. Approximately 25 000 lines of text now describe
the functions, mutant phenotypes, physical interactions, domain structures,
similarities to other proteins, and known modes of regulation for each of the
characterized proteins. Annotations are drawn from ~3500 yeast papers and abstracts, and >8700 yeast papers are cited in the
reference lists.
YPD complements the sequence databases GenBank, PIR-International and SWISS-PROT. YPD interacts heavily with the major genome databases for
Saccharomyces cerevisiae
, Martinsreid Institute for Protein Sequences (MIPS) and
Saccharomyces
Genome Database (SGD). MIPS, the informatics coordination site for the European
Genome Project, maintains a Yeast Genome Access site with annotated chromosome
files, browsers, and analysis tools. SGD, the
Saccharomyces
Genome Database, is maintained at Stanford University as a major resource for
yeast genomic and biological information. YPD uses SGD as the authority for
genetic name assignment and MIPS as the source for the systematic gene name
assignments. Both SGD and MIPS have been used as sources of information about
many of the new proteins, and both MIPS and SGD incorporate parts of the YPD
database. Links have been created in all three databases for cross-referencing to the others .
The YPD Home Page on the World-Wide Web, shown in Figure
1
, provides introductory material on YPD, a News and Comments section, YPD
documentation, and entry points for access to the YPD Protein Reports. Each YPD
Protein Report is presented as a single WWW page, and there is a report for
each of the yeast proteins. Users can access the reports through the search
forms (see below) or through precompiled indexes. The `New and updated YPD
reports by week' section presents an index to the newly characterized proteins
and those with updated entries for each recent week. The `YPD reports by
category' section presents precompiled indexes to the proteins in each of the
categories used in YPD, including localization categories, protein modification
categories, and functional categories (see Fig.
2
). The `YPD reports of special interest' section presents precompiled indexes of
special interest including proteins that contain a variety of sequence motifs.
The YPD Protein Reports can be selected from one of the YPD Search Forms. Users
can select proteins by gene or synonym name, by keywords, or by any of the
protein property categories. Synonyms can include any name used for a yeast
gene including temporary names used in the sequencing project and final
systematic gene names. Keywords can be any word used in the annotations or
references. The short search form is most convenient for searches based on gene
names, synonyms and keywords only. The long search form (partially shown in
Fig.
3
) adds the ability to select from the numerous protein property categories. Both
search forms allow use of the Boolean operators `AND' and `OR' for construction
of queries based on multiple criteria. The result of each search is a page
containing a synopsis of the search strategy, and a list of the protein `hits'
by gene name, synonyms, and the protein name/description field. Clicking any
protein in the `hit' list brings up the corresponding YPD Protein Report.
Figure
An example YPD Protein Report is shown in Figure
4
. Each report begins with a protein name/description field which is designed to
be an informative one-line description of the protein. Next are the fields for the gene names
used in YPD and other databases, and a full list of synonyms. The gene names
used in YPD follow those determined by the
Saccharomyces
Genome Database (SGD) in nearly all cases. Also included in the upper section
of the report are the protein property fields, the database accession numbers,
and short stretches of the N- and C-terminal sequences. The next section of the Protein Report contains
the textual annotations. Most of the annotations reference an article or review
in the yeast literature, although some contain references to other databases or
personal communications. After the annotations is the reference list with
titles. Every protein in the yeast proteome has a separate YPD Protein Report
in the format shown. These reports are recompiled daily from the latest
information available in YPD.
Figure
For each YPD Protein Report, hypertext links are available for immediate online
access to SGD, GenBank, MIPS, SWISS-PROT, and Entrez. The Entrez server (
3
) from NCBI provides abstracts for most references used in YPD. Furthermore,
whenever a gene or protein name occurs in an annotation or reference of a YPD
Protein Report, there is a link to the corresponding YPD Protein Report.
The original format for YPD is a spreadsheet, and this remains a popular format
for those who want to load the data into a personal computer. The YPD
spreadsheet contains one record (row) for each yeast protein of known sequence,
and it contains one column for each of the protein properties. Each spreadsheet
record has the gene name, synonym list, one-line protein description, and accession numbers to sequence databases. The
spreadsheet includes the complete amino acid composition of each protein, which
is not provided in the WWW format. The list of reference numbers for each
protein is included in the spreadsheet, and a separate file contains the full
reference list. The spreadsheet does not include the textual annotation lines.
The spreadsheet is useful for searching and sorting the yeast proteins by any of
the tabulated fields, including amino acid composition and N-terminal sequence. A complete list of the fields included in the
spreadsheet is provided in the documentation accessible from the YPD Home Page
or supplied with the spreadsheet. Release 6.0 of the spreadsheet contains the
complete proteome with 6021 entries. New versions of the spreadsheet are now
prepared weekly. Users can request the latest spreadsheet in a format
appropriate to their computer by sending Email to ypd{at}proteome.com.
YPD is also available through an Email server (see directions for access below).
This server provides YPD Protein Reports in the same format as provided on the
World Wide Web. The reports are sent automatically after a request is entered,
and numerous reports can be obtained from one request. A search form, similar
to that provided on the World Wide Web, is used to initiate searches by Email.
Release 6.0 contains 6021 entries representing the complete proteome of
Saccharomyces cerevisiae
. As of October 1996, YPD lists 2369 proteins that have been characterized
through genetics or biochemistry and 1231 proteins that have homology to
characterized proteins. The remaining 2421 proteins, or 40% of the total, have
unknown function (see Table
1
). In the current release, YPD tabulates 565 nuclear proteins, 52 cytoskeletal
proteins, 291 mitochondrial proteins, 161 transcription factors, 118 protein
kinases, 20 cyclins, 53 GTPases and many other categories relating the
function, localization, and modification of the proteins (Fig.
2
). A complete summary of the yeast proteins by these and other categories is
found under `YPD Reports by category' on the YPD home page.
Table 1
A summary of the growth of YPD since its first release in 1994 is shown in Table
1
. The number of characterized proteins has risen at a steady rate over the past
2 years. The average rate of characterization of new proteins is 36 per month.
It will be interesting to determine whether this rate increases in the next
year as the new sequences from the genome project are selected for experimental
characterization or whether the rate begins to decline because the functions of
the remaining proteins are the hardest ones to discover. The tabulation of
characterized versus uncharacterized proteins for each YPD release is presented
under `YPD contents by release number'.
YPD can be reached on the World-Wide Web through the YPD Home Page (http://www.proteome.com/YPDhome.html
). Parts of YPD have also been incorporated into the
Saccharomyces
Genome Database (SGD) (http://genome-www.stanford.edu ), and the MIPS Protein Database/Yeast Genome Database
(http://www.mips.biochem.mpg.de ). SGD is managed by J. Michael Cherry
(cherry@genome.stanford.edu ), and MIPS is managed by Werner Mewes
(mewes{at}mips.embnet.org ). The YPD Home Page and the contents of YPD are
maintained by the authors (wep@proteome.com and jg@proteome.com ).
The Email server is accessed by sending Email to yeast{at}proteome.com. YPD Protein
Reports are requested by placing one or more gene names in the subject line.
The search form is requested by placing `HELP' in the subject line.
Documentation is available by placing `DOC' in the subject line. The YPD
Protein Reports and other documents are automatically returned, with each
report in a separate Email message.
The YPD spreadsheet is available in Excel format for Macintosh or PC. It is also
available as a tab-delimited text file suitable for loading into any spreadsheet. A separate
references file and documentation file are provided. An updated spreadsheet is
automatically generated each week from the latest YPD information. The
spreadsheet can be obtained by sending Email to ypd{at}proteome.com . The Email
request should mention the type of computer (PC or Macintosh) and the file
format (Excel or tab-delimited text) desired.
We hope that users will supplement our curatorial efforts by sending us new data
submissions, additions, and corrections. We reply promptly to each message, and
make every effort to ensure completeness and correctness. The YPD curators can
be contacted by sending Email to ypd{at}proteome.com .
Authors wishing to cite YPD should use this article as a general reference for
the latest release available electronically.
We are grateful to Bruce Futcher and Gerald Latter of the Cold Spring Harbor
Laboratory for discussions that led to the YPD Project. We thank Michael
Cusick, Les Grivell, Bruno André, Jonathan Warner, Sepp Kohlwein, and Charles Cole for expert review and
assistance with portions of the database. We thank Weidong Jiang, Richard
Moerschell, and Rachelle Hecht for consulting and curatorial assistance. We
thank Michael Benoit and Irene Ong for computer assistance, Cheryl Lengieza for
management of the reference database, and Shelley Lengieza for general
assistance. We thank Werner Mewes, Karl Kleine, and their colleagues at MIPS
for their production of the annotated sequence files for each yeast chromosome,
and for frequent feedback on the YPD entries. We also thank Mike Cherry and the
staff at SGD for updates and corrections to YPD. Finally we thank all those who
participated in the yeast genome project for making this resource possible.


Release
Date
Total
Known
a
Homol
b
Unknown
c
1.2
Nov. 23, 1994
3020
1729
387
904
2.0
Dec. 8, 1994
3142
1750
450
942
3.0
Feb. 1, 1995
3512
1871
524
1117
4.0
Jun. 6, 1995
4046
1951
667
1428
4.1
Jul. 7, 1995
4305
2012
729
1564
5.0
Nov. 30, 1995
4559
2187
859
1913
6.0
Aug. 3, 1996
6021
2369
1231
2421
REFERENCES
Return
