Skip Navigation

This Article
Right arrow Abstract Freely available
Right arrow Print PDF (460K) Freely available
Right arrow Alert me when this article is cited
Right arrow Alert me if a correction is posted
Services
Right arrow Email this article to a friend
Right arrow Similar articles in this journal
Right arrow Similar articles in ISI Web of Science
Right arrow Similar articles in PubMed
Right arrow Alert me to new issues of the journal
Right arrow Add to My Personal Archive
Right arrow Download to citation manager
Right arrow Search for citing articles in:
ISI Web of Science (41)
Right arrowRequest Permissions
Right arrow Commercial Re-use Guidelines
for Open Access NAR Content
Google Scholar
Right arrow Articles by Payne, W. E.
Right arrow Articles by Garrels, J. I.
Right arrow Search for Related Content
PubMed
Right arrow PubMed Citation
Right arrow Articles by Payne, W. E.
Right arrow Articles by Garrels, J. I.
Social Bookmarking
 Add to CiteULike   Add to Connotea   Add to Del.icio.us  
What's this?

© 1997 Oxford University Press 57-62

Footnote

Yeast Protein Database (YPD): a database for the complete proteome of Saccharomyces cerevisiae

Yeast Protein Database (YPD): a database for the complete proteome of Saccharomyces cerevisiae William E. Payne and James I. Garrels*

Proteome, Inc., 200 Cummings Center, Suite 425C, Beverly , MA 01915, USA

Received October 16, 1996; Accepted October 21, 1996

ABSTRACT

The Yeast Protein Database (YPD) is a database for the proteins of the budding yeast, Saccharomyces cerevisiae . YPD is the first annotated database for the complete proteome of any organism. Now that the complete genome sequence of yeast is available, YPD contains entries for each of the characterized proteins and for each of the uncharacterized proteins predicted from the sequence. Contained in YPD are the calculated properties of each protein such as molecular weight and isoelectric point, experimentally determined properties such as subcellular localization and post-translational modifications, and extensive annotations from the yeast literature. YPD contains 25 000 lines of textual annotation that describe the known functions, mutant phenotypes, interactions, and other properties for the approximately 6000 proteins in the yeast proteome. The information in YPD is updated daily, and it is available on the World Wide Web at http://www.proteome.com/YPDhome.html .

INTRODUCTION

The Yeast Protein Database (YPD) is the first database to describe the complete proteome of an organism. YPD combines analysis of the genome sequence with an extensive review of the literature to provide a comprehensive description of the proteins of Saccharomyces cerevisiae . Each yeast protein, whether characterized experimentally or known only as an ORF (open reading frame) identified by the genome project, has an entry in YPD. Some information in YPD, such as molecular weight and isoelectric point, is calculated from the sequence. Most of the information, including subcellular localization, post-translational modifications, and information on function is derived from the yeast literature. Since our previous description of YPD ( 1 ), the yeast genome sequence has been completed ( 2 ). YPD is now a database that encompasses the entire yeast proteome.

The most rapidly growing part of YPD is the textual annotations drawn from our review of the yeast literature. Approximately 25 000 lines of text now describe the functions, mutant phenotypes, physical interactions, domain structures, similarities to other proteins, and known modes of regulation for each of the characterized proteins. Annotations are drawn from ~3500 yeast papers and abstracts, and >8700 yeast papers are cited in the reference lists.

YPD complements the sequence databases GenBank, PIR-International and SWISS-PROT. YPD interacts heavily with the major genome databases for Saccharomyces cerevisiae , Martinsreid Institute for Protein Sequences (MIPS) and Saccharomyces Genome Database (SGD). MIPS, the informatics coordination site for the European Genome Project, maintains a Yeast Genome Access site with annotated chromosome files, browsers, and analysis tools. SGD, the Saccharomyces Genome Database, is maintained at Stanford University as a major resource for yeast genomic and biological information. YPD uses SGD as the authority for genetic name assignment and MIPS as the source for the systematic gene name assignments. Both SGD and MIPS have been used as sources of information about many of the new proteins, and both MIPS and SGD incorporate parts of the YPD database. Links have been created in all three databases for cross-referencing to the others .

THE YPD HOME PAGE

The YPD Home Page on the World-Wide Web, shown in Figure 1 , provides introductory material on YPD, a News and Comments section, YPD documentation, and entry points for access to the YPD Protein Reports. Each YPD Protein Report is presented as a single WWW page, and there is a report for each of the yeast proteins. Users can access the reports through the search forms (see below) or through precompiled indexes. The `New and updated YPD reports by week' section presents an index to the newly characterized proteins and those with updated entries for each recent week. The `YPD reports by category' section presents precompiled indexes to the proteins in each of the categories used in YPD, including localization categories, protein modification categories, and functional categories (see Fig. 2 ). The `YPD reports of special interest' section presents precompiled indexes of special interest including proteins that contain a variety of sequence motifs.


Figure 1 . The YPD Home Page. World-Wide Web users can access the page at http://www.proteome.com/YPDhome.html . The YPD Home Page can be used to access YPD in any of its formats, to display summary information about YPD, to obtain YPD documentation, or to reach other yeast databases.


Figure 2 . Contents of YPD by Category. Each of the protein categories used in YPD is tabulated with the number of proteins in each category. Each of the category lines is a link to a list of all the proteins in the category, which is in turn linked to the individual protein reports.

Some statistics on the proteins coded by the yeast genome are also available through the YPD Home Page. The `YPD reports by category' section, mentioned above, contains a table (Fig. 2 ) listing the number of proteins in each of the YPD categories. The number of known transcription factors, proteins kinases, etc. is always available from this table, which is updated daily. The table also gives the number of proteins currently known through experimental characterization, the number known through similarity to characterized proteins, and the number of uncharacterized proteins. A summary of characterized versus uncharacterized proteins for each YPD release is presented under `YPD contents by release number'.

Graphic summaries of the YPD database are presented under `Theoretical 2D Gel Plots' and `Codon Bias Histograms', which are accessible from the YPD Home Page. The first presentation shows where each of the proteins is predicted to fall on a coordinate system of molecular weight versus isoelectric point, and the size of the data point is related to the codon bias. These plots, presented for the proteins of various subcellular fractions, approximate the distribution of spot positions and spot sizes observed on a 2D gel. The codon bias histograms show the distribution of codon bias data for proteins of various categories. For yeast proteins, codon bias is well-correlated with potential protein abundance.

THE YPD SEARCH FORM

The YPD Protein Reports can be selected from one of the YPD Search Forms. Users can select proteins by gene or synonym name, by keywords, or by any of the protein property categories. Synonyms can include any name used for a yeast gene including temporary names used in the sequencing project and final systematic gene names. Keywords can be any word used in the annotations or references. The short search form is most convenient for searches based on gene names, synonyms and keywords only. The long search form (partially shown in Fig. 3 ) adds the ability to select from the numerous protein property categories. Both search forms allow use of the Boolean operators `AND' and `OR' for construction of queries based on multiple criteria. The result of each search is a page containing a synopsis of the search strategy, and a list of the protein `hits' by gene name, synonyms, and the protein name/description field. Clicking any protein in the `hit' list brings up the corresponding YPD Protein Report.


Figure 3 . The YPD Search Form. This form allows YPD Protein Reports to be selected by gene name, by keywords, or by protein categories. The category selections fall into 14 groups, not all of which are shown here.

THE YPD PROTEIN REPORTS

An example YPD Protein Report is shown in Figure 4 . Each report begins with a protein name/description field which is designed to be an informative one-line description of the protein. Next are the fields for the gene names used in YPD and other databases, and a full list of synonyms. The gene names used in YPD follow those determined by the Saccharomyces Genome Database (SGD) in nearly all cases. Also included in the upper section of the report are the protein property fields, the database accession numbers, and short stretches of the N- and C-terminal sequences. The next section of the Protein Report contains the textual annotations. Most of the annotations reference an article or review in the yeast literature, although some contain references to other databases or personal communications. After the annotations is the reference list with titles. Every protein in the yeast proteome has a separate YPD Protein Report in the format shown. These reports are recompiled daily from the latest information available in YPD.


Figure 4 . A sample YPD Protein Report. The top field is the protein name/description field from the spreadsheet. The following fields present many of the protein properties from the YPD spreadsheet. After the property fields, annotations from the literature and the list of references are given.

For each YPD Protein Report, hypertext links are available for immediate online access to SGD, GenBank, MIPS, SWISS-PROT, and Entrez. The Entrez server ( 3 ) from NCBI provides abstracts for most references used in YPD. Furthermore, whenever a gene or protein name occurs in an annotation or reference of a YPD Protein Report, there is a link to the corresponding YPD Protein Report.

OTHER YPD FORMATS

The original format for YPD is a spreadsheet, and this remains a popular format for those who want to load the data into a personal computer. The YPD spreadsheet contains one record (row) for each yeast protein of known sequence, and it contains one column for each of the protein properties. Each spreadsheet record has the gene name, synonym list, one-line protein description, and accession numbers to sequence databases. The spreadsheet includes the complete amino acid composition of each protein, which is not provided in the WWW format. The list of reference numbers for each protein is included in the spreadsheet, and a separate file contains the full reference list. The spreadsheet does not include the textual annotation lines.

The spreadsheet is useful for searching and sorting the yeast proteins by any of the tabulated fields, including amino acid composition and N-terminal sequence. A complete list of the fields included in the spreadsheet is provided in the documentation accessible from the YPD Home Page or supplied with the spreadsheet. Release 6.0 of the spreadsheet contains the complete proteome with 6021 entries. New versions of the spreadsheet are now prepared weekly. Users can request the latest spreadsheet in a format appropriate to their computer by sending Email to ypd{at}proteome.com.

YPD is also available through an Email server (see directions for access below). This server provides YPD Protein Reports in the same format as provided on the World Wide Web. The reports are sent automatically after a request is entered, and numerous reports can be obtained from one request. A search form, similar to that provided on the World Wide Web, is used to initiate searches by Email.

CONTENTS OF THE CURRENT RELEASE

Release 6.0 contains 6021 entries representing the complete proteome of Saccharomyces cerevisiae . As of October 1996, YPD lists 2369 proteins that have been characterized through genetics or biochemistry and 1231 proteins that have homology to characterized proteins. The remaining 2421 proteins, or 40% of the total, have unknown function (see Table 1 ). In the current release, YPD tabulates 565 nuclear proteins, 52 cytoskeletal proteins, 291 mitochondrial proteins, 161 transcription factors, 118 protein kinases, 20 cyclins, 53 GTPases and many other categories relating the function, localization, and modification of the proteins (Fig. 2 ). A complete summary of the yeast proteins by these and other categories is found under `YPD Reports by category' on the YPD home page.

Table 1 . YPD contents versus release number
Release

Date

Total

Known a

Homol b

Unknown c

1.2

Nov. 23, 1994

3020

1729

387

904

2.0

Dec. 8, 1994

3142

1750

450

942

3.0

Feb. 1, 1995

3512

1871

524

1117

4.0

Jun. 6, 1995

4046

1951

667

1428

4.1

Jul. 7, 1995

4305

2012

729

1564

5.0

Nov. 30, 1995

4559

2187

859

1913

6.0

Aug. 3, 1996

6021

2369

1231

2421

a Proteins characterized through genetic or biochemical experiments. b Proteins that have not been characterized but have sequence similarity to characterized proteins. c Proteins of unknown function.

A summary of the growth of YPD since its first release in 1994 is shown in Table 1 . The number of characterized proteins has risen at a steady rate over the past 2 years. The average rate of characterization of new proteins is 36 per month. It will be interesting to determine whether this rate increases in the next year as the new sequences from the genome project are selected for experimental characterization or whether the rate begins to decline because the functions of the remaining proteins are the hardest ones to discover. The tabulation of characterized versus uncharacterized proteins for each YPD release is presented under `YPD contents by release number'.

HOW TO ACCESS YPD

YPD can be reached on the World-Wide Web through the YPD Home Page (http://www.proteome.com/YPDhome.html ). Parts of YPD have also been incorporated into the Saccharomyces Genome Database (SGD) (http://genome-www.stanford.edu ), and the MIPS Protein Database/Yeast Genome Database (http://www.mips.biochem.mpg.de ). SGD is managed by J. Michael Cherry (cherry@genome.stanford.edu ), and MIPS is managed by Werner Mewes (mewes{at}mips.embnet.org ). The YPD Home Page and the contents of YPD are maintained by the authors (wep@proteome.com and jg@proteome.com ).

The Email server is accessed by sending Email to yeast{at}proteome.com. YPD Protein Reports are requested by placing one or more gene names in the subject line. The search form is requested by placing `HELP' in the subject line. Documentation is available by placing `DOC' in the subject line. The YPD Protein Reports and other documents are automatically returned, with each report in a separate Email message.

The YPD spreadsheet is available in Excel format for Macintosh or PC. It is also available as a tab-delimited text file suitable for loading into any spreadsheet. A separate references file and documentation file are provided. An updated spreadsheet is automatically generated each week from the latest YPD information. The spreadsheet can be obtained by sending Email to ypd{at}proteome.com . The Email request should mention the type of computer (PC or Macintosh) and the file format (Excel or tab-delimited text) desired.

HOW TO SUBMIT PROTEIN DATA TO YPD

We hope that users will supplement our curatorial efforts by sending us new data submissions, additions, and corrections. We reply promptly to each message, and make every effort to ensure completeness and correctness. The YPD curators can be contacted by sending Email to ypd{at}proteome.com .

CITING YPD

Authors wishing to cite YPD should use this article as a general reference for the latest release available electronically.

ACKNOWLEDGEMENTS

We are grateful to Bruce Futcher and Gerald Latter of the Cold Spring Harbor Laboratory for discussions that led to the YPD Project. We thank Michael Cusick, Les Grivell, Bruno André, Jonathan Warner, Sepp Kohlwein, and Charles Cole for expert review and assistance with portions of the database. We thank Weidong Jiang, Richard Moerschell, and Rachelle Hecht for consulting and curatorial assistance. We thank Michael Benoit and Irene Ong for computer assistance, Cheryl Lengieza for management of the reference database, and Shelley Lengieza for general assistance. We thank Werner Mewes, Karl Kleine, and their colleagues at MIPS for their production of the annotated sequence files for each yeast chromosome, and for frequent feedback on the YPD entries. We also thank Mike Cherry and the staff at SGD for updates and corrections to YPD. Finally we thank all those who participated in the yeast genome project for making this resource possible.

REFERENCES

1 Garrels, J.I. (1996) Nucleic Acids Res., 24, 46-49.

2 Johnston, M, (1996) Curr. Biol., 6, 500-502.

3 Schuler, G.D., Epstein, J.A., Ohkawa, H., and Kans, J.A. Methods Enzymol. In press.


Return

* To whom correspondence should be addressed. Tel: +1 508 922 1643; Fax: +1 508 922 3971; Email: jg@proteome.com
Add to CiteULike CiteULike   Add to Connotea Connotea   Add to Del.icio.us Del.icio.us    What's this?


This article has been cited by other articles:


Home page
Biophys. JHome page
A. V. Ratushny, S. A. Ramsey, O. Roda, Y. Wan, J. J. Smith, and J. D. Aitchison
Control of Transcriptional Variability by Overlapping Feed-Forward Regulatory Motifs
Biophys. J., October 15, 2008; 95(8): 3715 - 3723.
[Abstract] [Full Text] [PDF]


Home page
BioinformaticsHome page
S. E. Ahnert, K. Willbrand, F. C. S. Brown, and T. M. A. Fink
Unbiased pattern detection in microarray data series
Bioinformatics, June 15, 2006; 22(12): 1471 - 1476.
[Abstract] [Full Text] [PDF]


Home page
BioinformaticsHome page
K. Willbrand, F. Radvanyi, J.-P. Nadal, J.-P. Thiery, and T. M. A. Fink
Identifying genes from up-down properties of microarray expression series
Bioinformatics, October 15, 2005; 21(20): 3859 - 3864.
[Abstract] [Full Text] [PDF]


Home page
Mol. Biol. CellHome page
M. U. Keller-Seitz, U. Certa, C. Sengstag, F. E. Wurgler, M. Sun, and M. Fasullo
Transcriptional Response of Yeast to Aflatoxin B1: Recombinational Repair Involving RAD51 and RAD1
Mol. Biol. Cell, September 1, 2004; 15(9): 4321 - 4336.
[Abstract] [Full Text] [PDF]


Home page
Nucleic Acids ResHome page
S. Komatsu, K. Kojima, K. Suzuki, K. Ozaki, and K. Higo
Rice Proteome Database based on two-dimensional polyacrylamide gel electrophoresis: its status in 2003
Nucleic Acids Res., January 1, 2004; 32(90001): D388 - 392.
[Abstract] [Full Text] [PDF]


Home page
Nucleic Acids ResHome page
C. Hoogland, J.-C. Sanchez, L. Tonella, P.-A. Binz, A. Bairoch, D. F. Hochstrasser, and R. D. Appel
The 1999 SWISS-2DPAGE database update
Nucleic Acids Res., January 1, 2000; 28(1): 286 - 288.
[Abstract] [Full Text] [PDF]


Home page
J. Bacteriol.Home page
H. Shimoi, H. Kitagaki, H. Ohmori, Y. Iimura, and K. Ito
Sed1p Is a Major Cell Wall Protein of Saccharomyces cerevisiae in the Stationary Phase and Is Involved in Lytic Enzyme Resistance
J. Bacteriol., July 1, 1998; 180(13): 3381 - 3387.
[Abstract] [Full Text]


This Article
Right arrow Abstract Freely available
Right arrow Print PDF (460K) Freely available
Right arrow Alert me when this article is cited
Right arrow Alert me if a correction is posted
Services
Right arrow Email this article to a friend
Right arrow Similar articles in this journal
Right arrow Similar articles in ISI Web of Science
Right arrow Similar articles in PubMed
Right arrow Alert me to new issues of the journal
Right arrow Add to My Personal Archive
Right arrow Download to citation manager
Right arrow Search for citing articles in:
ISI Web of Science (41)
Right arrowRequest Permissions
Right arrow Commercial Re-use Guidelines
for Open Access NAR Content
Google Scholar
Right arrow Articles by Payne, W. E.
Right arrow Articles by Garrels, J. I.
Right arrow Search for Related Content
PubMed
Right arrow PubMed Citation
Right arrow Articles by Payne, W. E.
Right arrow Articles by Garrels, J. I.
Social Bookmarking
 Add to CiteULike   Add to Connotea   Add to Del.icio.us  
What's this?