Skip Navigation

This Article
Right arrow Abstract Freely available
Right arrow Print PDF (414K) Freely available
Right arrow Alert me when this article is cited
Right arrow Alert me if a correction is posted
Services
Right arrow Email this article to a friend
Right arrow Similar articles in this journal
Right arrow Similar articles in ISI Web of Science
Right arrow Similar articles in PubMed
Right arrow Alert me to new issues of the journal
Right arrow Add to My Personal Archive
Right arrow Download to citation manager
Right arrow Search for citing articles in:
ISI Web of Science (37)
Right arrowRequest Permissions
Right arrow Commercial Re-use Guidelines
for Open Access NAR Content
Google Scholar
Right arrow Articles by Chervitz, S. A.
Right arrow Articles by Botstein, D.
Right arrow Search for Related Content
PubMed
Right arrow PubMed Citation
Right arrow Articles by Chervitz, S. A.
Right arrow Articles by Botstein, D.
Social Bookmarking
 Add to CiteULike   Add to Connotea   Add to Del.icio.us  
What's this?

Nucleic Acids Research Pages 74-78  


Using the Saccharomyces Genome Database (SGD) for analysis of protein similarities and structure
Introduction
Examining Protein Similarities At SGD
Protein Structure At SGD
Future Directions
Citing SGD
References


Using the Saccharomyces Genome Database (SGD) for analysis of protein similarities and structure

Using the Saccharomyces Genome Database (SGD) for analysis of protein similarities and structure

Stephen A. Chervitz, Erich T. Hester, Catherine A. Ball, Kara Dolinski, Selina S. Dwight, Midori A. Harris, Gail Juvik, Alice Malekian, Shannon Roberts, TaiYun Roe, Charles Scafe, Mark Schroeder, Gavin Sherlock, Shuai Weng, Yan Zhu, J. Michael Cherry* and David Botstein

Department of Genetics, Stanford University, Stanford, CA 94305-5120, USA

Received October 1, 1998; Revised and Accepted October 8, 1998

ABSTRACT

The Saccharomyces Genome Database (SGD) collects and organizes information about the molecular biology and genetics of the yeast Saccharomyces cerevisiae. The latest protein structure and comparison tools available at SGD are presented here. With the completion of the yeast sequence and the Caenorhabditis elegans sequence soon to follow, comparison of proteins from complete eukaryotic proteomes will be an extremely powerful way to learn more about a particular protein's structure, its function, and its relationships with other proteins. SGD can be accessed through the World Wide Web at http://genome-www.stanford.edu/Saccharomyces/

INTRODUCTION

The Saccharomyces Genome Database (SGD) exists to provide the scientific community with access to the Saccharomyces cerevisiae sequence and the wealth of associated information. This database includes a variety of biological information, including the complete, annotated DNA and protein sequence along with several tools for sequence analysis. Many of these features have been recently described (1,2). Here we focus on features of SGD that provide users with tools for comparing yeast protein sequences and examining protein structure. Sequence comparisons play a critical role in the initial process of determining the function of specific proteins and also in interpreting new protein sequence data from large-scale genome sequencing projects. There are several sequence comparison tools at SGD. Here, we discuss the Genome-wide Protein Similarity View program, which is a powerful tool for examining protein similarities. Like the expanding base of sequence information, there is also a growing amount of structural information. Sacch3D is a feature of SGD that organizes and presents structural information about yeast proteins and their putative homologs. Familiarity with tools such as those described here will enable molecular biologists and geneticists to gain insight into the function and possible evolution of their protein of interest.

EXAMINING PROTEIN SIMILARITIES AT SGD

The Genome-wide Protein Similarity View (GPSV), at URL http://genome-www.stanford.edu/cgi-bin/SGD/SWA/swaEntryForm.pl , displays, either graphically or in a table, all the ORFs in the S.cerevisiae genome that are similar to a given query ORF based on a Smith-Waterman protein sequence comparison (3). Smith-Waterman comparisons were conducted on a TimeLogic DeCypher II machine using the affine Smith-Waterman application (4). This system uses the pscorer program to calculate a P-value (5). More details of the Smith-Waterman alignment are available at the GPSV help page (URL in Table 1).

Table 1. URLs mentioned in this review

The GPSV graphic view, a Java applet, represents all 16 yeast nuclear chromosomes as horizontal black bars, with centromeres and positional coordinates indicated (Fig. 1A). Superimposed on the black chromosome bars are small vertical colored bars (similarity bars) that represent ORFs predicted by the Smith-Waterman analysis to have significant protein sequence similarities. A small black rectangle surrounds the bar for the query ORF itself. Its ORF name and associated standard gene name are displayed in the upper right hand corner. The color of the bars indicates the relative similarity shared with the query ORF. The warm colors (red) indicate high similarity while the cool colors (blue) indicate lower similarity. The user can switch between different query ORFs, add ORFs to the query list, and change several parameters of the similarity display.

   A
   B

Figure 1. The Genome-Wide Protein Similarity View page. Features are discussed in detail in the text. (A) The Genome-Wide Protein Similarity View graphic view; (B) Genome-Wide Protein Similarity View parameters.

Immediately below the graphic display are seven fields that contain additional information about the query and target ORFs (Fig. 1B). The first field displays a constantly updated readout of the current location of the mouse cursor in terms of base pairs along the chromosomes and the names of genes or ORFs selected by the mouse. The remaining fields contain information when the cursor is positioned over a similarity bar. The classes of information are: (i) P-Value (the P-value for the similarity between the query ORF and the target ORF); (ii) % Aligned (the percent of the query sequence that is aligned with the target sequence); and (iii) Gaps (the number of gaps inserted in the query sequence to achieve the alignment).

Each similarity bar can be clicked to reach more information about the target ORF. Options include links to the SGD Locus and Gene/Sequence Resources pages for the target ORF, an alignment of the query and target amino acid sequences, the DNA Similarity View, which displays the alignment of the target and query ORF DNA sequences, and the Protein Similarity View, where the selected target ORF is used as the query ORF.

The protein similarity data can also be displayed as a table, which can be accessed from the graphic display page or from the ORF input form. The table lists target ORFs in order of decreasing similarity to the query ORF, as determined by P-value; the target ORFs can also be sorted by percent identity. For each target ORF, the table lists the same information and links as the graphic display.

PROTEIN STRUCTURE AT SGD

The Sacch3D feature at SGD (at URL http://genome-www.stanford.edu/Sacch3D/ ) provides structural information for S.cerevisiae proteins by integrating data from SGD and structural databases and presenting it via up-to-date, concise summaries and links to structural resources. Sacch3D supplies researchers both within and outside the yeast community with insight into the structure and putative function of yeast proteins. Structural information for Sacch3D is obtained primarily by BLASTP analysis (6,7) of the Brookhaven Protein Database (PDB) (8,9) to identify all PDB structures with significant sequence (and therefore likely structural) similarity to yeast proteins. Results are updated monthly to keep pace with the growth of the PDB. To reduce the redundancy in the PDB and thus simplify the BLAST analysis, all PDB protein sequences are first clustered into groups of closely related sequences (see the on-line help at the Sacch3D website for details; Table 1) before the BLAST is run. As of September 1998, 18% of yeast proteins have either a known structure or putative homolog in a clustered version of the PDB (Table 2). The Sacch3D search utility provides a structural information page for all ORFs in the yeast genome (example shown in Fig. 2). This page contains information provided by both internal and external resources. A summary table is presented showing PDB structures for the yeast protein (if a structure can be identified) and proteins with which it shares significant sequence similarity. For each structure, there are links to a variety of freely available 3D viewers (Fig. 3) and external structural databases. 3D viewers include RasMol (10), Webmol Java viewer (11), Chime (MDL Information Systems, www.mdli.com) and Cn3D (12). External structural databases include PDB (8,9), SCOP (13,14), CATH (15), PDBsum (16), ModBase (17), Macromolecular Movements Database (18) and MMDB (19). The PDB similarities are listed from best to worst (based on BLASTP P-value) and are clustered to facilitate browsing. That is, one representative structure is listed in cases where there are multiple variants of the same structure (mutants or complex forms). Access to the neighboring structures is also provided.


Figure 2. Structural information page at Sacch3D. Structural information page for yeast GTP-binding protein GPA2/YER020W.

   A
   B

Figure 3. (A) A screen shot of the Java Viewer (11) showing the structure of a human single-stranded DNA-binding domain of the RPA70 subunit bound to ssDNA (PDB structure 1JMC; yeast homolog RFA1/YAR009C). Yellow, region of similarity to yeast protein; white, ssDNA; gray, region without similarity to yeast protein; red, disulfide bond. (B) A screen shot of the RasMol viewer (10) showing the human TATA-binding protein complex with TATA element DNA (PDB structure 1TGH; yeast homolog SPT15/YER148W). Gold, [beta] sheets; red, [alpha] helices; thin white ribbon, coil; wide gray ribbon, DNA.

For yeast proteins without a known structure but with significant sequence similarity to proteins with structures contained within PDB, links are available on the structural information page for homology-based models of the yeast protein structure. These models are accessed by links to the external resources ModBase (17) and Swiss-Model (20). Even for yeast proteins that lack significant similarities in the PDB, a variety of useful links are presented. These include links to secondary structure predictions, several pre-computed BLAST reports, and the Emotif (21) and Pfam (22) sequence search programs. Links to Swiss-Prot, Entrez and the NCBI COGs site for the yeast protein are also included.

Table 2. Sacch3D summary statistics for protein-encoding ORFs
  N Percent
Total yeast proteins 6215 100
Identified genetically 3086 50
With known 3D structure(s)a 52 0.8
With PDB homolog(s) 1110 18
With PDB homolog and identified genetically 848 13
Data in this table are current as of the 2 September 1998 release of the PDB.
aYeast proteins with known structures are current as of the 27 May 1998 release of the PDB.

Other Sacch3D features include: (i) flexible search options using yeast gene or ORF name, PDB identifier, Swiss-Prot identifier/accession, or text; (ii) a special page devoted to S.cerevisiae structures in PDB showing the number of different structures for each yeast protein with links to SGD and Sacch3D; (iii) a structural URLs page. Sacch3D maintains this list of URLs for web sites relevant to the analysis of protein structure and/or function, including links to structural biology resources, 3D viewers, genome-analysis web sites, journals and research groups; (iv) a domains page providing access to yeast proteins based on their SCOP-classified domains. Users can search for domains using a yeast gene/ORF name, a SCOP class number, or a SCOP fold number. Links are also provided to the WebMol Java viewer (11) to illustrate the location of the domain within the context of the 3D structure; (v) an analysis page that performs an electronic version of a Southern blot using the yeast genomic sequence; (vi) a What's New page that lists new features in Sacch3D as well as new yeast protein structures and new protein structures with homology to yeast proteins.

FUTURE DIRECTIONS

Protein sequence analysis and structure prediction will continue to be updated. However, in the future the major new analysis features of SGD will be associated with the many types of functional genomic results that are beginning to be released.

CITING SGD

When referring to Sacch3D and Genome-wide Similarity View at SGD, please cite this publication.

REFERENCES

1. Cherry,J.M., Adler,C., Ball,C., Chervitz,S.A., Dwight,S.S., Hester,E.T., Jia,Y., Juvik,G., Roe,T., Schroeder,M., Weng,S. and Botstein,D. (1998) Nucleic Acids Res., 26, 73-79. MEDLINE Abstract

2. Dolinski,K., Ball,C., Chervitz,S.A., Dwight,S.S., Harris,M., Roberts,S., Roe,T., Cherry,J.M. and Botstein,D. (1998) Yeast, 14, in press.

3. Smith,T.F. and Waterman,M.S. (1981) J. Mol. Biol., 147, 195-197. MEDLINE Abstract

4. TimeLogic: http://www.timelogic.com

5. Gish,W. and Altschul,S.F. (1996) Methods Enzymol., 266, 460-490.

6. Altschul,S.F., Gish,W., Miller,W., Myers,E.W. and Lipman,D.J. (1990)J. Mol. Biol., 215, 403-410. MEDLINE Abstract

7. Altschul,S.F., Madden,T.L., Schaffer,A.A., Zhang,J., Zhang,Z., Miller,W. and Lipman,D.J. (1997) Nucleic Acids Res., 25, 3389-3402. MEDLINE Abstract

8. Abola,E.E., Sussman,J.L., Prilusky,J. and Manning,N.O. (1997) Methods Enzymol., 277, 556-571.

9. Bernstein,F.C., Koetzle,T.F., Williams,G.J.B., Meyer,E.F.,Jr, Brice,M.D., Rodgers,J.R., Kennard,O., Shimanouchi,T. and Tasumi,M. (1977) J. Mol. Biol., 112, 535-542. MEDLINE Abstract

10. Sayle,R.A. and Milner-White,E.J. (1995) Trends Biochem. Sci., 9, 374. MEDLINE Abstract

11. Walther,D. (1997) Trends Biochem. Sci., 22, 274-275. MEDLINE Abstract

12. Hogue,C.W. (1997) Trends Biochem. Sci., 22, 314-316. MEDLINE Abstract

13. Murzin,A.G., Brenner,S.E., Hubbard,T. and Chothia,C. (1995) J. Mol. Biol., 247, 536-540. MEDLINE Abstract

14. Hubbard,T.J.P., Murzin,A.G., Brenner,S.E. and Chothia,C. (1997) Nucleic Acids Res., 25, 236-239. MEDLINE Abstract

15. Orengo,C.A., Michie,A.D., Jones,S., Jones,D.T., Swindells,M.B. and Thornton,J.M. (1997) Structure, 5, 1093-1108. MEDLINE Abstract

16. Laskowski,R.A., Hutchinson,E.G., Michie,A.D., Wallace,A.C., Jones,M.L. and Thornton,J.M. (1997) Trends Biochem. Sci., 22, 488-490. MEDLINE Abstract

17. Sánchez,R. and Sali,A. (1997) Proteins, Suppl. 1, 50-58. MEDLINE Abstract

18. Gerstein,M. and Krebs,W. (1998) Nucleic Acids Res., 26, 4280-4290.

19. Ohkawa,H., Ostell,J. and Bryant,S. (1995) ISMB, 3, 259-267. MEDLINE Abstract

20. Peitsch,M.C. (1996) Biochem. Soc. Trans., 24, 274-279. MEDLINE Abstract

21. Nevill-Manning,C.G., Wu,T.D. and Brutlag,D.L. (1998) Proc. Natl Acad. Sci. USA, 95, 5865-5871.

22. Sonnhammer,E.L., Eddy,S.R., Birney,E., Bateman,A. and Durbin,R. (1998) Nucleic Acids Res., 26, 320-322. MEDLINE Abstract


*To whom correspondence should be addressed. Tel: +1 650 723 7541; Fax: +1 650 723 7016; Email: cherry@genome.stanford.edu


This page is run by Oxford University Press, Great Clarendon Street, Oxford OX2 6DP, as part of the OUP Journals
Comments and feedback: www-admin{at}oup.co.uk
Last modification: 9 Dec 1998
Copyright©Oxford University Press, 1998.

Add to CiteULike CiteULike   Add to Connotea Connotea   Add to Del.icio.us Del.icio.us    What's this?


This article has been cited by other articles:


Home page
Mol. Biol. CellHome page
J. Parenteau, M. Durand, S. Veronneau, A.-A. Lacombe, G. Morin, V. Guerin, B. Cecez, J. Gervais-Bird, C.-S. Koh, D. Brunelle, et al.
Deletion of Many Yeast Introns Reveals a Minority of Genes that Require Splicing for Function
Mol. Biol. Cell, May 1, 2008; 19(5): 1932 - 1941.
[Abstract] [Full Text] [PDF]


Home page
Nucleic Acids ResHome page
J. H. Graber, G. D. McAllister, and T. F. Smith
Probabilistic prediction of Saccharomyces cerevisiae mRNA 3'-processing sites
Nucleic Acids Res., April 15, 2002; 30(8): 1851 - 1858.
[Abstract] [Full Text] [PDF]


Home page
Nucleic Acids ResHome page
S. S. Dwight, M. A. Harris, K. Dolinski, C. A. Ball, G. Binkley, K. R. Christie, D. G. Fisk, L. Issel-Tarver, M. Schroeder, G. Sherlock, et al.
Saccharomyces Genome Database (SGD) provides secondary gene annotation using the Gene Ontology (GO)
Nucleic Acids Res., January 1, 2002; 30(1): 69 - 72.
[Abstract] [Full Text] [PDF]


Home page
Genome ResHome page
Y. Pouliot, J. Gao, Q. J. Su, G. G. Liu, and X. B. Ling
DIAN: A Novel Algorithm for Genome Ontological Classification
Genome Res., October 1, 2001; 11(10): 1766 - 1779.
[Abstract] [Full Text] [PDF]


Home page
Nucleic Acids ResHome page
C. A. Ball, H. Jin, G. Sherlock, S. Weng, J. C. Matese, R. Andrada, G. Binkley, K. Dolinski, S. S. Dwight, M. A. Harris, et al.
Saccharomyces Genome Database provides tools to survey gene expression and functional analysis data
Nucleic Acids Res., January 1, 2001; 29(1): 80 - 81.
[Abstract] [Full Text] [PDF]


Home page
Arch NeurolHome page
M. W. Walberg
Applicability of Yeast Genetics to Neurologic Disease
Arch Neurol, August 1, 2000; 57(8): 1129 - 1134.
[Abstract] [Full Text] [PDF]


Home page
Nucleic Acids ResHome page
C. A. Davis, L. Grate, M. Spingola, and M. Ares Jr
Test of intron predictions reveals novel splice sites, alternatively spliced mRNAs and new introns in meiotically regulated genes of yeast
Nucleic Acids Res., April 15, 2000; 28(8): 1700 - 1706.
[Abstract] [Full Text] [PDF]


Home page
J. Biol. Chem.Home page
J. Urano, A. P. Tabancay, W. Yang, and F. Tamanoi
The Saccharomyces cerevisiae Rheb G-protein Is Involved in Regulating Canavanine Resistance and Arginine Uptake
J. Biol. Chem., April 6, 2000; 275(15): 11198 - 11206.
[Abstract] [Full Text] [PDF]


Home page
Proc. Natl. Acad. Sci. USAHome page
T. Ito, K. Tashiro, S. Muta, R. Ozawa, T. Chiba, M. Nishizawa, K. Yamamoto, S. Kuhara, and Y. Sakaki
From the Cover: Toward a protein-protein interaction map of the budding yeast: A comprehensive system to examine two-hybrid interactions in all possible combinations between the yeast proteins
PNAS, February 1, 2000; 97(3): 1143 - 1147.
[Abstract] [Full Text] [PDF]


Home page
Nucleic Acids ResHome page
C. A. Ball, K. Dolinski, S. S. Dwight, M. A. Harris, L. Issel-Tarver, A. Kasarskis, C. R. Scafe, G. Sherlock, G. Binkley, H. Jin, et al.
Integrating functional genomic information into the Saccharomyces Genome Database
Nucleic Acids Res., January 1, 2000; 28(1): 77 - 80.
[Abstract] [Full Text] [PDF]


Home page
Nucleic Acids ResHome page
A. Kumar, K.-H. Cheung, P. Ross-Macdonald, P. S. R. Coelho, P. Miller, and M. Snyder
TRIPLES: a database of gene function in Saccharomyces cerevisiae
Nucleic Acids Res., January 1, 2000; 28(1): 81 - 84.
[Abstract] [Full Text] [PDF]


Home page
Nucleic Acids ResHome page
M. Ringwald, J. T. Eppig, J. A. Kadin, J. E. Richardson, and the Gene Expression Database Group
GXD: a Gene Expression Database for the laboratory mouse: current status and recent enhancements
Nucleic Acids Res., January 1, 2000; 28(1): 115 - 119.
[Abstract] [Full Text] [PDF]


Home page
Nucleic Acids ResHome page
R. Ploger, J. Zhang, D. Bassett, R. Reeves, P. Hieter, M. Boguski, and F. Spencer
XREFdb: cross-referencing the genetics and genes of mammals and model organisms
Nucleic Acids Res., January 1, 2000; 28(1): 120 - 122.
[Abstract] [Full Text] [PDF]


Home page
Nucleic Acids ResHome page
M. Attimonelli, N. Altamura, R. Benne, A. Brennicke, J. M. Cooper, D. D'Elia, A. d. Montalvo, B. d. Pinto, M. De Robertis, P. Golik, et al.
MitBASE : a comprehensive and integrated mitochondrial DNA database. The present status
Nucleic Acids Res., January 1, 2000; 28(1): 148 - 152.
[Abstract] [Full Text] [PDF]


Home page
Nucleic Acids ResHome page
R. Sanchez, U. Pieper, N. Mirkovi, P. I. W. de Bakker, E. Wittenstein, and A. ali
MODBASE, a database of annotated comparative protein structure models
Nucleic Acids Res., January 1, 2000; 28(1): 250 - 253.
[Abstract] [Full Text] [PDF]


Home page
J. Biol. Chem.Home page
B. J. Brown, J.-W. Hyun, S. Duvvuri, P. A. Karplus, and V. Massey
The Role of Glutamine 114 in Old Yellow Enzyme
J. Biol. Chem., January 11, 2002; 277(3): 2138 - 2145.
[Abstract] [Full Text] [PDF]


This Article
Right arrow Abstract Freely available
Right arrow Print PDF (414K) Freely available
Right arrow Alert me when this article is cited
Right arrow Alert me if a correction is posted
Services
Right arrow Email this article to a friend
Right arrow Similar articles in this journal
Right arrow Similar articles in ISI Web of Science
Right arrow Similar articles in PubMed
Right arrow Alert me to new issues of the journal
Right arrow Add to My Personal Archive
Right arrow Download to citation manager
Right arrow Search for citing articles in:
ISI Web of Science (37)
Right arrowRequest Permissions
Right arrow Commercial Re-use Guidelines
for Open Access NAR Content
Google Scholar
Right arrow Articles by Chervitz, S. A.
Right arrow Articles by Botstein, D.
Right arrow Search for Related Content
PubMed
Right arrow PubMed Citation
Right arrow Articles by Chervitz, S. A.
Right arrow Articles by Botstein, D.
Social Bookmarking
 Add to CiteULike   Add to Connotea   Add to Del.icio.us  
What's this?