Skip Navigation

This Article
Right arrow Abstract Freely available
Right arrow Print PDF (1005K) Freely available
Right arrow Alert me when this article is cited
Right arrow Alert me if a correction is posted
Services
Right arrow Email this article to a friend
Right arrow Similar articles in this journal
Right arrow Similar articles in ISI Web of Science
Right arrow Similar articles in PubMed
Right arrow Alert me to new issues of the journal
Right arrow Add to My Personal Archive
Right arrow Download to citation manager
Right arrow Search for citing articles in:
ISI Web of Science (41)
Right arrowRequest Permissions
Right arrow Commercial Re-use Guidelines
for Open Access NAR Content
Google Scholar
Right arrow Articles by Hawkins, V.
Right arrow Articles by Nelson, P. S.
Right arrow Search for Related Content
PubMed
Right arrow PubMed Citation
Right arrow Articles by Hawkins, V.
Right arrow Articles by Nelson, P. S.
Social Bookmarking
 Add to CiteULike   Add to Connotea   Add to Del.icio.us  
What's this?

Nucleic Acids Research Pages 204-208  


PEDB: the Prostate Expression Database
Introduction
PEDB Data
   Sequence processing and curation
   EST assembly and clustering
   Annotation
Queries, Visualization And Analysis Tools
   Sequence homology searching
   Virtual expression analysis
Summary And Future Development
Acknowledgements
References


PEDB: the Prostate Expression Database

PEDB: the Prostate Expression Database

Victoria Hawkins1, David Doll1, Roger Bumgarner1, Todd Smith1,+, Chris Abajian1,+, Leroy Hood1 and Peter S. Nelson1,2,*

1Department of Molecular Biotechnology and 2Department of Medicine, University of Washington, Seattle, WA 98195, USA

Received September 1, 1998; Revised October 20, 1998; Accepted November 9, 1998

ABSTRACT

The Prostate Expression Database (PEDB) is a curated relational database and suite of analysis tools designed for the study of prostate gene expression in normal and disease states. Expressed Sequence Tags (ESTs) and full-length cDNA sequences derived from more than 40 human prostate cDNA libraries are maintained and represent a wide spectrum of normal and pathological conditions. Detailed library information including tissue source, library construction methods, sequence diversity and abundance are available in a library archive. Prostate ESTs are assembled into distinct species groups using the multiple alignment program CAP2 and are annotated with information from the GenBank, dbEST and Unigene public sequence databases. Annotated sequences in PEDB are searched using the BLAST algorithm. The differential expression of each EST species can be viewed across all libraries using a Virtual Expression Analysis Tool (VEAT), a graphical user interface written in Java for intra- and inter-library species comparisons. PEDB may be accessed via the World Wide Web at http://www.mbt.washington.edu/PEDB/

INTRODUCTION

Diseases of the prostate including prostate carcinoma and benign prostatic hypertrophy affect millions of individuals worldwide and contribute to significant morbidity and mortality. Estimates from the American Cancer Society indicate that more than 180 000 American men will be diagnosed with prostate cancer, and 39 000 will die of this malignancy in 1998 (1). Growing evidence implicates molecular genetic alterations in the development and progression of prostate cancer (2,3). Altered expression levels for individual or cohorts of genes may thus serve as leads for functional studies, diagnostics, therapeutic targets and potentially as predictors of clinical behavior.

The human genome is estimated to comprise approximately 100 000 genes (4). In order to confer developmental and functional specificity, only a fraction of this total are active in a given cell type at a given time, and these expressed genes result in the specific phenotype exhibited by the cell. As tumor cells are phenotypically different to their normal counterparts, their set of expressed genes, defined here as the transcriptome, differs in a qualitative (different genes expressed) and quantitative fashion. A thorough understanding of which genes are expressed, to what extent, and under what conditions would provide insight into the process of normal homeostasis and the process of carcinogenesis. The systematic and automated categorization of Expressed Sequence Tags (ESTs) by clustering and annotation provides a method to rapidly define a tissue or cellular transcriptome (5,6). A comparison of expression profiles between normal and pathological states can be used to identify differentially expressed genes that may provide insights into normal and abnormal physiology.

The Prostate Expression Database (PEDB) is an effort to establish a transcriptome of the human prostate that can be utilized by investigators studying normal and neoplastic prostate development. PEDB is a curated resource comprised of ESTs produced from cDNA libraries representing a wide spectrum of normal, benign and malignant prostate disease states. Detailed library information including tissue source, library construction methods, sequence diversity and abundance are maintained in a relational database management system (RDBMS). Prostate ESTs are assembled into distinct species groups using the multiple alignment program CAP2 (7), and annotated with information from the GenBank (8), dbEST (9) and Unigene (10) public sequence databases. The primary user work sites involve: (i) database queries with nucleotide sequence information using the BLAST algorithm (11) to find homologous sequences in PEDB that could be useful in extending and further defining the user's sequence; and (ii) virtual expression analysis using a graphical user interface to perform intra- and inter-library sequence abundance comparisons. The PEDB also provides links to other relevant WWW resources involving prostate disease, cancer biology and genomics.

PEDB DATA

Sequence processing and curation

Analysis scripts, written in the Perl 5 interpreted programming language, work in conjunction with an RDBMS (Oracle version 7.2) to automate a pipeline of sequence submission, masking, clustering and annotation tasks (Fig. 1). ESTs derived from the human prostate gland are obtained from GenBank via Entrez batch search (http://www.ncbi.nlm.nih.gov/Entrez/batch.html ), the NCI Cancer Genome Anatomy Project (CGAP; 12), The Institute for Genome Research (TIGR; http://www.tigr.org ), and the University of Washington. ESTs and relevant information are maintained in the RDBMS. Presently, more than 50 000 sequences from 42 distinct cDNA libraries comprise PEDB (Fig. 2).


Figure 1. PEDB sequence processing. Fasta files of sequences are submitted and masked for vector, E.coli and repetitive DNA sequences. Masked sequences are clustered and annotated for viewing and analysis with the PEDB WWW tools.


Figure 2. Library archive shows a dynamically generated list of libraries and their description and EST totals. Detailed library source and construction information is linked to each library.

Each EST is examined for sequence homology to cloning vectors, Escherichia coli and repetitive DNA sequence using a core program called AnalDemon (http://www.mbt.washington.edu/PEDB/software ). AnalDemon first employs Cross_Match (http://bozeman.mbt.washington.edu/phrap.docs/general.html ), a program based on the Smith-Waterman-Gotoh algorithm, to screen for vector contamination. ESTs are then compared using Cross_Match with the complete E.coli genome. Finally, ESTs are examined for interspersed repeats and regions of low sequence complexity using Repeatmasker (http://ftp.genome.washington.edu/RM/RepeatMasker.html ). Analysis results are stored in the RDBMS and are used to produce quality reports describing sequence contamination and artifacts. Regions of EST sequences with homology to vector, E.coli, or interspersed repeats are masked, and thus are not available for subsequent sequence assemblies. Elimination of these regions speeds the automated clustering and annotation process, and reduces the occurrence of erroneous assemblies.

EST assembly and clustering

CAP2 (7), a multiple alignment program based on a variant of the Smith-Waterman algorithm, is used for sequence assembly. Clustering is based on maximal scoring overlapping alignments and allows for general substitutions from sequencing errors, insertions and deletions. CAP2 produces a consensus sequence and allows varying sensitivity and overlap parameters. Unlike phrap (http://bozeman.mbt.washington.edu/phrap.docs/general.html ), CAP2 does not depend heavily on quality information and is desirable in cases where original sequence data is difficult to obtain. The program also provides alignments of the assembled sequences for visual inspection. A pre-sort step allows for rapid clustering.

Following the masking processing by AnalDemon, the cohort of ESTs with >100 bp of unmasked sequences are extracted and assembled. Assembling all ESTs from all libraries together ensures that more sequences will be correctly classified and assigned to the appropriate species ID even if they do not receive an annotation to a sequence in the public databases. Each group or cluster of ESTs exhibiting significant homology with one another is termed a species. Thus, a species is a sequence or group of sequences that is unique relative to the nucleotide sequence of other groups of sequences, and each is given a unique PEDB species identification number (SID). The SID provides a means to perform gene expression analyses across the entire assembly set, and can be used to provide a library-by-library species-specific differential expression profile.

Annotation

Each distinct species from the assembly is annotated by searching the Unigene (ncbi.nlm.nih.gov in /pub/schuler/unigene), GenBank (ncbi.nlm.nih.gov/blast/db/nt.Z), and EST databases (ncbi.nlm.nih.gov/blast/db/est.Z) using BLASTN (http://blast.wustl.edu ). Annotations are assigned automatically using the program SmartBlast (http://www.mbt.washington.edu/PEDB/software ) by selecting the database match with the lowest P value and the highest BLAST score where the maximum P value is e-20 and the minimum BLAST score is 500. A small number of species required manual reconciliation when either two distinct PEDB species were annotated with the same public database ID, or annotated with the same ID in one public database and different ID in another. Figure 3 depicts the annotation results of PEDB ESTs.


Figure 3. Venn diagram of species annotation against the GenBank, Unigene and dbEST public databases. The 49 046 prostate ESTs assemble to yield 18 504 distinct clusters or species. Annotation against public nucleic acid sequence databases is shown in the diagram indicating the number of species with homology to sequences in one or more databases, or having no annotation.

QUERIES, VISUALIZATION AND ANALYSIS TOOLS

Sequence homology searching

A BLAST server has been established at the PEDB site to allow homology comparisons between an investigator's pasted or uploaded query sequence and the annotated ESTs archived in PEDB. BLAST results include links to GenBank that display the GenBank file for the public sequence to which the species is annotated. A BLAST interface for the Unigene database is also available for sequence homology queries.

Virtual expression analysis

Dynamic gene expression profiles based upon the EST assembly and annotation information are generated, viewed and manipulated using the Virtual Expression Analysis Tool (VEAT). VEAT was written in the Java 1.1 programming language with initial zooming and plotting capabilities provided through a graphing package called PTPLOT (http://ptolemy.eecs.berkeley.edu/java/ptplot/index.html ). VEAT provides a virtual inter- and intra-library analysis of transcript abundance, diversity and differential expression. Libraries or groups of libraries from normal prostate, primary carcinoma, metastatic carcinoma, or with other specified attributes are selected individually or grouped for comparative analysis. The species abundance for each library align to produce a map of expression that is easily expanded or contracted for more or less detail. Different viewing options isolate unique or common species between libraries or groups of libraries. Thresholds dictating the level of expression differences to display are selected by the viewer. Gene species can also be analyzed for differential expression across all libraries or selected groups (Fig. 4).


Figure 4. Viewing gene expression profiles with VEAT. The abundance profile of ESTs derived from a group of normal prostate cDNA libraries (blue diamond) compared with ESTs from one metastatic prostate cDNA library (green triangle) are compared. The x axis segregates each species by PEDB identification number. The y axis displays the species abundance based on the number of times a specific EST appears in the library. One datapoint is selected (red circle) with the corresponding species annotation provided in the bottom panel consisting of the cluster ID, Frequency, Unigene Annotation and GenBank Annotation. Buttons on the bottom right link directly to public database entries. The expression for the individual species across all PEDB libraries is obtained using the `Expression' button resulting in a histogram display (inset).

Individual data points, representing a species of interest, may be selected to reveal annotation and library information in a text box below the plot. If the species of interest is reported as similar to a known sequence in GenBank, then the GenBank report for that sequence can be requested by selecting the GenBank button on an adjacent panel. Selecting the PEDB button will show the library and nucleotide sequence information for that species, including a breakdown of all ESTs assembled to produce the species consensus sequence. A detailed help section is available for orienting first-time users to the features of the VEAT interface.

SUMMARY AND FUTURE DEVELOPMENT

PEDB serves as a centralized archive of gene expression information derived from the human prostate organized in a fashion suitable for sequence-based queries, assessment of gene expression diversity and comparative expression analysis. In addition to continually adding new sequence data, future enhancements to PEDB focus on expanding the comparative analysis capabilities across other tissue types, providing an interactive cluster or contig assembly viewer, adding statistical validation to expression analyses and, ultimately, developing a corresponding database of prostate-specific protein expression information. The database is available via the World Wide Web at http://www.mbt.washington.edu/PEDB/ . We encourage users of PEDB to cite this paper as the primary reference.

ACKNOWLEDGEMENTS

We thank collaborators in the CaPCURE Genetics Consortium for helpful advice, Xiaoqiu Huang for the CAP2 program, Matthew Huang and Mark Adams for providing prostate ESTs, the University of Washington Molecular Biotechnology sequencing group for sequencing support, Michel Schummer and Harold Miller for software and database advice, Jeff Gerber for software development and Victor Ng for critical reading of the manuscript. This work was supported by the CaPCURE Foundation and a grant (K08 CA75173-01A1) from the National Cancer Institute (to P.S.N.)

REFERENCES

1. Landis,S.H., Murray,T., Bolden,S. and Wingo,P.A. (1998) CA Cancer J. Clin., 48, 6-29. MEDLINE Abstract

2. Isaacs,W.B., Bova,G.S., Morton,R.A., Bussemakers,M.J.G., Brooks,J.D. and Ewing,C.M. (1994) Seminars Oncol., 21, 514-521.

3. Visakorpi,T., Hyytinen,E., Koivisto,P., Tanner,M., Keinanen,R., Palmberg,C., Palotie,A., Tammela,T., Isola,J. and Kallioniemi,O.-P. (1995) Nature Genet., 9, 401-406. MEDLINE Abstract

4. Fields,C., Adams,M.D., White,O. and Venter,J.C. (1994) Nature Genet., 7, 345-346. MEDLINE Abstract

5. Nelson,P.S., Ng,W.L., Schummer,M., True,L.D., Liu,A.Y., Bumgarner,R.E., Ferguson,C., Dimak,A. and Hood,L. (1998) Genomics, 47, 12-25. MEDLINE Abstract

6. Adams,M.D., Kerlavage,A.R., Fleischman,R.D., Fuldner,R.A., Bult,C.J., Lee,N.H., Kirkness,E.F., Weinstock,K.G., Gocayne,J.D. and White,O. (1995) Nature, 377 (Supp), 3-174. MEDLINE Abstract

7. Huang,X. (1996) Genomics, 33, 21-31. MEDLINE Abstract

8. Benson,D.A., Boguski,M.S., Lipman,D.J., Ostell,J. and Ouellette,B.F.F. (1998) Nucleic Acids Res., 26, 1-7. MEDLINE Abstract

9. Boguski,M.S., Lowe,T.M.J. and Tolstoshev,C.M. (1993) Nature Genet., 4, 332-333. MEDLINE Abstract

10. Schuler,G.D. (1997) J. Mol. Med., 75, 694-698. MEDLINE Abstract

11. Altschul,S.F., Gish,W., Miller,W., Myers,E.W. and Lipman,D.J. (1990) J. Mol. Biol., 215, 403-410. MEDLINE Abstract

12. Strausberg,R.L., Dahl,C.A. and Klausner,R.D. (1997) Nature Genet., 15, 415-416. MEDLINE Abstract


*To whom correspondence should be addressed at: Department of Molecular Biotechnology, Box 357730 HSB K360, University of Washington, Seattle, WA 98195, USA. Tel: +1 206 221 4195; Fax: +1 206 685 7301; Email: psnels@u.washington.edu
+Present address: Geospiza Inc., Seattle, WA 98107, USA


This page is run by Oxford University Press, Great Clarendon Street, Oxford OX2 6DP, as part of the OUP Journals
Comments and feedback: www-admin{at}oup.co.uk
Last modification: 9 Dec 1998
Copyright©Oxford University Press, 1998.

Add to CiteULike CiteULike   Add to Connotea Connotea   Add to Del.icio.us Del.icio.us    What's this?


This article has been cited by other articles:


Home page
Clin. Cancer Res.Home page
S. R. Plymate, K. Haugk, I. Coleman, L. Woodke, R. Vessella, P. Nelson, R. B. Montgomery, D. L. Ludwig, and J. D. Wu
An Antibody Targeting the Type I Insulin-like Growth Factor Receptor Enhances the Castration-Induced Response in Androgen-Dependent Prostate Cancer
Clin. Cancer Res., November 1, 2007; 13(21): 6429 - 6439.
[Abstract] [Full Text] [PDF]


Home page
Brief BioinformHome page
S. H. Nagaraj, R. B. Gasser, and S. Ranganathan
A hitchhiker's guide to expressed sequence tag (EST) analysis
Brief Bioinform, January 1, 2007; 8(1): 6 - 21.
[Abstract] [Full Text] [PDF]


Home page
Clin. Cancer Res.Home page
J. D. Wu, K. Haugk, I. Coleman, L. Woodke, R. Vessella, P. Nelson, R. B. Montgomery, D. L. Ludwig, and S. R. Plymate
Combined In vivo Effect of A12, a Type 1 Insulin-Like Growth Factor Receptor Antibody, and Docetaxel against Prostate Cancer Tumors.
Clin. Cancer Res., October 15, 2006; 12(20): 6153 - 6160.
[Abstract] [Full Text] [PDF]


Home page
Cancer Res.Home page
C. Bavik, I. Coleman, J. P. Dean, B. Knudsen, S. Plymate, and P. S. Nelson
The Gene Expression Program of Prostate Fibroblast Senescence Modulates Neoplastic Epithelial Cell Proliferation through Paracrine Mechanisms
Cancer Res., January 15, 2006; 66(2): 794 - 802.
[Abstract] [Full Text] [PDF]


Home page
Nucleic Acids ResHome page
L.-C. Li, H. Zhao, H. Shiina, C. J. Kane, and R. Dahiya
PGDB: a curated and integrated database of genes related to the prostate
Nucleic Acids Res., January 1, 2003; 31(1): 291 - 293.
[Abstract] [Full Text] [PDF]


Home page
JNCI J Natl Cancer InstHome page
M. J. Bonham, A. Galkin, B. Montgomery, W. L. Stahl, D. Agus, and P. S. Nelson
Effects of the Herbal Extract PC-SPES on Microtubule Dynamics and Paclitaxel-Mediated Prostate Tumor Growth Inhibition
J Natl Cancer Inst, November 6, 2002; 94(21): 1641 - 1647.
[Abstract] [Full Text] [PDF]


Home page
Proc. Natl. Acad. Sci. USAHome page
P. S. Nelson, N. Clegg, H. Arnold, C. Ferguson, M. Bonham, J. White, L. Hood, and B. Lin
The program of androgen-responsive genes in neoplastic prostate epithelium
PNAS, September 3, 2002; 99(18): 11890 - 11895.
[Abstract] [Full Text] [PDF]


Home page
Am. J. Pathol.Home page
L. D. True, K. Buhler, J. Quinn, E. Williams, P. S. Nelson, N. Clegg, J. A. Macoska, T. Norwood, A. Liu, W. Ellis, et al.
A Neuroendocrine/Small Cell Prostate Carcinoma Xenograft--LuCaP 49
Am. J. Pathol., August 1, 2002; 161(2): 705 - 715.
[Abstract] [Full Text] [PDF]


Home page
Cancer Res.Home page
M. Bonham, H. Arnold, B. Montgomery, and P. S. Nelson
Molecular Effects of the Herbal Compound PC-SPES: Identification of Activity Pathways in Prostate Carcinoma
Cancer Res., July 15, 2002; 62(14): 3920 - 3924.
[Abstract] [Full Text] [PDF]


Home page
Nucleic Acids ResHome page
P. S. Nelson, C. Pritchard, D. Abbott, and N. Clegg
The human (PEDB) and mouse (mPEDB) Prostate Expression Databases
Nucleic Acids Res., January 1, 2002; 30(1): 218 - 220.
[Abstract] [Full Text] [PDF]


Home page
Cancer Res.Home page
A. Akalin, L. W. Elmore, H. L. Forsythe, B. A. Amaker, E. D. McCollum, P. S. Nelson, J. L. Ware, and S. E. Holt
A Novel Mechanism for Chaperone-mediated Telomerase Regulation during Prostate Cancer Progression
Cancer Res., June 1, 2001; 61(12): 4791 - 4796.
[Abstract] [Full Text] [PDF]


Home page
Cancer Res.Home page
B. Lin, J. T. White, C. Ferguson, S. Wang, R. Vessella, R. Bumgarner, L. D. True, L. Hood, and P. S. Nelson
Prostate Short-Chain Dehydrogenase Reductase 1 (PSDR1): A New Member of the Short-Chain Steroid Dehydrogenase/Reductase Family Highly Expressed in Normal and Neoplastic Prostate Epithelium
Cancer Res., February 1, 2001; 61(4): 1611 - 1618.
[Abstract] [Full Text]


Home page
JNCI J Natl Cancer InstHome page
B. A. Yoshida, M. M. Sokoloff, D. R. Welch, and C. W. Rinker-Schaeffer
Metastasis-Suppressor Genes: a Review and Perspective on an Emerging Field
J Natl Cancer Inst, November 1, 2000; 92(21): 1717 - 1730.
[Abstract] [Full Text] [PDF]


Home page
Cancer Res.Home page
B. Lin, J. T. White, C. Ferguson, R. Bumgarner, C. Friedman, B. Trask, W. Ellis, P. Lange, L. Hood, and P. S. Nelson
PART-1: A Novel Human Prostate-specific, Androgen-regulated Gene that Maps to Chromosome 5q12
Cancer Res., February 1, 2000; 60(4): 858 - 863.
[Abstract] [Full Text]


Home page
Nucleic Acids ResHome page
P. S. Nelson, N. Clegg, B. Eroglu, V. Hawkins, R. Bumgarner, T. Smith, and L. Hood
The Prostate Expression Database (PEDB): status and enhancements in 2000
Nucleic Acids Res., January 1, 2000; 28(1): 212 - 213.
[Abstract] [Full Text] [PDF]


Home page
Cancer Res.Home page
B. Lin, C. Ferguson, J. T. White, S. Wang, R. Vessella, L. D. True, L. Hood, and P. S. Nelson
Prostate-localized and Androgen-regulated Expression of the Membrane-bound Serine Protease TMPRSS2
Cancer Res., September 1, 1999; 59(17): 4180 - 4184.
[Abstract] [Full Text] [PDF]


This Article
Right arrow Abstract Freely available
Right arrow Print PDF (1005K) Freely available
Right arrow Alert me when this article is cited
Right arrow Alert me if a correction is posted
Services
Right arrow Email this article to a friend
Right arrow Similar articles in this journal
Right arrow Similar articles in ISI Web of Science
Right arrow Similar articles in PubMed
Right arrow Alert me to new issues of the journal
Right arrow Add to My Personal Archive
Right arrow Download to citation manager
Right arrow Search for citing articles in:
ISI Web of Science (41)
Right arrowRequest Permissions
Right arrow Commercial Re-use Guidelines
for Open Access NAR Content
Google Scholar
Right arrow Articles by Hawkins, V.
Right arrow Articles by Nelson, P. S.
Right arrow Search for Related Content
PubMed
Right arrow PubMed Citation
Right arrow Articles by Hawkins, V.
Right arrow Articles by Nelson, P. S.
Social Bookmarking
 Add to CiteULike   Add to Connotea   Add to Del.icio.us  
What's this?