Skip Navigation

This Article
Right arrow Abstract Freely available
Right arrow Print PDF (102K) Freely available
Right arrow Alert me when this article is cited
Right arrow Alert me if a correction is posted
Services
Right arrow Email this article to a friend
Right arrow Similar articles in this journal
Right arrow Similar articles in ISI Web of Science
Right arrow Similar articles in PubMed
Right arrow Alert me to new issues of the journal
Right arrow Add to My Personal Archive
Right arrow Download to citation manager
Right arrow Search for citing articles in:
ISI Web of Science (89)
Right arrow Commercial Re-use Guidelines
for Open Access NAR Content
Google Scholar
Right arrow Articles by Barker, W. C.
Right arrow Articles by Wu, C.
Right arrow Search for Related Content
PubMed
Right arrow PubMed Citation
Right arrow Articles by Barker, W. C.
Right arrow Articles by Wu, C.
Social Bookmarking
 Add to CiteULike   Add to Connotea   Add to Del.icio.us  
What's this?

Nucleic Acids Research, 2000, Vol. 28, No. 1 41-44
© 2000 Oxford University Press

The Protein Information Resource (PIR)

Winona C. Barker*, John S. Garavelli, Hongzhan Huang, Peter B. McGarvey, Bruce C. Orcutt, Geetha Y. Srinivasarao, Chunlin Xiao, Lai-Su L. Yeh, Robert S. Ledley, Joseph F. Janda, Friedhelm Pfeiffer1, Hans-Werner Mewes1, Akira Tsugita2 and Cathy Wu

Protein Information Resource, National Biomedical Research Foundation, 3900 Reservoir Road, NW, Washington, DC 20007, USA, 1GSF-Forschungszentrum für Umwelt und Gesundheit, Munich Information Center for Protein Sequences am Max-Planck-Instut für Biochemie, Am Klopferspitz 18, D-82152 Martinsried, Germany and 2Japan International Protein Information Database, Amakubo 1-16-1, Tsukuba 305-0005, Japan

Received October 1, 1999; Accepted October 4, 1999.


    ABSTRACT
 TOP
 ABSTRACT
 INTRODUCTION
 THE PIR-INTERNATIONAL PROTEIN...
 THE PIR SEARCH AND...
 AVAILABILITY
 REFERENCES
 
The Protein Information Resource (PIR) produces the largest, most comprehensive, annotated protein sequence database in the public domain, the PIR-International Protein Sequence Database, in collaboration with the Munich Information Center for Protein Sequences (MIPS) and the Japan International Protein Sequence Database (JIPID). The expanded PIR WWW site allows sequence similarity and text searching of the Protein Sequence Database and auxiliary databases. Several new web-based search engines combine searches of sequence similarity and database annotation to facilitate the analysis and functional identification of proteins. New capabilities for searching the PIR sequence databases include annotation-sorted search, domain search, combined global and domain search, and interactive text searches. The PIR-International databases and search tools are accessible on the PIR WWW site at http://pir.georgetown.edu and at the MIPS WWW site at http://www.mips.biochem.mpg.de . The PIR-International Protein Sequence Database and other files are also available by FTP.


    INTRODUCTION
 TOP
 ABSTRACT
 INTRODUCTION
 THE PIR-INTERNATIONAL PROTEIN...
 THE PIR SEARCH AND...
 AVAILABILITY
 REFERENCES
 
The accelerating pace of genome sequencing projects has greatly increased the volume and complexity of available molecular data. To realize the fullest possible value from the data and to gain a better understanding of the genome, databases and the computational tools for analyzing them are required to allow biologically relevant features in the sequences to be identified and to provide insight on their structure and function. For over 30 years, the Protein Information Resource (PIR) has been providing the scientific community with databases and tools for the organization and analysis of protein sequence data (1,2). Together with MIPS and JIPID, we have undertaken a major restructuring to meet the challenges presented by the rapid growth of largely uncharacterized sequence data and the opportunities provided by the nearly universal access of scientists to the resources available on the WWW. Among the key developments are complete protein family organization for the PIR-International Protein Sequence Database (PSD) and integrated WWW interfaces for user-friendly sequence analysis, database searching and information retrieval.


    THE PIR-INTERNATIONAL PROTEIN DATABASES
 TOP
 ABSTRACT
 INTRODUCTION
 THE PIR-INTERNATIONAL PROTEIN...
 THE PIR SEARCH AND...
 AVAILABILITY
 REFERENCES
 
PIR, MIPS and JIPID constitute the PIR-International consortium that maintains the PIR-International Protein Sequence Database (PSD), the largest publicly distributed and freely available protein sequence database. The database has the following distinguishing features.

• It is a comprehensive, annotated, and non-redundant protein sequence database, containing over 142 000 sequences as of September 1999. Included are sequences from the completely sequenced genomes of 16 prokaryotes, six archaebacteria, 17 viruses and phages, >100 eukaryote organelles and Saccharomyces cerevisiae.

• The collection is well organized with >99% of entries classified by protein family and >57% classified by protein superfamily.

• PSD annotation includes concurrent cross-references to other sequence, structure, genomic and citation databases, including the public nucleic acid sequence databases ENTREZ, MEDLINE, PDB, GDB, OMIM, FlyBase, MIPS/Yeast, SGD/Yeast, MIPS/Arabidopsis and TIGR. Where these databases are publicly and freely accessible and provide suitable WWW access, the cross-references presented on the PIR WWW site are hot-linked so that searchers can consult the most current data.

• The PIR is the only sequence database to provide context cross-references between its own database entries. These cross-references assist searchers in exploring relationships such as subunit associations in molecular complexes, enzyme–substrate interactions, activation and regulation cascades, as well as in browsing entries with shared features and annotations.

• Interim updates are made publicly available on a weekly basis, and full releases have been published quarterly since 1984.

In addition to the PSD, PIR-International distributes or provides WWW access to other sequence and auxiliary databases (Table 1), briefly described below, and maintains several internal data collections used for sequence annotation and integrity checks.


View this table:
[in this window]
[in a new window]
 
Table 1. PIR-International sequence and auxiliary databases
 
• PATCHX (3) is a non-redundant database assembled by MIPS of publicly available protein sequences not yet in the PIR-International PSD. PIR+PATCHX, a combination of the PSD and PATCHX containing ~300 000 sequences available for similarity searches, is the most complete non-redundant collection of protein sequences available in the public domain.

• ARCHIVE is a database of protein sequences as originally reported in a publication or submission, the only such collection of ‘as published’ unmerged sequences.

• NRL_3D (4) sequence-structure database is produced from sequence and annotation in the Protein Data Bank (PDB) of three-dimensional structures (5).

• FAMBASE is a collection of representative sequences from each protein family that can be used in a similarity search to reduce search time and improve sensitivity for identifying distant families.

• PIR-ALN (6) is a curated database of sequence alignments of superfamilies, families and homology domains, with annotation information derived from PSD and consensus patterns calculated from the alignments.

• RESID (7) is a database of post-translational modifications with descriptive, chemical, structural and bibliographic information based on feature information in the PSD.

• ProClass (8) is a protein family database that organizes non-redundant PIR-International PSD and SWISS-PROT sequences according to PIR superfamilies and PROSITE patterns.

• ProtFam (9) is a curated database of homology clusters with automatically generated multiple sequence alignments for families, superfamilies and homology domains.

To support both data management and data mining and assist knowledge discovery, the PIR databases are being migrated to an object-relational database management system. A three-tier network computing, architecture provides a framework for distributed object computing and Java-based WWW interfaces connect with the database server for database query and update tasks.

Superfamily and family classification
The pioneering work of Margaret Dayhoff on protein classification based on the superfamily concept (10,11) was refined by PIR-International (12) to assist database organization and molecular evolution studies. Central to the organization and annotation of the PIR-International sequence and auxiliary databases are the protein family relationships which are structured at three levels: (i) superfamilies and families (for full-length sequence similarity), (ii) homology domains (for local functional or structural units), and (iii) motifs (for functional or structural sites). PIR-International has maintained the highest classification rate and provided the most comprehensive classification and alignments of proteins among all major public domain databases.

To deal efficiently with the many new sequences from genome sequencing projects, procedures for family and superfamily classification have been automated. Over 99% of sequences are routinely classified shortly after entry into the database into protein families of sequences that are at least 45% identical. Subsequently, entries are further clustered into regular superfamilies of sequences that share end-to-end homology (but may be rather distantly related) and also into domain superfamilies of proteins sharing at least one common homology domain. There are currently >76 000 sequences in >8900 superfamilies, and 30 000 entries with 370 recognized homology domains in the PSD. Corresponding to the classification are 1500 superfamily, 2100 family and 400 domain alignments in the PIR-ALN database, and 15 000 family and 4500 superfamily alignments in the MIPS ProtFam database. Also available from PIR is the ProClass protein family database containing 92 000 classified entries as well as 1300 motif alignments of ~44 000 PIR-International PSD entries.


    THE PIR SEARCH AND ANALYSIS SYSTEM
 TOP
 ABSTRACT
 INTRODUCTION
 THE PIR-INTERNATIONAL PROTEIN...
 THE PIR SEARCH AND...
 AVAILABILITY
 REFERENCES
 
The PIR search and analysis system provides search engines of three types (Table 2): (i) interactive text-based search engines, which allow Boolean queries of text fields; (ii) standard sequence similarity search engines, including Peptide Match, Pattern Match, BLAST, FASTA, Pairwise Alignment and Multiple Alignment; and (iii) advanced search engines that combine sequence similarity and annotation searches or evaluate gene family relationships, including Annotation-Sorted Similarity Search, Domain Search, Global and Domain Similarity Search and GeneFIND. Sequence searching can be performed against the PSD, NRL_3D, PATCHX, FAMBASE, ProClass and the combined PSD+PATCHX collections. Text and entry searching is provided for the PSD, NRL_3D, PIR-ALN and RESID databases.


View this table:
[in this window]
[in a new window]
 
Table 2. PIR Search and Analysis systems
 
Sequence search and alignment
BLAST (13) and FASTA (14) searches for sequence similarity are available for all sequence databases. The output of these search engines employs a graphical interface showing location of hits within the query sequence and full-length alignments generated by SSEARCH (15). Multiple or pairwise alignments of PSD or user-supplied sequences can be done using CLUSTALW (16) or SSEARCH. PIR pattern or peptide matching programs can (i) match a query sequence against a database of regular expressions (i.e., patterns); (ii) search a user-specified regular expression against a sequence database; or (iii) find an exact match for a user-specified peptide sequence in one of the sequence databases, including the ARCHIVE database of ‘as published’ sequences.

PIR Similarity Search system
Combining sequence and annotation search, the Annotation-Sorted Similarity Search facility displays BLAST or FASTA matches along with the user-selected annotation (superfamily, family, species, taxonomic group, keyword or all five) in the annotation-sorted order. The matched entries can be selected for multiple alignments against the query sequence using CLUSTALW and displayed using MView (17).

The Domain Similarity Search engine uses FASTA to search against domain sequences compiled from the PIR-International PSD, and displays the PSD entry and domain annotation with a graphical representation of the matched region with links to domain alignments in PIR-ALN. The Global and Domain Similarity Search uses BLAST to search the PSD for global similarity and FASTA to search the domain sequence collection for local similarity. The results are ranked by the global score and show the extent of matches at both the global and domain levels. Any combination of complete sequences and domains can be selected and viewed in a multiple alignment.

The PIR Integrated Environment for Sequence Analysis provides an integrated environment for all above protein analysis tasks, including sequence similarity search, pattern and peptide match, multiple sequence alignment and advanced PIR similarity searches, as well as for entry retrieval by unique superfamily, family, title, species, taxonomic group, domains or keywords.

GeneFIND (18) provides protein family classification and information retrieval by combining several search/alignment tools and the ProClass database in a multi-level filter system, including the MOTIFIND neural networks, BLAST search, SSEARCH sequence alignment, motif pattern matching, hidden Markov motif modeling and CLUSTALW multiple motif alignment.


    AVAILABILITY
 TOP
 ABSTRACT
 INTRODUCTION
 THE PIR-INTERNATIONAL PROTEIN...
 THE PIR SEARCH AND...
 AVAILABILITY
 REFERENCES
 
PIR provides free public access to value-added protein information through its WWW site at http://pir.georgetown.edu and direct file transfer at ftp://nbrfa.georgetown.edu/pir . In addition to the databases (Table 1) and search tools (Table 2), the PIR WWW site also provides associated metadata, including technical bulletins and documentation that serves as metadata dictionaries for the PIR-International PSD. Accessible from the PIR anonymous FTP site are PIR-International databases and many other documents, files and software tools, including the weekly interim updates of the PSD (in NBRF format) and the corresponding sequence file (in FASTA format). The PIR-International PSD quarterly releases (in both NBRF and CODATA formats) are also available at the NCBI FTP server. Other sites and data depositories do not always have the most recent quarterly release of the PSD.


    ACKNOWLEDGEMENTS
 
PIR is a registered mark of NBRF. The work at NBRF is supported by grant number P41 LM05978 from the National Library of Medicine and by gifts from COMPAQ, Pfizer and Dupont. The work at MIPS is supported by the Federal Ministry of Education, Science, Research and Technology (BMBF, FKZ 03311670, 01KW9703/7), the Max-Planck-Society and the European Commission (BIO4-CT96-0110, 0338,0558).


    FOOTNOTES
 
* To whom correspondence should be addressed. Tel. +1 202 687 2121; Fax: +1 202 687 1662; Email: pirmail@nbrf.georgetown.edu Back


    REFERENCES
 TOP
 ABSTRACT
 INTRODUCTION
 THE PIR-INTERNATIONAL PROTEIN...
 THE PIR SEARCH AND...
 AVAILABILITY
 REFERENCES
 

    1 Dayhoff,M.O., Eck,R.V., Chang,M.A. and Sochard,M.R. (1965) Atlas of Protein Sequence and Structure, Vol. 1. National Biomedical Research Foundation, Silver Spring, MD.

    2 Dayhoff,M.O. (1979) Atlas of Protein Sequence and Structure, Vol. 5, Supplement 3. National Biomedical Research Foundation, Washington, DC.

    3 Barker,W.C., George,D.G., Mewes,H.-W., Pfeiffer,F. and Tsugita,A. (1993) Nucleic Acids Res., 21, 3089–3092.[Abstract/Free Full Text]

    4 Pattabiraman,N., Namboodiri,K., Lowrey,A. and Gaber,B.P. (1990) Protein Seq. Data Anal., 3, 387–405.[Medline]

    5 Abola,E.E., Manning,N.O., Prilusky,J., Stampf,D.R. and Sussman,J.L. (1996) Res. Natl Stand. Technol., 101, 231–241.

    6 Srinivasarao,G.Y., Yeh,L.-S., Marzec,C.R., Orcutt,B.C. and Barker,W.C. (1999) Bioinformatics, 15, 382–390.[Abstract/Free Full Text]

    7 Garavelli,J.S. (2000) Nucleic Acids Res., 28, 209–211 (this issue).[Abstract/Free Full Text]

    8 Wu,C., Xiao,C. and Huang,H. (2000) Nucleic Acids Res., 28, 273–276 (this issue).[Abstract/Free Full Text]

    9 Mewes,H.W., Frishman,D., Haase,D., Kaps,A., Lemcke,K., Mannhaupt,G., Pfeiffer,F., Schüller C., Stocker,S. and Weil,B. (2000) Nucleic Acids Res., 28, 37–40 (this issue).[Abstract/Free Full Text]

    10 Dayhoff,M.O. (1976) Fed. Proc., 35, 2132–2138.[ISI][Medline]

    11 Dayhoff,M.O., McLaughlin,P.J., Barker,W.C. and Hunt,L.T. (1975) Naturwissenschaften, 62, 154–161.

    12 Barker,W.C., Pfeiffer,F. and George,D. (1996) Methods Enzymol., 266, 59–71.[ISI][Medline]

    13 Altschul,S.F., Madden,T.L., Schaffer,A.A., Zhang,J., Zhang,Z., Miller,W. and Lipman,D.J. (1997) Nucleic Acids Res., 25, 3389–3402.[Abstract/Free Full Text]

    14 Pearson,W.R. and Lipman,D.J. (1988) Proc. Natl Acad. Sci. USA, 85, 2444–2448.[Abstract/Free Full Text]

    15 Smith,T.F. and Waterman,M.S. (1981) Adv. Appl. Math., 2, 482–489.

    16 Thompson,J.D., Higgins,D.G. and Gibson,T.J. (1994) Nucleic Acids Res., 22, 4673–4680.[Abstract/Free Full Text]

    17 Brown,N.P., Leroy,C. and Sander,C. (1998) Bioinformatics, 14, 380–381.[Abstract/Free Full Text]

    18 Wu,C.H., Huang,H. and Shivakumar,S. (1999) Int. J. Artificial Intelligence Tools, in press.


Add to CiteULike CiteULike   Add to Connotea Connotea   Add to Del.icio.us Del.icio.us    What's this?


This article has been cited by other articles:


Home page
Appl. Environ. Microbiol.Home page
A. Saha, A. Sharma, A. Dhar, B. Bhattacharyya, S. Roy, and S. K. Das Gupta
Antagonists of Hsp16.3, a Low-Molecular-Weight Mycobacterial Chaperone and Virulence Factor, Derived from Phage-Displayed Peptide Libraries
Appl. Envir. Microbiol., November 1, 2005; 71(11): 7334 - 7344.
[Abstract] [Full Text] [PDF]


Home page
Mol Cancer ResHome page
A. R. Meloni, C.-H. Lai, T.-P. Yao, and J. R. Nevins
A Mechanism of COOH-Terminal Binding Protein-Mediated Repression
Mol. Cancer Res., October 1, 2005; 3(10): 575 - 583.
[Abstract] [Full Text] [PDF]


Home page
Nucleic Acids ResHome page
A. Ruepp, A. Zollner, D. Maier, K. Albermann, J. Hani, M. Mokrejs, I. Tetko, U. Guldener, G. Mannhaupt, M. Munsterkotter, et al.
The FunCat, a functional annotation scheme for systematic classification of proteins from whole genomes
Nucleic Acids Res., October 14, 2004; 32(18): 5539 - 5545.
[Abstract] [Full Text] [PDF]


Home page
Nucleic Acids ResHome page
D. L. Wheeler, D. M. Church, S. Federhen, A. E. Lash, T. L. Madden, J. U. Pontius, G. D. Schuler, L. M. Schriml, E. Sequeira, T. A. Tatusova, et al.
Database resources of the National Center for Biotechnology
Nucleic Acids Res., January 1, 2003; 31(1): 28 - 33.
[Abstract] [Full Text] [PDF]


Home page
Nucleic Acids ResHome page
D. Frishman, M. Mokrejs, D. Kosykh, G. Kastenmuller, G. Kolesov, I. Zubrzycki, C. Gruber, B. Geier, A. Kaps, K. Albermann, et al.
The PEDANT genome database
Nucleic Acids Res., January 1, 2003; 31(1): 207 - 211.
[Abstract] [Full Text] [PDF]


Home page
Nucleic Acids ResHome page
D. L. Wheeler, D. M. Church, A. E. Lash, D. D. Leipe, T. L. Madden, J. U. Pontius, G. D. Schuler, L. M. Schriml, T. A. Tatusova, L. Wagner, et al.
Database resources of the National Center for Biotechnology Information: 2002 update
Nucleic Acids Res., January 1, 2002; 30(1): 13 - 16.
[Abstract] [Full Text] [PDF]


Home page
Nucleic Acids ResHome page
M. M. Gromiha, H. Uedaira, J. An, S. Selvaraj, P. Prabakaran, and A. Sarai
ProTherm, Thermodynamic Database for Proteins and Mutants: developments in version 3.0
Nucleic Acids Res., January 1, 2002; 30(1): 301 - 302.
[Abstract] [Full Text] [PDF]


Home page
Protein Sci.Home page
N. Antcheva, A. Pintar, A. Patthy, A. Simoncsits, E. Barta, B. Tchorbanov, and S. Pongor
Proteins of circularly permuted sequence present within the same organism: The major serine proteinase inhibitor from Capsicum annuum seeds
Protein Sci., November 1, 2001; 10(11): 2280 - 2290.
[Abstract] [Full Text] [PDF]


Home page
Genome Res.Home page
J. Murvai, K. Vlahovicek, C. Szepesvari, and S. Pongor
Prediction of Protein Functional Domains from Sequences Using Artificial Neural Networks
Genome Res., August 1, 2001; 11(8): 1410 - 1417.
[Abstract] [Full Text] [PDF]


Home page
Nucleic Acids ResHome page
P. Bertone, Y. Kluger, N. Lan, D. Zheng, D. Christendat, A. Yee, A. M. Edwards, C. H. Arrowsmith, G. T. Montelione, and M. Gerstein
SPINE: an integrated tracking database and data mining approach for identifying feasible targets in high-throughput structural proteomics
Nucleic Acids Res., July 1, 2001; 29(13): 2884 - 2898.
[Abstract] [Full Text] [PDF]


Home page
Protein Sci.Home page
S. Jayasinghe, K. Hristova, and S. H. White
MPtopo: A database of membrane protein topology
Protein Sci., February 1, 2001; 10(2): 455 - 458.
[Abstract] [Full Text]


Home page
Nucleic Acids ResHome page
D. L. Wheeler, D. M. Church, A. E. Lash, D. D. Leipe, T. L. Madden, J. U. Pontius, G. D. Schuler, L. M. Schriml, T. A. Tatusova, L. Wagner, et al.
Database resources of the National Center for Biotechnology Information
Nucleic Acids Res., January 1, 2001; 29(1): 11 - 16.
[Abstract] [Full Text] [PDF]


Home page
Nucleic Acids ResHome page
K. A. T. Silverstein, E. Shoop, J. E. Johnson, A. Kilian, J. L. Freeman, T. M. Kunau, I. A. Awad, M. Mayer, and E. F. Retzel
The MetaFam Server: a comprehensive protein family resource
Nucleic Acids Res., January 1, 2001; 29(1): 49 - 51.
[Abstract] [Full Text] [PDF]


Home page
Nucleic Acids ResHome page
J. Murvai, K. Vlahovicek, E. Barta, and S. Pongor
The SBASE protein domain library, release 8.0: a collection of annotated protein sequence segments
Nucleic Acids Res., January 1, 2001; 29(1): 58 - 60.
[Abstract] [Full Text] [PDF]


Home page
Nucleic Acids ResHome page
H. Kong, L.-F. Lin, N. Porter, S. Stickel, D. Byrd, J. Posfai, and R. J. Roberts
Functional analysis of putative restriction-modification system genes in the Helicobacter pylori J99 genome
Nucleic Acids Res., September 1, 2000; 28(17): 3216 - 3223.
[Abstract] [Full Text] [PDF]


Home page
Nucleic Acids ResHome page
J. S. Garavelli
The RESID Database of protein structure modifications: 2000 update
Nucleic Acids Res., January 1, 2000; 28(1): 209 - 211.
[Abstract] [Full Text] [PDF]


Home page
Nucleic Acids ResHome page
H. Huang, C. Xiao, and C. H. Wu
ProClass protein family database
Nucleic Acids Res., January 1, 2000; 28(1): 273 - 276.
[Abstract] [Full Text] [PDF]


Home page
Genome Res.Home page
D. Zhuo, W. D. Zhao, F. A. Wright, H.-Y. Yang, J.-P. Wang, R. Sears, T. Baer, D.-H. Kwon, D. Gordon, S. Gibbs, et al.
Assembly, Annotation, and Integration of UNIGENE Clusters into the Human Genome Draft
Genome Res., May 1, 2001; 11(5): 904 - 918.
[Abstract] [Full Text]


This Article
Right arrow Abstract Freely available
Right arrow Print PDF (102K) Freely available
Right arrow Alert me when this article is cited
Right arrow Alert me if a correction is posted
Services
Right arrow Email this article to a friend
Right arrow Similar articles in this journal
Right arrow Similar articles in ISI Web of Science
Right arrow Similar articles in PubMed
Right arrow Alert me to new issues of the journal
Right arrow Add to My Personal Archive
Right arrow Download to citation manager
Right arrow Search for citing articles in:
ISI Web of Science (89)
Right arrow Commercial Re-use Guidelines
for Open Access NAR Content
Google Scholar
Right arrow Articles by Barker, W. C.
Right arrow Articles by Wu, C.
Right arrow Search for Related Content
PubMed
Right arrow PubMed Citation
Right arrow Articles by Barker, W. C.
Right arrow Articles by Wu, C.
Social Bookmarking
 Add to CiteULike   Add to Connotea   Add to Del.icio.us  
What's this?