| Nucleic Acids Research | Pages |
Database of protein sequence alignments: PIR-ALN
Introduction
Data Selection
Current Status And Database Access
Acknowledgements
References
Database of protein sequence alignments: PIR-ALN
ABSTRACT
INTRODUCTION
PIR-ALN is a database of curated and annotated protein sequence alignments derived from the PIR-International Protein Sequence Database. The database includes alignments of protein sequences as superfamilies, families and homology domains. Sequences belong to the same homeomorphic superfamily if they are homologous from end-to-end (1). Each superfamily is further classified into families containing sequences that are at least 45% identical. Many protein sequences are composed of a number of distinct functional regions or 'domains', or multiple copies of the same domain. The sequence segments corresponding to the same homology domain in two or more superfamilies are extracted and aligned to form the homology domain alignments in PIR-ALN.
DATA SELECTION
The selection of data is tied very closely to the task of classifying sequences into families and superfamilies based on sequence comparision (2). Closely related sequences are first clustered into families and then the families are clustered into superfamilies. As a first step in clustering the incoming sequences, our collaborators at Munich Information Center for Protein Sciences (MIPS) dynamically maintain a database of FASTA scores from searching every sequence against the PIR-International Protein Sequence Database (3). High-scoring sequences are examined for percent identity and length of overlap. Software has been developed to determine if a new sequence belongs to an existing family or if a new family must be created. Using this approach, 95.2% of the sequences in PIR have been clustered at the family level. About 30% of the sequences fall into single-member families. For groups that have at least two members, multiple sequence alignments have been generated. The PROT-FAM database of family alignments is available for browsing and searching at MIPS http://www.mips.biochem.mpg.de (4).
Sequence families are then clustered into superfamilies and family, superfamily, and domain alignments are constructed at the PIR. An overview of the process is shown in Figure
Figure 1. Flow of information to the PIR-ALN database. The DOMAINDB database contains the sequence segments represented in all the homology domain alignments in PIR-ALN. This database is searched to screen new sequences for already defined homology domains. Once the sequences to be clustered are identified, the CLUSTALW (5) program is used to generate multiple sequence alignments. Since computer-generated alignments are not always biologically correct, ALNED, an interactive alignment editor has been developed in-house to view, edit and update the alignments.
CURRENT STATUS AND DATABASE ACCESS
The alignments in PIR-ALN contain a selection of sequences both to keep the alignments at a reasonable size and to ensure that there is no bias towards a group that has many sequences. Superfamily alignments are made when the superfamily has at least two members from different families. The PIR-ALN database has alignments of each type of homology domain defined as a feature in the PIR Protein Sequence Database.
For some superfamilies and homology domains with a large number of sequences that are highly divergent, several alignments containing representative sequences have been constructed. Some examples are immunoglobulin homology, SH3 homology and kinase-related transforming protein superfamily. Currently, PIR-ALN has over 3500 alignments with >1000 superfamily and >350 homology domain alignments.
The URLs for current statistics, documentation of different fields, sample entry and database access are included in Table 1. The alignment database has been integrated with the ATLAS multidatabase information retrieval system, developed at PIR, which provides full access to the data.
Table 1.
| Information | URL | |
| Database description | http://www-nbrf.georgetown.edu/pir/alndb.html | |
| Statistics | http://www-nbrf.georgetown.edu/cgi-bin/nbrfbase | |
| PIR search page | http://www-nbrf.georgetown.edu/pir/searchdb.html | |
| Superfamily document | http://www-nbrf.georgetown.edu/pir/doc/sfdef.html | |
| PIR-ALN search page | http://www-nbrf.georgetown.edu/nbrf/getaln.html | |
| ATLAS CD distribution | http://www-nbrf.georgetown.edu/pir/atcd.html | |
| ATLAS manual | http://www-nbrf.georgetown.edu/pir/doc/atlas.html |
The quarterly and weekly updates of the PIR-ALN alignment database can be accessed via the WWW. PIR-ALN is included on the `Atlas of Protein and Genomic Sequences' CD-ROM available from the PIR-International centers in the US, Europe and Japan. It can also be obtained by anonymous FTP from the PIR FTP site at nbrf.georgetown.edu, directory [anonymous.pir.alignment].
The PIR-ALN database can be accessed on the PIR Web site in two ways. From the PIR entry request page, the PIR sequence entry will cross-reference a PIR-ALN entry if the sequence is a member of the retrieved alignment. Alternately, one can access the alignments directly through the PIR-ALN request page. The members and classification fields are hypertext linked to the PIR sequence database, so the user interacts with both the databases.
ACKNOWLEDGEMENTS
This work has been supported by NLM grant LM05798. The authors thank PIR staff for their contributions of alignments to the database and Katie Sidman and Desiree Goins for administrative support and help with Web page design. Work by MIPS was supported by grants from the Bundesministerium f. Bildung, Forschung und Technologie (BMBF, FKZ 0311670, 01KW9703/7) and the European Commission (BIOCT-CT-96110).
REFERENCES
This article has been cited by other articles:
This page is run by Oxford University Press, Great Clarendon Street, Oxford OX2 6DP, as part of the OUP Journals
Comments and feedback: www-admin{at}oup.co.uk
Last modification: 9 Dec 1998
Copyright©Oxford University Press, 1998.
![]()
CiteULike
Connotea
Del.icio.us What's this?
![]()
![]()

![]()
![]()
![]()
Y. Zhang, J. Lv, H. Liu, J. Zhu, J. Su, Q. Wu, Y. Qi, F. Wang, and X. Li
HHMD: the human histone modification database
Nucleic Acids Res.,
November 5, 2009;
(2009)
gkp968v1.
[Abstract]
[Full Text]
[PDF]
![]()
![]()
![]()

![]()
![]()
![]()
K. A. T. Silverstein, E. Shoop, J. E. Johnson, A. Kilian, J. L. Freeman, T. M. Kunau, I. A. Awad, M. Mayer, and E. F. Retzel
The MetaFam Server: a comprehensive protein family resource
Nucleic Acids Res.,
January 1, 2001;
29(1):
49 - 51.
[Abstract]
[Full Text]
[PDF]
![]()
![]()
![]()

![]()
![]()
![]()
C. A. Ball, K. Dolinski, S. S. Dwight, M. A. Harris, L. Issel-Tarver, A. Kasarskis, C. R. Scafe, G. Sherlock, G. Binkley, H. Jin, et al.
Integrating functional genomic information into the Saccharomyces Genome Database
Nucleic Acids Res.,
January 1, 2000;
28(1):
77 - 80.
[Abstract]
[Full Text]
[PDF]
![]()
This Article ![]()
![]()
Abstract
![]()
Print PDF (45K)
![]()
Alert me when this article is cited
![]()
Alert me if a correction is posted
![]()
Services ![]()
![]()
Email this article to a friend
![]()
Similar articles in this journal
![]()
Similar articles in ISI Web of Science
![]()
Similar articles in PubMed
![]()
Alert me to new issues of the journal
![]()
Add to My Personal Archive
![]()
Download to citation manager
![]()
Search for citing articles in:
ISI Web of Science (9)
![]()
Request Permissions ![]()
Commercial Re-use Guidelines
for Open Access NAR Content
![]()
Google Scholar ![]()
![]()
Articles by Srinivasarao, G. Y.
![]()
Articles by Pfeiffer, F.
![]()
Search for Related Content
![]()
PubMed ![]()
![]()
PubMed Citation
![]()
Articles by Srinivasarao, G. Y.
![]()
Articles by Pfeiffer, F.
![]()
Social Bookmarking ![]()
![]()
What's this?