| Nucleic Acids Research | Pages |
The SBASE protein domain library, release 6.0: a collection of annotated protein sequence segments
Introduction
Description Of The Data
Definition of protein domains
Source and origin of data
Cross-references
Record structure
Distribution And Access
Distribution
Access by WWW: record retrieval and BLAST search
Citation
Acknowledgements
References
The SBASE protein domain library, release 6.0: a collection of annotated protein sequence segments
ABSTRACT
INTRODUCTION
Detection of domains in newly determined sequences is usually based on pattern collections that contain consensus representation domain types deduced from multiple alignments. Consensus descriptions come in different varieties such as regular expressions, sequence profiles, hidden Markov models, etc. Development of such a consesnsus description requires expertise and careful judgement hence pattern collections can hardly keep pace with the flow of new genome data. Another problem is the inevitable statistical bias of the consensus. Namely, atypical domains for which there are too few known examples, may not fit well with a consensus pattern developed with a numerous dataset of similar domains. Finally, there are domain types for which it is not easy to develop consensus representations because of weak similarity.
SBASE is a collection of protein domain sequences designed to facilitate detection domain homologies without the above problems (1,2). Here the method of domain recognition is database search rather than pattern search, so atypical and typical domains are equally well recognized. The underlying database, SBASE is preprocessed by BLAST similarity search (3) and the similarity groups (that can be best pictured as densely connected graphs) form the basis of domain recognition.
The current release 6.0 of SBASE contains over 100 000 annotated protein sequence segments consistently named by structure, function, biased composition, binding-specificity and/or similarity to other proteins.
The main developments with respect to the previous release can be summarized as follows. (i) Release 6.0 contains 130 703 sequence entries, 63% more than release 5.0 (Table 1). (ii) All records are now provided with standard names and an effort was made to use domain names also used by other squence databases and pattern collections like Prosite (4) and PFAM (5). (iii) The entries were grouped based on standard names (2312 groups) and those with at least three entries (1039 groups) were futher classified on the basis of the BLAST similarity. A total of 2463 clusters with at least three members are deposited into a separate database, SBASE-CLUSTERS, which is now available through anonymous ftp as well as through links on the WWW-server (a description of the clustering procedure is given at the web-site). Within each standard name group the clusters are numbered, in such a way that clusters with more inter-member similarity have larger numbers. (iv) A new graphic output facility is added to the server whereby local domain similarity can be plotted along the sequence.
DESCRIPTION OF THE DATA
Definition of protein domains
Domains included in SBASE are protein sequence segments with known structure and/or function. The main entry classes are summarized in Table 2. The boundaries of the domains are either as previously defined in the original publications or determined by homology to domains with known boundaries. In this release, the boundaries used by PFAM (5) were adopted for a number of domain types.
Source and origin of data
SBASE data originate from three main sources: (i) from the SWISS-PROT protein sequence databank (6); (ii) from the Protein Sequence Database of the PIR International Protein sequence database (PIR) (7); and (iii) from the literature. From a total of 130 703 records in SBASE 6.0, 96 305 (73%), 27 089 (21%) and 6656 (5%) are of eukaryotic, prokaryotic and viral origin, respectively. Domain sizes vary in length between 5 and 1000 amino acids.
Redundancy of sequences in SBASE 6.0 is kept at a minimal level. In some cases, the domain definitions overlap.
Cross-references
SBASE 6.0 has cross-references to several protein and nucleic acid databanks, as well as to the PROSITE (4), PRINTS (8), PRODOM (9) and BLOCKS (10) databases (Table 3). In each record, the DR-lines contain the cross-reference data.
Record structure
The format of SBASE 6.0 (Fig.
DISTRIBUTION AND ACCESS
Distribution
SBASE 6.0 (23 October, 1998) is distributed by anonymous `ftp' file transfer from ftp.icgeb.trieste.it. The complete database (including the records and list of clusters), is 75 Mb, its compressed form is 8.3 Mb.
Access by WWW: record retrieval and BLAST search
SBASE 6.0 and SBASE-CLUSTERS can be searched at the WWW-server http://base.icgeb.trieste.it/sbase and at the mirror site http://sbase.abc.hu/sbase . Record retrieval is with the SRS system. At present, cross-references to SBASE-CLUSTERS, EMBL, MEDLINE, MIM, PRINTS, PRODOM, PROSITE and SWISS-PROT can be directly accessed through the WWW-server. Prediction of domain homologies via BLAST searching is possible either by (i) running a search against SBASE, or (ii) running a search against SWISS-PROT and reprocessing the search output (11,12). In the output of the latter, local domain similarities are also graphically represented as a sequence-plot (Fig.
Figure 2. (A) Graphic output of the domain similarity server (www.icgeb.trieste.it/sbase ) in response to the query sequence C1S_HUMAN from SWISS-PROT. The known domain structure of this query is CUB-EGF-CUB-SUSHI-SUSHI-SPR (where S = signal, P = propeptide, SPR = serine protease). The ouput shows the plot of the BLAST similarities along with the SBASE standard names. Arrows have been added to help identification in black and white (original is in color). (B) Output of the domain homology WWW server (www.icgeb.trieste.it/sbase ) in response to the annexin sequence shown in Figure 1 (detail). NSD: number of significant similarities found in the BLAST output; GN.: number of the given domain occuring in the database; Sum.Score: cumulative sum of BLAST scores belonging to a domain-name in the output; Overlap Max: maximum similarity score found (11). The server output contains alignments provided with annotations and a detailed explanation about evaluation (not shown).
Citation
Users of SBASE and of the WWW/Email servers are asked to cite this article in their publications.
ACKNOWLEDGEMENTS
SBASE was established in 1990 and is maintained collaboratively by the International Center for Genetic Engineering and Biotechnology, Trieste, Italy and the ABC Institute for Biochemistry and Protein Research, Gödöllö, Hungary. The authors wish to thank the support of EMBnet, the European Molecular Biology Network. The Protein Structure and Function Group is supported by EMBnet in the framework of EU grant ERBBIO4-CT96-0030. Work at ABC was supported by ICGEB collaborative research grant no CRP/HUN9603.
REFERENCES
This article has been cited by other articles:
This page is run by Oxford University Press, Great Clarendon Street, Oxford OX2 6DP, as part of the OUP Journals
Comments and feedback: www-admin{at}oup.co.uk
Last modification: 9 Dec 1998
Copyright©Oxford University Press, 1998.
![]()
CiteULike
Connotea
Del.icio.us What's this?
![]()
![]()

![]()
![]()
![]()
M. Novatchkova, G. Schneider, R. Fritz, F. Eisenhaber, and A. Schleiffer
DOUTfinder--identification of distant domain outliers using subsignificant sequence similarity.
Nucleic Acids Res.,
July 1, 2006;
34(Web Server issue):
W214 - W218.
[Abstract]
[Full Text]
[PDF]
![]()
![]()
![]()

![]()
![]()
![]()
N. Antcheva, A. Pintar, A. Patthy, A. Simoncsits, E. Barta, B. Tchorbanov, and S. Pongor
Proteins of circularly permuted sequence present within the same organism: The major serine proteinase inhibitor from Capsicum annuum seeds
Protein Sci.,
November 1, 2001;
10(11):
2280 - 2290.
[Abstract]
[Full Text]
[PDF]
![]()
![]()
![]()

![]()
![]()
![]()
J. Murvai, K. Vlahovicek, E. Barta, B. Cataletto, and S. Pongor
The SBASE protein domain library, release 7.0: a collection of annotated protein sequence segments
Nucleic Acids Res.,
January 1, 2000;
28(1):
260 - 262.
[Abstract]
[Full Text]
[PDF]
![]()
This Article ![]()
![]()
Abstract
![]()
Print PDF (92K)
![]()
Alert me when this article is cited
![]()
Alert me if a correction is posted
![]()
Services ![]()
![]()
Email this article to a friend
![]()
Similar articles in this journal
![]()
Similar articles in ISI Web of Science
![]()
Similar articles in PubMed
![]()
Alert me to new issues of the journal
![]()
Add to My Personal Archive
![]()
Download to citation manager
![]()
Search for citing articles in:
ISI Web of Science (9)
![]()
Commercial Re-use Guidelines
for Open Access NAR Content
![]()
Google Scholar ![]()
![]()
Articles by Murvai, J.
![]()
Articles by Pongor, S.
![]()
Search for Related Content
![]()
PubMed ![]()
![]()
PubMed Citation
![]()
Articles by Murvai, J.
![]()
Articles by Pongor, S.
![]()
Social Bookmarking ![]()
![]()
What's this?




