Skip Navigation

This Article
Right arrow Abstract Freely available
Right arrow Print PDF (92K) Freely available
Right arrow Alert me when this article is cited
Right arrow Alert me if a correction is posted
Services
Right arrow Email this article to a friend
Right arrow Similar articles in this journal
Right arrow Similar articles in ISI Web of Science
Right arrow Similar articles in PubMed
Right arrow Alert me to new issues of the journal
Right arrow Add to My Personal Archive
Right arrow Download to citation manager
Right arrow Search for citing articles in:
ISI Web of Science (9)
Right arrow Commercial Re-use Guidelines
for Open Access NAR Content
Google Scholar
Right arrow Articles by Murvai, J.
Right arrow Articles by Pongor, S.
Right arrow Search for Related Content
PubMed
Right arrow PubMed Citation
Right arrow Articles by Murvai, J.
Right arrow Articles by Pongor, S.
Social Bookmarking
 Add to CiteULike   Add to Connotea   Add to Del.icio.us  
What's this?

Nucleic Acids Research Pages 257-259  


The SBASE protein domain library, release 6.0: a collection of annotated protein sequence segments
Introduction
Description Of The Data
   Definition of protein domains
   Source and origin of data
   Cross-references
   Record structure
Distribution And Access
   Distribution
   Access by WWW: record retrieval and BLAST search
   Citation
Acknowledgements
References


The SBASE protein domain library, release 6.0: a collection of annotated protein sequence segments

The SBASE protein domain library, release 6.0: a collection of annotated protein sequence segments

János Murvai1, Kristian Vlahovicek1, Endre Barta2, Csaba Szepesvári3, Cristina Acatrinei1 and Sándor Pongor1,2,*

1International Centre for Genetic Engineering and Biotechnology, Area Science Park, 34012 Trieste, Italy, 2ABC Institute for Biochemistry and Protein Research, 2100 Gödöllö, Hungary and 3Research Group on Artificial Intelligence, József Attila Universisty, 6700 Szeged, Hungary

Received October 2, 1998; Accepted October 7, 1998

ABSTRACT

The sixth release of the SBASE protein domain library sequences contains 130 703 annotated and crossreferenced entries corresponding to structural, functional, ligand-binding and topogenic segments of proteins. The entries were grouped based on standard names (2312 groups) and futher classified on the basis of the BLAST similarity (2463 clusters). Automated searching with BLAST and a new sequence-plot representation of local domain similarities are available at the WWW-server http://www.icgeb.trieste.it/sbase . A mirror site is at http://sbase.abc.hu/sbase . The database is freely available by anonymous `ftp' file transfer from ftp.icgeb.trieste.it

INTRODUCTION

Detection of domains in newly determined sequences is usually based on pattern collections that contain consensus representation domain types deduced from multiple alignments. Consensus descriptions come in different varieties such as regular expressions, sequence profiles, hidden Markov models, etc. Development of such a consesnsus description requires expertise and careful judgement hence pattern collections can hardly keep pace with the flow of new genome data. Another problem is the inevitable statistical bias of the consensus. Namely, atypical domains for which there are too few known examples, may not fit well with a consensus pattern developed with a numerous dataset of similar domains. Finally, there are domain types for which it is not easy to develop consensus representations because of weak similarity.

SBASE is a collection of protein domain sequences designed to facilitate detection domain homologies without the above problems (1,2). Here the method of domain recognition is database search rather than pattern search, so atypical and typical domains are equally well recognized. The underlying database, SBASE is preprocessed by BLAST similarity search (3) and the similarity groups (that can be best pictured as densely connected graphs) form the basis of domain recognition.

The current release 6.0 of SBASE contains over 100 000 annotated protein sequence segments consistently named by structure, function, biased composition, binding-specificity and/or similarity to other proteins.

The main developments with respect to the previous release can be summarized as follows. (i) Release 6.0 contains 130 703 sequence entries, 63% more than release 5.0 (Table 1). (ii) All records are now provided with standard names and an effort was made to use domain names also used by other squence databases and pattern collections like Prosite (4) and PFAM (5). (iii) The entries were grouped based on standard names (2312 groups) and those with at least three entries (1039 groups) were futher classified on the basis of the BLAST similarity. A total of 2463 clusters with at least three members are deposited into a separate database, SBASE-CLUSTERS, which is now available through anonymous ftp as well as through links on the WWW-server (a description of the clustering procedure is given at the web-site). Within each standard name group the clusters are numbered, in such a way that clusters with more inter-member similarity have larger numbers. (iv) A new graphic output facility is added to the server whereby local domain similarity can be plotted along the sequence.


Table 1. Increase of data in SBASE 6.0

DESCRIPTION OF THE DATA

Definition of protein domains

Domains included in SBASE are protein sequence segments with known structure and/or function. The main entry classes are summarized in Table 2. The boundaries of the domains are either as previously defined in the original publications or determined by homology to domains with known boundaries. In this release, the boundaries used by PFAM (5) were adopted for a number of domain types.


Table 2. Examples of domains in SBASE 6.0

Source and origin of data

SBASE data originate from three main sources: (i) from the SWISS-PROT protein sequence databank (6); (ii) from the Protein Sequence Database of the PIR International Protein sequence database (PIR) (7); and (iii) from the literature. From a total of 130 703 records in SBASE 6.0, 96 305 (73%), 27 089 (21%) and 6656 (5%) are of eukaryotic, prokaryotic and viral origin, respectively. Domain sizes vary in length between 5 and 1000 amino acids.

Redundancy of sequences in SBASE 6.0 is kept at a minimal level. In some cases, the domain definitions overlap.

Cross-references

SBASE 6.0 has cross-references to several protein and nucleic acid databanks, as well as to the PROSITE (4), PRINTS (8), PRODOM (9) and BLOCKS (10) databases (Table 3). In each record, the DR-lines contain the cross-reference data.


Table 3. Cross-references to other databases in SBASE

Record structure

The format of SBASE 6.0 (Fig. 1) follows that of the EMBL and SWISS-PROT databases and can be directly formatted under the GCG package The field types used are listed in Table 4. The clusters to which a sequence belongs are determined by (i) the standard name and (ii) the (optional) subclass number included in the CL field, e.g. ANNEXINS/8 (the CE field of previous releases is now abandoned).


Figure 1. A sample entry from the SBASE 6.0 protein domain library. An annexin repeat domain. The underlined items are linked in the SBASE World Wide Web server so that the corresponding records can be viewed on the screen by `clicking' on them.


Table 4. Types of comment lines in SBASE 6.0 records

DISTRIBUTION AND ACCESS

Distribution

SBASE 6.0 (23 October, 1998) is distributed by anonymous `ftp' file transfer from ftp.icgeb.trieste.it. The complete database (including the records and list of clusters), is 75 Mb, its compressed form is 8.3 Mb.

Access by WWW: record retrieval and BLAST search

SBASE 6.0 and SBASE-CLUSTERS can be searched at the WWW-server http://base.icgeb.trieste.it/sbase and at the mirror site http://sbase.abc.hu/sbase . Record retrieval is with the SRS system. At present, cross-references to SBASE-CLUSTERS, EMBL, MEDLINE, MIM, PRINTS, PRODOM, PROSITE and SWISS-PROT can be directly accessed through the WWW-server. Prediction of domain homologies via BLAST searching is possible either by (i) running a search against SBASE, or (ii) running a search against SWISS-PROT and reprocessing the search output (11,12). In the output of the latter, local domain similarities are also graphically represented as a sequence-plot (Fig. 2).


Figure 2. (A) Graphic output of the domain similarity server (www.icgeb.trieste.it/sbase ) in response to the query sequence C1S_HUMAN from SWISS-PROT. The known domain structure of this query is CUB-EGF-CUB-SUSHI-SUSHI-SPR (where S = signal, P = propeptide, SPR = serine protease). The ouput shows the plot of the BLAST similarities along with the SBASE standard names. Arrows have been added to help identification in black and white (original is in color). (B) Output of the domain homology WWW server (www.icgeb.trieste.it/sbase ) in response to the annexin sequence shown in Figure 1 (detail). NSD: number of significant similarities found in the BLAST output; GN.: number of the given domain occuring in the database; Sum.Score: cumulative sum of BLAST scores belonging to a domain-name in the output; Overlap Max: maximum similarity score found (11). The server output contains alignments provided with annotations and a detailed explanation about evaluation (not shown).

Citation

Users of SBASE and of the WWW/Email servers are asked to cite this article in their publications.

ACKNOWLEDGEMENTS

SBASE was established in 1990 and is maintained collaboratively by the International Center for Genetic Engineering and Biotechnology, Trieste, Italy and the ABC Institute for Biochemistry and Protein Research, Gödöllö, Hungary. The authors wish to thank the support of EMBnet, the European Molecular Biology Network. The Protein Structure and Function Group is supported by EMBnet in the framework of EU grant ERBBIO4-CT96-0030. Work at ABC was supported by ICGEB collaborative research grant no CRP/HUN9603.

REFERENCES

1. Pongor,S., Skerl,V., Cserzo,M., Hatsagi,Z., Simon,G. and Bevilacqua,V. (1993) Protein Engng., 6, 391-395.

2. Fabian,P., Murvai,J., Hatsagi,Z., Vlahovicek,K., Hegyi,H. and Pongor,S. (1997) Nucleic Acids Res., 25, 240-243. MEDLINE Abstract

3. Altschul,S.F., Madden,T.L., Schaffer,A.A., Zhang,J., Zhang,Z., Miller,W. and Lipman,D.J. (1997) Nucleic Acids Res., 25, 3389-3402. MEDLINE Abstract

4. Bairoch,A., Bucher,P. and Hofmann,K. (1996) Nucleic Acids Res., 24, 189-196. MEDLINE Abstract

5. Sonnhammer,E.L., Eddy,S.R., Birney,E., Bateman,A. and Durbin,R. (1998) Nucleic Acids Res., 26, 320-322. MEDLINE Abstract

6. Bairoch,A. and Apweiler,R. (1998) Nucleic Acids Res., 26, 38-42. MEDLINE Abstract

7. Barker,W.C., Garavelli,J.S., Haft,D.H., Hunt,L.T., Marzec,C.R., Orcutt,B.C., Srinivasarao,G.Y., Yeh,L.S.L., Ledley,R.S., Mewes,H.W., Pfeiffer,F. and Tsugita,A. (1998) Nucleic Acids Res., 26, 27-32. MEDLINE Abstract

8. Attwood,T.K., Beck,M.E., Flower,D.R., Scordis,P. and Selley,J.N. (1998) Nucleic Acids Res., 26, 304-308. MEDLINE Abstract

9. Corpet,F., Gouzy,J. and Kahn,D. (1998) Nucleic Acids Res., 26, 323-326. MEDLINE Abstract

10. Henikoff,S., Pietrokovski,S. and Henikoff,J.G. (1998) Nucleic Acids Res., 26, 309-312. MEDLINE Abstract

11. Murvai,J., Vlahovicek,K., Barta,E., PFeiffer,F., Hegyi,H. and Pongor,S. (1998) Bioinformatics, in press.

12. Hegyi,H. and Pongor,S. (1993) Comput. Applic. Biosci., 9, 371-372.

13. Stoesser,G., Moseley,M.A., Sleep,J., McGowran,M., Garcia-Pastor,M. and Sterk,P. (1998) Nucleic Acids Res., 26, 8-15. MEDLINE Abstract

14. Bernstein,F.C., Koetzle,T.F., Williams,G.J., Meyer,E.E.,Jr, Brice,M.D., Rodgers,J.R., Kennard,O., Shimanouchi,T. and Tasumi,M. (1977) J. Mol. Biol., 112, 535-542.

15. Pearson,P., Francomano,C., Foster,P., Bocchini,C., Li,P. and McKusick,V. (1994) Nucleic Acids Res., 22, 3470-3473. MEDLINE Abstract

16. Flybase Consortium (1998) Nucleic Acids Res., 26, 85-88.

17. Rudd,K.E., Bouffard,G. and Miller,G. (1992) In Davies,K.E. and Tilghman,S.M. (eds), Genome Analysis. Cold Spring Harbor Laboratory Press, New York, pp. 1-38.

18. Myers,F. (1990) Human Retrovirus and Aids Database. Los Alamos National Laboratory, Los Alamos, NM, USA.

19. Roberts,R.J. and Macelis,D. (1998) Nucleic Acids Res., 26, 338-350. MEDLINE Abstract


*To whom correspondence should be addressed at: ICGEB, Area Science Park, 34012 Trieste, Italy. Tel: +39 040 375 7300; Fax: +39 040 226 555; Email: pongor@icgeb.trieste.it


This page is run by Oxford University Press, Great Clarendon Street, Oxford OX2 6DP, as part of the OUP Journals
Comments and feedback: www-admin{at}oup.co.uk
Last modification: 9 Dec 1998
Copyright©Oxford University Press, 1998.

Add to CiteULike CiteULike   Add to Connotea Connotea   Add to Del.icio.us Del.icio.us    What's this?


This article has been cited by other articles:


Home page
Nucleic Acids ResHome page
M. Novatchkova, G. Schneider, R. Fritz, F. Eisenhaber, and A. Schleiffer
DOUTfinder--identification of distant domain outliers using subsignificant sequence similarity.
Nucleic Acids Res., July 1, 2006; 34(Web Server issue): W214 - W218.
[Abstract] [Full Text] [PDF]


Home page
Protein Sci.Home page
N. Antcheva, A. Pintar, A. Patthy, A. Simoncsits, E. Barta, B. Tchorbanov, and S. Pongor
Proteins of circularly permuted sequence present within the same organism: The major serine proteinase inhibitor from Capsicum annuum seeds
Protein Sci., November 1, 2001; 10(11): 2280 - 2290.
[Abstract] [Full Text] [PDF]


Home page
Nucleic Acids ResHome page
J. Murvai, K. Vlahovicek, E. Barta, B. Cataletto, and S. Pongor
The SBASE protein domain library, release 7.0: a collection of annotated protein sequence segments
Nucleic Acids Res., January 1, 2000; 28(1): 260 - 262.
[Abstract] [Full Text] [PDF]


This Article
Right arrow Abstract Freely available
Right arrow Print PDF (92K) Freely available
Right arrow Alert me when this article is cited
Right arrow Alert me if a correction is posted
Services
Right arrow Email this article to a friend
Right arrow Similar articles in this journal
Right arrow Similar articles in ISI Web of Science
Right arrow Similar articles in PubMed
Right arrow Alert me to new issues of the journal
Right arrow Add to My Personal Archive
Right arrow Download to citation manager
Right arrow Search for citing articles in:
ISI Web of Science (9)
Right arrow Commercial Re-use Guidelines
for Open Access NAR Content
Google Scholar
Right arrow Articles by Murvai, J.
Right arrow Articles by Pongor, S.
Right arrow Search for Related Content
PubMed
Right arrow PubMed Citation
Right arrow Articles by Murvai, J.
Right arrow Articles by Pongor, S.
Social Bookmarking
 Add to CiteULike   Add to Connotea   Add to Del.icio.us  
What's this?