Skip Navigation

This Article
Right arrow Abstract Freely available
Right arrow Print PDF (137K) Freely available
Right arrow Alert me when this article is cited
Right arrow Alert me if a correction is posted
Services
Right arrow Email this article to a friend
Right arrow Similar articles in this journal
Right arrow Similar articles in ISI Web of Science
Right arrow Similar articles in PubMed
Right arrow Alert me to new issues of the journal
Right arrow Add to My Personal Archive
Right arrow Download to citation manager
Right arrow Search for citing articles in:
ISI Web of Science (12)
Right arrow Commercial Re-use Guidelines
for Open Access NAR Content
Google Scholar
Right arrow Articles by Murvai, J.
Right arrow Articles by Pongor, S.
Right arrow Search for Related Content
PubMed
Right arrow PubMed Citation
Right arrow Articles by Murvai, J.
Right arrow Articles by Pongor, S.
Social Bookmarking
 Add to CiteULike   Add to Connotea   Add to Del.icio.us  
What's this?

Nucleic Acids Research, 2000, Vol. 28, No. 1 260-262
© 2000 Oxford University Press

The SBASE protein domain library, release 7.0: a collection of annotated protein sequence segments

János Murvai1, Kristian Vlahovicek1, Endre Barta2, Bruno Cataletto1 and Sándor Pongor1,2,*

1International Centre for Genetic Engineering and Biotechnology, Area Science Park, 34012 Trieste, Italy and 2ABC Institute for Biochemistry and Protein Research, 2100 Gödöllö, Hungary

Received September 29, 1999; Revised and Accepted October 15, 1999.


    ABSTRACT
 TOP
 ABSTRACT
 INTRODUCTION
 DESCRIPTION OF THE DATA
 DISTRIBUTION AND ACCESS
 REFERENCES
 
SBASE 7.0 is the seventh release of the SBASE protein domain library sequences that contains 237 937 annotated structural, functional, ligand-binding and topogenic segments of proteins, cross-referenced to all major sequence databases and sequence pattern collections. The entries are clustered into over 1811 groups and are provided with two WWW-based search facilities for on-line use. SBASE 7.0 is freely available by anonymous ‘ftp’ file transfer from ftp.icgeb.trieste.it . Automated searching of SBASE with BLAST can be carried out with the WWW servers http://www.icgeb. trieste.it/sbase/ and http://sbase.abc.hu/sbase/


    INTRODUCTION
 TOP
 ABSTRACT
 INTRODUCTION
 DESCRIPTION OF THE DATA
 DISTRIBUTION AND ACCESS
 REFERENCES
 
Prediction of domains is usually based on pattern collections that contain consensus representations of domain types deduced from multiple alignments. Consensus representation of sequences (such as consensus sequences, regular expressions, sequence profiles, hidden Markov models, etc.) requires human expertise and careful judgement hence pattern collections can hardly keep pace with the flow of new genome data. Another problem is the inevitable statistical bias of consensus representations. Namely, reliable multiple alignments require a good number of domain examples, and, as a consequence, atypical domains for which there are too few known examples, may be difficult to recognize. Finally, there are domain types for which it is not easy to develop consensus representations because of the weak similarity.

SBASE is a collection of protein domain sequences designed to facilitate detection of domain homologies without the above problems (1,2). The method of domain recognition is database search rather than pattern search, so atypical and typical domains are equally well recognized (3). The central concept is the ‘similarity group’, i.e. a group of domain sequences that have BLAST similarity to each other. One can distinguish tight and loose groups depending on how many significant similarity connections exist, on average, between the members. Briefly, a new sequence is considered member of a given group if its similarity parameters (3) are above the threshold levels automatically established for that group, and if it has no sequential overlap with any other domain group. Validated domain groups i.e. the 1550 groups that satisfy these criteria are deposited in SBASE-A; these are the well-known structural and functional domain types. SBASE-B contains a 261 groups that are either (i) less well characterized than the groups of SBASE-A, or (ii) are defined by composition (e.g. glycine-rich), cellular location (e.g. transmembrane, etc.). These groups are sometimes defined in an overlapping manner, e.g. an extracellular domain (SBASE-B) may contain an EGF-module (SBASE-A).

The current release 7.0 of SBASE contains over 230 000 annotated protein sequence segments consistently named by structure, function, biased composition, binding-specificity and/or similarity to other proteins.

The main developments with respect to the previous release [release 6] can be summarized as follows:

(i) Release 7.0 contains 237 937 sequence entries, 82% more than release 6.0 (Table 1).


View this table:
[in this window]
[in a new window]
 
Table 1. Increase of data in SBASE release 7.0
 
(ii) The entries were grouped by standard names and further classified on the basis of the BLAST similarity scores. The list of all clusters having at least two members is deposited into a separate database, SBASE-CLUSTERS, which is now available through anonymous ftp as well as through links on the WWW-server. A total of 1811 domain groups were found, of which 1550 validated groups (1936 clusters) are in SBASE-A and 261 groups (382 clusters) are in SBASE-B. The clusters are identified by the standard name and by the (optional) subclass number included in the SC field. The CL and CE fields of previous releases are now abandoned.


    DESCRIPTION OF THE DATA
 TOP
 ABSTRACT
 INTRODUCTION
 DESCRIPTION OF THE DATA
 DISTRIBUTION AND ACCESS
 REFERENCES
 
Definition of protein domains
Domains included in SBASE are protein sequence segments with known structure and/or function. The main entry classes are summarized in Table 2. The boundaries of the domains are either as previously defined in the original publications or determined by homology to domains with known boundaries such as given in the PROT-FAM (4) and in the PFAM databases (5).


View this table:
[in this window]
[in a new window]
 
Table 2. Examples of domains in SBASE 7.0
 
Source and origin of data
SBASE data originate from three main sources: (i) from the SWISS-PROT protein sequence databank (6); (ii) from the Protein Sequence Database of the Protein Identification Resource (PIR International) (7); and (iii) from the literature. The sequences are either translated from nucleotide sequence databases (8,9) or directly keyed in at the protein level. From a total of 237 937 records in SBASE 7.0, 136 367 (57%), 53 307 (22.4%) and 38 083 (20.6%) are of eukaryotic, prokaryotic and viral origin, respectively. Domain sizes vary in length between 5 and 1000 amino acids.

Cross-references
SBASE 7.0 has cross-references to several protein and nucleic acid databanks, as well as to the PROSITE (10) PRINTS (11), ProDom (12), BLOCKS (13) and PFAM (5) domain databases, the Protein Structure Data Bank (14) and the database of human Mendelian inheritance (15) (Table 3). In each record, the DR-lines contain the cross-reference data.


View this table:
[in this window]
[in a new window]
 
Table 3. Cross-references to other databases in SBASE
 
Record structure
The format of SBASE 7.0 follows that of the EMBL and SWISS-PROT databases and can be directly formatted under the GCG program package using (16).


    DISTRIBUTION AND ACCESS
 TOP
 ABSTRACT
 INTRODUCTION
 DESCRIPTION OF THE DATA
 DISTRIBUTION AND ACCESS
 REFERENCES
 
Distribution
SBASE 7.0 (6 October, 1999) is distributed by anonymous ‘ftp’ file transfer from ftp.icgeb.trieste.it . The complete database (including the records and list of clusters), is 221 Mb, its compressed form is 16.3 Mb.

BLAST search by WWW-server
SBASE 7.0 can be searched by the BLAST program using the WWW-server http://www.icgeb.trieste.it/sbase . A related server was created in order to assign SBASE domain homologies on the basis of BLAST searches performed on the SWISS-PROT database and on the PIR International databases (7). This service (available at http://www.abc.hu/blast.html and at domain@abc.hu ) returns the best potential domain homo­logies ranked according to BLAST score.

Access by WWW-server
Record retrieval and the above services can be accessed also using the WWW-server at http://www.icgeb.trieste.it/sbase . At present, cross-references to SBASE-CLUSTERS, EMBL, MEDLINE, MIM, PRINTS, ProDom, PROSITE and SWISS-PROT can be directly accessed through the WWW-server.

Citation
Users of SBASE and of the WWW servers are asked to cite this article in their publications, e.g. in the following form: ‘The sequence homologies were analyzed searching the SBASE protein domain sequence library release 7.0’ via automated electronic mail (WWW) server’.


    ACKNOWLEDGEMENTS
 
This work was supported in part by EMBnet, the European Molecular Biology Network in the framework of EU grant ERBBIO4-CT96-0030. SBASE was established in 1990 and is maintained collaboratively by the International Center for Genetic Engineering and Biotechnology, Trieste, Italy and the Agricultural Biotechnology Center, Gödöllö, Hungary. The help of Suzanne Kerbavcic with the manuscript is gratefully acknowledged.


    FOOTNOTES
 
* To whom correspondence should be addressed at: International Centre for Genetic Engineering and Biotechnology, Area Science Park, 34012 Trieste, Italy. Tel: +39 040 3757 300; Fax: +39 040 226 555; Email: pongor@icgeb.trieste.it Back


    REFERENCES
 TOP
 ABSTRACT
 INTRODUCTION
 DESCRIPTION OF THE DATA
 DISTRIBUTION AND ACCESS
 REFERENCES
 

    1 Pongor,S., Skerl,V., Cserzo,M., Hatsagi,Z., Simon,G. and Bevilacqua,V. (1993) Protein Eng., 6, 391–395.[Abstract/Free Full Text]

    2 Murvai,J., Vlahovicek,K., Barta,E., Szepesvári,C., Acatrinei,C. and Pongor,S. (1999) Nucleic Acids Res., 27, 257–259. [Abstract/Free Full Text]

    3 Murvai,J., Vlahovicek,K., Barta,E., Parthasarathy,S., Hegyi,H., Pfeiffer,F. and Pongor,S. (1999) Bioinformatics, 15, 343–344.[Abstract/Free Full Text]

    4 Mewes,H.W., Heumann,K., Kaps,A., Mayer,K., Pfeiffer,F., Stocker,S. and Frishman,D. (1999) Nucleic Acids Res., 27, 44–48. Updated article in this issue: Nucleic Acids Res. (2000), 28, 37–40.[Abstract/Free Full Text]

    5 Bateman,A., Birney,E., Durbin,R., Eddy,S.R., Finn,R.D. and Sonnhammer,E.L. (1999) Nucleic Acids Res., 27, 260–262. Updated article in this issue: Nucleic Acids Res. (2000), 28, 263–266.[Abstract/Free Full Text]

    6 Bairoch,A. and Apweiler,R. (1999) Nucleic Acids Res., 27, 49–54. Updated article in this issue: Nucleic Acids Res. (2000), 28, 45–48.[Abstract/Free Full Text]

    7 Barker,W.C., Garavelli,J.S., McGarvey,P.B., Marzec,C.R., Orcutt,B.C., Srinivasarao,G.Y., Yeh,L.S., Ledley,R.S., Mewes,H.W., Pfeiffer,F., Tsugita,A. and Wu,C. (1999) Nucleic Acids Res., 27, 39–43.[Abstract/Free Full Text]

    8 Benson,D.A., Boguski,M.S., Lipman,D.J., Ostell,J., Ouellette,B.F., Rapp,B.A. and Wheeler,D.L. (1999) Nucleic Acids Res., 27, 12–17. Updated article in this issue: Nucleic Acids Res. (2000), 28, 15–18.[Abstract/Free Full Text]

    9 Stoesser,G., Tuli,M.A., Lopez,R. and Sterk,P. (1999) Nucleic Acids Res., 27, 18–24. Updated article in this issue: Nucleic Acids Res. (2000), 28, 19–23.[Abstract/Free Full Text]

    10 Hofmann,K., Bucher,P., Falquet,L. and Bairoch,A. (1999) Nucleic Acids Res., 27, 215–219.[Abstract/Free Full Text]

    11 Attwood,T.K., Flower,D.R., Lewis,A.P., Mabey,J.E., Morgan,S.R., Scordis,P., Selley,J.N. and Wright,W. (1999) Nucleic Acids Res., 27, 220–225. Updated article in this issue: Nucleic Acids Res. (2000), 28, 225–227.[Abstract/Free Full Text]

    12 Corpet,F., Gouzy,J. and Kahn,D. (1999) Nucleic Acids Res., 27, 263–267. Updated article in this issue: Nucleic Acids Res. (2000), 28, 267–269.[Abstract/Free Full Text]

    13 Henikoff,J.G., Henikoff,S. and Pietrokovski,S. (1999) Nucleic Acids Res., 27, 226–228. Updated article in this issue: Nucleic Acids Res. (2000), 28, 228–230.[Abstract/Free Full Text]

    14 Bernstein,F.C., Koetzle,T.F., Williams,G.J., Meyer,E.E.,Jr, Brice,M.D., Rodgers,J.R., Kennard,O., Shimanouchi,T. and Tasumi,M. (1977) J. Mol. Biol., 112, 535–542.[ISI][Medline]

    15 Pearson,P., Francomano,C., Foster,P., Bocchini,C., Li,P. and McKusick,V. (1994) Nucleic Acids Res., 22, 3470–3473.[Abstract/Free Full Text]

    16 Flybase-Consortium (1999) Nucleic Acids Res., 27, 85–88.[Abstract/Free Full Text]

    17 Rudd,K.E., Bouffard,G. and Miller,G. (1992) In Davies,K.E. and Tilghman,S.M. (eds), Genome Analysis. Cold Spring Harbor Laboratory Press, Cold Spring Harbor, NY, pp. 1–38.

    18 Roberts,R.J. and Macelis,D. (1999) Nucleic Acids Res., 27, 312–313. Updated article in this issue: Nucleic Acids Res. (2000), 28, 306–307.[Abstract/Free Full Text]


Add to CiteULike CiteULike   Add to Connotea Connotea   Add to Del.icio.us Del.icio.us    What's this?


This article has been cited by other articles:


Home page
Protein Sci.Home page
N. Antcheva, A. Pintar, A. Patthy, A. Simoncsits, E. Barta, B. Tchorbanov, and S. Pongor
Proteins of circularly permuted sequence present within the same organism: The major serine proteinase inhibitor from Capsicum annuum seeds
Protein Sci., November 1, 2001; 10(11): 2280 - 2290.
[Abstract] [Full Text] [PDF]


Home page
Genome Res.Home page
J. Murvai, K. Vlahovicek, C. Szepesvari, and S. Pongor
Prediction of Protein Functional Domains from Sequences Using Artificial Neural Networks
Genome Res., August 1, 2001; 11(8): 1410 - 1417.
[Abstract] [Full Text] [PDF]


Home page
Nucleic Acids ResHome page
K. A. T. Silverstein, E. Shoop, J. E. Johnson, A. Kilian, J. L. Freeman, T. M. Kunau, I. A. Awad, M. Mayer, and E. F. Retzel
The MetaFam Server: a comprehensive protein family resource
Nucleic Acids Res., January 1, 2001; 29(1): 49 - 51.
[Abstract] [Full Text] [PDF]


Home page
Nucleic Acids ResHome page
J. Murvai, K. Vlahovicek, E. Barta, and S. Pongor
The SBASE protein domain library, release 8.0: a collection of annotated protein sequence segments
Nucleic Acids Res., January 1, 2001; 29(1): 58 - 60.
[Abstract] [Full Text] [PDF]


Home page
Hum Mol GenetHome page
J. P. Boylan and A. F. Wright
Identification of a novel protein interacting with RPGR
Hum. Mol. Genet., September 1, 2000; 9(14): 2085 - 2093.
[Abstract] [Full Text] [PDF]


This Article
Right arrow Abstract Freely available
Right arrow Print PDF (137K) Freely available
Right arrow Alert me when this article is cited
Right arrow Alert me if a correction is posted
Services
Right arrow Email this article to a friend
Right arrow Similar articles in this journal
Right arrow Similar articles in ISI Web of Science
Right arrow Similar articles in PubMed
Right arrow Alert me to new issues of the journal
Right arrow Add to My Personal Archive
Right arrow Download to citation manager
Right arrow Search for citing articles in:
ISI Web of Science (12)
Right arrow Commercial Re-use Guidelines
for Open Access NAR Content
Google Scholar
Right arrow Articles by Murvai, J.
Right arrow Articles by Pongor, S.
Right arrow Search for Related Content
PubMed
Right arrow PubMed Citation
Right arrow Articles by Murvai, J.
Right arrow Articles by Pongor, S.
Social Bookmarking
 Add to CiteULike   Add to Connotea   Add to Del.icio.us  
What's this?