Nucleic Acids Research, 2002, Vol. 30, No. 1 405-408
© 2002 Oxford University Press
SWEET-DB: an attempt to create annotated data collections for carbohydrates
University Hildesheim, Institute of Physics and Technical Informatics, Marienburger Platz 22, 31141 Hildesheim, Germany, 1German Cancer Research Center, Spectroscopic Department, Im Neuenheimer Feld 240, 69120 Heidelberg, Germany and 2University of Applied Sciences Darmstadt, Department for Information and Knowledge Management, Schöfferstraße 1-3, D-64295 Darmstadt, Germany
Received August 20, 2001; Revised and Accepted October 10, 2001.
| ABSTRACT |
|---|
|
|
|---|
Complex carbohydrates are known as mediators of complex cellular events. Concerning their structural diversity, their potential of information content is several orders of magnitude higher in a short sequence than any other biological macromolecule. SWEET-DB (http://www.dkfz.de/spec2/sweetdb/) is an attempt to use modern web techniques to annotate and/or cross-reference carbohydrate-related data collections which allow glycoscientists to find important data for compounds of interest in a compact and well-structured representation. Currently, reference data taken from three data sources can be retrieved for a given carbohydrate (sub)structure. The sources are CarbBank structures and literature references (linked to NCBI PubMed service), NMR data taken from SugaBase and 3D co-ordinates generated with SWEET-II. The main purpose of SWEET-DB is to enable an easy access to all data stored for one carbohydrate structure entering a complete sequence or parts thereof. Access to SWEET-DB contents is provided with the help of separate input spreadsheets for (sub)structures, bibliographic data, general structural data like molecular weight, NMR spectra and biological data. A detailed online tutorial is available at http://www.dkfz.de/spec2/sweetdb/nar/.
| INTRODUCTION |
|---|
|
|
|---|
The human genome seems to encode not >30 00040 000 proteins. This relatively small number of human genes compared to the genome of other species has been one of the big surprises coming out of the analysis of the human genome project (1). A major challenge is to understand how post-translational events, such as glycosylation, affect the activities and functions of these proteins in health and disease. Glycosylated proteins are ubiquitous components of extracellular matrices and cellular surfaces, where their oligosaccharide moieties are implicated in a wide range of cellcell and cellmatrix recognition events (24). Carbohydrate modifications of proteins and lipids are key factors in modulating their structure and function within cells. In the extracellular milieu, they exert effects on cellular recognition in infection, cancer and immune response, but details of the specific mechanisms are most often still rather rudimentary.
The use of proteomics databases has become indispensable for daily work of the molecular biologist, but this situation has not yet been achieved for carbohydrate applications. Even if one takes into account that the number of scientists working on various topics in the glycosciences is considerably smaller than the number of molecular biologists working with proteins and nucleic acids (5), it is obvious that the acceptance of carbohydrate-related data collections is considerably lower in the community of glycoscientists than the acceptance that various proteomics data collections and tools receive by molecular biologists. Moreover, the opposite seemed to become true. The CarbBank project (68), the largest collection of carbohydrate-related references that had been built up during the 1980s and 1990s, entered shutdown mode in 1999 due to lack of funding.
Recently, new attempts have been described aiming to create tools which link available information on glycans from various sources. GlycoSuite and BOLD (9,10) are databases which are currently cross-linked with MEDLINE and SWISS-PROT/TrEMBL and contain annotated information extracted from scientific literature on glycoprotein-derived glycan structures. The company GlycoMind (http://www.glycominds.com) currently builds up a database that compiles information about glyco-conjugated molecules, including their structures, functions and interactions with other molecules.
Cross-linking for proteomics tools is mainly achieved on the basis of identity or similarity of gene or protein sequences. Sequences for complex carbohydrates differ significantly from the simple linear form which describes genes and proteins: the number of naturally occurring residues is much larger for carbohydrates, each pair of monosaccharide residues can be linked in several ways, and one residue can be connected to three or four others (branching). Thus, a carbohydrate structure database must use more elaborate encoding schemes to be able to describe identity of such structures as well as similarities. Carbohydrates potentially contain information content that is several orders of magnitude higher in a short sequence than any other biological macromolecule (11). Typical N-glycan structures exhibit 9 to
20 residues. The average sequence length in CarbBank is 6 residues.
| SWEET-DB CONTENT |
|---|
|
|
|---|
SWEET-DB is an attempt to use modern web techniques to annotate and/or cross-reference carbohydrate-related data collections which allow glycoscientists to find important data for compounds of interest in a compact and well-structured representation. Currently, reference data taken from CarbBank (linked to NCBI PubMed service), NMR data taken from SugaBase (12,13) and 3D co-ordinates generated with SWEET-II (14) can be retrieved for a given carbohydrate (sub)structure.
About 50 000 CarbBank (68), entries and 1600 1H and 13C-NMR spectra taken from SugaBase (12,13) constitute the database for our SWEET-DB implementation. Both collections can be linked using the Linear Notation for Unique Description of Carbohydrate Sequences (LINUCS) (15), a description of the carbohydrate structure which is close to IUPAC-IUBMB nomenclature recommendations (http://www.chem.qmw.ac.uk/iupac/2carb/) (16). Spatial representations were generated with SWEET-II (14) and subsequently optimised using the MM3 force field as implemented in the TINKER package (17). An automatic link to NCBI PubMed (18) service was established based on the reference information provided by the original CarbBank fields.
| SWEET-DB ACCESS |
|---|
|
|
|---|
The main page of SWEET-DB (http://www.dkfz.de/spec2/sweetdb/) provides several levels of access to the contents of SWEET-DB (Fig. 1, top border). Activating bibliographic search mode produces a spreadsheet which allows the search for authors, catchwords contained in the title, journal name and year of publication. The search for data associated with a complete glycan sequence or parts thereof is definitively the most frequently used way to access SWEET-DB. Depending on the size of the structure one wants to retrieve, different input spreadsheets are provided helping the user to specify correctly the required format for monosaccharide units and linkage information (Fig. 1). The 4 x 1 input matrix is thought to help the novice user. Frequently occurring monosaccharides are predefined and can be input using pull-down menus. The 4 x 3 and 6 x 5 input matrices are suitable for the input of larger, branched sequences. Here the user has to input the monosaccharide unit and linkage type to enable the retrieval of all possible glycan structures.
|
The matched sequences are displayed using a simple ASCII representation (similar to CarbBank notation). The user has the option to access all data (references, NMR data, molecular weight, glycan composition) stored for one sequence. Additionally a 3D model is available which can be displayed using public domain programs like RasMol, Chime, etc. Access to sequences specifying the range of molecular weight, frequency of atoms, content of monosaccharide components, specific residues and number of residues and branches is provided under the general structure information retrieval option.
If the user wants to find spectroscopic data stored for a certain (sub)structure, the search for a structure option can be used. The input spreadsheet provides the option that only entries containing NMR data will be displayed (Fig. 1). The output of matching entries is accomplished in two steps. In the first step, the sequence of all matched glycans is displayed. In the second step, NMR data are provided as a list of assigned shifts and coupling constants (Fig. 2, bottom) for user-selected entries. In case all glycan sequences shall be retrieved matching certain spectroscopic data (e.g. 1H-NMR shifts), SWEET-DB offers two different retrieval options. (i) Up to 10 NMR shifts (with a certain tolerance) can be input. A list of sequences containing matching spectra is presented in descending order of a score factor. The score factor takes into account the number of matched shifts and their deviation from the input shift. (ii) All NMR shifts assigned to a certain atom within a given monosaccharide unit (for example H1 in
-D-Galp) can be visualised as shift frequency histogram recalled (Fig. 2). Additionally, the shift range of interest can be specified. In such a way the chemical surrounding of each individual NMR shift can be analysed in detail. 1H- as well as 13C-NMR data can be retrieved.
|
A demonstration of SWEET-DB is provided as a tutorial at http://www.dkfz.de/spec2/sweetdb/nar/.
| IMPLEMENTATION |
|---|
|
|
|---|
In order to better encode and retrieve the complexity of carbohydrate structures, SWEET-DB has been implemented as a relational database rather than as a flat file. A LAMP (Linux, Apache, MySQL, PHP) system is used to store, retrieve and output the data.
| FUTURE DEVELOPMENTS |
|---|
|
|
|---|
SWEET-DB is intended as a regular service relational carbohydrate sequence database which strives to provide a high level of annotation, a minimal level of redundancy and high level of integration with other databases. Additionally, we will try to implement AUTO-SWEET-DB as a computer-annotated supplement to SWEET-DB using essentially the same format for both databases. AUTO-SWEET-DB is an attempt to find out how far it is possible to build up and update continuously a data collection (similar to TrEMBL database as an extension to SWISS-PROT for proteins) by scanning and extracting all electronically available resources like abstracts, publications and web pages containing information relevant to glycosciences.
Currently we are working to establish input facilities for SWEET-DB based on a user-friendly web-based interface. A prototype to input new carbohydrate sequences, references and annotations as well as NMR spectra (http://www.dkfz.de/spec2/nmr_eingabe/) is already available. Integration with other online databases is planned.
| FOOTNOTES |
|---|
* To whom correspondence should be addressed. Tel: +49 6221 42 4541; Fax: +49 6221 42 4554; Email: w.vonderlieth{at}dkfz.de Present addresses: Alexander Loß and Annika Loß, Gebrüder Gerstenberg GmbH & Co, 31134 Hildesheim, Germany Peter Bunsmann, Bosch-Blaupunkt, 31134 Hildesheim, Germany
| REFERENCES |
|---|
|
|
|---|
-
1 Venter,J. (2001) The sequence of the human genome. Science, 291, 13041351.
2 Helenius,A. and Aebi,M. (2001) Intracellular functions of N-linked glycans. Science, 291, 23642369.
3 Rudd,P., Elliott,T.,Cresswell,P.,Wilson,I. and Dwek,R. (2001) Glycosylation and the immune system. Science, 291, 23702376.
4 Wells,L., Vosseller,K. and Hart,G. (2001) Glycosylation of nucleocytoplasmic proteins: signal transduction and O-GlcNAc. Science, 291, 23762378.
5 Hardy,B. and Wilson,I. (1996) Virtual resource development in the glycosciences. Glycoconj. J., 13, 865872.[Web of Science][Medline]
6 Albersheim,P. (1991) Complex carbohydrate structural database. Glycobiology, 113, 113.
7 Doubet,S., Bock,K., Smith,D., Darvill,A. and Albersheim,P. (1989) The complex carbohydrate structure database. Trends Biochem. Sci., 14, 475477.[Web of Science][Medline]
8 Doubet,S. and Albersheim,P. (1992) CarbBank. Glycobiology, 2, 505.
9 Cooper,C., Wilkins,M., Williams,K. and Packer,N. (1999) BOLDa biological O-linked glycan database. Electrophoresis, 20, 35893598.[Web of Science][Medline]
10 Cooper,C., Harrison,M., Wilkins,M. and Packer,N. (2001) GlycoSuiteDB: a new curated relational database of glycoprotein glycan structures and their biological sources. Nucleic Acids Res., 29, 332335.
11 Laine,R. (1994) A calculation of all possible oligosaccharide isomers both branched and linear yield 1.05 x 1012 structures for a reducing hexasaccharide: the Isomer Barrier to development of single method saccharide sequencing or synthesis system. Glycobiology, 4, 759767.
12 van Kuik,J. and Vliegenthart,J.F. (1992) Databases of complex carbohydrates. Trends Biotechnol., 10, 182185.[Web of Science][Medline]
13 van Kuik,J., Hard,K. and Vliegenthart,J.F. (1992) A 1H NMR database computer program for the analysis of the primary structure of complex carbohydrates. Carbohydr. Res., 235, 5368.[Web of Science][Medline]
14 Bohne,A., Lang,E. and von der Lieth,C.-W. (1998) W3-SWEET: carbohydrate modeling by Internet. J. Mol. Model., 4, 3343.
15 Bohne,A., Lang,E., Förster,T. and von der Lieth,C.-W. (2001) LINUCS: LInear Notation for Unique Description of Carbohydrate Sequences. Carbohydr Res., 336, 111.[Web of Science][Medline]
16 McNaught,A. (1997) International Union of Biochemistry and Molecular Biology. Joint Commission on Biochemical Nomenclature. Nomenclature of carbohydrates. Carbohydr. Res., 297, 192.[Web of Science][Medline]
17 Pappu,R., Hart,R. and Ponder,J. (1998) Analysis and application of potential energy smoothing for global optimization. J. Phys. Chem. B, 102, 97259742.
18 Wheeler,D., Church,D., Lash,A., Leipe,D., Madden,T., Pontius,J., Schuler,G., Schriml,L., Tatusova,T., Wagner,L. et al. (2001) Database resources of the National Center for Biotechnology Information Nucleic Acids Res., 29, 1116. Updated article in this issue: Nucleic Acids Res. (2002), 30, 1316.
This article has been cited by other articles:
![]() |
D. Goldberg, M. Bern, S. J. North, S. M. Haslam, and A. Dell Glycan family analysis for deducing N-glycan topology from single MS Bioinformatics, February 1, 2009; 25(3): 365 - 371. [Abstract] [Full Text] [PDF] |
||||
![]() |
A. Loss, R. Stenutz, E. Schwarzer, and C.-W. von der Lieth GlyNest and CASPER: two independent approaches to estimate 1H and 13C NMR shifts of glycans available through a common web-interface. Nucleic Acids Res., July 1, 2006; 34(suppl_2): W733 - W737. [Abstract] [Full Text] [PDF] |
||||
![]() |
K. Hashimoto, S. Goto, S. Kawano, K. F. Aoki-Kinoshita, N. Ueda, M. Hamajima, T. Kawasaki, and M. Kanehisa KEGG as a glycome informatics resource Glycobiology, May 1, 2006; 16(5): 63R - 70R. [Abstract] [Full Text] [PDF] |
||||
![]() |
T. Lutteke, A. Bohne-Lang, A. Loss, T. Goetz, M. Frank, and C.-W. von der Lieth GLYCOSCIENCES.de: an Internet portal to support glycomics and glycobiology research Glycobiology, May 1, 2006; 16(5): 71R - 81R. [Abstract] [Full Text] [PDF] |
||||
![]() |
K. F. Aoki, H. Mamitsuka, T. Akutsu, and M. Kanehisa A score matrix to reveal the hidden links in glycans Bioinformatics, April 15, 2005; 21(8): 1457 - 1463. [Abstract] [Full Text] [PDF] |
||||
![]() |
T. Lutteke, M. Frank, and C.-W. von der Lieth Carbohydrate Structure Suite (CSS): analysis of carbohydrate 3D structures derived from the PDB Nucleic Acids Res., January 1, 2005; 33(suppl_1): D242 - D246. [Abstract] [Full Text] [PDF] |
||||
![]() |
K. K. Lohmann and C.-W. von der Lieth GlycoFragment and GlycoSearchMS: web tools to support the interpretation of mass spectra of complex carbohydrates Nucleic Acids Res., July 1, 2004; 32(suppl_2): W261 - W266. [Abstract] [Full Text] [PDF] |
||||
![]() |
K. F. Aoki, A. Yamaguchi, N. Ueda, T. Akutsu, H. Mamitsuka, S. Goto, and M. Kanehisa KCaM (KEGG Carbohydrate Matcher): a software tool for analyzing the structures of carbohydrate sugar chains Nucleic Acids Res., July 1, 2004; 32(suppl_2): W267 - W272. [Abstract] [Full Text] [PDF] |
||||
![]() |
M. Kapoor, H. Srinivas, E. Kandiah, E. Gemma, L. Ellgaard, S. Oscarson, A. Helenius, and A. Surolia Interactions of Substrate with Calreticulin, an Endoplasmic Reticulum Chaperone J. Biol. Chem., February 14, 2003; 278(8): 6194 - 6200. [Abstract] [Full Text] [PDF] |
||||
| ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||





