Skip Navigation

This Article
Right arrow Abstract Freely available
Right arrow Print PDF (338K) Freely available
Right arrow Alert me when this article is cited
Right arrow Alert me if a correction is posted
Services
Right arrow Email this article to a friend
Right arrow Similar articles in this journal
Right arrow Similar articles in ISI Web of Science
Right arrow Similar articles in PubMed
Right arrow Alert me to new issues of the journal
Right arrow Add to My Personal Archive
Right arrow Download to citation manager
Right arrow Search for citing articles in:
ISI Web of Science (37)
Right arrowRequest Permissions
Right arrow Commercial Re-use Guidelines
for Open Access NAR Content
Google Scholar
Right arrow Articles by Krause, A.
Right arrow Articles by Vingron, M.
Right arrow Search for Related Content
PubMed
Right arrow PubMed Citation
Right arrow Articles by Krause, A.
Right arrow Articles by Vingron, M.
Social Bookmarking
 Add to CiteULike   Add to Connotea   Add to Del.icio.us  
What's this?

Nucleic Acids Research, 2002, Vol. 30, No. 1 299-300
© 2002 Oxford University Press

SYSTERS, GeneNest, SpliceNest: exploring sequence space from genome to protein

Antje Krause*, Stefan A. Haas, Eivind Coward and Martin Vingron

Max-Planck-Institute for Molecular Genetics, Computational Molecular Biology, Ihnestrasse 73, 14195 Berlin, Germany

Received September 21, 2001; Accepted October 1, 2001.


    ABSTRACT
 TOP
 ABSTRACT
 INTEGRATED DATABASES
 SUMMARY
 REFERENCES
 
We have integrated the protein families from SYSTERS and the expressed sequence tag (EST) clusters from our database GeneNest with SpliceNest, a new database mapping EST contigs into genomic DNA. The SYSTERS protein sequence cluster set provides an automatically generated classification of all sequences of the SWISS-PROT, TrEMBL and PIR databases into disjoint protein family and superfamily clusters. GeneNest is a database and software package for producing and visualizing gene indices from ESTs and mRNAs. Currently, the database comprises gene indices of human, mouse, Arabidopsis thaliana and zebrafish. SpliceNest is a web-based graphical tool to explore gene structure, including alternative splicing, based on a mapping of the EST consensus sequences from GeneNest to the complete human genome. The integration of SYSTERS, GeneNest and SpliceNest into one framework now permits an overall exploration of the whole sequence space covering protein, mRNA and EST sequences, as well as genomic DNA. The databases are available for querying and browsing at http://cmb.molgen.mpg.de.


    INTEGRATED DATABASES
 TOP
 ABSTRACT
 INTEGRATED DATABASES
 SUMMARY
 REFERENCES
 
SYSTERS
The SYSTERS protein sequence cluster set (1) consists of the hierarchical classification of all known sequences from the SWISS-PROT (2), TrEMBL and PIR (3) sequence databases into disjoint protein family clusters and superfamilies. The classification is based on an all-against-all database search using gapped BLAST (4) with a subsequent hierarchical clustering. The sequences in every cluster have been multiply aligned using CLUSTALW (5) and for each cluster an unrooted phylogenetic tree is available. All multiple alignments are annotated with known domains from the Pfam database of protein domain families (6) and clusters can be selected directly from a list of Pfam domains. A new protein sequence can be searched against the database of multiple alignments using the similarity searching tool SSMAL (7). For each cluster, an MView (8) output is generated and from the resulting partial multiple alignment a majority consensus sequence is calculated. All consensus sequences together build a database searchable with BLAST. Precomputed BLAST searches of the GeneNest consensus sequences against the SYSTERS protein consensus sequences were evaluated to generate links from SYSTERS to GeneNest and vice versa.

GeneNest
GeneNest (9) is a database and software package for the generation and visualization of gene indices based on EST and mRNA sequences. Currently, the database comprises gene indices of man (based on UniGene), mouse, Arabidopsis thaliana and zebrafish. All cDNA/mRNA sequences related to an organism are extracted either directly from the EMBL (10) database or from an already clustered UniGene (11) database. A preprocessing step includes vector clipping, repeat annotation and marking of regions of low sequence quality in order to restrict processing to data of high quality. In further steps, these sequences are clustered and all members of each cluster are assembled into one or more contigs. Roughly speaking, each cluster represents a single gene, whereas contigs of a cluster reflect different transcripts of that gene. A schematic view of the assembled clusters is presented on the GeneNest web site. Detailed information about sequences and their preprocessing results, as well as information about open reading frames, similarities between clusters or protein homologies, can be accessed interactively. GeneNest can be queried using BLAST against the consensus sequences or by keyword search. GeneNest is tightly linked to SYSTERS and SpliceNest as well as to external resources like EMBL.

SpliceNest
SpliceNest (12) is a web-based graphical tool to explore gene structure based on a mapping of the expressed sequence tag (EST) consensus sequences (contigs) from GeneNest to the complete human genome. Assuming that a cluster normally represents a single gene, every contig of a cluster is aligned separately to the same genomic region, using the spliced alignment program sim4 (13). Differences between the contigs may correspond to alternative splicing, but they can also be due to low sequence quality, genomic contamination or other artifacts. The alignments are visualized in a diagram showing the exon/intron structure of all contigs of a single cluster (i.e. gene) simultaneously, mapped on the common genomic sequence. Exons are represented as colored bars and introns as arrows. The visualization facilitates the identification of genuine splice variants. Furthermore, candidate loci of alternative splicing are automatically identified and highlighted. If a cluster has several matches in the genome, a ranked list of all matches is provided. Each contig is linked to the corresponding GeneNest assembly, giving easy access to information about individual EST and mRNA sequences. Other links point to detailed alignments, related entries in the EMBL database or raw sequences. A toolbar allows zooming into the alignment. The current version of SpliceNest uses the GeneNest assembly based on human UniGene and the Golden Path genomic sequence (14).


    SUMMARY
 TOP
 ABSTRACT
 INTEGRATED DATABASES
 SUMMARY
 REFERENCES
 
The three otherwise independent databases GeneNest, SpliceNest and SYSTERS are now fully linked with each other and to other major databases (Fig. 1). This allows navigating, e.g. from a protein to its UniGene cluster assembly and on to its genomic position and structure. Alternatively, one might enter via a sorted list of UniGene clusters on a chromosome and link from a particular cluster to its gene product in the context of a protein family. Thus, the linking of these databases facilitates navigation of sequence space between genomic DNA and protein sequences and families.



View larger version (84K):
[in this window]
[in a new window]
 
Figure 1. The integration of SYSTERS, GeneNest and SpliceNest into one framework. Possible queries to the databases are given on the left (blue), followed by the underlying query tools BLAST and SSMAL (green). The features and interactions of the SYSTERS, GeneNest and SpliceNest databases are shown in the middle (yellow) and links to external resources on the right (red).

 

    ACKNOWLEDGEMENTS
 
We acknowledge financial support from Bundesministerium für Bildung und Forschung (BMBF) and Deutsches Human Genom Projekt (DHGP).


    FOOTNOTES
 
* To whom correspondence should be addressed. Tel: +49 30 8413 1404; Fax: +49 30 8413 1152; Email: krause_a{at}molgen.mpg.de Back


    REFERENCES
 TOP
 ABSTRACT
 INTEGRATED DATABASES
 SUMMARY
 REFERENCES
 

    1 Krause,A. and Vingron,M. (1998) A set-theoretic approach to database searching and clustering. Bioinformatics, 14, 430–438.[Abstract/Free Full Text]

    2 Bairoch,A. and Apweiler,R. (2000) The SWISS-PROT protein sequence database and its supplement TrEMBL in 2000. Nucleic Acids Res., 28, 45–48.[Abstract/Free Full Text]

    3 Barker,W.C., Garavelli,J.S., Hou,Z., Huang,H., Ledley,R.S., McGarvey,P.B., Mewes,H.W., Orcutt,B.C., Pfeiffer,F., Tsugita,A. et al. (2001) Protein Information Resource: a community resource for expert annotation of protein data. Nucleic Acids Res., 29, 29–32. Updated article in this issue: Nucleic Acids Res. (2002), 30, 35–37.[Abstract/Free Full Text]

    4 Altschul,S.F., Madden,T.L., Schäffer,A.A., Zhang,J., Zhang,Z., Miller,W. and Lipman,D.J. (1997) Gapped BLAST and PSI-BLAST: a new generation of protein database search programs. Nucleic Acids Res., 25, 3389–3402.[Abstract/Free Full Text]

    5 Thompson,J.D., Higgins,D.G. and Gibson,T.J. (1994) CLUSTAL W: improving the sensitivity of progressive multiple sequence alignment through sequence weighting, position specific gap penalties and weight matrix choice. Nucleic Acids Res., 22, 4673–4680.[Abstract/Free Full Text]

    6 Bateman,A., Birney,E., Durbin,R., Eddy,S.R., Howe,K.L. and Sonnhammer,E.L.L. (2000) The Pfam protein families database. Nucleic Acids Res., 28, 263–266. Updated article in this issue: Nucleic Acids Res. (2002), 30, 276–280.[Abstract/Free Full Text]

    7 Nicodème,P. (1998) SSMAL: similarity searching with alignment graphs. Bioinformatics, 14, 508–515.[Abstract/Free Full Text]

    8 Brown,N.P., Leroy,C. and Sander,C. (1998) MView: a web-compatible database search or multiple alignment viewer. Bioinformatics, 14, 380–381.[Abstract/Free Full Text]

    9 Haas,S.A., Beissbarth,T., Rivals,E., Krause,A. and Vingron,M. (2000) GeneNest: automated generation and visualization of gene indices. Trends Genet., 16, 521–523.[Web of Science][Medline]

    10 Stoesser,G., Baker,W., van den Broek,A., Camon,E., Garcia-Pastor,M., Kanz,C., Kulikova,T., Lombard,V., Lopez,R., Parkinson,H. et al. (2001) The EMBL nucleotide sequence database. Nucleic Acids Res., 29, 17–21. Updated article in this issue: Nucleic Acids Res. (2002), 30, 21–26.[Abstract/Free Full Text]

    11 Schuler,G.D. (1997) Pieces of the puzzle: expressed sequence tags and the catalog of human genes. J. Mol. Med., 75, 694–698.[Web of Science][Medline]

    12 Coward,E., Haas,S.A. and Vingron,M. (2002) SpliceNest: visualization of gene structure and alternative splicing based on EST clusters. Trends Genet., 18, in press.

    13 Florea,L., Hartzell,G., Zhang,Z., Rubin,G.M. and Miller,W. (1998) A computer program for aligning a cDNA sequence with a genomic DNA sequence. Genome Res., 8, 967–974.[Abstract/Free Full Text]

    14 International Human Genome Sequencing Consortium (2001) Initial sequencing and analysis of the human genome. Nature, 409, 860–921.[Medline]


Add to CiteULike CiteULike   Add to Connotea Connotea   Add to Del.icio.us Del.icio.us    What's this?


This article has been cited by other articles:


Home page
Nucleic Acids ResHome page
S. Foissac and M. Sammeth
ASTALAVISTA: dynamic and flexible analysis of alternative splicing events in custom gene datasets
Nucleic Acids Res., July 13, 2007; 35(suppl_2): W297 - W299.
[Abstract] [Full Text] [PDF]


Home page
Brief BioinformHome page
S. H. Nagaraj, R. B. Gasser, and S. Ranganathan
A hitchhiker's guide to expressed sequence tag (EST) analysis
Brief Bioinform, January 1, 2007; 8(1): 6 - 21.
[Abstract] [Full Text] [PDF]


Home page
Nucleic Acids ResHome page
T. Rattei, R. Arnold, P. Tischler, D. Lindner, V. Stumpflen, and H. W. Mewes
SIMAP: the similarity matrix of proteins
Nucleic Acids Res., January 1, 2006; 34(suppl_1): D252 - D256.
[Abstract] [Full Text] [PDF]


Home page
BioinformaticsHome page
V. Kunin, S. A. Teichmann, M. A. Huynen, and C. A. Ouzounis
The properties of protein family space depend on experimental design
Bioinformatics, June 1, 2005; 21(11): 2618 - 2622.
[Abstract] [Full Text] [PDF]


Home page
Plant Physiol.Home page
K. Horan, J. Lauricha, J. Bailey-Serres, N. Raikhel, and T. Girke
Genome Cluster Database. A Sequence Family Analysis Platform for Arabidopsis and Rice
Plant Physiology, May 1, 2005; 138(1): 47 - 54.
[Abstract] [Full Text] [PDF]


Home page
Genome ResHome page
N. Kim, S. Shin, and S. Lee
ECgene: Genome-based EST clustering and gene modeling for alternative splicing
Genome Res., April 1, 2005; 15(4): 566 - 576.
[Abstract] [Full Text] [PDF]


Home page
Nucleic Acids ResHome page
P. Kim, N. Kim, Y. Lee, B. Kim, Y. Shin, and S. Lee
ECgene: genome annotation for alternative splicing
Nucleic Acids Res., January 1, 2005; 33(suppl_1): D75 - D79.
[Abstract] [Full Text] [PDF]


Home page
Nucleic Acids ResHome page
T. Meinel, A. Krause, H. Luz, M. Vingron, and E. Staub
The SYSTERS Protein Family Database in 2005
Nucleic Acids Res., January 1, 2005; 33(suppl_1): D226 - D229.
[Abstract] [Full Text] [PDF]


Home page
Plant Physiol.Home page
T. Girke, J. Lauricha, H. Tran, K. Keegstra, and N. Raikhel
The Cell Wall Navigator Database. A Systems-Based Approach to Organism-Unrestricted Mining of Protein Families Involved in Cell Wall Metabolism
Plant Physiology, October 1, 2004; 136(2): 3003 - 3008.
[Full Text] [PDF]


Home page
Nucleic Acids ResHome page
N. Kim, S. Shin, and S. Lee
ASmodeler: gene modeling of alternative splicing from genomic alignment of mRNA, EST and protein sequences
Nucleic Acids Res., July 1, 2004; 32(suppl_2): W181 - W186.
[Abstract] [Full Text] [PDF]


Home page
Nucleic Acids ResHome page
S. Mohseni-Zadeh, A. Louis, P. Brezellec, and J.-L. Risler
PHYTOPROT: a database of clusters of plant proteins
Nucleic Acids Res., January 1, 2004; 32(90001): D351 - 353.
[Abstract] [Full Text] [PDF]


Home page
Nucleic Acids ResHome page
A. J. Enright, V. Kunin, and C. A. Ouzounis
Protein families and TRIBES in genome sequence space
Nucleic Acids Res., August 1, 2003; 31(15): 4632 - 4638.
[Abstract] [Full Text] [PDF]


Home page
Nucleic Acids ResHome page
R. Sorek and H. M. Safer
A novel algorithm for computational identification of contaminated EST libraries
Nucleic Acids Res., February 1, 2003; 31(3): 1067 - 1074.
[Abstract] [Full Text] [PDF]


This Article
Right arrow Abstract Freely available
Right arrow Print PDF (338K) Freely available
Right arrow Alert me when this article is cited
Right arrow Alert me if a correction is posted
Services
Right arrow Email this article to a friend
Right arrow Similar articles in this journal
Right arrow Similar articles in ISI Web of Science
Right arrow Similar articles in PubMed
Right arrow Alert me to new issues of the journal
Right arrow Add to My Personal Archive
Right arrow Download to citation manager
Right arrow Search for citing articles in:
ISI Web of Science (37)
Right arrowRequest Permissions
Right arrow Commercial Re-use Guidelines
for Open Access NAR Content
Google Scholar
Right arrow Articles by Krause, A.
Right arrow Articles by Vingron, M.
Right arrow Search for Related Content
PubMed
Right arrow PubMed Citation
Right arrow Articles by Krause, A.
Right arrow Articles by Vingron, M.
Social Bookmarking
 Add to CiteULike   Add to Connotea   Add to Del.icio.us  
What's this?