Nucleic Acids Research, 2003, Vol. 31, No. 8 2187-2195
© 2003 Oxford University Press
GenDBan open source genome annotation system for prokaryote genomes
Center for Genome Research, 1 Technische Fakultät and 2 Department of Biology, Bielefeld University, Bielefeld, Germany
*To whom correspondence should be addressed. Tel: +49 521 106 4827; Fax: +49 521 106 5626; Email: fm{at}genetik.uni-bielefeld.de
Received October 30, 2002; Revised and Accepted February 4, 2003
| ABSTRACT |
|---|
|
|
|---|
The flood of sequence data resulting from the large number of current genome projects has increased the need for a flexible, open source genome annotation system, which so far has not existed. To account for the individual needs of different projects, such a system should be modular and easily extensible. We present a genome annotation system for prokaryote genomes, which is well tested and readily adaptable to different tasks. The modular system was developed using an object-oriented approach, and it relies on a relational database backend. Using a well defined application programmers interface (API), the system can be linked easily to other systems. GenDB supports manual as well as automatic annotation strategies. The software currently is in use in more than a dozen microbial genome annotation projects. In addition to its use as a production genome annotation system, it can be employed as a flexible framework for the large-scale evaluation of different annotation strategies. The system is open source.
| INTRODUCTION |
|---|
|
|
|---|
The process of genome annotation can be defined as assigning meaning to sequence data that would otherwise be almost devoid of information. By identifying regions of interest and defining putative functions for those areas, the genome can be understood and further research initiated. Annotation generally is thought to be of best quality when performed by a human expert. The vast amount of data which has to be evaluated in any whole-genome annotation project, however, has led to the (partial) automation of the procedure. Due to this, software assistance for computation, storage, retrieval and analysis of relevant data has become essential for the success of any genome project.
Comparison of existing tools
A number of genome annotation systems intended for the analysis of prokaryotic and eukaryotic organisms have been designed and presented in the last few years. The first generation was published in 1996 and consisted of the MAGPIE (1), GeneQuiz (2) and Pedant (3) systems. These focused primarily on generating human readable HTML documents based on tables and sometimes in-line graphics. A number of good ideas originated from this first generation of genome annotation systems, which made their way into todays systems. Examples are the intuitive visualizations provided by MAGPIE and the splitting of results by significance levels to enable comparison of different tools (also MAGPIE). Since then, a second generation of mostly commercial genome annotation systems has been published, including ERGO (Integrated Genomics, Inc.), Pedant-Pro (successor to Pedant, Biomax Informatics AG), Phylosopher (Gene Data, Inc.), BioScout (Genequiz, Lion AG), WIT (4) and the open source system Artemis (5). Some systems (MAGPIE, Artemis and Phylosopher) contain extensive visualizations or include multiple genome comparison-based annotation strategies [most notably by ERGO (6)]. With the exception of Artemis, all systems provide an automatic annotation feature. To the best of our knowledge, except ERGO, all systems use a variant of best blast hit as their fixed, built-in annotation strategy. Only MAGPIE, Artemis and the newer versions of Pedant allow the integration of expert knowledge through manual annotation. (In the last few weeks, the Manatee system has been made public by TIGR. The authors have not yet had the opportunity to evaluate this system.)
The substantial commercial interest in the area of genome annotation has led to a situation where, with the noted exception of Artemis, no genome annotation system is in the public domain. Therefore, only the source code of Artemis is available for further analysis by the research community. Even in-depth technical information, such as details about the annotation strategy implemented, is very hard to obtain. This lack of access is a major hurdle when trying to evaluate these complex systems. Together with the omission of well defined application programmers interfaces (APIs), this prevents the extension of existing systems. This situation is counter-productive for science in this field: the best experts in the field have no medium to contribute their experience to the cooperative evolution of better and better annotation systems. The resulting need for a well designed and documented open source genome annotation system led us to develop GenDB. GenDB is a flexible and easily extensible system, which currently is in worldwide use for the annotation of more than a dozen novel microbial genomes.
As with the very successful Linux computer operating system, the open source license of GenDB enables the cooperative development of high quality software for genome annotation. The system is intended to provide a flexible, transparent infrastructure for genome projects, which other groups can adopt and modify to meet their requirements.
The System Architecture and Implementation sections describe in detail the GenDB software system; they are intended to enable bioinformatics scientists to evaluate the system. The next section outlines the bioinformatics methods currently implemented by the system; here the target audience is the biologist looking for a tool to annotate a genome. Finally, the section on applications is intended for a general audience to show the scope of projects in which GenDB currently is being used.
| SYSTEM ARCHITECTURE |
|---|
|
|
|---|
A surprising lesson learned from the analysis of the existing systems (as far as they are known to the authors) is the lack of consistent internal data representation. However, in our opinion, an internal data representation using a well defined data model is the prerequisite needed to provide an API for any larger software system.
Data model design
We chose a very simple data model, based on only three core types of objects. Regions describe arbitrary (sub-) sequences. A region can be related to a parent region, e.g. a CDS is part of a contig. Observations correspond to information computed by various tools [e.g. BLAST (7) or InterPro (8)] for those regions. Annotations store the interpretation of a (human) annotator. They describe regions based on the evidence stored in the observations. Figure 1 shows the relationships between the different core objects. As can be seen, there is a clear distinction between the results from various bioinformatics tools (observations) and their interpretation (annotations), implemented in the data model. While this data model seems very generic, it represents a hierarchy of classes, including the complete EMBL feature set with several extensions. There are additional classes (e.g. tools and annotators) that complement the three core classes.
|
Since data access is via the objects described above, the classes in GenDB themselves form the API.
This object-oriented approach makes code maintenance easy, and also makes the data and methods in our system accessible to other programs. At the same time, we provide a means to extend the GenDB system.
General overview
Figure 2 illustrates the architecture of the GenDB system: the GenDB objects are mapped onto tables via O2DBI and stored in an SQL database. All access to these data via a Perl client or server API, or via a C++ client interface is again managed by the O2DBI module. On the client side, user interfaces can be implemented that use the functionality of these APIs.
|
On the server side, sequence databases can be accessed with the SRS (9) system or via the BioPerl (http://www.bioperl.org) interfaces. Computational intensive tools such as BLAST or InterPro can be managed and scheduled via a BioGrid (e.g. Sun GridEngine http://www.sun.com/gridware).
Plug-in architecture
As all data in the system are accessible, almost any task can be performed by a plug-in, defined as a tool that operates on the GenDB data structures. While the core GenDB system provides a mechanism for manual annotation, an automatic annotation plug-in performs automatic assignment of regions (e.g. genes) and/or functional annotation for those regions. Another example of the plug-in architecture is the inclusion of the PathFinder (10) component for the analysis of metabolic data.
Wizards
Repetitive tasks such as updating the position of every downstream gene after a frameshift correction are performed by wizards. These are software agents, modeling repetitive tasks and/or tasks that require complex and synchronized changes to several data objects. All actions performed using wizards are modeled as annotations. Currently, wizards are implemented for frameshift and sequence data correction, CDS-start correction and reload (update) of contig sequences.
| IMPLEMENTATION |
|---|
|
|
|---|
We chose Perl (http://www.perl.org) as implementation language using a multitude of existing Perl modules from the BioPerl project. The widespread use of Perl in bioinformatics will enable many researchers to use GenDB as a platform for their implementation of further genome analysis pipelines. Using Perl with GenDB allows the incorporation of additional tools and methods from this area of research. To be able to offer an API to the outside world, the system requires a persistent storage layer. We elected to use an relational storage backend (SQL), which provides a fast, reliable and well tested storage subsystem.
O2DBIv2 (objects to database interface)
The complexity of our system encourages using an object-oriented approach not only in designing (see Fig. 1) but also in implementing the system. While Perl offers various interfaces to DBMS systems, there was no previous tool available for the mapping of Perl objects to relational tables, applicable for our purposes. We therefore used at first the original O2DBI system (O2DBI, J.Clausen, Technical Report, Bielefeld University, 2002) which was then enhanced substantially by B. Linke as O2DBIv2 (B.Linke, in preparation) to map Perl objects automatically to relational tables. Object descriptions in UML (XMI) format are now translated into a library of Perl objects with Perl and C++ clientserver bindings. All objects are stored in a relational database [e.g. MySQL (http://www. mysql.com) or PostgreSQL (http://www.postgresql.org)].
Figure 3 shows a simplified version of the role of O2DBI. Classes are described as Perl hashes (denoting objects) which are mapped to relational tables. Perl source code is generated that implements standard methods (create, delete, init, get/set, etc.) for the objects. These automatically generated object methods are stored in Perl modules. Extension of the object functionality is possible in separate Perl modules.
|
Interfaces
There are several ways of accessing the system, an API, user interfaces and a new clientserver interface.
User interfaces. The more widely used frontend is a Gtk-Perl (http://www.gtkperl.org) based graphical user interface (GUI) that offers access to the data in the system by a variety of navigation metaphors (see Figs 4 and 5). Since not all users have access to a platform with Perl/Gtk, a web interface is also provided. The web interface offers somewhat restricted functionality with respect to the GUI. However, due to its HTML standard compliance, the web interface provides access to GenDB for a wide range of platforms.
|
|
As stated above, the GenDB classes form the API. Documentation of each class and object property or method is available on our web site. The relative simplicity of our object model, together with the documentation, have led several groups to use GenDB as a platform for their research. The web site has several sample scripts that show the functionality of the GenDB API. Using this interface, programmers are able to extract or manipulate the GenDB data objects. This allows, for example, the user to write simple Perl scripts that compute the molecular weight for every protein in a given genome and to generate a table.
SOAP interface. In addition to the Perl API, we are in the final development stages of a clientserver programmers interface. This will not only allow non-Perl platforms to connect to the GenDB system, but will also allow clients to run on remote machines. We use a SOAP (http://www.w3.org/2000/xp/Group/) interface to make our GenDB API available to languages such as C++, Python or Java.
System requirements
Since one aim of the GenDB project is to provide a platform for end users and developers, the system has very modest system requirements. A Unix system with Perl, an SQL database and BioPerl are necessary. If the user wants to compute new observations with GenDB, the required tools will have to be installed on the system or have to be available via some kind of queuing system. For a complete local installation, the sequence databases used by the tools and some sequence retrieval mechanism are required. We currently use SRS and BioPerl for this purpose. Of the systems available today, only SRS provides user-friendly views on the sequence databases.
The system can be installed on a single (e.g. Linux) server or can be spread out over multiple machines, creating a clientserver installation. Locally, several test and development installations exist on single CPU Linux platforms, while our production environment includes a clientserver environment with a server for the frontend, a dedicated database server and a BioGrid to perform the computation of observations.
License
To provide a resource to the academic community, we distribute the complete system (including source code) to non-commercial users under an open source license. Special commercial licenses are available on request.
Documentation and availability
The complete system including the source code, documentation, a guided tour and installation instructions is available from our web site: http://gendb.Genetik.Uni-Bielefeld.DE. The documentation includes the details on the system architecture, the API and data model. An XML file describing the complete data model in great detail and hyperlinks to both versions of O2DBI can also be found on the web site.
| BIOINFORMATICS METHODS |
|---|
|
|
|---|
Data import and export
An important step for any genome analysis project is the availability of good import and export facilities in the genome annotation system. Currently, the GenDB system allows data import from GenBank, EMBL and fasta format files. Supported export formats are GenBank, EMBL, fasta format files and GFF (genome feature format; see http://www.sanger. ac.uk/Software/formats/GFF). A user-configurable linear or circular whole-genome view (see Fig. 5), which can be exported as a PNG file, complements the export formats. For each gene annotated with GenDB, a printable gene report can be generated.
Integration of tools
As described in the System Architecture section, GenDB allows the incorporation of arbitrary programs for different kinds of bioinformatics analysis. According to the system design, these programs are integrated as tools, which create observations for a specific kind of region. The inclusion of such tools in GenDB is very easy, with the most time-consuming step typically being the implementation of a parser for the result files. For the prediction of regions, such as coding sequences (CDS) or tRNA-encoding genes, GLIMMER (11), CRITICA (12) and tRNAscan-SE (13) have been integrated into the system. Homology searches at the DNA or amino acid level against arbitrary sequence databases can be done using the BLAST program suite. In addition to using HMMer (14) for motif searches, we also search the BLOCKS (15) and InterPro databases to classify sequence data based on a combination of different kinds of motif search tools. A number of additional tools have been integrated for the characterization of certain features of coding sequences, such as TMHMM (16) for the prediction of
-helical transmembrane regions, SignalP (17) for signal peptide prediction, or CoBias (A.C.McHardy et al., in preparation) for analyzing trends in codon usage.
Whereas some tools only return a numeric score and/or an E-value as a result, other tools such as BLAST or HMMer additionally provide more detailed information, such as an alignment. Although the complete tool results are available to the annotator, only a minimum data subset is stored in the form of observations. Based on this subset, the complete tool result record can be recomputed on demand. Storing only a minimal subset of data reduces the storage demands by two orders of magnitude when compared with the traditional store everything approach. Our performance measurements have shown this also to be more time efficient than data retrieval from a disk subsystem for any realistic genome project. The computation of tool results is done via a plug-in that connects to a BioGrid using the Sun GridEngine software. The graphical user interface for the display of tool results is depicted in Figure 6. Upon selection of a certain region, all available tool results for this region are visualized in a completely customizable list. More information about the underlying database record is available by a cross-link to the corresponding sequence databases with the SRS system.
|
Data navigation metaphors
The design of the GenDB system allows the projection of data from any component or plug-in onto all views (see also Fig. 7). This allows the user to navigate the genome with a wide variety of synchronized views.
|
Annotation
As already mentioned, the GenDB data model features a strict separation of tool results (observations) and their interpretation (annotation). This confers a large amount of flexibility and enables researchers to define their application-specific annotation strategies freely. The GenDB system supports both manual annotation and the application of automated annotation strategies. For manual annotation, the user interface provides a one click infrastructure; for automatic annotation, the API can be used.
The core GenDB system offers simple automatic annotation functions which allow the application of user-defined best tool result strategies. In addition to this, the GenDB-Annotate plug-in provides more complex annotation strategies based on the integration of an expert system. Here, the user can define a set of rules to be used for automatic annotation of regions, or assignment of function to those regions. Owing to the consistent, internal data representation of GenDB, all GenDB objects can be accessed directly by an expert system. While implementing a new annotation strategy currently entails writing programming code, we are in the process of establishing a graphical editor (with XML export) for editing of annotation rules and a processor for computing annotations based on these rules.
For annotation projects, the linear contig with its list of genes often is only a starting point. The knowledge about metabolic pathways and the enzymes contained in them is connected to the data in GenDB via the GOPArc (Gene Ontology and Pathway Architecture) module. GOPArc integrates our previously described PathFinder system. It is a tool for the integration of metabolic pathway and ontology knowledge into GenDB. Using O2DBI, we created an object model representation of the complete KEGG database. Knowledge from other databases [e.g. MetaCyc (http://www. metacyc.org), BRENDA (http://www.brenda.uni-koeln.de)] can be incorporated. In addition to that, the system also provides access to the complete Gene Ontologies (GO) (http://www.geneontology.org) and navigation metaphors that allow browsing genomic data via the GO categories.
Annotation pipeline
Figure 8 shows an example of a genome annotation pipeline that has been implemented with GenDB. Upon import of the raw sequence data, a parent region object describing the genome sequence is created. Following this step, user-defined tools for the prediction of different kinds of regions, such as coding sequences (CDS) or tRNA-encoding genes, can be run. The output of these tools is stored as observations which refer to the parent region object. Based on these observations, an annotator, human or machine, performs region annotation. This means confirming or disregarding the results of gene prediction tools by creating region objects such as CDSs or tRNAs. The annotations form a complete protocol of all region annotation events. Following the creation of different kinds of regions, additional tools such as BLAST, HMMer or CoBias can be run, creating information related to their potential function. Finally, a function annotation step can be performed by an annotator in which a putative function is assigned to these regions by an interpretation of the observations.
|
| APPLICATIONS |
|---|
|
|
|---|
The GenDB system can be used for the annotation of novel genomes, as a model organism database (MOD) for the curation of already annotated genomes, or as a platform for software development.
Using GenDB for genome annotation
The GenDB system has already been installed at a number of European and worldwide institutions, including the German Max Planck network. GenDB currently is being used for the annotation of a number of microbial genomes. The genomes of Sinorhizobium meliloti (18) and Corynebacterium glutamicum ATCC 13032 (J.Kalinowski et al., in preparation) in addition to a large number of bacterial artificial chromosomes, cosmids and plasmids [e.g. Tauch et al. (19,20)] have already been analyzed with GenDB at Bielefeld University. Six novel genomes currently are being analyzed by other European groups with their own installations of GenDB. An additional five genomes (Sorangium cellulosum, Xhantomonas campestris pv. vesicatoria, Alcanivorax borkumensis, Azoarcus sp. and Clavibacter michiganensis) are analyzed by a network of German groups, which use the GenDB platform established at Bielefeld University.
GenDB as model organism database
For curation of already annotated genomes, these can be imported from EMBL or GenBank format files into the system. Any annotation information contained in these is stored in the form of GenDB objects. The data corresponding to these objects again are available via the GenDB API and user interfaces. Once there is a standard data model for prokaryote genomes (such as GMOD for the eukaryote world, see http://www.gmod.org), GenDB will be updated to support that data model.
GenDB as a platform for software development
Due to its versatility, the system is also well suited for use as a platform for novel software development, for which it has already been employed for 2 years at Bielefeld University. Recently, a number of German groups have started to implement their algorithms in the framework of GenDB, e.g. groups in the Max Planck Institutes in Tübingen and Bremen have implemented individual gene prediction strategies for their microbial genome projects using the GenDB framework.
| DISCUSSION |
|---|
|
|
|---|
We present a new open source platform, for both biologists and bioinformatics researchers, that implements the state of the art for genome annotation systems and enhances it in several areas. The system has been in use for 2 years at Bielefeld University and for more than a year at various other institutions. The key features of the system are its flexibility and extensibility. With respect to the genome annotation process, the system provides a flexible framework for implementing various user-defined annotation strategies, instead of relying on a single built-in annotation approach. Our past experiences have also shown the system to be well suited as an extensible platform for the integration of different kinds of functionality. It currently is used for the implementation of a system which links microarray data to gene annotation. We have implemented a wide range of metaphors for data navigation, which allow fast and easy access to different kinds of information during the genome annotation process. We hope that the positive features of the system which we provide to the research community will help to initiate research in new directions and will also be used for generating novel knowledge.
The well designed and documented API has also enabled other researchers to build their own tools based on GenDB. This proves that the main benefit of the open source approach, the cooperative development of high quality software, is already emerging. The ongoing work on GenDB is in the direction of more sophisticated automatic annotation methods. Another direction is the integration of GenDB with other programs and data sources to build a platform for systems biology.
Since only 6070% of the genes typically found in a bacterial genome can be characterized functionally using a purely sequence-based approach, there is a clear need for adding more information to the analysis process. The GenDB system is an ideal platform to link transcriptome and proteome evidence to the genome, facilitating further analysis of previously uncharacterized genes.
| ACKNOWLEDGEMENTS |
|---|
The authors would like to thank all GenDB users for their time, patience and feedback that helped greatly in optimizing numerous details of the system. A.C.M. was supported by the DFG-Graduiertenkolleg 635 Bioinformatik. The work of F.M. and A.G. is financed by the BMBF grant 031U213D.
| REFERENCES |
|---|
|
|
|---|
- Gaasterland,T. and Sensen,C.W. (1996) MAGPIE: automated genome interpretation. Trends Genet., 12, 7678.[CrossRef][Web of Science][Medline]
- Andrade,M.A., Brown,N.P., Leroy,C., Hoersch,S., de Daruvar,A., Reich,C., Franchini,A., Tamames,J., Valencia,A., Ouzounis,C. and Sander,C. (1999) Automated genome sequence analysis and annotation. Bioinformatics, 15, 391412.
[Abstract/Free Full Text] - Frishman,D., Albermann,K., Hani,J., Heumann,K., Metanomski,A., Zollner,A. and Mewes,H.W. (2001) Functional and structural genomics using PEDANT. Bioinformatics, 17, 4457.
[Abstract/Free Full Text] - Overbeek,R., Larsen,N., Pusch,G.D., DSouza,M., Selkov,E., Kyrpides,N., Fonstein,M., Maltsev,N. and Selkov,E. (2002) WIT: integrated system for high-throughput genome sequence analysis and metabolic reconstruction. Nucleic Acids Res., 28, 123125.
- Rutherford,K.M., Parkhill,J., Crook,J., Horsnell,T., Rice,P., Rajandream,M.-A. and Barrell,B. (2000) Artemis: sequence visualisation and annotation. Bioinformatics, 16, 944945.
[Abstract/Free Full Text] - Overbeck,R., Fontstein,M., DSouza,M., Pusch,G.D. and Maltsev,N. (1999) The use of gene clusters to infer functional coupling. Proc. Natl Acad. Sci. USA, 96, 28962901.
[Abstract/Free Full Text] - Altschul,S.F., Madden,T.L., Schaffer,A.A., Zhang,J., Zhang,Z., Miller,W. and Lipman,D.J. (1997) Gapped BLAST and PSI-BLAST: a new generation of protein database search programs. Nucleic Acids Res., 25, 33893402.
[Abstract/Free Full Text] - Apweiler,R., Attwood,T.K., Bairoch,A., Bateman,A., Birney,E., Biswas,M., Bucher,P., Cerutti,L., Corpet,F., Croning,M.D.R., Durbin,R., Falquet,L., Fleischmann,W., Gouzy,J., Hermjakob,H., Hulo,N., Jonassen,I., Kahn,D., Kanapin,A., Karavidopoulou,Y., Lopez,R., Marx,B., Mulder,T.M., Oinn,N.J., Pagni,M., Servant,F., Sigrist,C.J.A. and Zdobnov,E.M. (2001) The InterPro database, an integrated documentation resource for protein families, domains and functional sites. Nucleic Acids Res., 29, 3740.
[Abstract/Free Full Text] - Etzold,T. and Argos,P. (1993) SRS: an indexing and retrieval tool for flat file data libraries. CABIOS, 9, 4957.
- Goesmann,A., Haubrock,M., Meyer,F., Kalinowski,J. and Giegerich,R. (2002) Pathfinder: reconstruction and dynamic visualization of metabolic pathways. Bioinformatics, 18, 124129.
[Abstract/Free Full Text] - Delcher,A.L., Harmon,D., Kasif,S., White,O. and Salzberg,S.L. (1999) Improved microbial gene identification with GLIMMER. Nucleic Acids Res., 27, 46364641.
[Abstract/Free Full Text] - Badger,H. and Olsen,G.J. (1999) CRITICA: coding region identification tool invoking comparative analysis. Mol. Biol. Evol., 16, 512524.[Abstract]
- Lowe,T.M. and Eddy,S.R. (1997) tRNAscan-SE: a program for improved detection of transfer RNA genes in genomic sequence. Nucleic Acids Res., 25, 955964.
[Abstract/Free Full Text] - Eddy,S.R. (1998) Profile hidden Markov models. Bioinformatics, 14, 755763.
[Abstract/Free Full Text] - Henikoff,S. and Henikoff,J.G. (1991) Automated assembly of protein blocks for database searching. Nucleic Acids Res., 19, 65656572.
[Abstract/Free Full Text] - Sonnhammer,E.L.L., von Heijne,G. and Krogh,A. (1998) A hidden Markov model for predicting transmembrane helices in protein sequences. In Glasgow,J., Littlejohn,T., Major,R., Lathrop,F., Sankoff,D. and Sensen,C. (eds), Proceedings of the Sixth International Conference on Intelligent Systems for Molecular Biology. AAAI Press, Menlo Park, CA, pp. 175182.
- Nielsen,H., Engelbrecht,J., Brunak,S. and von Heijne,G. (1997) Identification of prokaryotic and eukaryotic signal peptides and prediction of their cleavage sites. Protein Eng., 10, 16.
[Abstract/Free Full Text] - Galibert,F., Finan,T.M., Long,S.R., Pühler,A., Abola,P., Ampe,F., Barloy-Hubler,F., Barnett,M.J., Becker,A., Boistard,P., Bothe,G., Boutry,M., Bowser,L., Buhrmester,J., Cadieu,E., Capela,D., Chain,P., Cowie,A., Davis,R.W., Dreano,S., Federspiel,N.A., Fisher,R.F., Gloux,S., Godrie,T., Goffeau,A., Golding,B., Gouzy,J., Gurjal,M., Hernandez-Lucas,I., Hong,A., Huizar,L., Hyman,R.W., Jones,T., Kahn,D., Kahn,M.L., Kalman,S., Keating,D.H., Kiss,E., Komp,C., Lelaure,V., Masuy,D., Palm,C., Peck,M.C., Pohl,T.M., Portetelle,D., Purnelle,B., Ramsperger,U., Surzycki,R., Thebault,P., Vandenbol,M., Vorhölter,F.J., Weidner,S., Wells,D.H., Wong,K., Yeh,K.C. and Batut,J. (2001) The composite genome of the legume symbiont Sinorhizobium meliloti. Science, 29, 668672.
- Tauch,A., Schneiker,S., Selbitschka,W., Pühler,A., van Overbeek,L.S., Smalla,K., Thomas,C.M., Bailey,M.J., Forney,L.J., Weightman,A., Ceglowski,P., Pembroke,T., Tietze,E., Schroder,G., Lanka,E. and van Elsas,J.D. (2002) The complete nucleotide sequence and environmental distribution of the cryptic, conjugative, broad-host-range plasmid pIPO2 isolated from bacteria of the wheat rhizosphere. Microbiology, 148, 16371653.
[Abstract/Free Full Text] - Tauch,A., Schlüter,A., Bischoff,N., Goesmann,A., Meyer,F. and Pühler,A. (2002) The 79,370bp conjugative plasmid pb4 consists of an incp-beta backbone loaded with a chromate resistance transposon, the strastrb streptomycin resistance gene pair, the oxacillinase gene bla(nps-1), and a tripartite antibiotic efflux system of the resistancenodulationdivision family. Mol. Gen. Genomics, 268, 570584.
This article has been cited by other articles:
![]() |
J. Krawczyk, T. A. Kohl, A. Goesmann, J. Kalinowski, and J. Baumbach From Corynebacterium glutamicum to Mycobacterium tuberculosis--towards transfers of gene regulatory networks and integrated data analyses with MycoRegNet Nucleic Acids Res., June 3, 2009; (2009) gkp453v1. [Abstract] [Full Text] [PDF] |
||||
![]() |
H.-M. Oh, S. J. Giovannoni, S. Ferriera, J. Johnson, and J.-C. Cho Complete Genome Sequence of Erythrobacter litoralis HTCC2594 J. Bacteriol., April 1, 2009; 191(7): 2419 - 2420. [Abstract] [Full Text] [PDF] |
||||
![]() |
R. Rosenstein, C. Nerz, L. Biswas, A. Resch, G. Raddatz, S. C. Schuster, and F. Gotz Genome Analysis of the Meat Starter Culture Bacterium Staphylococcus carnosus TM300 Appl. Envir. Microbiol., February 1, 2009; 75(3): 811 - 822. [Abstract] [Full Text] [PDF] |
||||
![]() |
J. Baumbach, A. Tauch, and S. Rahmann Towards the integrated analysis, visualization and reconstruction of microbial gene regulatory networks Brief Bioinform, January 1, 2009; 10(1): 75 - 83. [Abstract] [Full Text] [PDF] |
||||
![]() |
J. Klein, R. Munch, I. Biegler, I. Haddad, I. Retter, and D. Jahn Strepto-DB, a database for comparative genomics of group A (GAS) and B (GBS) streptococci, implemented with the novel database platform 'Open Genome Resource' (OGeR) Nucleic Acids Res., January 1, 2009; 37(suppl_1): D494 - D498. [Abstract] [Full Text] [PDF] |
||||
![]() |
E. Strauch, J. A. Hammerl, A. Konietzny, S. Schneiker-Bekel, W. Arnold, A. Goesmann, A. Puhler, and L. Beutin Bacteriophage 2851 Is a Prototype Phage for Dissemination of the Shiga Toxin Variant Gene 2c in Escherichia coli O157:H7 Infect. Immun., December 1, 2008; 76(12): 5466 - 5477. [Abstract] [Full Text] [PDF] |
||||
![]() |
H. Neuweger, S. P. Albaum, M. Dondrup, M. Persicke, T. Watt, K. Niehaus, J. Stoye, and A. Goesmann MeltDB: a software platform for the analysis and integration of metabolomics experiment data Bioinformatics, December 1, 2008; 24(23): 2726 - 2732. [Abstract] [Full Text] [PDF] |
||||
![]() |
S. E. Battle, F. Meyer, J. Rello, V. L. Kung, and A. R. Hauser Hybrid Pathogenicity Island PAGI-5 Contributes to the Highly Virulent Phenotype of a Pseudomonas aeruginosa Isolate in Mammals J. Bacteriol., November 1, 2008; 190(21): 7130 - 7140. [Abstract] [Full Text] [PDF] |
||||
![]() |
J. M. Gonzalez, B. Fernandez-Gomez, A. Fernandez-Guerra, L. Gomez-Consarnau, O. Sanchez, M. Coll-Llado, J. del Campo, L. Escudero, R. Rodriguez-Martinez, L. Alonso-Saez, et al. From the Cover: Genome analysis of the proteorhodopsin-containing marine bacterium Polaribacter sp. MED152 (Flavobacteria) PNAS, June 24, 2008; 105(25): 8724 - 8729. [Abstract] [Full Text] [PDF] |
||||
![]() |
K.-H. Gartemann, B. Abt, T. Bekel, A. Burger, J. Engemann, M. Flugel, L. Gaigalat, A. Goesmann, I. Grafen, J. Kalinowski, et al. The Genome Sequence of the Tomato-Pathogenic Actinomycete Clavibacter michiganensis subsp. michiganensis NCPPB382 Reveals a Large Island Involved in Pathogenicity J. Bacteriol., March 15, 2008; 190(6): 2138 - 2149. [Abstract] [Full Text] [PDF] |
||||
![]() |
C. Ansong, S. O. Purvine, J. N. Adkins, M. S. Lipton, and R. D. Smith Proteogenomics: needs and roles to be filled by proteomics in genome annotation Brief Funct Genomic Proteomic, March 10, 2008; (2008) eln010v1. [Abstract] [Full Text] [PDF] |
||||
![]() |
C. Schoen, J. Blom, H. Claus, A. Schramm-Gluck, P. Brandt, T. Muller, A. Goesmann, B. Joseph, S. Konietzny, O. Kurzai, et al. Whole-genome comparison of disease and carriage strains provides insights into virulence evolution in Neisseria meningitidis PNAS, March 4, 2008; 105(9): 3473 - 3478. [Abstract] [Full Text] [PDF] |
||||
![]() |
S. M. Sievert, K. M. Scott, M. G. Klotz, P. S. G. Chain, L. J. Hauser, J. Hemp, M. Hugler, M. Land, A. Lapidus, F. W. Larimer, et al. Genome of the Epsilonproteobacterial Chemolithoautotroph Sulfurimonas denitrificans Appl. Envir. Microbiol., February 15, 2008; 74(4): 1145 - 1156. [Abstract] [Full Text] [PDF] |
||||
![]() |
C. Meisinger-Henschel, M. Schmidt, S. Lukassen, B. Linke, L. Krause, S. Konietzny, A. Goesmann, P. Howley, P. Chaplin, M. Suter, et al. Genomic sequence of chorioallantois vaccinia virus Ankara, the ancestor of modified vaccinia virus Ankara J. Gen. Virol., December 1, 2007; 88(12): 3249 - 3259. [Abstract] [Full Text] [PDF] |
||||
![]() |
L. P. Wackett, J. A. Frias, J. L. Seffernick, D. J. Sukovich, and S. M. Cameron Genomic and Biochemical Studies Demonstrating the Absence of an Alkane-Producing Phenotype in Vibrio furnissii M1 Appl. Envir. Microbiol., November 15, 2007; 73(22): 7192 - 7198. [Abstract] [Full Text] [PDF] |
||||
![]() |
A. Schluter, I. Krahn, F. Kollin, G. Bonemann, M. Stiens, R. Szczepanowski, S. Schneiker, and A. Puhler IncP-1{beta} Plasmid pGNB1 Isolated from a Bacterial Community from a Wastewater Treatment Plant Mediates Decolorization of Triphenylmethane Dyes Appl. Envir. Microbiol., October 15, 2007; 73(20): 6345 - 6350. [Abstract] [Full Text] [PDF] |
||||
![]() |
M. A. Moran, R. Belas, M. A. Schell, J. M. Gonzalez, F. Sun, S. Sun, B. J. Binder, J. Edmonds, W. Ye, B. Orcutt, et al. Ecological Genomics of Marine Roseobacters Appl. Envir. Microbiol., July 15, 2007; 73(14): 4559 - 4569. [Abstract] [Full Text] [PDF] |
||||
![]() |
M. Richter, M. Kube, D. A. Bazylinski, T. Lombardot, F. O. Glockner, R. Reinhardt, and D. Schuler Comparative Genome Analysis of Four Magnetotactic Bacteria Reveals a Complex Set of Group-Specific Genes Implicated in Magnetosome Biomineralization and Function J. Bacteriol., July 1, 2007; 189(13): 4899 - 4910. [Abstract] [Full Text] [PDF] |
||||
![]() |
U. Wegmann, M. O'Connell-Motherway, A. Zomer, G. Buist, C. Shearman, C. Canchaya, M. Ventura, A. Goesmann, M. J. Gasson, O. P. Kuipers, et al. Complete Genome Sequence of the Prototype Lactic Acid Bacterium Lactococcus lactis subsp. cremoris MG1363 J. Bacteriol., April 15, 2007; 189(8): 3256 - 3270. [Abstract] [Full Text] [PDF] |
||||
![]() |
A. Schluter, R. Szczepanowski, N. Kurz, S. Schneiker, I. Krahn, and A. Puhler Erythromycin Resistance-Conferring Plasmid pRSB105, Isolated from a Sewage Treatment Plant, Harbors a New Macrolide Resistance Determinant, an Integron-Containing Tn402-Like Element, and a Large Region of Unknown Function Appl. Envir. Microbiol., March 15, 2007; 73(6): 1952 - 1960. [Abstract] [Full Text] [PDF] |
||||
![]() |
B. M. Fuchs, S. Spring, H. Teeling, C. Quast, J. Wulf, M. Schattenhofer, S. Yan, S. Ferriera, J. Johnson, F. O. Glockner, et al. From the Cover: Characterization of a marine gammaproteobacterium capable of aerobic anoxygenic photosynthesis PNAS, February 20, 2007; 104(8): 2891 - 2896. [Abstract] [Full Text] [PDF] |
||||
![]() |
R. Szczepanowski, I. Krahn, N. Bohn, A. Puhler, and A. Schluter Novel Macrolide Resistance Module Carried by the IncP-1{beta} Resistance Plasmid pRSB111, Isolated from a Wastewater Treatment Plant Antimicrob. Agents Chemother., February 1, 2007; 51(2): 673 - 678. [Abstract] [Full Text] [PDF] |
||||
![]() |
T. Hain, C. Steinweg, C. T. Kuenne, A. Billion, R. Ghai, S. S. Chatterjee, E. Domann, U. Karst, A. Goesmann, T. Bekel, et al. Whole-Genome Sequence of Listeria welshimeri Reveals Common Steps in Genome Reduction with Listeria innocua as Compared to Listeria monocytogenes J. Bacteriol., November 1, 2006; 188(21): 7405 - 7415. [Abstract] [Full Text] [PDF] |
||||
![]() |
G. Bonemann, M. Stiens, A. Puhler, and A. Schluter Mobilizable IncQ-Related Plasmid Carrying a New Quinolone Resistance Gene, qnrS2, Isolated from the Bacterial Community of a Wastewater Treatment Plant. Antimicrob. Agents Chemother., September 1, 2006; 50(9): 3075 - 3080. [Abstract] [Full Text] [PDF] |
||||
![]() |
K. Bryson, V. Loux, R. Bossy, P. Nicolas, S. Chaillou, M. van de Guchte, S. Penaud, E. Maguin, M. Hoebeke, P. Bessieres, et al. AGMIAL: implementing an annotation strategy for prokaryote genomes as a distributed system Nucleic Acids Res., July 19, 2006; 34(12): 3533 - 3545. [Abstract] [Full Text] [PDF] |
||||
![]() |
N. Pobigaylo, D. Wetter, S. Szymczak, U. Schiller, S. Kurtz, F. Meyer, T. W. Nattkemper, and A. Becker Construction of a Large Signature-Tagged Mini-Tn5 Transposon Library and Its Application to Mutagenesis of Sinorhizobium meliloti. Appl. Envir. Microbiol., June 1, 2006; 72(6): 4329 - 4337. [Abstract] [Full Text] [PDF] |
||||
![]() |
M. Stiens, S. Schneiker, M. Keller, S. Kuhn, A. Puhler, and A. Schluter Sequence Analysis of the 144-Kilobase Accessory Plasmid pSmeSM11a, Isolated from a Dominant Sinorhizobium meliloti Strain Identified during a Long-Term Field Release Experiment. Appl. Envir. Microbiol., May 1, 2006; 72(5): 3662 - 3672. [Abstract] [Full Text] [PDF] |
||||
![]() |
D. Vallenet, L. Labarre, Z. Rouy, V. Barbe, S. Bocs, S. Cruveiller, A. Lajus, G. Pascal, C. Scarpelli, and C. Medigue MaGe: a microbial genome annotation system supported by synteny results Nucleic Acids Res., January 10, 2006; 34(1): 53 - 65. [Abstract] [Full Text] [PDF] |
||||
![]() |
F. Thieme, R. Koebnik, T. Bekel, C. Berger, J. Boch, D. Buttner, C. Caldana, L. Gaigalat, A. Goesmann, S. Kay, et al. Insights into Genome Plasticity and Pathogenicity of the Plant Pathogenic Bacterium Xanthomonas campestris pv. vesicatoria Revealed by the Complete Genome Sequence J. Bacteriol., November 1, 2005; 187(21): 7254 - 7266. [Abstract] [Full Text] [PDF] |
||||
![]() |
M. Mussmann, M. Richter, T. Lombardot, A. Meyerdierks, J. Kuever, M. Kube, F. O. Glockner, and R. Amann Clustered Genes Related to Sulfate Respiration in Uncultured Prokaryotes Support the Theory of Their Concomitant Horizontal Transfer J. Bacteriol., October 15, 2005; 187(20): 7126 - 7137. [Abstract] [Full Text] [PDF] |
||||
![]() |
M. Falb, F. Pfeiffer, P. Palm, K. Rodewald, V. Hickmann, J. Tittor, and D. Oesterhelt Living with two extremes: Conclusions from the genome sequence of Natronomonas pharaonis Genome Res., October 1, 2005; 15(10): 1336 - 1343. [Abstract] [Full Text] [PDF] |
||||
![]() |
A. Tauch, O. Kaiser, T. Hain, A. Goesmann, B. Weisshaar, A. Albersmeier, T. Bekel, N. Bischoff, I. Brune, T. Chakraborty, et al. Complete Genome Sequence and Analysis of the Multiresistant Nosocomial Pathogen Corynebacterium jeikeium K411, a Lipid-Requiring Bacterium of the Human Skin Flora J. Bacteriol., July 1, 2005; 187(13): 4671 - 4682. [Abstract] [Full Text] [PDF] |
||||
![]() |
E. J. Alm, K. H. Huang, M. N. Price, R. P. Koche, K. Keller, I. L. Dubchak, and A. P. Arkin The MicrobesOnline Web site for comparative genomics Genome Res., July 1, 2005; 15(7): 1015 - 1022. [Abstract] [Full Text] [PDF] |
||||
![]() |
G. H. Van Domselaar, P. Stothard, S. Shrivastava, J. A. Cruz, A. Guo, X. Dong, P. Lu, D. Szafron, R. Greiner, and D. S. Wishart BASys: a web server for automated bacterial genome annotation Nucleic Acids Res., July 1, 2005; 33(suppl_2): W455 - W459. [Abstract] [Full Text] [PDF] |
||||
![]() |
A. Goesmann, B. Linke, D. Bartels, M. Dondrup, L. Krause, H. Neuweger, S. Oehm, T. Paczian, A. Wilke, and F. Meyer BRIGEP--the BRIDGE-based genome-transcriptome-proteome browser Nucleic Acids Res., July 1, 2005; 33(suppl_2): W710 - W716. [Abstract] [Full Text] [PDF] |
||||
![]() |
R. Szczepanowski, S. Braun, V. Riedel, S. Schneiker, I. Krahn, A. Puhler, and A. Schluter The 120 592 bp IncF plasmid pRSB107 isolated from a sewage-treatment plant encodes nine different antibiotic-resistance determinants, two iron-acquisition systems and other putative virulence-associated functions Microbiology, April 1, 2005; 151(4): 1095 - 1111. [Abstract] [Full Text] [PDF] |
||||
![]() |
H. Heuer, R. Szczepanowski, S. Schneiker, A. Puhler, E. M. Top, and A. Schluter The complete sequences of plasmids pB2 and pB3 provide evidence for a recent ancestor of the IncP-1{beta} group without any accessory genes Microbiology, November 1, 2004; 150(11): 3591 - 3599. [Abstract] [Full Text] [PDF] |
||||
![]() |
R. Szczepanowski, I. Krahn, B. Linke, A. Goesmann, A. Puhler, and A. Schluter Antibiotic multiresistance plasmid pRSB101 isolated from a wastewater treatment plant is related to plasmids residing in phytopathogenic bacteria and carries eight different resistance determinants including a multidrug transport system Microbiology, November 1, 2004; 150(11): 3613 - 3630. [Abstract] [Full Text] [PDF] |
||||
![]() |
H. Teeling, T. Lombardot, M. Bauer, W. Ludwig, and F. O. Glockner Evaluation of the phylogenetic position of the planctomycete 'Rhodopirellula baltica' SH 1 by means of concatenated ribosomal protein sequences, DNA-directed RNA polymerase subunit sequences and whole genome trees Int J Syst Evol Microbiol, May 1, 2004; 54(3): 791 - 801. [Abstract] [Full Text] [PDF] |
||||
![]() |
J. Westberg, A. Persson, A. Holmberg, A. Goesmann, J. Lundeberg, K.-E. Johansson, B. Pettersson, and M. Uhlen The Genome Sequence of Mycoplasma mycoides subsp. mycoides SC Type Strain PG1T, the Causative Agent of Contagious Bovine Pleuropneumonia (CBPP) Genome Res., February 1, 2004; 14(2): 221 - 227. [Abstract] [Full Text] [PDF] |
||||
![]() |
A. Schluter, H. Heuer, R. Szczepanowski, L. J. Forney, C. M. Thomas, A. Puhler, and E. M. Top The 64 508 bp IncP-1{beta} antibiotic multiresistance plasmid pB10 isolated from a waste-water treatment plant provides evidence for recombination between members of different branches of the IncP-1{beta} group Microbiology, November 1, 2003; 149(11): 3139 - 3153. [Abstract] [Full Text] [PDF] |
||||
![]() |
S. Schubbe, M. Kube, A. Scheffel, C. Wawer, U. Heyen, A. Meyerdierks, M. H. Madkour, F. Mayer, R. Reinhardt, and D. Schuler Characterization of a Spontaneous Nonmagnetic Mutant of Magnetospirillum gryphiswaldense Reveals a Large Deletion Comprising a Putative Magnetosome Island J. Bacteriol., October 1, 2003; 185(19): 5779 - 5790. [Abstract] [Full Text] [PDF] |
||||
![]() |
C. Baar, M. Eppinger, G. Raddatz, J. Simon, C. Lanz, O. Klimmek, R. Nandakumar, R. Gross, A. Rosinus, H. Keller, et al. Complete genome sequence and analysis of Wolinella succinogenes PNAS, September 30, 2003; 100(20): 11690 - 11695. [Abstract] [Full Text] [PDF] |
||||
![]() |
F. O. Glockner, M. Kube, M. Bauer, H. Teeling, T. Lombardot, W. Ludwig, D. Gade, A. Beck, K. Borzym, K. Heitmann, et al. Complete genome sequence of the marine planctomycete Pirellula sp. strain 1 PNAS, July 8, 2003; 100(14): 8298 - 8303. [Abstract] [Full Text] [PDF] |
||||
| ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||




















