Skip Navigation



Nucleic Acids Research Advance Access published online on November 13, 2009

Nucleic Acids Research, doi:10.1093/nar/gkp848
This Article
Right arrow Abstract Freely available
Right arrow Print PDF (6251K) Freely available
Right arrow Screen PDF (740K) Freely available
Right arrow Supplementary Data
Right arrow Alert me when this article is cited
Right arrow Alert me if a correction is posted
Services
Right arrow Email this article to a friend
Right arrow Similar articles in this journal
Right arrow Similar articles in PubMed
Right arrow Alert me to new issues of the journal
Right arrow Add to My Personal Archive
Right arrow Download to citation manager
Right arrow Commercial Re-use Guidelines
for Open Access NAR Content
Google Scholar
Right arrow Articles by Liolios, K.
Right arrow Articles by Kyrpides, N. C.
PubMed
Right arrow PubMed Citation
Right arrow Articles by Liolios, K.
Right arrow Articles by Kyrpides, N. C.
Social Bookmarking
 Add to CiteULike   Add to Connotea   Add to Del.icio.us  
What's this?

© The Author(s) 2009. Published by Oxford University Press.
This is an Open Access article distributed under the terms of the Creative Commons Attribution Non-Commercial License (http://creativecommons.org/licenses/by-nc/2.5/uk/) which permits unrestricted non-commercial use, distribution, and reproduction in any medium, provided the original work is properly cited.


Database Issue

The Genomes On Line Database (GOLD) in 2009: status of genomic and metagenomic projects and their associated metadata

Konstantinos Liolios1, I-Min A. Chen2, Konstantinos Mavromatis1, Nektarios Tavernarakis3, Philip Hugenholtz4, Victor M. Markowitz2 and Nikos C. Kyrpides1,*

1Genome Biology Program, DOE Joint Genome Institute, 2800 Mitchell Drive, Walnut Creek, 2Biological Data Management and Technology Center, Lawrence Berkeley National Laboratory, Berkeley, CA, USA, 3Institute of Molecular Biology and Biotechnology, Foundation for Research and Technology, Heraklion, Crete, Greece and 4Microbial Ecology Program, DOE Joint Genome Institute, 2800 Mitchell Drive, Walnut Creek, CA, USA

*To whom correspondence should be addressed. Tel: +1 925 296 5718; Fax: +1 925 296 5850; Email: nckyrpides{at}lbl.gov

Received September 19, 2009. Accepted September 22, 2009.


    ABSTRACT
 TOP
 ABSTRACT
 HISTORY AND GROWTH
 CURRENT STATUS OF GOLD
 NEW DEVELOPMENTS
 OVERVIEW STATISTICS
 FUTURE DIRECTIONS
 DATABASE AVAILABILITY
 SUPPLEMENTARY DATA
 FUNDING
 REFERENCES
 
The Genomes On Line Database (GOLD) is a comprehensive resource for centralized monitoring of genome and metagenome projects worldwide. Both complete and ongoing projects, along with their associated metadata, can be accessed in GOLD through precomputed tables and a search page. As of September 2009, GOLD contains information for more than 5800 sequencing projects, of which 1100 have been completed and their sequence data deposited in a public repository. GOLD continues to expand, moving toward the goal of providing the most comprehensive repository of metadata information related to the projects and their organisms/environments in accordance with the Minimum Information about a (Meta)Genome Sequence (MIGS/MIMS) specification. GOLD is available at: http://www.genomesonline.org and has a mirror site at the Institute of Molecular Biology and Biotechnology, Crete, Greece, at: http://gold.imbb.forth.gr/


    HISTORY AND GROWTH
 TOP
 ABSTRACT
 HISTORY AND GROWTH
 CURRENT STATUS OF GOLD
 NEW DEVELOPMENTS
 OVERVIEW STATISTICS
 FUTURE DIRECTIONS
 DATABASE AVAILABILITY
 SUPPLEMENTARY DATA
 FUNDING
 REFERENCES
 
The Genomes OnLine Database (GOLD) provides a centralized resource for the continuous monitoring of genome and metagenome sequencing projects worldwide, uniquely integrated with their associated metadata. Since its founding in 1997 (1–4), GOLD has grown dramatically, now hosting information regarding over 5800 sequencing projects (Figure 1A).


Figure 1
View larger version (40K):
[in this window]
[in a new window]
[Download PowerPoint slide]
 
Figure 1. Statistical information available from GOLD. (A) Evolution of the complete and ongoing genome projects monitored in GOLD from December 1997 through September 2009. (B) Distribution of the 5831 genome projects across the major sequencing centers. Abbreviations: JGI, Joint Genome Institute; JCVI, J. Craig Venter Institute; Broad, Broad Institute; WashU, Washington University; Sanger, the Wellcome Trust Sanger Institute; BCM-HGSC, Baylor College of Medicine Human Genome Sequencing Center; WORLD, all other sequencing centers. (C) Distribution of the 200 current metagenome projects across the three major metagenome classification categories. (D) Phylogenetic distribution of the 4172 bacterial genome projects as of September 2009.

 
The number of registered sequencing projects has doubled since the publication of the previous report two years ago (4). As of September 2009, 5843 projects have been recorded, versus 2905 as of September 2007 and 1575 as of September 2005 (3, 4). This rapid growth has been fueled by decreasing sequencing costs combined with technological advances, and was significantly augmented by the launching and successful execution of several large-scale microbial genome sequencing initiatives, e.g. the Human Microbiome Project (http://www.hmpdacc.org/) and the Genomic Encyclopedia of Bacteria and Archaea (http://www.jgi.doe.gov/programs/GEBA/). During this period, GOLD has also expanded its scope beyond standard genomic and metagenomic projects to now encompass data from the growing number of resequencing, transcriptome and metatranscriptome projects.

In parallel with this doubling in the number of genome projects has come an increase in the number of captured metadata fields from 56 in 2007 (4) to 135 today. This is an area of active development; thus, we anticipate further increases as more metadata types are described and captured in published studies. Some of the new metadata types are described below.

Among the most important developments of the database during the last 2 years are those coupled to the growth of the metadata. These include the implementation of GOLD-specific Controlled Vocabularies (CVs) for the representation of the associated data, as well as coordination with the Genomics Standards Consortium (GSC) (http://gensc.org/) and compliance with its recommendations for the Minimum Information about a (Meta)Genome Sequence (MIGS/MIMS) (5).

As the rate of launching new projects accelerates, the task of monitoring and recording their data along with their metadata grows ever more difficult. Therefore, the sequencing centers and the community at large are strongly encouraged to register their own sequencing projects in GOLD to ensure complete and accurate project tracking.


    CURRENT STATUS OF GOLD
 TOP
 ABSTRACT
 HISTORY AND GROWTH
 CURRENT STATUS OF GOLD
 NEW DEVELOPMENTS
 OVERVIEW STATISTICS
 FUTURE DIRECTIONS
 DATABASE AVAILABILITY
 SUPPLEMENTARY DATA
 FUNDING
 REFERENCES
 
Published complete genomes
The year 2009 represents a landmark in the history of genome sequencing projects: the completed sequencing of the first 1000 genomes. As of September 2009, GOLD documents 1100 completed genome projects, a 1.7-fold increase from 2 years ago (4). These comprise 914 bacterial, 68 archaeal and 118 eukaryotic genomes. Thus, the completely sequenced archaeal and bacterial genomes currently total 982, leading one to confidently predict that the community will celebrate yet another 1000 genome milestone before the end of the year.

For all of these projects, the complete genome sequence is ‘published’ by being deposited in one of the public archival databases such as GenBank (6), EMBL (7) and DDBJ (8). However, a rapidly increasing proportion of the projects do not have an associated publication in the literature. That fraction currently stands at 37% (408 of 1100). This shift is partly attributable to the more frequent release of sequence data to the community prior to publication in compliance with the rapid pre-publication data release policies and recommendations (9). Another factor is the increase in larger-scale efforts that involve the parallel sequencing of several hundred organisms (e.g. the HMP and GEBA). Here, preparation of the typical detailed publication describing the genome of every single organism would be virtually impossible (4,10). This situation calls for a new mechanism that can provide a GSC-compliant citable record for every completed genome project and its metadata. To that end, an open access scientific journal, Standards in Genomic Sciences (SIGS), (http://standardsingenomics.org/) has recently been launched (11), its goal being to catalog and maintain the data from completed genome projects in an orderly and standardized manner (10).

In addition to publication of each complete genome sequence, GSC also strongly recommends that the source organism be available from a culture collection center. It is unfortunate that after so many years and so many genome sequences, the widely accepted policies for publication of genome sequencing projects require the submission to a public repository of only the sequence data, not also the biological material itself. As a result, from the current list of 982 completed archaeal and bacterial genomes, only 518 (53%) appear to be available from a culture collection center (12), and only half of those genomes (27% of the total) represent a type strain of the sequenced species.

Ongoing genome projects
In addition to the 1095 completed projects, there are currently 4543 ongoing sequencing projects, of which 3271 are bacterial, 110 archaeal and 1162 are eukaryotic. This total is more than double the 2158 reported 2 years ago. Until recently, the projects monitored for GOLD were predominantly ‘Genome’ and ‘EST’ sequencing projects, supplemented by a small number of ‘Genome-Surveys’ and ‘Genome-Regions’ (the latter representing some eukaryotic projects focused on specific genomic regions). The increasing number of ‘Resequencing’ and ‘Transcriptome’ projects prompted the addition of these two new project types during the past year (Table 1).


View this table:
[in this window]
[in a new window]

 
Table 1. Project type distributiona

 
The current Sequencing Status distribution tallied by domain is shown in Table 2. The Sequencing Status designations and current tallies are as follows:
  • Complete: DNA sequencing has been completed; 288 projects in addition to the 1100 already published.
  • Draft: a draft sequence has been deposited in a public repository; 1164 projects.
  • In progress: the DNA sequence has been received by the sequencing center but there is not yet public data release; 442 projects.
  • Awaiting DNA: an organism selection has been made, but the DNA has not yet arrived at the DNA sequencing center; 236 projects.
  • Targeted: a project has been identified but further work has not yet begun; 527 projects.


View this table:
[in this window]
[in a new window]

 
Table 2. Project status distributiona

 
The distributions of all projects by Project Type and by Sequencing Status are now dynamically tracked with every GOLD update and can be viewed online through the main page at: http://www.genomesonline.org/gold.cgi.

Metagenome projects
The past 2 years have seen a growing number of metagenomic projects added to GOLD, and the expectation is that this trend will continue, reinforced by further advances in the sequencing technology. The database currently reports 200 distinct metagenomic projects, embracing 453 samples.

During curation, careful attention is paid to ensure that project names follow the standardized schema previously described (4). All the metagenome projects are classified under three major categories: environmental (137 projects), endobiotic or host-associated (53 projects) and synthetic (10 samples) (Figure 1C). A project classification schema is also under development and will soon be released from the database. A prototype of this classification has already been adopted by the Integrated Microbial Genomes with Microbiome Samples (IMG/M) database (13) and is available for browsing online (http://img.jgi.doe.gov/cgi-bin/m/main.cgi?section=TaxonList&page=taxonListPhylo&domain=*Microbiome&genome_type=metagenome). A hierarchical classification scheme with all the metagenome projects captured in GOLD will soon be available from the database.

Metadata
The genome/metagenome associated metadata have also undergone significant expansion in GOLD during the last 2 years. The number of metadata categories has increased from two in the previous release to six in GOLD v.3: (i) organism information; (ii) project information; (iii) sequencing information; (iv) environmental metadata; (v) host metadata; and (vi) organism metadata. Likewise, the number of metadata fields assigned to those categories has grown from 56 to 135.

The current status of the different fields and the number of projects with associated data for each of the corresponding fields is shown in Table 3. Some of the metadata fields are populated for all or most of the projects, while other fields (particularly newer ones) are yet to be curated for the majority of the projects. Although the number of metadata fields is expected to continue to grow, the current list has already been put to use in microbial comparative analysis systems such as the Integrated Microbial Genomes IMG (14) and IMG/M (13).

Particularly important developments currently underway involve the integration and mapping of several of the available metadata fields in GOLD to well-developed publicly available metadata ontologies and control vocabularies such as ‘Habitat-Lite’ (15) and others.


View this table:
[in this window]
[in a new window]

 
Table 3. Metadata categories and fields

 

    NEW DEVELOPMENTS
 TOP
 ABSTRACT
 HISTORY AND GROWTH
 CURRENT STATUS OF GOLD
 NEW DEVELOPMENTS
 OVERVIEW STATISTICS
 FUTURE DIRECTIONS
 DATABASE AVAILABILITY
 SUPPLEMENTARY DATA
 FUNDING
 REFERENCES
 
New user interface implementing new technologies
The burgeoning array of new types of data recorded in GOLD necessitated a major revamping of the graphical user interface. The GOLD tables have been visually enhanced using advanced graphical technologies such as EXT JS JavaScript library for the grids, the Yahoo User Interface Library for the pie charts and data tables, the Google Maps API for geographical location display, Google MarkerClusterer for improved visual display of multiple map locations, and the JavaScript Object Notation (JSON) data format for rapid data loading.

On the main page (http://www.genomesonline.org/gold.cgi), three links have been added to connect to new pages displaying the current distribution of projects by type, sequencing status and phylogeny (Figure 2, right). On each of these new pages, the same technologies are used to convey key breakdown data in a visually intuitive manner. Below the links, the Google map is displayed showing all projects individually or in clusters (Figure 2, left). Clicking on a project displays information about the collection location, an image (if available), and a link to the project’s GOLD CARD page.


Figure 2
View larger version (43K):
[in this window]
[in a new window]
[Download PowerPoint slide]
 
Figure 2. Graphical displays in GOLD. (Left) Geographical display of the collection location for organisms and environmental samples. Click on a project to view the detailed information window showing the name of the project, an image (if available), a GOLD CARD link, and a short description identifying the location. (Right) Phylogenetic distribution of archaeal, bacterial and eukaryotic projects with accompanying data tables.

 
The same entry page provides access to the enhanced tables for the five major GOLD project categories (published complete genomes, archaeal ongoing genomes, bacterial ongoing genomes, eukaryotic ongoing genomes and metagenomes). Each table displays information for 12 primary metadata fields for each project. By default, projects are sorted by GOLDSTAMP ID, sequential numbers assigned in sequence as projects are entered in GOLD. To sort by the data in any other column, click the column header. To display advanced options, mouse over the column header and click to open the dropdown list. These options enable you to sort in ascending or descending order, to show/hide different columns and to filter the projects displayed based on data in that column.

The Search GOLD page has been completely rewritten. There are currently four tab pages, each corresponding to a different search mode and each offering new capabilities for more effective searching. The first tab, the basic search, provides commonly used Boolean queries for the most frequently searched fields in three main data categories. The Advanced Search tab offers a more extended list of search criteria from eight major data categories. The Metadata Search tab can be used to query the database metadata and view the results in tables and graphical displays of statistics and rankings. A fourth tab that is currently under development, Custom (SQL) Search, will enable users to construct and execute their own SQL queries. The aforementioned interface technologies are also employed here to provide an enhanced visual display of the search results and enable further manipulations. The user can export the search results to a Microsoft® Excel file or redirect them to the metadata analysis page. At that page, charts and statistics can be derived from the breakdown of the search results based on more than 40 metadata fields.

Finally, the GOLD CARD page has also been extensively redesigned, making for more intuitive navigation (Figure 3). Genome project data are now organized into seven major categories for easier access. Google map location and images of the organism(s) are provided when available. Empty data rows can be hidden by clicking the arrow located at the upper right corner of the card. The GOLD CARD page complies with the GSC standards (5) and provides IDs and links for all the compliant data fields. The list of metadata fields provided by GOLD, now more than 100, includes those currently part of the MIGS specifications plus many more that are now candidates for inclusion in the MIGS list.

The prefix in the GOLDSTAMP identifier assigned to each project encodes additional project information: Gc, GOLD complete; Gi, GOLD incomplete; Gm, GOLD metagenome; Ge, GOLD EST; Gr, GOLD resequencing; and Gt, GOLD transcriptome.


Figure 3
View larger version (108K):
[in this window]
[in a new window]
[Download PowerPoint slide]
 
Figure 3. The GOLD CARD page. The GOLD CARD Page with the list of available metadata organized into six major categories. The corresponding MIGS/MIMS IDs are also shown for each GOLD field.

 
Metadata collection and management system
The number of genome projects initiated is increasing exponentially, bringing with it an exponential increase in the task of curating the GOLD data. To help cope with this flood, a new project management system (IMG-GOLD) was created to interface between GOLD and the Integrated Microbial Genomes (IMG) system (14). IMG is a widely used community resource for comparative analysis of publicly available genome data. The Expert Review version of IMG (IMG ER) (16) allows users to enter their own genome sequence data sets so that they can review and curate the annotations prior to their public release. Metadata accompanying those genome data sets are now captured via the IMG ER submission site (http://img.jgi.doe.gov/submit) and recorded in the new IMG-GOLD system (http://img.jgi.doe.gov/gold). IMG-GOLD now serves not only as the database underlying GOLD, but also as the source of metadata for IMG and IMG ER (and their metagenome counterparts, IMG/M and IMG/M ER). An example of how the metadata from GOLD can support and be presented through a metagenome analysis system, such as IMG/M (13), is presented in Supplementary Figure S1 (Supplementary Data). We anticipate that similar data exchange and interoperability between GOLD and other analytical systems, such as RAST (17) and CAMERA (18), will be developed in the near future.

Other systems already powered by GOLD include the NIH-funded Human Microbiome Project Catalog (HMPC) provided through the Data Analysis and Coordination Center (DACC) (http://www.hmpdacc.org/). DACC connects directly to the GOLD database and accesses the HMP-specific data subset. To enable monitoring of the status of an HMP genome project, a new set of attributes and data types were added to GOLD and the already-existing controlled vocabularies were expanded. The HMPC page enables the DACC collaborators to choose and view targeted genome strains for sequencing. However, the community can also use this page to query the reference genomes and return profuse metadata.

IMG-GOLD also provides a web-based data entry mechanism that enables genome project submitters and curators to create/update/delete GOLD genome projects, provide associated metadata and create/edit controlled vocabularies for new metadata attributes. For users who prefer to provide metadata in file format, preformatted Excel spreadsheets are provided on the GOLD site (http://genomesonline.org/Project_submission.htm) for both genome and metagenome projects.

Data availability
All GOLD data are available according to the Creative Commons License of Attribution-NonCommercial-ShareAlike (http://creativecommons.org/licenses/by-nc-sa/2.5/). All of the available metadata types in GOLD can be downloaded to an Excel file to facilitate wider distribution and use of the data.


    OVERVIEW STATISTICS
 TOP
 ABSTRACT
 HISTORY AND GROWTH
 CURRENT STATUS OF GOLD
 NEW DEVELOPMENTS
 OVERVIEW STATISTICS
 FUTURE DIRECTIONS
 DATABASE AVAILABILITY
 SUPPLEMENTARY DATA
 FUNDING
 REFERENCES
 
Several different types of statistics, related to each of the data fields, can be derived using GOLD’s advanced search engine, the new metadata search capability, and the data download capability. In addition, graphical overviews for specific data types are provided via the ‘Gold Statistics’ link on the database home page (http://genomesonline.org/gold_statistics.htm). This feature is supported for the data fields discussed in the following paragraphs.

Evolution of genome projects
Genome project tracking in GOLD has been steadily increasing over time with an average\2.25-fold increase every 2 years for the past 12 years (Figure 1A). The microbial genome projects have been carrying the majority of that increase. This systematic and comprehensive genome project tracking can help addressing two major questions: (i) where and how numerous are the remaining gaps in sequencing along the bacterial and archaeal branches of the tree of life, and (ii) how accurately can we predict the number of genome projects that will be sequenced over the next 3–5 years?

Table 4 addresses the first question by reporting the taxonomic distribution of genome projects, showing for each taxon the number of genome projects compared with the total number of described taxonomic units (filtering out the environmental and the unknown entries). In effect, it identifies the taxonomic groups in each domain of life for which there are no currently registered genome projects. These taxonomic groups should eventually become targets for new sequencing projects. Further, we hope that the availability of this systematic project monitoring will not only help identify the next sequencing targets, but also help the sequencing centers to avoid unnecessary redundancy and duplication of efforts.


View this table:
[in this window]
[in a new window]

 
Table 4. Taxonomic distribution of genome projectsa

 
Table 5 attempts to address the second question which is what is the anticipated growth of the microbial genome projects over the next 5 years? Following a very conservative estimate we would expect to see three times increase in the number of the complete and 10 times in the number of the draft microbial genome projects that have been sequenced during the last 15 years. However, if we extrapolate a linear increase in the number of finished and draft genomes based on Figure 1A, those predictions would be realized within the next 3 years.


View this table:
[in this window]
[in a new window]

 
Table 5. Predicted increase of microbial genome sequencing projects

 
Sequencing centers
Four major sequencing centers account for about 50% of the 5843 sequencing projects currently monitored in GOLD (Figure 1B), a situation that has not changed over the last 2 years. However, when considering only archaeal and bacterial projects, the two leading sequencing centers (JGI and JCVI) now represent a smaller share: about 35%, compared to more than half 2 years ago. The fact that a much larger community is now carrying out these projects compared to 2 years ago also reflects the increasing democratization of the sequencing technology.

Phylogenetic distribution
The sampling bias favoring three major bacterial lineages—Proteobacteria, Firmicutes and Actinobacteria—has decreased only slightly during the last couple of years (Figure 1D). The above three lineages now comprise 80% of all genome projects compared to 82% 2 years ago. This small shift is due mostly to large-scale sequencing efforts, such as the GEBA and HMP, which target previously neglected phylogenetic lineages. Clearly, there remains much room for improvement here, and further progress can be expected if similar large-scale biodiversity sequencing efforts continue.


    FUTURE DIRECTIONS
 TOP
 ABSTRACT
 HISTORY AND GROWTH
 CURRENT STATUS OF GOLD
 NEW DEVELOPMENTS
 OVERVIEW STATISTICS
 FUTURE DIRECTIONS
 DATABASE AVAILABILITY
 SUPPLEMENTARY DATA
 FUNDING
 REFERENCES
 
The challenges facing GOLD have increased dramatically as GOLD continues to evolve from a genome/metagenome project monitoring system into a universal genome project core catalog/indexer charged with the task of providing data interconnectivity, exchange and dissemination. In this new role, GOLD is required to efficiently store, process and automatically track metadata that is rapidly increasing in scope and complexity. All the while, there is a great expectation for GOLD to pioneer future genomic standards.

To meet these challenges will require the creation of a shared genome project conceptual model and a database schema to handle the genome-project-associated metadata. The genome/metagenome data continue to be somewhat structured and hierarchical, but the rich associated metadata information becoming available requires the creation of a ‘Genome Project Ontology’ for effective management. Incorporation of other available ontologies, such as existing medical and environmental ontologies, is part of the immediate plan.

Furthermore, numerous other bioinformatics databases and researchers will need to acquire and/or synchronize with GOLD data. To address their needs, GOLD will provide access for client programs via web services using SOAP, GOLDXML and other RESTful technologies, as well as communicate with subscribers via RSS feeds. To further increase community access to GOLD, a GOLD-wiki site will be established where genome project curators can contribute additional project information using various media-rich data formats. We also plan to employ data warehousing tools to facilitate reporting and analysis of the GOLD data on the statistics page, thereby eliminating the need for the manual creation of Excel charts that become quickly outdated. To improve data mining, the GOLD search engine will provide an advanced query mechanism wherein the search criteria available will depend on the meta-properties of the input objects.


    DATABASE AVAILABILITY
 TOP
 ABSTRACT
 HISTORY AND GROWTH
 CURRENT STATUS OF GOLD
 NEW DEVELOPMENTS
 OVERVIEW STATISTICS
 FUTURE DIRECTIONS
 DATABASE AVAILABILITY
 SUPPLEMENTARY DATA
 FUNDING
 REFERENCES
 
GOLD can be accessed at: http://www.genomesonline.org/.

Further comments and feedback are welcome at: mail{at}genomesonline.org.


    SUPPLEMENTARY DATA
 TOP
 ABSTRACT
 HISTORY AND GROWTH
 CURRENT STATUS OF GOLD
 NEW DEVELOPMENTS
 OVERVIEW STATISTICS
 FUTURE DIRECTIONS
 DATABASE AVAILABILITY
 SUPPLEMENTARY DATA
 FUNDING
 REFERENCES
 
Supplementary Data are available at NAR Online.


    FUNDING
 TOP
 ABSTRACT
 HISTORY AND GROWTH
 CURRENT STATUS OF GOLD
 NEW DEVELOPMENTS
 OVERVIEW STATISTICS
 FUTURE DIRECTIONS
 DATABASE AVAILABILITY
 SUPPLEMENTARY DATA
 FUNDING
 REFERENCES
 
The US Department of Energy’s Office of Science, Biological and Environmental Research Program; and by the University of California, Lawrence Berkeley National Laboratory under Contract No. DE-AC02-05CH11231, Lawrence Livermore National Laboratory under Contract No. DE-AC52-07NA27344; and Los Alamos National Laboratory under Contract No. DE-AC02-06NA25396. Funding for open access charge: Department of Energy.

Conflict of interest statement. None declared.


    ACKNOWLEDGEMENTS
 
We would like to thank Merry Youle for her excellent editorial assistance. GOLD has been maintained and developed mostly based on the volunteer work of its small team. We are grateful to all the colleagues who kindly provide information for the more accurate monitoring of the genome projects and particularly to Michelle Giglio and Heather Huot from University of Maryland. The support of Rashida Lathan, Stella Proukaki and Tatiana Drakakis is especially acknowledged. The full list of all contributors is available at: (http://www.genomesonline.org/acknowledgments.html).


    REFERENCES
 TOP
 ABSTRACT
 HISTORY AND GROWTH
 CURRENT STATUS OF GOLD
 NEW DEVELOPMENTS
 OVERVIEW STATISTICS
 FUTURE DIRECTIONS
 DATABASE AVAILABILITY
 SUPPLEMENTARY DATA
 FUNDING
 REFERENCES
 

  1. Kyrpides N. Genomes OnLine Database (GOLD 1.0): a monitor of complete and ongoing genome projects world-wide. Bioinformatics (1999) 15:773–774.[Abstract/Free Full Text]

  2. Bernal A, Ear U, Kyrpides N. Genomes Online Database (GOLD): A Monitor pf genome projects world-wide. Nucleic Acid Res. (2001) 29:126–127.[Abstract/Free Full Text]

  3. Liolios K, Tavernarakis N, Hugenholtz P, Kyrpides NC. The Genomes On Line Database (GOLD) v.2: a monitor of Genome Projects world-wide. Nucleic Acid Res. (2006) 34:D332–D334.[Abstract/Free Full Text]

  4. Liolios K, Mavromatis K, Tavernarakis N, Kyrpides NC. The Genomes OnLine Database (GOLD) in 2007: status of genomic and metagenomic projects and their associated metadata. Nucleic Acid Res. (2007) 36:D475–D479.[CrossRef][Web of Science][Medline]

  5. Field D, Garrity G, Gray T, Morrison N, Selengut J, Sterk P, Tatusova T, Thompson N, Allen MJ, Ashburner M, et al. Towards a richer description of our complete collection of genomes and metagenomes: the "Minimum Information about a Genome Sequence" (MIGS) specification. Nat. Biotechnol. (2008) 26:541–547.[CrossRef][Web of Science][Medline]

  6. Benson DA, Karsch-Mizrachi I, Lipman DJ, Ostell J, Wheeler DL. GenBank. Nucleic Acids Res (2007) 35:D21–D25.[Abstract/Free Full Text]

  7. Kulikova T, Akhtar R, Aldebert P, Althorpe N, Andersson M, Baldwin A, Bates K, Bhattacharyya S, Bower L, Browne P, et al. EMBL Nucleotide Sequence Database in 2006. Nucleic Acids Res. (2007) 35:D16–D20.[Abstract/Free Full Text]

  8. Okubo K, Sugawara H, Gojobori T, Tateno Y. DDBJ in preparation for overview of research activities behind data submissions. Nucleic Acids Res. (2006) 34:D6–D9.[Abstract/Free Full Text]

  9. Birney E, Hudson TJ, Green ED, Gunter C, Eddy S, Rogers J, Harris JR, Ehrlich SD, Apweiler R, Austin CP, et al. Prebublication data sharing. Nature (2009) 461:168–170.[CrossRef][Web of Science][Medline]

  10. Kyrpides NC. Fifteen years of microbial genomics: meeting the challenges and fulfilling the dream. Nat. Biotechnol. (2009) 27:627–632.[CrossRef][Web of Science][Medline]

  11. Garrity GM, Field D, Kyrpides NC. Standards in genomic sciences. Stand. Genomic Sci. (2009) 1:1–2.

  12. Dawyndt P, Vancanneyt M, DeMeyer H, Swings J. Knowledge accumulation and resolution of data inconsistencies during the integration of microbial information sources. IEEE Transactions on Knowledge and Data Engineering (2005) 17:1111–1126.[CrossRef][Web of Science]

  13. Markowitz VM, Ivanova NN, Szeto E, Palaniappan K, Chu K, Dalevi D, Chen IMA, Grechkin Y, Dubchak I, Anderson I, et al. IMG/M: a data management and analysis system for metagenomes. Nucleic Acids Res. (2008) 36:D534–D538.[Abstract/Free Full Text]

  14. Markowitz VM, Szeto E, Palaniappan K, Grechkin Y, Chu K, Chen I.-MA, Dubchak I, Anderson I, Lykidis A, Mavromatis K, et al. The Integrated Microbial Genomes (IMG) system in 2007: data content and analysis tool extensions. Nucleic Acids Res. (2008) 36:D528–D533.[Abstract/Free Full Text]

  15. Hirschman L, Clark C, Cohen KB, Mardis S, Luciano J, Kottmann R, Cole J, Markowitz V, Kyrpides N, Morrison N, et al. Habitat-Lite: a GSC case study based on free text terms for environmental metadata. OMICS (2008) 12:129–136.[CrossRef][Web of Science][Medline]

  16. Markowitz VM, Mavromatis K, Ivanova NN, Chen I.-MA, Chu K, Kyrpides NC. Expert IMG ER: a system for microbial genome annotation expert review and curation. Bioinformatics (2009) 25:2271–2278.[Abstract/Free Full Text]

  17. Meyer F, Paarmann D, D'S;ouza M, Olson R, Glass EM, Kubal M, Paczian T, Rodriguez A, Stevens R, Wilke A, et al. The metagenomics RAST server – a public resource for the automatic phylogenetic and functional analysis of metagenomes. BMC Bioinformatics (2008) 19:386.

  18. Seshadri R, Kravitz SA, Smarr L, Gilna P, Frazier M. CAMERA: a community resource for metagenomics. PLoS Biol. (2007) 5:e75.[CrossRef][Medline]


Add to CiteULike CiteULike   Add to Connotea Connotea   Add to Del.icio.us Del.icio.us    What's this?



This Article
Right arrow Abstract Freely available
Right arrow Print PDF (6251K) Freely available
Right arrow Screen PDF (740K) Freely available
Right arrow Supplementary Data
Right arrow Alert me when this article is cited
Right arrow Alert me if a correction is posted
Services
Right arrow Email this article to a friend
Right arrow Similar articles in this journal
Right arrow Similar articles in PubMed
Right arrow Alert me to new issues of the journal
Right arrow Add to My Personal Archive
Right arrow Download to citation manager
Right arrow Commercial Re-use Guidelines
for Open Access NAR Content
Google Scholar
Right arrow Articles by Liolios, K.
Right arrow Articles by Kyrpides, N. C.
PubMed
Right arrow PubMed Citation
Right arrow Articles by Liolios, K.
Right arrow Articles by Kyrpides, N. C.
Social Bookmarking
 Add to CiteULike   Add to Connotea   Add to Del.icio.us  
What's this?