Nucleic Acids Research, 2001, Vol. 29, No. 1 126-127
© 2001 Oxford University Press
Genomes OnLine Database (GOLD): a monitor of genome projects world-wide
Integrated Genomics, Chicago Technology Park, 2201 West Campbell Park Drive, Chicago, IL 60612, USA
Received September 5, 2000; Revised and Accepted November 8, 2000.
| ABSTRACT |
|---|
|
|
|---|
GOLD is a comprehensive resource for accessing information related to completed and ongoing genome projects world-wide. The database currently provides information on 350 genome projects, of which 48 have been completely sequenced and their analysis published. GOLD was created in 1997 and since April 2000 it has been licensed to Integrated Genomics. The database is freely available through the URL: http://igweb.integratedgenomics.com/GOLD/.
| INTRODUCTION |
|---|
|
|
|---|
The year 1995 marked the birth of a new era in biology. During that year, the complete DNA sequence of an autonomous living organism, Haemophilus influenzae, became available (1), although the entire genetic material of a phage (bacteriophage
X174) was completely sequenced eighteen years earlier (2). Since 1995, >30 genomes have been completely sequenced and published (3) including three eukaryotic, six archaeal and 26 bacterial. Each of these sequencing projects has been accompanied by a variety of analyses and studies performed by institutes and universities around the world. In many cases, the results are made freely available to the public through Internet-based databases, such as WIT (http://wit.integratedgenomics.com/IGwit/) (4), KEGG (http://www.genome.ad.jp/kegg/kegg2.html ) (5), GeneQuiz (http://www.sander.ebi.ac.uk/gqsrv/submit/ ) (6) and MIPS (http://www.mips.biochem.mpg.de/) (7). | SCOPE OF THE DATABASE |
|---|
|
|
|---|
Researchers often face the problem of not having adequate information easily accessible or of not becoming easily aware of the new developments and available data in their field. Genomics is arguably one of the most rapidly developing areas in biology today, with actual data from sequencing projects increasing exponentially over the past five years. Coupled to this development is the growing number of databases that explore and analyze these data. The drop in sequencing cost and the improvement in sequencing technology has made possible a very large number of sequencing projects around the world (8). This proliferation of sequencing projects has created a need for constructing and maintaining a resource that would monitor and display information pertinent to a genome project.
| HISTORY AND GROWTH |
|---|
|
|
|---|
GOLD was established in 1997 as a web resource that would collect and provide information for all publicly available genome projects. At its inception, GOLD held information on six complete genomes and a handful of ongoing genome projects. Today, GOLD provides information on 197 genomic sequencing centers and 74 funding agencies covering 350 genome projects. Of those genomes, 48 have been completely sequenced and their analysis published. GOLD also reports as in progress 176 prokaryotic and 126 eukaryotic genome projects. Furthermore, the database provides over 3200 hypertext links.
| DATABASE ACCESS |
|---|
|
|
|---|
GOLD is available through Integrated Genomicss web server at the entry point: http://igweb.integratedgenomics.com/GOLD/. Users have direct access to the databases pre-computed tables that include the complete published genomes and the ongoing prokaryotic and eukaryotic genome projects. They can also use the search form to query specific features or information about a genome project. Corrections, suggestions and feedback are most welcome at gold{at}integratedgenomics.com.
| DATABASE FORMAT |
|---|
|
|
|---|
GOLD is currently built as a set of ASCII (text) files. The information for every reported genome is organized into specific fields to record the various data types: the organism name (Organism), the phylogenetic position of an organism (Tree), general information on an organism (Information) and the size of the genome and the number of the predicted ORFs (Size). In addition, there are hyperlinks to the actual data such as the DNA sequence, the lists of ORFs and functions (DATA), the organizations that completed (or are completing) the sequencing of an organism (Institution) and the agencies that provided the funds for the sequencing project (Funding). There is also a collection of links to different databases that provide online analysis for a particular genome (Genome Database). Finally, the current status of ongoing projects (Status) and the references for completed and published genomes (Publication) are also reported.
| DATABASE SEARCH ENGINE |
|---|
|
|
|---|
As of March 2000, GOLD has taken on a relational model that allows for indexing of the fields described above. An engine that generates Text-Based Relational Database (TBRD engine) was created using Perl. The TBRD engine is capable of resolving any Sentence Query Language (SQL) statement and displaying the index in the GOLD relational database. Before the TBRD generation, a program was written to extract as much data as possible from the former GOLD database. Then, manual curation followed to complete omissions or fix inconsistencies. This process eliminates duplications of data and standardizes the storage and display formats of the tables and their contents. The present GOLD interface, built by CGI scripts, allows a user to input queries and receive results. Once the input is received, the CGI scripts build an SQL statement and send it to the TBRD engine. The CGI scripts then wrap the results from the TBRD engine in HTML format and give them to the users.
| ACKNOWLEDGEMENTS |
|---|
We are grateful to our colleagues who have contributed information for a more accurate monitoring of genome projects. GOLD has been supported by Integrated Genomics.
| FOOTNOTES |
|---|
* To whom correspondence should be addressed. Tel: +1 312 226 9435; Fax: +1 312 226 9446; Email:nikos{at}integratedgenomics.com
| REFERENCES |
|---|
|
|
|---|
-
1 Fleischmann,R.D., Adams,M.D., White,O., Clayton,R.A., Kirkness,E.F., Kerlavage,A.R., Bult,C.J., Tomb,J.F., Dougherty,B.A., Merrick,J.M. et al. (1995) Whole-genome random sequencing and assembly of Haemophilus influenzae Rd. Science, 269, 496512.
2 Sanger,F., Air,G.M., Barrell,B.G., Brown,N.L., Coulson,A.R., Fiddes,C.A., Hutchison,C.A., Slocombe,P.M. and Smith,M. (1977) Nucleotide sequence of bacteriophage phi X174 DNA. Nature, 265, 687695.[Medline]
3 Kyrpides,N.C. (1999) Genomes OnLine Database (GOLD 1.0): a monitor of complete and ongoing genome projects world-wide. Bioinformatics, 15, 773774.
4 Overbeek,R., Larsen,N., Pusch,G.D., DSouza,M., Selkov,E.,Jr, Kyrpides,N., Fonstein,M., Maltsev,N. and Selkov,E. (2000) WIT: integrated system for high-throughput genome sequence analysis and metabolic reconstruction. Nucleic Acids Res., 28, 123125.
5 Kanehisa,M. and Goto,S. (2000) KEGG: Kyoto Encyclopedia of Genes and Genomes. Nucleic Acids Res., 28, 2730.
6 Hoersch,S., Leroy,C., Brown,N.P., Andrade,M.A. and Sander,C. (2000) The GeneQuiz web server: protein functional analysis through the Web. Trends Biochem. Sci., 25, 3335.[Web of Science][Medline]
7 Mewes,H.W., Frishman,D., Gruber,C., Geier,B., Haase,D., Kaps,A., Lemcke,K., Mannhaupt,G., Pfeiffer,F., Schuller,C., Stocker,S. and Well,B. (2000) MIPS: a database for genomes and protein sequences. Nucleic Acids Res., 28, 3740.
8 Overbeek,R. (2000) Genomics: what is realistically achievable. Genome Biol., 1, 2002.12002.3.
This article has been cited by other articles:
![]() |
K. Liolios, I-M. A. Chen, K. Mavromatis, N. Tavernarakis, P. Hugenholtz, V. M. Markowitz, and N. C. Kyrpides The Genomes On Line Database (GOLD) in 2009: status of genomic and metagenomic projects and their associated metadata Nucleic Acids Res., November 13, 2009; (2009) gkp848v1. [Abstract] [Full Text] [PDF] |
||||
![]() |
S. Mitra, B. Klar, and D. H. Huson Visual and statistical comparison of metagenomes Bioinformatics, August 1, 2009; 25(15): 1849 - 1855. [Abstract] [Full Text] [PDF] |
||||
![]() |
J. Lagnel, C. S. Tsigenopoulos, and I. Iliopoulos NOBLAST and JAMBLAST: New Options for BLAST and a Java Application Manager for BLAST results Bioinformatics, March 15, 2009; 25(6): 824 - 826. [Abstract] [Full Text] [PDF] |
||||
![]() |
G. G. Leparc, T. Tuchler, G. Striedner, K. Bayer, P. Sykacek, I. L. Hofacker, and D. P. Kreil Model-based probe set optimization for high-performance microarrays Nucleic Acids Res., February 1, 2009; 37(3): e18 - e18. [Abstract] [Full Text] [PDF] |
||||
![]() |
K. Liolios, K. Mavromatis, N. Tavernarakis, and N. C. Kyrpides The Genomes On Line Database (GOLD) in 2007: status of genomic and metagenomic projects and their associated metadata Nucleic Acids Res., January 11, 2008; 36(suppl_1): D475 - D479. [Abstract] [Full Text] [PDF] |
||||
![]() |
G. Jones and P. Dyson Evolution of Transmembrane Protein Kinases Implicated in Coordinating Remodeling of Gram-Positive Peptidoglycan: Inside versus Outside J. Bacteriol., November 1, 2006; 188(21): 7470 - 7476. [Abstract] [Full Text] [PDF] |
||||
![]() |
N. Massjouni, C. G. Rivera, and T. M. Murali VIRGO: computational prediction of gene functions. Nucleic Acids Res., July 1, 2006; 34(Web Server issue): W340 - W344. [Abstract] [Full Text] [PDF] |
||||
![]() |
G. Lelandais, P. Vincens, A. Badel-Chagnon, S. Vialette, C. Jacq, and S. Hazout Comparing gene expression networks in a multi-dimensional space to extract similarities and differences between organisms Bioinformatics, June 1, 2006; 22(11): 1359 - 1366. [Abstract] [Full Text] [PDF] |
||||
![]() |
G. Dieterich, U. Karst, J. Wehland, and L. Jansch VIS-O-BAC: exploratory visualization of functional genome studies from bacteria Bioinformatics, March 1, 2006; 22(5): 630 - 631. [Abstract] [Full Text] [PDF] |
||||
![]() |
K. Liolios, N. Tavernarakis, P. Hugenholtz, and N. C. Kyrpides The Genomes On Line Database (GOLD) v.2: a monitor of genome projects worldwide Nucleic Acids Res., January 1, 2006; 34(suppl_1): D332 - D334. [Abstract] [Full Text] [PDF] |
||||
![]() |
V. M. Markowitz, F. Korzeniewski, K. Palaniappan, E. Szeto, G. Werner, A. Padki, X. Zhao, I. Dubchak, P. Hugenholtz, I. Anderson, et al. The integrated microbial genomes (IMG) system Nucleic Acids Res., January 1, 2006; 34(suppl_1): D344 - D348. [Abstract] [Full Text] [PDF] |
||||
![]() |
I. Paulsen and A. von Haeseler INVHOGEN: a database of homologous invertebrate genes Nucleic Acids Res., January 1, 2006; 34(suppl_1): D349 - D353. [Abstract] [Full Text] [PDF] |
||||
![]() |
H. Li, A. Coghlan, J. Ruan, L. J. Coin, J.-K. Heriche, L. Osmotherly, R. Li, T. Liu, Z. Zhang, L. Bolund, et al. TreeFam: a curated database of phylogenetic trees of animal gene families Nucleic Acids Res., January 1, 2006; 34(suppl_1): D572 - D580. [Abstract] [Full Text] [PDF] |
||||
![]() |
S. B. Conners, C. I. Montero, D. A. Comfort, K. R. Shockley, M. R. Johnson, S. R. Chhabra, and R. M. Kelly An Expression-Driven Approach to the Prediction of Carbohydrate Transport and Utilization Regulons in the Hyperthermophilic Bacterium Thermotoga maritima J. Bacteriol., November 1, 2005; 187(21): 7267 - 7282. [Abstract] [Full Text] [PDF] |
||||
![]() |
A. Romualdi, R. Siddiqui, G. Glockner, R. Lehmann, and J. Suhnel GenColors: accelerated comparative analysis and annotation of prokaryotic genomes at various stages of completeness Bioinformatics, September 15, 2005; 21(18): 3669 - 3671. [Abstract] [Full Text] [PDF] |
||||
![]() |
D. Park, S. Lee, D. Bolser, M. Schroeder, M. Lappe, D. Oh, and J. Bhak Comparative interactomics analysis of protein family interaction networks using PSIMAP (protein structural interactome map) Bioinformatics, August 1, 2005; 21(15): 3234 - 3240. [Abstract] [Full Text] [PDF] |
||||
![]() |
D. Field, E. J. Feil, and G. A. Wilson Databases and software for the comparison of prokaryotic genomes Microbiology, July 1, 2005; 151(7): 2125 - 2132. [Abstract] [Full Text] [PDF] |
||||
![]() |
A. Goesmann, B. Linke, D. Bartels, M. Dondrup, L. Krause, H. Neuweger, S. Oehm, T. Paczian, A. Wilke, and F. Meyer BRIGEP--the BRIDGE-based genome-transcriptome-proteome browser Nucleic Acids Res., July 1, 2005; 33(suppl_2): W710 - W716. [Abstract] [Full Text] [PDF] |
||||
![]() |
M. Imielinski, C. Belta, A. Halasz, and H. Rubin Investigating metabolite essentiality through genome-scale analysis of Escherichia coli production capabilities Bioinformatics, May 1, 2005; 21(9): 2008 - 2016. [Abstract] [Full Text] [PDF] |
||||
![]() |
D. Bartels, S. Kespohl, S. Albaum, T. Druke, A. Goesmann, J. Herold, O. Kaiser, A. Puhler, F. Pfeiffer, G. Raddatz, et al. BACCardI--a tool for the validation of genomic assemblies, assisting genome finishing and intergenome comparison Bioinformatics, April 1, 2005; 21(7): 853 - 859. [Abstract] [Full Text] [PDF] |
||||
![]() |
S. Rimour, D. Hill, C. Militon, and P. Peyret GoArrays: highly dynamic and efficient microarray probe design Bioinformatics, April 1, 2005; 21(7): 1094 - 1103. [Abstract] [Full Text] [PDF] |
||||
![]() |
J. M. Peregrin-Alvarez, A. Yam, G. Sivakumar, and J. Parkinson PartiGeneDB--collating partial genomes Nucleic Acids Res., January 1, 2005; 33(suppl_1): D303 - D307. [Abstract] [Full Text] [PDF] |
||||
![]() |
P. Stothard, G. Van Domselaar, S. Shrivastava, A. Guo, B. O'Neill, J. Cruz, M. Ellison, and D. S. Wishart BacMap: an interactive picture atlas of annotated bacterial genomes Nucleic Acids Res., January 1, 2005; 33(suppl_1): D317 - D320. [Abstract] [Full Text] [PDF] |
||||
![]() |
S. Raval, S. B. Gowda, D. D. Singh, and N. R. Chandra A database analysis of jacalin-like lectins: sequence-structure-function relationships Glycobiology, December 1, 2004; 14(12): 1247 - 1263. [Abstract] [Full Text] [PDF] |
||||
![]() |
M. Y. Galperin and E. V. Koonin 'Conserved hypothetical' proteins: prioritization of targets for experimental study Nucleic Acids Res., October 12, 2004; 32(18): 5452 - 5463. [Abstract] [Full Text] [PDF] |
||||
![]() |
D. E. Hill, M. A. Brasch, A. A. del Campo, L. Doucette-Stamm, J. I. Garrels, J. Glaven, J. L. Hartley, J. R. Hudson Jr., T. Moore, and M. Vidal Academia-Industry Collaboration: An Integral Element for Building "Omic" Resources Genome Res., October 1, 2004; 14(10b): 2010 - 2014. [Full Text] [PDF] |
||||
![]() |
N. Lopez-Bigas and C. A. Ouzounis Genome-wide identification of genes likely to be involved in human genetic disease Nucleic Acids Res., June 4, 2004; 32(10): 3108 - 3114. [Abstract] [Full Text] [PDF] |
||||
![]() |
V. Veeramachaneni and W. Makalowski Visualizing Sequence Similarity of Protein Families Genome Res., June 1, 2004; 14(6): 1160 - 1169. [Abstract] [Full Text] [PDF] |
||||
![]() |
S. D. Drabenstot, D. M. Kupfer, J. D. White, D. W. Dyer, B. A. Roe, K. L. Buchanan, and J. W. Murphy FELINES: a utility for extracting and examining EST-defined introns and exons Nucleic Acids Res., November 15, 2003; 31(22): e141 - e141. [Abstract] [Full Text] [PDF] |
||||
![]() |
V. Kunin and C. A. Ouzounis The Balance of Driving Forces During Genome Evolution in Prokaryotes Genome Res., July 1, 2003; 13(7): 1589 - 1594. [Abstract] [Full Text] [PDF] |
||||
![]() |
R. Overbeek, N. Larsen, T. Walunas, M. D'Souza, G. Pusch, E. Selkov Jr, K. Liolios, V. Joukov, D. Kaznadzey, I. Anderson, et al. The ERGOTM genome analysis and discovery system Nucleic Acids Res., January 1, 2003; 31(1): 164 - 171. [Abstract] [Full Text] [PDF] |
||||
![]() |
A. Bhattacharyya, S. Stilwagen, N. Ivanova, M. D'Souza, A. Bernal, A. Lykidis, V. Kapatral, I. Anderson, N. Larsen, T. Los, et al. Whole-genome comparative analysis of three phytopathogenic Xylella fastidiosa strains PNAS, September 17, 2002; 99(19): 12403 - 12408. [Abstract] [Full Text] [PDF] |
||||
![]() |
S. Y. Gerdes, M. D. Scholle, M. D'Souza, A. Bernal, M. V. Baev, M. Farrell, O. V. Kurnasov, M. D. Daugherty, F. Mseeh, B. M. Polanuyer, et al. From Genetic Footprinting to Antimicrobial Drug Targets: Examples in Cofactor Biosynthetic Pathways J. Bacteriol., August 15, 2002; 184(16): 4555 - 4572. [Abstract] [Full Text] [PDF] |
||||
![]() |
A. Navarro and N. H. Barton The Effects of Multilocus Balancing Selection on Neutral Variability Genetics, June 1, 2002; 161(2): 849 - 863. [Abstract] [Full Text] [PDF] |
||||
![]() |
A. J. Enright, S. Van Dongen, and C. A. Ouzounis An efficient algorithm for large-scale detection of protein families Nucleic Acids Res., April 1, 2002; 30(7): 1575 - 1584. [Abstract] [Full Text] [PDF] |
||||
![]() |
J.-M. Lee, S. Zhang, S. Saha, S. Santa Anna, C. Jiang, and J. Perkins RNA Expression Analysis Using an Antisense Bacillus subtilis Genome Array J. Bacteriol., December 15, 2001; 183(24): 7371 - 7380. [Abstract] [Full Text] [PDF] |
||||
| ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||







