Nucleic Acids Research Advance Access originally published online on October 30, 2008
Nucleic Acids Research 2009 37(Database issue):D455-D458; doi:10.1093/nar/gkn858
| ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
Nucleic Acids Research, 2009, Vol. 37, Database issue D455-D458
© 2008 The Author(s)
This is an Open Access article distributed under the terms of the Creative Commons Attribution Non-Commercial License (http://creativecommons.org/licenses/by-nc/2.0/uk/) which permits unrestricted non-commercial use, distribution, and reproduction in any medium, provided the original work is properly cited.
This article appears in the following Nucleic Acids Research issue: Database issue [View the issue table of contents]
Articles |
DEG 5.0, a database of essential genes in both prokaryotes and eukaryotes
1Department of Epidemiology and Biostatistics, Tianjin Cancer Institute and Hospital, Tianjin 300060 and 2Department of Physics, Tianjin University, Tianjin 300072, China
*To whom correspondence should be addressed. Tel: +86 22 2740 2987; Fax: +86 22 2335 8329; Email: rzhang.cn{at}gmail.com
Received September 15, 2008. Revised October 14, 2008. Accepted October 16, 2008.
| ABSTRACT |
|---|
|
|
|---|
Essential genes are those indispensable for the survival of an organism, and their functions are therefore considered a foundation of life. Determination of a minimal gene set needed to sustain a life form, a fundamental question in biology, plays a key role in the emerging field, synthetic biology. Five years after we constructed DEG, a database of essential genes, DEG 5.0 has significant advances over the 2004 version in both the number of essential genes and the number of organisms in which these genes are determined. The number of prokaryotic essential genes in DEG has increased about 10-fold, mainly owing to genome-wide gene essentiality screens performed in a wide range of bacteria. The number of eukaryotic essential genes has increased more than 5-fold, because DEG 1.0 only had yeast ones, but DEG 5.0 also has those in humans, mice, worms, fruit flies, zebrafish and the plant Arabidopsis thaliana. These updates not only represent significant advances of DEG, but also represent the rapid progress of the essential-gene field. DEG is freely available at the website http://tubic.tju.edu.cn/deg or http://www.essentialgene.org.
| INTRODUCTION |
|---|
|
|
|---|
Essential genes are those indispensable for the survival of an organism under certain conditions, and the functions they encode are therefore considered a foundation of life. Essential genes of an organism constitute its minimal gene set, which is the smallest possible group of genes that would be sufficient to sustain a functioning cellular life form under the most favorable conditions (1–3). Determination of the minimal gene set for an organism addresses a conceptually important question: what are the basic functions needed to sustain a life form, and therefore the minimal-gene-set concept plays a key role in the emerging field, synthetic biology (4). Essential-gene studies are of interest for practical reasons as well. For instance, essential genes, because of lethality from their disruptions, are attractive targets of antibiotics (5). Some essential genes that are conserved across species are candidates for broad-spectrum drug targets, whereas those specific for one bacterium are candidates for species-specific ones.
In 2004, we constructed DEG 1.0, a database of essential genes (6). In the past five years, fueled by the accumulation of sequenced genomes, sophisticated genome-wide mutagenesis techniques (7), and the burgeoning field of synthetic biology (8–10), significant advances have been made in determining essential genes in a wide range of organisms. This paper represents an update, DEG 5.0, a database of essential genes in both prokaryotes and eukaryotes.
| SUMMARY OF DATABASE UPDATES |
|---|
|
|
|---|
In parallel to the rapid progress of the essential-gene field, DEG 5.0 has significant advances over DEG 1.0 by the following changes:
- The number of prokaryotic essential genes has increased about 10-fold, from 543 to 5260 (Table 1). (i) In DEG 1.0, some essential genes, e.g. those in Escherichia coli, were collected from literature searches, but in DEG 5.0 these records were replaced by those determined by genome-wide studies using the genetic footprinting technique (11) and systematic gene knockout experiments (12), (ii) In DEG 1.0, some essential genes, e.g. those in Haemophilus influenzae were determined by theoretical prediction from comparative genomics studies (13), but in DEG 5.0 these records were replaced by those determined by genome-wide studies using global transposon mutagenesis (14) and (iii) In 2004, only two genome-wide studies in identifying bacterial essential genes were done, but now 12 have been finished.
- The number of essential genes in eukaryotes has increased more than 5-fold, from 878 to 4808, because DEG 1.0 only had yeast essential genes, but DEG 5.0 also has those in humans, mice, worms, fruit flies, zebrafish and the plant Arabidopsis thaliana.
|
| DATABASE DESCRIPTION |
|---|
|
|
|---|
Essential genes in prokaryotes
Determination of a minimal gene set for cellular life was made possible by the availability of the first two completely sequenced genomes from the bacteria Mycoplasma genitalium (15) and H. influenzae (16). An attempt to determine a minimal gene set was pioneered by Koonin and coworkers by comparing these two sequenced genomes that belong to two ancient bacterial lineages, based on a notion that genes that are conserved between them are likely essential for cellular functions (13).
In 1999, Venter's group performed the first global transposon mutagenesis in M. genitalium to experimentally address the question of what is the minimal gene set for a living organism (17), and about 300 genes were estimated to be essential and were included in DEG 1.0. However, the concept of global transposon mutagenesis is in fact based on the identification of non-essential genes, i.e. those disrupted by transposons are identified, and those not disrupted are considered essential. Therefore, to gain the proof of gene dispensability in M. genitalium, Venter's group isolated and characterized every Tn4001 insertion mutants that were present in individual colonies picked from agar plates (18). Consequently, 382 genes were demonstrated to be essential, and these genes were included in DEG 5.0 by replacing those in version 1.0. A high-density transposon mutagenesis strategy was also applied to H. influenzae (14), and the essential genes so obtained replaced corresponding records in DEG 1.0, which were determined by comparative genomics (13).
In DEG 1.0, essential genes of E. coli were collected from http://magpie.genome.wisc.edu/~chris/essential.html, in which essential genes were obtained by searching related literatures. Using a genetic footprinting technique, Gerdes et al. (11) conducted a genome-wide, comprehensive experimental assessment of the E. coli genes necessary for robust aerobic growth, and consequently, 620 genes were identified to be essential. In addition, the Keio collection contains 303 essential genes that were determined by systematic single-gene knockout experiments (12). Therefore, in DEG 5.0, essential gene records obtained by literature search were replaced by those obtained through both genome-wide mutagenesis studies (11) and systematic single-gene knockout experiments (12), except that only one copy is retained for the 205 genes that overlap between the two studies.
About 100 Streptococcus pneumoniae essential genes were determined by a high-throughput gene disruption system (19). Later, 133 essential genes were determined by allelic replacement mutagenesis (20). In DEG 5.0, the two results were combined by removing redundant records, resulting in 244 essential genes in S. pneumoniae. DEG 1.0 contained 65 Staphylococcus aureus essential genes determined by using antisense RNA technique (21), and DEG 5.0 now contains 302 S. aureus essential genes by combining with results from the studies using the rapid shotgun antisense RNA method (22).
In the past several years, many genome-wide mutagenesis studies have been performed in a wide range of bacteria. In addition to those mentioned above, DEG 5.0 contains essential genes determined by large-scale single-gene deletion studies in Acinetobacter baylyi (23) and Bacillus subtilis (24), those determined by global transposon mutagenesis in Francisella novicida (25), Helicobacter pylori (26), Mycobacterium tuberculosis (27), Mycoplasma pulmonis (28) and Pseudomonas aeruginosa (29,30), and those determined by trapping lethal insertions in Salmonella typhimurium (31).
Essential genes in eukaryotes
Another major improvement in DEG 5.0 is the inclusion of essential genes of many eukaryotes, including animals and the plant A. thaliana, whereas the only eukaryotic species in DEG 1.0 was Saccharomyces cerevisiae (32). The goal of determining bacterial minimal gene set also applies to eukaryotes, i.e. to define a minimal gene set needed to produce a living multicellular organism or a viable plant. Although this goal is obviously too ambitious at the current stage, much effort has already been devoted in the identification of essential genes in eukaryotes.
In the Drosophila genome, about 25% of genes were disrupted by P-element insertions by The Berkeley Drosophila Genome Project (33), and those genes whose disruption had lethal phenotypes were collected in DEG 5.0. In the Caenorhabditis elegans genome, using the RNA interference, Kamath et al. (34) inhibited the activity of about 86% of all genes, and characterized their phenotypes, and genes whose inhibition were lethal were included in DEG 5.0. Hopkins and coworkers conducted a large-scale insertional mutagenesis in zebrafish to identify genes essential for embryonic and early larval development (35), and the identified essential genes were collected in DEG 5.0. The first large-scale identification of essential genes in a flowering plant was performed by Meinke and coworker in A. thaliana by characterizing a large number of T-DNA insertion lines (36), and the identified essential genes were collected in DEG 5.0.
Large-scale gene inactivation studies have not been performed in mice, likely due to technical difficulties and labor intensiveness, however, because mice are probably the most important model organism, a large number of genes have already been inactivated by individual laboratories. In a study comparing essentiality between duplicate genes and singleton genes, Liao and Zhang (37) analyzed nearly 3900 individually inactivated mouse genes, and found that about 55% were essential in both singletons and duplicates. The essential genes analyzed in this study were collected in DEG 5.0. In another study comparing human and mouse essential genes, Liao and Zhang (38) extensively reviewed literatures to find genes whose null mutations in humans are lethal, and these human essential genes were also collected in DEG 5.0.
User interface and data access
The whole database is divided into two subdatabases, those of prokaryotic and eukaryotic essential genes. Each entry has a unique DEG identification number, gene name, gene reference number, gene function, and DNA and protein sequences. For prokaryotic essential genes, a link to the COG information (39) is also provided. All information is stored and operated by an open-source database management system, MySQL, which allows rapid data retrieval. There are several ways by which users can have access to the data. Users can browse the essential gene records, and can also search for essential genes by their names, functions, accession numbers and organisms. In addition, users can also perform BLAST searches against DEG for query DNA or protein sequences. Because the database is composed of two subdatabases, i.e. those for prokaryotes and eukaryotes, users need to perform the functions of Browse, Search and BLAST in individual databases. In addition, the whole database can also be downloaded upon request.
| CONCLUSION AND FUTURE DEVELOPMENT |
|---|
|
|
|---|
DEG 5.0 has significant advances over DEG 1.0 in both the number of essential genes and the number of organisms in which these genes are determined. These updates not only represent significant advances over the 2004 version of DEG, but also represent the rapid progress of the essential-gene field. In future, in prokaryotes, fueled by the availability of more and more complete genomes and the emerging field, synthetic biology, it is expected that the increase in the essential gene number will accelerate, whereas in eukaryotic model organisms, because most gene essentiality screens are far from saturated, the number of essential genes is also expected to grow. These advances will be reflected timely by DEG future updates. We welcome users' comments, corrections and new information, which will be used for updating. DEG is freely available at the website http://tubic.tju.edu.cn/deg or http://www.essentialgene.org.
| FUNDING |
|---|
|
|
|---|
The present work was supported in part by the National Natural Science Foundation of China (NNSF 90408028). Funding for open access charge: NNSF 90408028.
Conflict of interest statement. None declared.
| ACKNOWLEDGEMENTS |
|---|
We would like to thank the anonymous referees for their constructive comments.
| REFERENCES |
|---|
|
|
|---|
- Koonin EV. How many genes can make a cell: the minimal-gene-set concept. Annu. Rev. Genomics Hum. Genet. (2000) 1:99–116.[CrossRef][Web of Science][Medline]
- Koonin EV. Comparative genomics, minimal gene-sets and the last universal common ancestor. Nat. Rev. Microbiol. (2003) 1:127–136.[CrossRef][Web of Science][Medline]
- Gerdes S, Edwards R, Kubal M, Fonstein M, Stevens R, Osterman A. Essential genes on metabolic maps. Curr. Opin. Biotechnol. (2006) 17:448–456.[CrossRef][Web of Science][Medline]
- Lartigue C, Glass JI, Alperovich N, Pieper R, Parmar PP, Hutchison C.A. 3rd, Smith HO, Venter JC. Genome transplantation in bacteria: changing one species to another. Science (2007) 317:632–638.
[Abstract/Free Full Text] - Galperin MY, Koonin EV. Searching for drug targets in microbial genomes. Curr. Opin. Biotechnol. (1999) 10:571–578.[CrossRef][Web of Science][Medline]
- Zhang R, Ou HY, Zhang CT. DEG: a database of essential genes. Nucleic Acids Res. (2004) 32:D271–D272.
[Abstract/Free Full Text] - Judson N, Mekalanos JJ. Transposon-based approaches to identify essential bacterial genes. Trends Microbiol. (2000) 8:521–526.[CrossRef][Web of Science][Medline]
- Benner SA, Sismour AM. Synthetic biology. Nat. Rev. Genet. (2005) 6:533–543.[CrossRef][Web of Science][Medline]
- Andrianantoandro E, Basu S, Karig DK, Weiss R. Synthetic biology: new engineering rules for an emerging discipline. Mol. Syst. Biol. (2006) 2. 2006 0028.
- Galperin MY. The dawn of synthetic genomics. Environ. Microbiol. (2008) 10:821–825.[CrossRef]
- Gerdes SY, Scholle MD, Campbell JW, Balazsi G, Ravasz E, Daugherty MD, Somera AL, Kyrpides NC, Anderson I, Gelfand MS, et al. Experimental determination and system level analysis of essential genes in Escherichia coli MG1655. J. Bacteriol. (2003) 185:5673–5684.
[Abstract/Free Full Text] - Baba T, Ara T, Hasegawa M, Takai Y, Okumura Y, Baba M, Datsenko KA, Tomita M, Wanner BL, Mori H. Construction of Escherichia coli K-12 in-frame, single-gene knockout mutants: the Keio collection. Mol. Syst. Biol. (2006) 2. 2006 0008.
- Mushegian AR, Koonin EV. A minimal gene set for cellular life derived by comparison of complete bacterial genomes. Proc. Natl Acad. Sci. USA (1996) 93:10268–10273.
[Abstract/Free Full Text] - Akerley BJ, Rubin EJ, Novick VL, Amaya K, Judson N, Mekalanos JJ. A genome-scale analysis for identification of genes required for growth or survival of Haemophilus influenzae. Proc. Natl Acad. Sci. USA (2002) 99:966–971.
[Abstract/Free Full Text] - Fraser CM, Gocayne JD, White O, Adams MD, Clayton RA, Fleischmann RD, Bult CJ, Kerlavage AR, Sutton G, Kelley JM, et al. The minimal gene complement of Mycoplasma genitalium. Science (1995) 270:397–403.
[Abstract/Free Full Text] - Fleischmann RD, Adams MD, White O, Clayton RA, Kirkness EF, Kerlavage AR, Bult CJ, Tomb JF, Dougherty BA, Merrick JM, et al. Whole-genome random sequencing and assembly of Haemophilus influenzae Rd. Science (1995) 269:496–512.
[Abstract/Free Full Text] - Hutchison CA, Peterson SN, Gill SR, Cline RT, White O, Fraser CM, Smith HO, Venter JC. Global transposon mutagenesis and a minimal Mycoplasma genome. Science (1999) 286:2165–2169.
[Abstract/Free Full Text] - Glass JI, Assad-Garcia N, Alperovich N, Yooseph S, Lewis MR, Maruf M, Hutchison C.A. 3rd, Smith HO, Venter JC. Essential genes of a minimal bacterium. Proc. Natl Acad. Sci. USA (2006) 103:425–430.
[Abstract/Free Full Text] - Thanassi JA, Hartman-Neumann SL, Dougherty TJ, Dougherty BA, Pucci MJ. Identification of 113 conserved essential genes using a high-throughput gene disruption system in Streptococcus pneumoniae. Nucleic Acids Res. (2002) 30:3152–3162.
[Abstract/Free Full Text] - Song JH, Ko KS, Lee JY, Baek JY, Oh WS, Yoon HS, Jeong JY, Chun J. Identification of essential genes in Streptococcus pneumoniae by allelic replacement mutagenesis. Mol. Cells (2005) 19:365–374.[Web of Science][Medline]
- Ji Y, Zhang B, Van SF, Horn, Warren P, Woodnutt G, Burnham MK, Rosenberg M. Identification of critical staphylococcal genes using conditional phenotypes generated by antisense RNA. Science (2001) 293:2266–2269.
[Abstract/Free Full Text] - Forsyth RA, Haselbeck RJ, Ohlsen KL, Yamamoto RT, Xu H, Trawick JD, Wall D, Wang L, Brown-Driver V, Froelich JM, et al. A genome-wide strategy for the identification of essential genes in Staphylococcus aureus. Mol. Microbiol. (2002) 43:1387–1400.[CrossRef][Web of Science][Medline]
- de Berardinis V, Vallenet D, Castelli V, Besnard M, Pinet A, Cruaud C, Samair S, Lechaplais C, Gyapay G, Richez C, et al. A complete collection of single-gene deletion mutants of Acinetobacter baylyi ADP1. Mol. Syst. Biol. (2008) 4:174.[Medline]
- Kobayashi K, Ehrlich SD, Albertini A, Amati G, Andersen KK, Arnaud M, Asai K, Ashikaga S, Aymerich S, Bessieres P, et al. Essential Bacillus subtilis genes. Proc. Natl Acad. Sci. USA (2003) 100:4678–4683.
[Abstract/Free Full Text] - Gallagher LA, Ramage E, Jacobs MA, Kaul R, Brittnacher M, Manoil C. A comprehensive transposon mutant library of Francisella novicida, a bioweapon surrogate. Proc. Natl Acad. Sci. USA (2007) 104:1009–1014.
[Abstract/Free Full Text] - Salama NR, Shepherd B, Falkow S. Global transposon mutagenesis and essential gene analysis of Helicobacter pylori. J. Bacteriol. (2004) 186:7926–7935.
[Abstract/Free Full Text] - Sassetti CM, Boyd DH, Rubin EJ. Genes required for mycobacterial growth defined by high density mutagenesis. Mol. Microbiol. (2003) 48:77–84.[CrossRef][Web of Science][Medline]
- French CT, Lao P, Loraine AE, Matthews BT, Yu H, Dybvig K. Large-scale transposon mutagenesis of Mycoplasma pulmonis. Mol. Microbiol. (2008) 69:67–76.[CrossRef][Web of Science][Medline]
- Liberati NT, Urbach JM, Miyata S, Lee DG, Drenkard E, Wu G, Villanueva J, Wei T, Ausubel FM. An ordered, nonredundant library of Pseudomonas aeruginosa strain PA14 transposon insertion mutants. Proc. Natl Acad. Sci. USA (2006) 103:2833–2838.
[Abstract/Free Full Text] - Jacobs MA, Alwood A, Thaipisuttikul I, Spencer D, Haugen E, Ernst S, Will O, Kaul R, Raymond C, Levy R, et al. Comprehensive transposon mutant library of Pseudomonas aeruginosa. Proc. Natl Acad. Sci. USA (2003) 100:14339–14344.
[Abstract/Free Full Text] - Knuth K, Niesalla H, Hueck CJ, Fuchs TM. Large-scale identification of essential Salmonella genes by trapping lethal insertions. Mol. Microbiol. (2004) 51:1729–1744.[CrossRef][Web of Science][Medline]
- Mewes HW, Frishman D, Guldener U, Mannhaupt G, Mayer K, Mokrejs M, Morgenstern B, Munsterkotter M, Rudd S, Weil B. MIPS: a database for genomes and protein sequences. Nucleic Acids Res. (2002) 30:31–34.
[Abstract/Free Full Text] - Spradling AC, Stern D, Beaton A, Rhem EJ, Laverty T, Mozden N, Misra S, Rubin GM. The Berkeley Drosophila Genome Project gene disruption project: Single P-element insertions mutating 25% of vital Drosophila genes. Genetics (1999) 153:135–177.
[Abstract/Free Full Text] - Kamath RS, Fraser AG, Dong Y, Poulin G, Durbin R, Gotta M, Kanapin A, Le Bot N, Moreno S, Sohrmann M, et al. Systematic functional analysis of the Caenorhabditis elegans genome using RNAi. Nature (2003) 421:231–237.[CrossRef][Web of Science][Medline]
- Amsterdam A, Nissen RM, Sun Z, Swindell EC, Farrington S, Hopkins N. Identification of 315 genes essential for early zebrafish development. Proc. Natl Acad. Sci. USA (2004) 101:12792–12797.
[Abstract/Free Full Text] - Tzafrir I, Pena-Muralla R, Dickerman A, Berg M, Rogers R, Hutchens S, Sweeney TC, McElver J, Aux G, Patton D, et al. Identification of genes required for embryo development in Arabidopsis. Plant Physiol. (2004) 135:1206–1220.
[Abstract/Free Full Text] - Liao BY, Zhang J. Mouse duplicate genes are as essential as singletons. Trends Genet. (2007) 23:378–381.[CrossRef][Web of Science][Medline]
- Liao BY, Zhang J. Null mutations in human and mouse orthologs frequently result in different phenotypes. Proc. Natl Acad. Sci. USA (2008) 105:6987–6992.
[Abstract/Free Full Text] - Tatusov RL, Koonin EV, Lipman DJ. A genomic perspective on protein families. Science (1997) 278:631–637.
[Abstract/Free Full Text] - Judson N, Mekalanos JJ. TnAraOut, a transposon-based approach to identify and characterize essential bacterial genes. Nat. Biotechnol. (2000) 18:740–745.[CrossRef][Web of Science][Medline]
| ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||