Nucleic Acids Research, 2003, Vol. 31, No. 1 187-189
© 2003 Oxford University Press
HGT-DB: a database of putative horizontally transferred genes in prokaryotic complete genomes
Evolutionary Genomics Group, Biochemistry and Biotechnology Department, Rovira i Virgili University, Pl Imperial Tàrraco 1, E-43005 Tarragona, Spain
*To whom correspondence should be addressed. Tel: +34 977559565; Fax: +34 977558232; Email: vallve{at}quimica.urv.es
Received July 17, 2002; Revised and Accepted September 12, 2002
ABSTRACT
The Horizontal Gene Transfer DataBase (HGT-DB) is a genomic database that includes statistical parameters such as G+C content, codon and amino-acid usage, as well as information about which genes deviate in these parameters for prokaryotic complete genomes. Under the hypothesis that genes from distantly related species have different nucleotide compositions, these deviated genes may have been acquired by horizontal gene transfer. The current version of the database contains 88 bacterial and archaeal complete genomes, including multiple chromosomes and strains. For each genome, the database provides statistical parameters for all the genes, as well as averages and standard deviations of G+C content, codon usage, relative synonymous codon usage and amino-acid content. It also provides information about correspondence analyses of the codon usage, plus lists of extraneous group of genes in terms of G+C content and lists of putatively acquired genes. With this information, researchers can explore the G+C content and codon usage of a gene when they find incongruities in sequence-based phylogenetic trees. A search engine that allows searches for gene names or keywords for a specific organism is also available. HGT-DB is freely accessible at http://www.fut.es/~debb/HGT.
INTRODUCTION
Horizontal Gene Transfer (HGT), the transfer of genes between different species, is recognized as one of the major forces in prokaryotic genome evolution (1). Acquired genes may provide novel metabolic capabilities and catalyze the diversification of microbial lineages. HGT events can be detected from patterns of best matches to different species and the distribution of genes, or by identifying regions of the genome with unusual compositions or incongruities between phylogenetic trees (2,3). Each of these methods has its advantages and disadvantages (2). The prediction of horizontally transferred genes using atypical nucleotide composition is based on the genome hypothesis (4) that assumes that codon usage and G+C content are distinct global features of each prokaryotic genome. With this method, a significant number of prokaryotic genes have been proposed as having been acquired by HGT (5,6). However, it cannot predict all acquired genes unambiguously (7) because genes may have adjusted to the base composition and codon usage of the host genome (this is called the amelioration process) or because an unusual composition may be due to factors other than HGT (6). Despite these limitations, atypical G+C content and patterns of codon usage are especially useful for detecting the putative origin of the transferred genes (810).
To confirm whether a gene or group of genes has been acquired by HGT, it can be useful to combine multiple lines of evidence (2). If researchers have access to the compositional parameters for each gene from complete genomes, they will be able to explore for themselves the G+C content and codon usage of genes when they find incongruences among sequence-based phylogenetic trees or when they detect putatively transferred genes with other methods. We have, therefore, created the Horizontal Gene Transfer DataBase (HGT-DB) to facilitate compositional analyses and provide additional evidence for discussing the possible foreign origin of the genes of a genome and detecting whether acquired genes have been ameliorated. For each prokaryotic complete genome, the HGT-DB provides averages and standard deviations of G+C content, codon usage, relative synonymous codon usage and amino-acid content, as well as lists of putative horizontally transferred genes, correspondence analyses of the codon usage and lists of extraneous groups of genes in terms of G+C content. For each gene, the database lists several statistical parameters, including total and positional G+C content, and determines whether the gene deviates from the mean values of its own genome. The HGT-DB has so far been used to study strain-specific genes of Helicobacter pylori (11,12) and to exclude putative horizontally transferred genes in genomic or proteomic analyses (13).
SOURCES OF GENOMIC DATA AND METHODS
Sequence files of prokaryotic complete genomes are retrieved from the NCBI ftp server. Total and positional G+C content, codon usage, relative synonymous codon usage and amino-acid content are calculated for each gene. For each genome, except for genes under 300 bp, which can have extraneous compositional values, the averages and standard deviations of the above parameters are calculated. The methods we used to consider whether a gene is extraneous in terms of G+C content or codon usage and a candidate to be acquired by HGT are described in Garcia-Vallve et al. (6). Briefly, genes are considered as extraneous in terms of G+C content or codon usage if they deviate by more than 1.5 standard deviations from the mean values. Genes are considered to be putative horizontally transferred genes if they have extraneous G+C content and codon usage, they are over 300 bp and they do not deviate from the average amino-acid composition. Clusters of genes with a high or low G+C content are also considered to be acquired genes, regardless of their length or codon usage (6). It is important to distinguish highly expressed genes from horizontally transferred genes (6). Highly expressed genes may deviate from the mean values of codon usage because they adapt their codon usage to the more abundant tRNAs. For this reason, ribosomal proteins, a group of highly expressed genes, are filtered and not included in the database predictions. Other groups of highly expressed genes will be included in future versions of the database, but individual analyses to define the group of highly expressed genes for each genome, if there are any, will probably be needed.
Genes proposed as being acquired horizontally are represented in a correspondence analysis in which protein-coding sequences are considered as points in a 59-dimensional space (the stop codons and codons for methionine and tryptophan are not included), and each dimension corresponds to the relative frequency of use of each codon measured with the relative synonymous codon usage (RSCU) values. Correspondence analysis reduces this multidimensional space to a two- or three-dimensional space that can be represented graphically. In these graphs, vertically descended genes are expected to cluster together around the origin, whereas genes predicted as acquisitions are expected to be on the periphery.
ORGANIZATION OF THE DATABASE
The HGT-DB is organized by genome i.e. every prokaryotic genome that has been completely sequenced forms a new entry. Different chromosomes from the same organism, or genomes from the same species but different strains, are found in different entries. The current version of the database contains 88 genomes that are sorted alphabetically and classified taxonomically. Table 1 shows the archaeal and bacterial genomes included in the current version of the database, as well as the number of extraneous genes in terms of G+C content and codon usage. The main page for each genome contains links to additional sections and the mean values and standard deviations of total and positional G+C content, codon usage, relative synonymous codon usage and amino-acid content. The other sections available for each genome are: a correspondence analysis of the codon usage, a list of extraneous regions in terms of G+C content and a list of the proposed horizontally acquired genes. The database also provides access to a tab-delimited file with all the statistical calculations for each gene of a genome. The fields available for each gene in these files are: information about its position (coordinates, strand and length), gene name, function, the Cluster of Orthologous Group, COG, (14) it belongs to, total and positional G+C content, the Mahalanobis distance to the average codon usage (6), amino-acid content deviations, if any, and a prediction of whether the gene belongs to a region with a high or low G+C content or whether it has been acquired by HGT. This information can be also accessed via a search engine that allows searches for gene names or keywords for a specific organism. When searching for a gene name, one can also view the upstream and downstream genes.
|
Forces other than HGT are also responsible for the heterogeneity in the codon usage of all the genes of a genome. The HGT-DB, therefore, has a section containing the correspondence analysis of the relative synonymous codon usage for each genome. This section contains a table with the percentage variability of the six axes that account for the greatest variation in codon usage, a graphical representation of the coordinates of each gene in the first and second axes (the genes proposed as being acquired by HGT and putative highly expressed genes are shown in different colors) and a table with the correlation values between the position of genes in the first or second axis, and the G+C content and several indices of codon bias. These indices are: the effective number of codons (Nc) (15), the intrinsic codon deviation index (ICDI) (16), the translational efficiency index (P2) (17) and the scaled X2 index (18).
DATABASE ACCESS
HGT-DB is freely accessible at http://www.fut.es/~debb/HGT/. The database will be updated several times each year. Changes and new additions to the database can be viewed in the news and previous release section.
ACKNOWLEDGEMENTS
We thank Kevin Costello of the Language Service of the Rovira i Virgili University for his help with writing the manuscript, and TINET for hosting the database.
REFERENCES
- Koonin,E.V., Makarova,K.S. and Aravind,L. (2001) Horizontal gene transfer in prokaryotes: Quantification and Classification. Annu. Rev. Microbiol., 55, 709742.[CrossRef][Web of Science][Medline]
- Eisen,J.A. (2000) Horizontal gene transfer among microbial genomes: new insights from complete genome analysis. Curr. Opin. Genet. Dev., 10, 606611.[CrossRef][Web of Science][Medline]
- Ragan,M.A. (2001) Detection of lateral gene transfer among microbial genomes. Curr. Opin. Genet. Dev., 11, 620626.[CrossRef][Web of Science][Medline]
- Grantham,R., Gautier,C., Gouy,M., Mercier,R. and Pave,A. (1980) Codon catalog usage and the genome hypothesis. Nucleic Acids Res., 8, r49r62.[Web of Science][Medline]
- Ochman,H., Lawrence,J.G. and Groisman,E.A. (2000) Lateral gene transfer and the nature of bacterial innovation. Nature, 405, 299304.[CrossRef][Medline]
- Garcia-Vallve,S., Romeu,A. and Palau,J. (2000) Horizontal gene transfer in bacterial and archaeal complete genomes. Genome Res., 10, 17191725.
[Abstract/Free Full Text] - Lawrence,J.G. and Ochman,H. (2002) Reconciling the many faces of lateral gene transfer. Trends Microbiol., 10, 14.[CrossRef][Web of Science][Medline]
- Garcia-Vallve,S., Palau,J. and Romeu,A. (1999) Horizontal gene transfer in glycosyl hydrolases inferred from codon usage in Escherichia coli and Bacillus subtilis. Mol. Biol. Evol., 16, 11251134.[Abstract]
- Garcia-Vallve,S., Romeu,A. and Palau,J. (2000) Horizontal gene transfer of glycosyl hydrolases of the rumen fungi. Mol. Biol. Evol., 17, 352361.
[Abstract/Free Full Text] - Garcia-Vallve,S., Simó,F.X., Montero,M.A., Arola,L. and Romeu,A. (2002) Simultaneous horizontal gene transfer of a gene coding for ribosomal protein L27 and operational genes in Arthrobacter sp. J. Mol. Evol., in press.
- Israel,D.A., Salama,N., Krishna,U., Rieger,U.M., Atherton,J.C., Falkow,S. and Peek,R.M.,Jr (2001) Helicobacter pylori genetic diversity within the gastric niche of a single human host. Proc. Natl Acad. Sci. USA, 98, 1462514630.
[Abstract/Free Full Text] - Garcia-Vallve,S., Janssen,P.J. and Ouzounis,C.A. (2002) Genetic variation between Helicobacter pylori strains: gene acquisition or loss? Trends Microbiol., 10, 445447.[CrossRef][Web of Science][Medline]
- Akashi,H. and Gojobori,T. (2002) Metabolic efficiency and amino acid composition in the proteomes of Escherichia coli and Bacillus subtilis. Proc. Natl Acad. Sci. USA, 99, 36953700.
[Abstract/Free Full Text] - Tatusov,R.L., Natale,D.A., Garkavtsev,I.V., Tatusova,T.A., Shankavaram,U.T., Rao,B.S., Kiryutin,B., Galperin,M.Y., Fedorova,N.D. and Koonin,E.V. (2001) The COG database: new developments in phylogenetic classification of proteins from complete genomes. Nucleic Acids Res., 29, 2238.
[Abstract/Free Full Text] - Wright,F. (1990) The effective number of codons used in a gene. Gene, 87, 2329.[CrossRef][Web of Science][Medline]
- Freire-Picos,M.A., Gonzalez-Siso,M.I., Rodriguez-Belmonte,E., Rodriguez-Torres,A.M., Ramil,E. and Cerdan,M.E. (1994) Codon usage in Kluyveromyces lactis and in yeast cytochrome c-encoding genes. Gene, 139, 4349.[CrossRef][Web of Science][Medline]
- Gouy,M. and Gautier,C. (1982) Codon usage in bacteria-correlation with gene expressivity. Nucleic Acids Res., 10, 70557074.
[Abstract/Free Full Text] - Shields,D.C. and Sharp,P.M. (1987) Synonymous codon usage in Bacillus subtilis reflects both translational selection and mutational biases. Nucleic Acids Res., 15, 80238040.
[Abstract/Free Full Text]
This article has been cited by other articles:
![]() |
G.-Q. Hu, J.-T. Guo, Y.-C. Liu, and H. Zhu MetaTISA: Metagenomic Translation Initiation Site Annotator for improving gene start prediction Bioinformatics, July 15, 2009; 25(14): 1843 - 1845. [Abstract] [Full Text] [PDF] |
||||
![]() |
W. A. Findlay and R. J. Redfield Coevolution of DNA Uptake Sequences and Bacterial Proteomes Gen Biol Evol, June 22, 2009; 2009(0): 45 - 55. [Abstract] [Full Text] [PDF] |
||||
![]() |
F.-B. Guo and J.-B. Yuan Codon Usages of Genes on Chromosome, and Surprisingly, Genes in Plasmid are Primarily Affected by Strand-specific Mutational Biases in Lawsonia intracellularis DNA Res, April 1, 2009; 16(2): 91 - 104. [Abstract] [Full Text] [PDF] |
||||
![]() |
H. Suzuki, M. Sota, C. J. Brown, and E. M. Top Using Mahalanobis distance to compare genomic signatures between bacterial plasmids and chromosomes Nucleic Acids Res., December 1, 2008; 36(22): e147 - e147. [Abstract] [Full Text] [PDF] |
||||
![]() |
H. Suzuki, C. J. Brown, L. J. Forney, and E. M. Top Comparison of Correspondence Analysis Methods for Synonymous Codon Usage in Bacteria DNA Res, December 1, 2008; 15(6): 357 - 365. [Abstract] [Full Text] [PDF] |
||||
![]() |
L.-W. Jiang, K.-L. Lin, and C. L. Lu OGtree: a tool for creating genome trees of prokaryotes based on overlapping genes Nucleic Acids Res., July 1, 2008; 36(suppl_2): W475 - W480. [Abstract] [Full Text] [PDF] |
||||
![]() |
X. Peng Evidence for the horizontal transfer of an integrase gene from a fusellovirus to a pRN-like plasmid within a single strain of Sulfolobus and the implications for plasmid survival Microbiology, February 1, 2008; 154(2): 383 - 391. [Abstract] [Full Text] [PDF] |
||||
![]() |
P. Puigbo, A. Romeu, and S. Garcia-Vallve HEG-DB: a database of predicted highly expressed genes in prokaryotic complete genomes under translational selection Nucleic Acids Res., January 11, 2008; 36(suppl_1): D524 - D527. [Abstract] [Full Text] [PDF] |
||||
![]() |
J. J. Diaz-Mejia, C. F. Amabile-Cuevas, I. Rosas, and V. Souza An analysis of the evolutionary relationships of integron integrases, with emphasis on the prevalence of class 1 integrons in Escherichia coli isolates from clinical and environmental origins Microbiology, January 1, 2008; 154(1): 94 - 102. [Abstract] [Full Text] [PDF] |
||||
![]() |
J. Becq, M. C. Gutierrez, V. Rosas-Magallanes, J. Rauzier, B. Gicquel, O. Neyrolles, and P. Deschavanne Contribution of Horizontally Acquired Genomic Islands to the Evolution of the Tubercle Bacilli Mol. Biol. Evol., August 1, 2007; 24(8): 1861 - 1871. [Abstract] [Full Text] [PDF] |
||||
![]() |
H.-Y. Ou, X. He, E. M. Harrison, B. R. Kulasekara, A. B. Thani, A. Kadioglu, S. Lory, J. C. D. Hinton, M. R. Barer, Z. Deng, et al. MobilomeFINDER: web-based tools for in silico and experimental discovery of bacterial genomic islands Nucleic Acids Res., July 13, 2007; 35(suppl_2): W97 - W104. [Abstract] [Full Text] [PDF] |
||||
![]() |
P. Puigbo, E. Guzman, A. Romeu, and S. Garcia-Vallve OPTIMIZER: a web server for optimizing the codon usage of DNA sequences Nucleic Acids Res., July 13, 2007; 35(suppl_2): W126 - W131. [Abstract] [Full Text] [PDF] |
||||
![]() |
R. K. Azad and J. G. Lawrence Detecting laterally transferred genes: use of entropic clustering methods and genome position Nucleic Acids Res., July 9, 2007; 35(14): 4629 - 4639. [Abstract] [Full Text] [PDF] |
||||
![]() |
P. Puigbo, S. Garcia-Vallve, and J. O. McInerney TOPD/FMTS: a new software to compare phylogenetic trees Bioinformatics, June 15, 2007; 23(12): 1556 - 1558. [Abstract] [Full Text] [PDF] |
||||
![]() |
M. C. Miller, D. P. Keymer, A. Avelar, A. B. Boehm, and G. K. Schoolnik Detection and Transformation of Genome Segments That Differ within a Coastal Population of Vibrio cholerae Strains Appl. Envir. Microbiol., June 1, 2007; 73(11): 3695 - 3704. [Abstract] [Full Text] [PDF] |
||||
![]() |
M. Touchon and E. P. C. Rocha Causes of Insertion Sequences Abundance in Prokaryotic Genomes Mol. Biol. Evol., April 1, 2007; 24(4): 969 - 981. [Abstract] [Full Text] [PDF] |
||||
![]() |
A. M. Earl, R. Losick, and R. Kolter Bacillus subtilis Genome Diversity J. Bacteriol., February 1, 2007; 189(3): 1163 - 1170. [Abstract] [Full Text] [PDF] |
||||
![]() |
L. Krause, A. C. McHardy, T. W. Nattkemper, A. Puhler, J. Stoye, and F. Meyer GISMO--gene identification using a support vector machine for ORF classification Nucleic Acids Res., January 28, 2007; 35(2): 540 - 549. [Abstract] [Full Text] [PDF] |
||||
![]() |
S. H. Yoon, Y.-K. Park, S. Lee, D. Choi, T. K. Oh, C.-G. Hur, and J. F. Kim Towards pathogenomics: a web-based resource for pathogenicity islands Nucleic Acids Res., January 12, 2007; 35(suppl_1): D395 - D400. [Abstract] [Full Text] [PDF] |
||||
![]() |
G. Jin, L. Nakhleh, S. Snir, and T. Tuller Inferring Phylogenetic Networks by the Maximum Parsimony Criterion: A Case Study Mol. Biol. Evol., January 1, 2007; 24(1): 324 - 337. [Abstract] [Full Text] [PDF] |
||||
![]() |
G. S. Vernikos and J. Parkhill Interpolated variable order motifs for identification of horizontally acquired DNA: revisiting the Salmonella pathogenicity islands Bioinformatics, September 15, 2006; 22(18): 2196 - 2203. [Abstract] [Full Text] [PDF] |
||||
![]() |
E. M. Heizer Jr, D. W. Raiford, M. L. Raymer, T. E. Doom, R. V. Miller, and D. E. Krane Amino Acid Cost and Codon-Usage Biases in 6 Prokaryotic Genomes: A Whole-Genome Analysis Mol. Biol. Evol., September 1, 2006; 23(9): 1670 - 1680. [Abstract] [Full Text] [PDF] |
||||
![]() |
G. Erauso, K. M. Stedman, H. J. G. van de Werken, W. Zillig, and J. van der Oost Two novel conjugative plasmids from a single strain of Sulfolobus Microbiology, July 1, 2006; 152(7): 1951 - 1968. [Abstract] [Full Text] [PDF] |
||||
![]() |
F. Gao and C.-T. Zhang GC-Profile: a web-based tool for visualizing and analyzing the variation of GC content in genomic sequences. Nucleic Acids Res., July 1, 2006; 34(Web Server issue): W686 - W691. [Abstract] [Full Text] [PDF] |
||||
![]() |
V. Rosas-Magallanes, P. Deschavanne, L. Quintana-Murci, R. Brosch, B. Gicquel, and O. Neyrolles Horizontal Transfer of a Virulence Operon to the Ancestor of Mycobacterium tuberculosis Mol. Biol. Evol., June 1, 2006; 23(6): 1129 - 1135. [Abstract] [Full Text] [PDF] |
||||
![]() |
H.-Y. Ou, L.-L. Chen, J. Lonnen, R. R. Chaudhuri, A. B. Thani, R. Smith, N. J. Garton, J. Hinton, M. Pallen, M. R. Barer, et al. A novel strategy for the identification of genomic islands by comparative analysis of the contents and contexts of tRNA sites in closely related bacteria Nucleic Acids Res., January 9, 2006; 34(1): e3 - e3. [Abstract] [Full Text] [PDF] |
||||
![]() |
J. M. Coombs and T. Barkay New Findings on Evolution of Metal Homeostasis Genes: Evidence from Comparative Genome Analysis of Bacteria and Archaea Appl. Envir. Microbiol., November 1, 2005; 71(11): 7083 - 7091. [Abstract] [Full Text] [PDF] |
||||
![]() |
A. Tsirigos and I. Rigoutsos A sensitive, support-vector-machine method for the detection of horizontal gene transfers in viral, archaeal and bacterial genomes Nucleic Acids Res., July 8, 2005; 33(12): 3699 - 3707. [Abstract] [Full Text] [PDF] |
||||
![]() |
M. W. J. van Passel, A. C. M. Luyf, A. H. C. van Kampen, A. Bart, and A. van der Ende {delta}{rho}-Web, an online tool to assess composition similarity of individual nucleic acid sequences Bioinformatics, July 1, 2005; 21(13): 3053 - 3055. [Abstract] [Full Text] [PDF] |
||||
![]() |
E. Belda, A. Moya, and F. J. Silva Genome Rearrangement Distances and Gene Order Phylogeny in {gamma}-Proteobacteria Mol. Biol. Evol., June 1, 2005; 22(6): 1456 - 1467. [Abstract] [Full Text] [PDF] |
||||
![]() |
M. R. Gillings, M. P. Holley, H. W. Stokes, and A. J. Holmes Integrons in Xanthomonas: A source of species genome diversity PNAS, March 22, 2005; 102(12): 4419 - 4424. [Abstract] [Full Text] [PDF] |
||||
![]() |
D. J. V. Beste, J. Peters, T. Hooper, C. Avignone-Rossa, M. E. Bushell, and J. McFadden Compiling a Molecular Inventory for Mycobacterium bovis BCG at Two Growth Rates: Evidence for Growth Rate-Mediated Regulation of Ribosome Biosynthesis and Lipid Metabolism J. Bacteriol., March 1, 2005; 187(5): 1677 - 1684. [Abstract] [Full Text] [PDF] |
||||
![]() |
A. Tsirigos and I. Rigoutsos A new computational method for the detection of horizontal gene transfer events Nucleic Acids Res., February 16, 2005; 33(3): 922 - 933. [Abstract] [Full Text] [PDF] |
||||
![]() |
C. Dufraigne, B. Fertil, S. Lespinats, A. Giron, and P. Deschavanne Detection and characterization of horizontal transfers in prokaryotes using genomic signature Nucleic Acids Res., January 13, 2005; 33(1): e6 - e6. [Abstract] [Full Text] [PDF] |
||||
![]() |
A. Medrano-Soto, G. Moreno-Hagelsieb, P. Vinuesa, J. A. Christen, and J. Collado-Vides Successful Lateral Transfer Requires Codon Usage Compatibility Between Foreign Genes and Recipient Genomes Mol. Biol. Evol., October 1, 2004; 21(10): 1884 - 1894. [Abstract] [Full Text] [PDF] |
||||
![]() |
M. W. J. van Passel, A. Bart, R. J. A. Waaijer, A. C. M. Luyf, A. H. C. van Kampen, and A. van der Ende An in vitro strategy for the selective isolation of anomalous DNA from prokaryotic genomes Nucleic Acids Res., August 10, 2004; 32(14): e114 - e114. [Abstract] [Full Text] [PDF] |
||||
![]() |
X. Wang, J. F. Preston III, and T. Romeo The pgaABCD Locus of Escherichia coli Promotes the Synthesis of a Polysaccharide Adhesin Required for Biofilm Formation J. Bacteriol., May 1, 2004; 186(9): 2724 - 2734. [Abstract] [Full Text] [PDF] |
||||
![]() |
F.-B. Guo, H.-Y. Ou, and C.-T. Zhang ZCURVE: a new system for recognizing protein-coding genes in bacterial and archaeal genomes Nucleic Acids Res., March 15, 2003; 31(6): 1780 - 1789. [Abstract] [Full Text] [PDF] |
||||
| ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||








