Nucleic Acids Research Advance Access published online on July 9, 2009
Nucleic Acids Research, doi:10.1093/nar/gkp576
| ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
Computational Biology |
Detection of genomic islands via segmental genome heterogeneity
1Department of Computer Science, University of California San Diego, 9500 Gilman Drive; Mail Code 0404 La Jolla, CA 92093, 2Department of Biological Sciences, University of Pittsburgh Pittsburgh, PA 15260, 3Keck Graduate Institute of Applied Life Sciences, 535 Watson Drive Claremont and 4School of Mathematical Sciences Claremont Graduate University 711 North College Avenue Claremont, CA 91711, USA
*To whom correspondence should be addressed. Tel: +1 412 624 4204; Fax: +1 412 624 4759; Email: jlawrenc{at}pitt.edu
Received April 30, 2009. Revised June 19, 2009. Accepted June 22, 2009.
While the recognition of genomic islands can be a powerful mechanism for identifying genes that distinguish related bacteria, few methods have been developed to identify them specifically. Rather, identification of islands often begins with cataloging individual genes likely to have been recently introduced into the genome; regions with many putative alien genes are then examined for other features suggestive of recent acquisition of a large genomic region. When few phylogenetic relatives are available, the identification of alien genes relies on their atypical features relative to the bulk of the genes in the genome. The weakness of these bottom–up approaches lies in the difficulty in identifying robustly those genes which are atypical, or phylogenetically restricted, due to recent foreign ancestry. Herein, we apply an alternative top–down approach where bacterial genomes are recursively divided into progressively smaller regions, each with uniform composition. In this way, large chromosomal regions with atypical features are identified with high confidence due to the simultaneous analysis of multiple genes. This approach is based on a generalized divergence measure to quantify the compositional difference between segments in a hypothesis-testing framework. We tested the proposed genome island prediction algorithm on both artificial chimeric genomes and genuine bacterial genomes.