Nucleic Acids Research Advance Access published online on July 17, 2007
Nucleic Acids Research, doi:10.1093/nar/gkm515
| ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
Computational Biology |
Automated recognition of retroviral sequences in genomic data—RetroTector©
1Department of Neuroscience, Physiology and 2Section of Virology, Department of Medical Sciences, Uppsala University, Uppsala and 3Department of Biology and Chemical Engineering, Mälardalens Högskola, Eskilstuna, Sweden
*To whom correspondence should be addressed. Tel: +46 18 611 55 93; Fax: +46 18 55 10 12; Email: Jonas.Blomberg{at}medsci.uu.se
Received March 2, 2007. Revised June 13, 2007. Accepted June 15, 2007.
Eukaryotic genomes contain many endogenous retroviral sequences (ERVs). ERVs are often severely mutated, therefore difficult to detect. A platform independent (Java) program package, RetroTector© (ReTe), was constructed. It has three basic modules: (i) detection of candidate long terminal repeats (LTRs), (ii) detection of chains of conserved retroviral motifs fulfilling distance constraints and (iii) attempted reconstruction of original retroviral protein sequences, combining alignment, codon statistics and properties of protein ends. Other features are prediction of additional open reading frames, automated database collection, graphical presentation and automatic classification. ReTe favors elements >1000-bp long due to its dependence on order of and distances between retroviral fragments. It detects single or low-copy-number elements. ReTe assigned a retroviral score of 890–2827 to 10 exogenous retroviruses from seven genera, and accurately predicted their genes. In a simulated model, ReTe was robust against mutational decay. The human genome was analyzed in 1–2 days on a LINUX cluster. Retroviral sequences were detected in divergent vertebrate genomes. Most ReTe detected chains were coincident with Repeatmasker output and the HERVd database. ReTe did not report most of the evolutionary old HERV-L related and MalR sequences, and is not yet tailored for single LTR detection. Nevertheless, ReTe rationally detects and annotates many retroviral sequences.
Present address: Patric Jern, Department of Molecular Biology and Microbiology, Tufts University School of Medicine, Boston, MA, USA.