Nucleic Acids Research, Vol 27, Issue 22 4405-4408, Copyright © 1999 by Oxford University Press
S Raghavan and CA Ouzounis
In the process of analysing the four available complete archaeal genomes,
we have noted that certain regions characterised as 'non- coding' exhibit
significant sequence similarity to other protein sequences from Archaea and
other species. Using established technology, we have identified a number of
potential protein coding regions in these putative 'non-coding' regions. We
have detected 524 such cases, of which 113 regions appear to code for
proteins present in archaeal or other species, while the remaining 411
regions are mostly start/stop definition conflicts. Of the 113 protein
coding regions, only 21 code for proteins with homologues of known
function. The number of novel coding sequences identified herein amounts to
1. 5% of the total genome entries, while the conflicting cases represent an
additional 5%. The observed differences between the four complete archaeal
genomes seem to reflect disparate approaches to genome annotation. Genome
sequence collections should be regularly checked to improve gene prediction
by sequence similarity and greater effort is required to make gene
definitions consistent across related species.
ARTICLES
Novel coding regions in four complete archaeal genomes
Computational Genomics Group, Research Programme, The European Bioinformatics Institute, EMBL Cambridge Outstation, Cambridge CB10 1SD, UK.
![]()
CiteULike
Connotea
Del.icio.us What's this?
This article has been cited by other articles:
![]() |
N. C. Kyrpides, C. A. Ouzounis, I. Iliopoulos, V. Vonstein, and R. Overbeek Analysis of the Thermotoga maritima genome combining a variety of sequence similarity and genome context tools Nucleic Acids Res., November 15, 2000; 28(22): 4573 - 4576. [Abstract] [Full Text] [PDF] |
||||
