Nucleic Acids Research Advance Access originally published online on November 28, 2008
Nucleic Acids Research 2009 37(1):289-297; doi:10.1093/nar/gkn916
| ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
Nucleic Acids Research, 2009, Vol. 37, No. 1 289-297
© 2008 The Author(s)
This is an Open Access article distributed under the terms of the Creative Commons Attribution Non-Commercial License (http://creativecommons.org/licenses/by-nc/2.0/uk/) which permits unrestricted non-commercial use, distribution, and reproduction in any medium, provided the original work is properly cited.
Genomics |
Assessing the gene space in draft genomes
1UC Davis Genome Center, University of California Davis, Davis, CA, USA and 2The Wellcome Trust Sanger Institute, Genome Campus, Hinxton, CB10 1SA, UK
*To whom correspondence should be addressed. Tel: +1 530 754 4989; Email: ifkorf{at}ucdavis.edu
Received July 17, 2008. Revised October 28, 2008. Accepted October 30, 2008.
Genome sequencing projects have been initiated for a wide range of eukaryotes. A few projects have reached completion, but most exist as draft assemblies. As one of the main reasons to sequence a genome is to obtain its catalog of genes, an important question is how complete or completable the catalog is in unfinished genomes. To answer this question, we have identified a set of core eukaryotic genes (CEGs), that are extremely highly conserved and which we believe are present in low copy numbers in higher eukaryotes. From an analysis of a phylogenetically diverse set of eukaryotic genome assemblies, we found that the proportion of CEGs mapped in draft genomes provides a useful metric for describing the gene space, and complements the commonly used N50 length and x-fold coverage values.
The authors wish it to be known that, in their opinion, the first two authors should be regarded as joint First Authors.