Published online 20 July 2005
Article |
Genome annotation errors in pathway databases due to semantic ambiguity in partial EC numbers
Bioinformatics Research Group, Artificial Intelligence Center, SRI International Menlo Park, CA 94025, USA
*To whom correspondence should be addressed. Tel: +1 650 859 5669; Fax: +1 650 859 3735; Email: green{at}ai.sri.com
Received April 11, 2005. Revised June 29, 2005. Accepted June 29, 2005.
We report on a new type of systematic annotation error in genome and pathway databases that results from the misinterpretation of partial Enzyme Commission (EC) numbers such as 1.1.1.-. This error results in the assignment of genes annotated with a partial EC number to many or all biochemical reactions that are annotated with the same partial EC number. That inference is faulty because of the ambiguous nature of partial EC numbers. We have observed this type of error in multiple databases, including KEGG, VIMSS and IMG, all of which assign genes to KEGG pathways. The Escherichia coli subset of the KEGG database exhibits this error for 6.8% of its gene-reaction assignments. For example, KEGG contains 17 reactions that are annotated with EC 1.1.1.-. A group of three E.coli genes, b1580 [putative dehydrogenase, NAD(P)-binding, starvation-sensing protein], b3787 (UDP-N-acetyl-D-mannosaminuronic acid dehydrogenase) and b0207 (2,5-diketo-D-gluconate reductase B), is assigned to 15 of those reactions, despite experimental evidence indicating different single functions for two of the three genes. Furthermore, the databases (DBs) are internally inconsistent in that the description of gene functions for genes with partial EC numbers is inconsistent with the activities implied by reactions to which the genes were assigned. We infer that these inconsistencies result from the processing used to match gene products to reactions within KEGG's metabolic pathways. These errors affect scientists who use these DBs as online encyclopedias and they affect bioinformaticists who use these DBs to train and validate newly developed algorithms.
![]()
CiteULike
Connotea
Del.icio.us What's this?
This article has been cited by other articles:
![]() |
N. Goffard, T. Frickey, and G. Weiller PathExpress update: the enzyme neighbourhood method of associating gene-expression data with metabolic pathways Nucleic Acids Res., July 1, 2009; 37(suppl_2): W335 - W339. [Abstract] [Full Text] [PDF] |
||||
![]() |
D. A. R. S. Latino, Q.-Y. Zhang, and J. Aires-de-Sousa Genome-scale classification of metabolic reactions and assignment of EC numbers with self-organizing maps Bioinformatics, October 1, 2008; 24(19): 2236 - 2244. [Abstract] [Full Text] [PDF] |
||||
![]() |
M. R. Stam, E. G.J. Danchin, C. Rancurel, P. M. Coutinho, and B. Henrissat Dividing the large glycoside hydrolase family 13 into subfamilies: towards improved functional annotations of {alpha}-amylase-related proteins Protein Eng. Des. Sel., December 1, 2006; 19(12): 555 - 562. [Abstract] [Full Text] [PDF] |
||||
![]() |
I. Friedberg Automated protein function prediction--the genomic challenge Brief Bioinform, September 1, 2006; 7(3): 225 - 242. [Abstract] [Full Text] [PDF] |
||||
![]() |
D. Vallenet, L. Labarre, Z. Rouy, V. Barbe, S. Bocs, S. Cruveiller, A. Lajus, G. Pascal, C. Scarpelli, and C. Medigue MaGe: a microbial genome annotation system supported by synteny results Nucleic Acids Res., January 10, 2006; 34(1): 53 - 65. [Abstract] [Full Text] [PDF] |
||||
![]() |
P. D. Karp, C. A. Ouzounis, C. Moore-Kochlacs, L. Goldovsky, P. Kaipa, D. Ahren, S. Tsoka, N. Darzentas, V. Kunin, and N. Lopez-Bigas Expansion of the BioCyc collection of pathway/genome databases to 160 genomes Nucleic Acids Res., October 24, 2005; 33(19): 6083 - 6089. [Abstract] [Full Text] [PDF] |
||||



