Published online 14 March 2005
Article |
Integration of text- and data-mining using ontologies successfully selects disease gene candidates
South African National Bioinformatics Institute, University of the Western Cape Belville 7535, South Africa 1Knowledge Extraction Laboratory, Institute for Infocomm Research 21 Heng Mui Keng Terrace, Singapore 119613
*To whom correspondence should be addressed. Tel: +27219592611; Fax: 27219592512; Email: nicki{at}sanbi.ac.za
Received November 19, 2004. Revised January 26, 2005. Accepted February 22, 2005.
Genome-wide techniques such as microarray analysis, Serial Analysis of Gene Expression (SAGE), Massively Parallel Signature Sequencing (MPSS), linkage analysis and association studies are used extensively in the search for genes that cause diseases, and often identify many hundreds of candidate disease genes. Selection of the most probable of these candidate disease genes for further empirical analysis is a significant challenge. Additionally, identifying the genes that cause complex diseases is problematic due to low penetrance of multiple contributing genes. Here, we describe a novel bioinformatic approach that selects candidate disease genes according to their expression profiles. We use the eVOC anatomical ontology to integrate text-mining of biomedical literature and data-mining of available human gene expression data. To demonstrate that our method is successful and widely applicable, we apply it to a database of 417 candidate genes containing 17 known disease genes. We successfully select the known disease gene for 15 out of 17 diseases and reduce the candidate gene set to 63.3% (±18.8%) of its original size. This approach facilitates direct association between genomic data describing gene expression and information from biomedical texts describing disease phenotype, and successfully prioritizes candidate genes according to their expression in disease-affected tissues.
Present address: Janet F.Kelso, Max-Planck-Institute for Evolutionary Anthropology, Deutscher Platz 6, D-04103 Leipzig, Germany
![]()
CiteULike
Connotea
Del.icio.us What's this?
This article has been cited by other articles:
![]() |
T. J. Jorgensen, I. Ruczinski, B. Kessing, M. W. Smith, Y. Y. Shugart, and A. J. Alberg Hypothesis-Driven Candidate Gene Association Studies: Practical Design and Analytical Considerations Am. J. Epidemiol., October 15, 2009; 170(8): 986 - 993. [Abstract] [Full Text] [PDF] |
||||
![]() |
E. Antezana, M. Kuiper, and V. Mironov Biological knowledge management: the emerging role of the Semantic Web technologies Brief Bioinform, July 1, 2009; 10(4): 392 - 407. [Abstract] [Full Text] [PDF] |
||||
![]() |
J. Chen, E. E. Bardes, B. J. Aronow, and A. G. Jegga ToppGene Suite for gene list enrichment analysis and candidate gene prioritization Nucleic Acids Res., July 1, 2009; 37(suppl_2): W305 - W311. [Abstract] [Full Text] [PDF] |
||||
![]() |
L. C. Tsoi, M. Boehnke, R. L. Klein, and W. J. Zheng Evaluation of genome-wide association study results through development of ontology fingerprints Bioinformatics, May 15, 2009; 25(10): 1314 - 1320. [Abstract] [Full Text] [PDF] |
||||
![]() |
S. Yilmaz, P. Jonveaux, C. Bicep, L. Pierron, M. Smail-Tabbone, and M.D. Devignes Gene-disease relationship discovery based on model-driven data integration and database view definition Bioinformatics, January 15, 2009; 25(2): 230 - 236. [Abstract] [Full Text] [PDF] |
||||
![]() |
N. Tiffin, I. Okpechi, C. Perez-Iratxeta, M. A. Andrade-Navarro, and R. Ramesar Prioritization of candidate disease genes for metabolic syndrome by computational analysis of its defining phenotypes Physiol Genomics, September 17, 2008; 35(1): 55 - 64. [Abstract] [Full Text] [PDF] |
||||
![]() |
S. Yu, S. Van Vooren, L.-C. Tranchevent, B. De Moor, and Y. Moreau Comparison of vocabularies, representations and ranking algorithms for gene prioritization by text mining Bioinformatics, August 15, 2008; 24(16): i119 - i125. [Abstract] [Full Text] [PDF] |
||||
![]() |
L.-C. Tranchevent, R. Barriot, S. Yu, S. Van Vooren, P. Van Loo, B. Coessens, B. De Moor, S. Aerts, and Y. Moreau ENDEAVOUR update: a web resource for gene prioritization in multiple species Nucleic Acids Res., July 1, 2008; 36(suppl_2): W377 - W384. [Abstract] [Full Text] [PDF] |
||||
![]() |
M. G. Kann Protein interactions and disease: computational approaches to uncover the etiology of diseases Brief Bioinform, September 1, 2007; 8(5): 333 - 346. [Abstract] [Full Text] [PDF] |
||||
![]() |
C. Perez-Iratxeta, P. Bork, and M. A. Andrade-Navarro Update of the G2D tool for prioritization of gene candidates to inherited diseases Nucleic Acids Res., July 13, 2007; 35(suppl_2): W212 - W216. [Abstract] [Full Text] [PDF] |
||||
![]() |
S. Van Vooren, B. Thienpont, B. Menten, F. Speleman, B. D. Moor, J. Vermeesch, and Y. Moreau Mapping biomedical concepts onto the human genome by mining literature on chromosomal aberrations Nucleic Acids Res., April 3, 2007; 35(8): 2533 - 2543. [Abstract] [Full Text] [PDF] |
||||
![]() |
Y. A. Lussier and Y. Liu Computational Approaches to Phenotyping: High-Throughput Phenomics Proceedings of the ATS, January 1, 2007; 4(1): 18 - 25. [Abstract] [Full Text] [PDF] |
||||
![]() |
R. A. George, J. Y. Liu, L. L. Feng, R. J. Bryson-Richardson, D. Fatkin, and M. A. Wouters Analysis of protein sequence and interaction data for candidate disease gene prediction Nucleic Acids Res., November 14, 2006; 34(19): e130 - e130. [Abstract] [Full Text] [PDF] |
||||
![]() |
S. Rossi, D. Masotti, C. Nardini, E. Bonora, G. Romeo, E. Macii, L. Benini, and S. Volinia TOM: a web-based integrated approach for identification of candidate disease genes. Nucleic Acids Res., July 1, 2006; 34(Web Server issue): W285 - W292. [Abstract] [Full Text] [PDF] |
||||
![]() |
N. Tiffin, E. Adie, F. Turner, H. G. Brunner, M. A. van Driel, M. Oti, N. Lopez-Bigas, C. Ouzounis, C. Perez-Iratxeta, M. A. Andrade-Navarro, et al. Computational disease gene identification: a concert of methods prioritizes type 2 diabetes and obesity candidate genes Nucleic Acids Res., June 6, 2006; 34(10): 3067 - 3081. [Abstract] [Full Text] [PDF] |
||||
![]() |
E. A. Adie, R. R. Adams, K. L. Evans, D. J. Porteous, and B. S. Pickard SUSPECTS: enabling fast and effective prioritization of positional candidates Bioinformatics, March 15, 2006; 22(6): 773 - 774. [Abstract] [Full Text] [PDF] |
||||
![]() |
C. Cagnoli, C. Mariotti, F. Taroni, M. Seri, A. Brussino, C. Michielotto, M. Grisoli, D. Di Bella, N. Migone, C. Gellera, et al. SCA28, a novel form of autosomal dominant cerebellar ataxia on chromosome 18p11.22-q11.2 Brain, January 1, 2006; 129(1): 235 - 242. [Abstract] [Full Text] [PDF] |
||||






