Skip Navigation


Nucleic Acids Research Advance Access originally published online on May 8, 2009
Nucleic Acids Research 2009 37(11):e79; doi:10.1093/nar/gkp310
This Article
Right arrow Full Text Freely available
Right arrow Print PDF (889K) Freely available
Right arrow Screen PDF (288K) Freely available
Right arrow Supplementary Data
Right arrowOA All Versions of this Article:
37/11/e79    most recent
gkp310v1
Right arrow Alert me when this article is cited
Right arrow Alert me if a correction is posted
Services
Right arrow Email this article to a friend
Right arrow Similar articles in this journal
Right arrow Similar articles in PubMed
Right arrow Alert me to new issues of the journal
Right arrow Add to My Personal Archive
Right arrow Download to citation manager
Right arrow Commercial Re-use Guidelines
for Open Access NAR Content
Google Scholar
Right arrow Articles by Leong, H. S.
Right arrow Articles by Kipling, D.
PubMed
Right arrow PubMed Citation
Right arrow Articles by Leong, H. S.
Right arrow Articles by Kipling, D.
Related Collections
Right arrow Microarray
Social Bookmarking
 Add to CiteULike   Add to Connotea   Add to Del.icio.us  
What's this?

Nucleic Acids Research, 2009, Vol. 37, No. 11 e79
© 2009 The Author(s)
This is an Open Access article distributed under the terms of the Creative Commons Attribution Non-Commercial License (http://creativecommons.org/licenses/by-nc/2.0/uk/) which permits unrestricted non-commercial use, distribution, and reproduction in any medium, provided the original work is properly cited.


Methods Online

Text-based over-representation analysis of microarray gene lists with annotation bias

Hui Sun Leong and David Kipling*

Department of Pathology, School of Medicine, Cardiff University, Heath Park, Cardiff CF14 4XN, UK

*To whom correspondence should be addressed. Tel: +44 (29) 206 87037; Email: kiplingd{at}cardiff.ac.uk

Received December 1, 2008. Revised April 14, 2009. Accepted April 16, 2009.

A major challenge in microarray data analysis is the functional interpretation of gene lists. A common approach to address this is over-representation analysis (ORA), which uses the hypergeometric test (or its variants) to evaluate whether a particular functionally defined group of genes is represented more than expected by chance within a gene list. Existing applications of ORA have been largely limited to pre-defined terminologies such as GO and KEGG. We report our explorations of whether ORA can be applied to a wider mining of free-text. We found that a hitherto underappreciated feature of experimentally derived gene lists is that the constituents have substantially more annotation associated with them, as they have been researched upon for a longer period of time. This bias, a result of patterns of research activity within the biomedical community, is a major problem for classical hypergeometric test-based ORA approaches, which cannot account for such bias. We have therefore developed three approaches to overcome this bias, and demonstrate their usability in a wide range of published datasets covering different species. A comparison with existing tools that use GO terms suggests that mining PubMed abstracts can reveal additional biological insight that may not be possible by mining pre-defined ontologies alone.


Add to CiteULike CiteULike   Add to Connotea Connotea   Add to Del.icio.us Del.icio.us    What's this?




Disclaimer: Please note that abstracts for content published before 1996 were created through digital scanning and may therefore not exactly replicate the text of the original print issues. All efforts have been made to ensure accuracy, but the Publisher will not be held responsible for any remaining inaccuracies. If you require any further clarification, please contact our Customer Services Department.