Published online 2 January 2004
Nucleic Acids Research, 2004, Vol. 32, No. 1 135-142
© 2004 Oxford University Press
Automatic extraction of mutations from Medline and cross-validation with OMIM
European Bioinformatics Institute, Wellcome Trust Genome Campus, Hinxton CB10 1SD, UK, 1 LION Bioscience AG, Waldhoferstrasse 98, D-69123 Heidelberg, Germany, 2 PheneX Pharmaceuticals AG, Im Neuenheimer Feld 515, D-69120 Heidelberg, Germany and 3 Cellzome AG, Meyerhofstrasse 1, D-69117 Heidelberg, Germany
*To whom correspondence should be addressed. Tel: +44 1223 492594; Fax: +44 1223 444468; Email: rebholz{at}ebi.ac.uk
Mutations help us to understand the molecular origins of diseases. Researchers, therefore, both publish and seek disease-relevant mutations in public databases and in scientific literature, e.g. Medline. The retrieval tends to be time-consuming and incomplete. Automated screening of the literature is more efficient. We developed extraction methods (called MEMA) that scan Medline abstracts for mutations. MEMA identified 24 351 singleton mutations in conjunction with a HUGO gene name out of 16 728 abstracts. From a sample of 100 abstracts we estimated the recall for the identification of mutationgene pairs to 35% at a precision of 93%. Recall for the mutation detection alone was >67% with a precision rate of >96%. This shows that our system produces reliable data. The subset consisting of protein sequence mutations (PSMs) from MEMA was compared to the entries in OMIM (20 503 entries versus 6699, respectively). We found 1826 PSMgene pairs to be in common to both datasets (cross-validated). This is 27% of all PSMgene pairs in OMIM and 91% of those pairs from OMIM which co-occur in at least one Medline abstract. We conclude that Medline covers a large portion of the mutations known to OMIM. Another large portion could be artificially produced mutations from mutagenesis experiments. Access to the database of extracted mutationgene pairs is available through the web pages of the EBI (refer to http://www.ebi. ac.uk/rebholz/index.html).
![]()
CiteULike
Connotea
Del.icio.us What's this?
This article has been cited by other articles:
![]() |
K. Darty, A. Denise, and Y. Ponty VARNA: Interactive drawing and editing of the RNA secondary structure Bioinformatics, August 1, 2009; 25(15): 1974 - 1975. [Abstract] [Full Text] [PDF] |
||||
![]() |
W. Xuan, P. Wang, S. J. Watson, and F. Meng Medline search engine for finding genetic markers with biological significance Bioinformatics, September 15, 2007; 23(18): 2477 - 2484. [Abstract] [Full Text] [PDF] |
||||
![]() |
J. G. Caporaso, W. A. Baumgartner Jr, D. A. Randolph, K. B. Cohen, and L. Hunter MutationFinder: a high-performance system for extracting point mutation mentions from text Bioinformatics, July 15, 2007; 23(14): 1862 - 1865. [Abstract] [Full Text] [PDF] |
||||
![]() |
J. Bonis, L. I. Furlong, and F. Sanz OSIRIS: a tool for retrieving literature about sequence variants Bioinformatics, October 15, 2006; 22(20): 2567 - 2569. [Abstract] [Full Text] [PDF] |
||||
