Nucleic Acids Research, 2002, Vol. 30, No. 1 149-151
© 2002 Oxford University Press
Homophila: human disease gene cognates in Drosophila
1San Diego Supercomputer Center and 2Department of Biology, University of California, San Diego, 9500 Gilman Drive, La Jolla, CA 92093-0349, USA
Received August 16, 2001; Revised and Accepted October 10, 2001.
| ABSTRACT |
|---|
|
|
|---|
Although many human genes have been associated with genetic diseases, knowing which mutations result in disease phenotypes often does not explain the etiology of a specific disease. Drosophila melanogaster provides a powerful system in which to use genetic and molecular approaches to investigate human genetic diseases. Homophila is an intergenomic resource linking the human and fly genomes in order to stimulate functional genomic investigations in Drosophila that address questions about genetic disease in humans. Homophila provides a comprehensive linkage between the disease genes compiled in Online Mendelian Inheritance in Man (OMIM) and the complete Drosophila genomic sequence. Homophila is a relational database that allows searching based on human disease descriptions, OMIM number, human or fly gene names, and sequence similarity, and can be accessed at http://homophila.sdsc.edu.
| INTRODUCTION |
|---|
|
|
|---|
The continuing progress in the sequencing of the human genome will accelerate the identification of many genes involved in human diseases. Although a map location, nucleotide sequence and even the identity of the protein involved in a specific disease may be known, it is often difficult to decipher the etiology of the disease without employing an experimental organism. One approach to deciphering the role of these genes in specific diseases is to investigate the function of cognate genes in model organisms. A number of groups (14) have examined various sets of genes for cognates in Drosophila, and it is clear that other groups will employ this powerful genetic model organism in the future.
Online Mendelian Inheritance in Man (OMIM) (5) is a catalog of human genes and genetic disorders. The OMIM Morbid Map describes those disease genes with known cytogenetic positions. Additional disease-related genes can be found in OMIM entries as allelic variants of a given gene. The combination of these two types of OMIM entries gives a relatively complete view of known genes involved in human diseases.
Homophila is a systematic examination of these human disease-related genes and their Drosophila cognates. This cross-genomic analysis bridges the gap between the human disease and the Drosophila genome databases (6). Furthermore, this information is available online in a searchable format supported by a relational database management system (RDBMS).
| DATABASE CONTENT |
|---|
|
|
|---|
Homophila integrates information from four main sources: human disease gene information from OMIM, information relating OMIM entries to specific sequences from LocusLink (7), Drosophila nucleotide and protein sequence data (8), and annotation of Drosophila genes from FlyBase (9).
Construction of Homophila began with a list of OMIM disease entries (ones that either appear in the Morbid Map or contain an allelic variant notation). Because of the narrative nature of the OMIM database, which often discusses entirely unrelated proteins that may have been excluded as the causes of the disease, it is not possible to simply look up the sequences related to each disease in OMIM. A more involved procedure relying on the NCBI LocusLink database was required. Each OMIM disease entry was looked up in the LocusLink mim2loc table, which relates OMIM entries to NCBI locus records. Each locus record was then used to locate the correct protein sequence records using the LocusLink loc2UG, loc2acc and loc2ref tables, which specify entries in the NCBI UniGene, protein and RefSeq databases, respectively.
Each of the protein sequence entries was compared to the complete Drosophila genome sequence using the BLASTP program (10). BLAST comparisons were performed using BLAST v.2.09 with the standard BLOSUM 62 and expect = 1 x 1010 settings. The result of this procedure was a list of 5283 protein sequence entries associated with 911 OMIM disease loci and 666 matching Drosophila genes (Table 1).
|
A relational database has been implemented to allow queries on these results and is available online (http://homophila.sdsc.edu) using the MySQL RDBMS (11). PERL scripts using the DBI package are used to convert queries entered on the Homophila web pages to SQL queries to the actual RDBMS.
A complete list of P-element locations in the Drosophila genomic sequence was kindly provided by FlyBase (9). This information is added to the results of the database searches in order to identify cognate genes for which null mutants in genes already exist (e.g. the P-element falls within the protein coding sequence of a gene) or for which it would be straightforward to generate null deletion mutations (e.g. by imprecise P-element excision).
| ACCESS |
|---|
|
|
|---|
Homophila is available for both browsing and searching online at http://homophila.sdsc.edu. The database content is also available in a relational version or as flat files upon request.
Many OMIM disease entries have multiple protein sequences linked to the disease through LocusLink. The BLAST search results for each of the probe sequences are merged and used to create a list of best matching sequences (Fig. 1).
|
The precompiled list of best matches obviously gives an incomplete view of the correspondence between the gene probes for a specific disease and their Drosophila cognates. More complete information is available by directly searching the database. Searches based on OMIM entry number, human and Drosophila gene names and symbols, human disease description and text keywords are available. All entries matching the search query are displayed in a summary output (Fig. 2).
|
The information stored in Homophila is changing rapidly as new disease loci are sequenced. Homophila is updated approximately every 2 months using a semi-automated process to import source data and perform the requisite analyses.
| FUTURE DEVELOPMENT PLANS |
|---|
|
|
|---|
1. Complete automation of data update and analysis.
2. Extension of analyses to other genomes: Dictyostelium, Caenorhabditis elegans, Saccharomyces cerevisiae and Mus musculus.
3. Inclusion/linkage of more complete information about human diseases and Drosophila genes so that searches based on known human disease phenotypes and Drosophila mutant phenotypes can be used to identify potentially novel functional groupings of human and fly genes.
| ACKNOWLEDGEMENT |
|---|
This work is supported in part by the National Institutes of Health through a National Center for Research Resources program grant (P 41 RR08605-06) to the National Biomedical Computation Resource at the San Diego Supercomputer Center.
| FOOTNOTES |
|---|
* To whom correspondence should be addressed at: San Diego Supercomputer Center, University of California, San Diego, 9500 Gilman Drive, La Jolla, CA 92093-0537, USA. Tel: +1 858 534 8312; Fax: +1 858 822 0873; Email: gribskov{at}sdsc.edu
| REFERENCES |
|---|
|
|
|---|
-
1 Fortini,M.E., Skupski,M.P., Boguski,M.S. and Hariharan,I.K. (2000) A survey of human disease gene counterparts in the Drosophila genome. J. Cell Biol., 150, F23F30.
2 Littleton,J.T. and Ganetzky,B. (2000) Ion channels and synaptic organization: analysis of the Drosophila genome. Neuron, 26, 3543.[Web of Science][Medline]
3 Potter,C.J., Turenchalk,G.S. and Xu,T. (2000) Drosophila in cancer research: an expanding role. Trends Genet., 16, 3339.[Web of Science][Medline]
4 Rubin,G.M., Yandell,M.D., Wortman,J.R., Gabor Miklos,G.L., Nelson,C.R., Hariharan,I.K., Fortini,M.E., Li,P.W., Apweiler,R., Fleischmann,W. et al. (2000) Comparative genomics of the eukaryotes. Science, 287, 22042215.
5 Boyadijiev,S.A. and Jabs,E.W. (2000) Online Mendelian Inheritance in Man (OMIM) as a knowledgebase for human developmental disorders. Clin. Genet., 57, 253266.[Medline]
6 Reiter,L.T., Potocki,L., Chien,S., Gribskov,M. and Bier,E. (2001) A systematic analysis of human disease-associated gene sequences in Drosophila melanogaster. Genome Res., 11, 11141125.
7 Pruitt,K.D., Katz,K.S., Sicotte,H. and Maglott,D.R. (2000) Introducing RefSeq and LocusLink: curated human genome resources at the NCBI. Trends Genet., 16, 4447.[Web of Science][Medline]
8 Adams,M.D., Celniker,S.E., Holt,R.A., Evans,C.A., Gocayne,J.D., Amanatides,P.G., Scherer,S.E., Li,P.W., Hoskins,R.A., Galle,R.F. et al. (2000) The genome sequence of Drosophila melanogaster. Science, 287, 21852195.
9 The FlyBase Consortium (1999) The FlyBase database of the Drosophila genome projects and community literature. Nucleic Acids Res., 27, 8588. Updated article in this issue: Nucleic Acids Res. (2002), 30, 106108.
10 Altschul,S.F., Madden,T.L., Schäffer,A.A., Zhang,J., Zhang,Z., Miller,W. and Lipman,D.J. (1997) Gapped BLAST and PSI-BLAST: a new generation of protein database search programs. Nucleic Acids Res., 25, 33893402.
11 Dubois,P. (2000) MySQL. New Riders, IN.
This article has been cited by other articles:
![]() |
J. D. Armstrong and J. I. van Hemert Towards a virtual fly brain Phil Trans R Soc A, June 13, 2009; 367(1896): 2387 - 2397. [Abstract] [Full Text] [PDF] |
||||
![]() |
J. A. T. Dow Insights into the Malpighian tubule from functional genomics J. Exp. Biol., February 1, 2009; 212(3): 435 - 445. [Abstract] [Full Text] [PDF] |
||||
![]() |
J. Bai, R. Binari, J.-Q. Ni, M. Vijayakanthan, H.-S. Li, and N. Perrimon RNA interference screening in Drosophila primary cells for genes involved in muscle assembly and maintenance Development, April 15, 2008; 135(8): 1439 - 1449. [Abstract] [Full Text] [PDF] |
||||
![]() |
J. Branco, I. Al-Ramahi, L. Ukani, A. M. Perez, P. Fernandez-Funez, D. Rincon-Limas, and J. Botas Comparative analysis of genetic modifiers in Drosophila points to common and distinct mechanisms of pathogenesis among polyglutamine diseases Hum. Mol. Genet., February 1, 2008; 17(3): 376 - 390. [Abstract] [Full Text] [PDF] |
||||
![]() |
J. C. Sullivan, A. M. Reitzel, and J. R. Finnerty Upgrades to StellaBase facilitate medical and genetic studies on the starlet sea anemone, Nematostella vectensis Nucleic Acids Res., January 11, 2008; 36(suppl_1): D607 - D611. [Abstract] [Full Text] [PDF] |
||||
![]() |
M. W. Kankel, G. D. Hurlbut, G. Upadhyay, V. Yajnik, B. Yedvobnick, and S. Artavanis-Tsakonas Investigating the Genetic Circuitry of Mastermind in Drosophila, a Notch Signal Effector Genetics, December 1, 2007; 177(4): 2493 - 2505. [Abstract] [Full Text] [PDF] |
||||
![]() |
W. J. Wolfgang Exploring Protection from Methotrexate-Induced Teratogenicity in Flies Toxicol. Sci., October 1, 2007; 99(2): 363 - 365. [Full Text] [PDF] |
||||
![]() |
X. Zhu, N. Singh, C. Donnelly, P. Boimel, and F. Elefant The Cloning and Characterization of the Histone Acetyltransferase Human Homolog Dmel\TIP60 in Drosophila melanogaster: Dmel\TIP60 Is Essential for Multicellular Development Genetics, March 1, 2007; 175(3): 1229 - 1240. [Abstract] [Full Text] [PDF] |
||||
![]() |
R. J. Greenspan and H. A. Dierick 'Am not I a fly like thee?' From genes in fruit flies to behavior in humans Hum. Mol. Genet., October 1, 2004; 13(suppl_2): R267 - R273. [Abstract] [Full Text] [PDF] |
||||
![]() |
J. A. T. DOW and S. A. DAVIES Integrative Physiology and Functional Genomics of Epithelial Function in a Genetic Model Organism Physiol Rev, July 1, 2003; 83(3): 687 - 729. [Abstract] [Full Text] [PDF] |
||||
![]() |
J. C. Hendricks Genetic Models in Applied Physiology: Invited Review: Sleeping flies don't lie: the use of Drosophila melanogaster to study sleep and circadian rhythms J Appl Physiol, April 1, 2003; 94(4): 1660 - 1672. [Abstract] [Full Text] [PDF] |
||||
| ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||









