Nucleic Acids Research Advance Access published online on April 29, 2008
Nucleic Acids Research, doi:10.1093/nar/gkn227
| ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
Computational Genomics |
Designating eukaryotic orthology via processed transcription units
1Institute of Biomedical Informatics, National Yang-Ming University, 2Bioinformatics Program, Taiwan International Graduate Program, Academia Sinica, 3Institute of Biomedical Sciences, Academia Sinica, Taipei, 4Institute of Bioinformatics, National Chiao-Tung University, Hsinchu and 5Institute of Statistical Sciences, Academia Sinica, Taipei, Taiwan
*To whom correspondence should be addressed. Tel: +(886) 2 2652 3967; Fax: +(886) 2 2782 7654; Email: wenlin{at}ibms.sinica.edu.tw
Received January 4, 2008. Revised April 8, 2008. Accepted April 11, 2008.
Orthology is a widely used concept in comparative and evolutionary genomics. In addition to prokaryotic orthology, delineating eukaryotic orthology has provided insight into the evolution of higher organisms. Indeed, many eukaryotic ortholog databases have been established for this purpose. However, unlike prokaryotes, alternative splicing (AS) has hampered eukaryotic orthology assignments. Therefore, existing databases likely contain ambiguous eukaryotic ortholog relationships and possibly misclassify alternatively spliced protein isoforms as in-paralogs, which are duplicated genes that arise following speciation. Here, we propose a new approach for designating eukaryotic orthology using processed transcription units, and we present an orthology database prototype using the human and mouse genomes. Currently existing programs cover less than 69% of the human reference sequences when assigning human/mouse orthologs. In contrast, our method encompasses up to 80% of the human reference sequences. Moreover, the ortholog database presented herein is more than 92% consistent with the existing databases. In addition to managing AS, this approach is capable of identifying orthologs of embedded genes and fusion genes using syntenic evidence. In summary, this new approach is sensitive, specific and can generate a more comprehensive and accurate compilation of eukaryotic orthologs.