Nucleic Acids Research Advance Access published online on August 26, 2006
Nucleic Acids Research, doi:10.1093/nar/gkl578
| ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
© 2006 The Author(s)
This is an Open Access article distributed under the terms of the Creative Commons Attribution Non-Commercial License (http://creativecommons.org/licenses/by-nc/2.0/uk/) which permits unrestricted non-commercial use, distribution, and reproduction in any medium, provided the original work is properly cited.
Computational Biology |
Hematopoietic gene promoters subjected to a group-combinatorial study of DNA samples: identification of a megakaryocytic selective DNA signature
1 College of Engineering, Boston University Boston, MA, USA 2 Department of Biochemistry, Boston University School of Medicine 715 Albany Street, K225, Boston, MA 02118, USA
*To whom correspondence should be addressed. Tel: +1 617 638 5053; Fax: +1 617 638 5054; Email: ravid{at}biochem.bumc.bu.edu
*Correspondence may also be addressed to Yehonathan Hazony. Tel: +1 617 353 3270; Email: hazony{at}bu.edu
Received March 5, 2006. Revised July 10, 2006. Accepted July 24, 2006.
Identification of common sub-sequences for a group of functionally related DNA sequences can shed light on the role of such elements in cell-specific gene expression. In the megakaryocytic lineage, no one single unique transcription factor was described as linage specific, raising the possibility that a cluster of gene promoter sequences presents a unique signature. Here, the megakaryocytic gene promoter group, which consists of both human and mouse 5' non-coding regions, served as a case study. A methodology for group-combinatorial search has been implemented as a customized software platform. It extracts the longest common sequences for a group of related DNA sequences and allows for single gaps of varying length, as well as double- and multiple-gap sequences. The results point to common DNA sequences in a group of genes that is selectively expressed in megakaryocytes, and which does not appear in a large group of control, random and specific sequences. This suggests a role for a combination of these sequences in cell-specific gene expression in the megakaryocytic lineage. The data also point to an intrinsic cross-species difference in the organization of 5' non-coding sequences within the mammalian genomes. This methodology may be used for the identification of regulatory sequences in other lineages.