Nucleic Acids Research, 2004, Vol. 32, Database issue D78-D81
© 2004 Oxford University Press
DBTSS, DataBase of Transcriptional Start Sites: progress report 2004
1 Human Genome Center, Institute of Medical Science, University of Tokyo, 4-6-1 Shirokanedai, Minato-ku, Tokyo, 108-8639, Japan, and 2 Undergraduate Program for Bioinformatics and Systems Biology, Faculty of Science, University of Tokyo, 7-3-1 Hongo, Bunkyo-ku, Tokyo, 113-0033, Japan
*To whom correspondence should be addressed. Tel: +81 3 5449 5343; Fax: +81 3 5449 5416; Email: ysuzuki{at}ims.u-tokyo.ac.jp
+BP192706BP383670
Received September 15, 2003; Revised and Accepted October 1, 2003
DDBJ/EMBL/GenBank accession nos+.
| ABSTRACT |
|---|
|
|
|---|
DBTSS (http://dbtss.hgc.jp) was originally constructed based on a collection of experimentally determined TSSs of human genes. Since its first release in 2002, it has been updated several times. First, the amount of stored data has increased significantly: e.g. the number of clones that match both the RefSeq mRNA set and the genome sequence has increased from 111 382 to 190 964, now covering 11 234 genes. Second, the positions of SNPs in dbSNP were displayed on the upstream regions of contained human genes. Third, DBTSS now covers other species such as mouse and the human malaria parasite. It will become a central database containing data for many more species with oligo-capping and related methods. Lastly, the database now serves for comparative promoter analyses: in the current version, comparative views of potentially orthologous promoters from human and mouse are presented with an additional function of searching potential transcription-factor binding sites, which are either conserved or diverged between species.
| INTRODUCTION |
|---|
|
|
|---|
The knowledge of exact transcriptional start sites (TSSs) of genes is valuable in many ways: it makes the prediction of translational start sites more accurate; it can be used for exploring sequence determinants of TSSs; and it makes the analysis of upstream regulatory regions (promoters) more precise. In principle, information of a TSS is obtained by mapping the corresponding transcript onto the genome sequence. Nevertheless, it is widely known that many mRNA sequence data stored in public databases, lack information about their 5' ends because of the difficulty in obtaining full-length cDNAs. Thus, even after the completion of human genome sequencing, it is not easy to locate TSSs systematically. To overcome this problem, we have developed a method to construct full-length enriched cDNA libraries using a cap selection technique, the oligo-capping method, and have been systematically collecting full-length cDNA data with this method [(1); T.Ota et al. submitted]. Initial computational characterization of human TSSs has been carried out (2,3) and a database [DataBase of Transcriptional Start Sites (DBTSS)] containing the TSS information of 7889 human genes has been constructed (4). In this report, we summarize the updates of DBTSS since its first release, including its new departure as a basis of comparative promoter analyses.
| NEW FEATURES |
|---|
|
|
|---|
Compared with its initial version, the current DBTSS (version 3) has been upgraded in at least five ways. First, the number of processed one-pass human cDNA clones has increased significantly (from 217 402 to 400 225). Since one of the important findings from our TSS analysis was that the TSS position of a gene is not always fixed but rather often fluctuates for
50 bp on average (3), the distribution of TSS positions should become clearer as the number of mapped cDNA clones increases. As always, we constructed a so-called RefFull sequence set (11 234 sequences) by extending the 5'-end sequences of RefSeq mRNA sequences (5), if necessary. On average, 6042 sequences were extended by 71.6 bp. At the genomic level, the average difference between 5'-ends of two data sets becomes 4396 bp because of internal introns. Thus, it is clear that our data make promoter analysis of human genes much easier. For more details of the statistics of the DBTSS, see the Statistics section of the DBTSS web page. Second, to facilitate promoter analysis of human genes, we mapped the positions of single nucleotide polymorphisms (SNPs) stored in a public database, dbSNP (5), on the 1000:+200 region of each representative TSS for each human gene (a sample output is shown in Fig. 1). These SNPs are candidates of functional regulatory SNPs (rSNPs) that affect the promoter activity. We also plan to add SNP data from other sources. In DBTSS, it is also possible to enlist the name of genes located within a specified distance from each SNP.
|
The third, and probably the most important, upgrade of DBTSS is that it now supports data from multiple species. To date, we have constructed many full-length cDNA libraries of various species upon requests from many researchers. In addition, large-scale collections of cDNAs determined using a related method by Yoshihide Hayashizakis group are also publicly available (6,7). In the current version, we added the data of 2490 clones of Plasmodium falciparum, the human malaria parasite (8) and 580 209 full-length cDNA sequences of Mus musculus (7). The number of Ref-full members for mouse is 6875 (for more details, see Y.Suzuki et al., submitted). We will add data for other species whenever we get the agreement. They include data for Caenorhabditis elegans, chimpanzee, macaque, Cyanidioschyzon melorae (unicellular red alga), zebrafish and sorghum.
The remaining two novel features will be explained in the next section.
| PROMOTER COMPARISON AND SEARCH OF CIS-ELEMENTS |
|---|
|
|
|---|
The fourth novel feature of the DBTSS (version 3) is that it provides users with comparative views of human and mouse promoters that are probably orthologous. The potentially orthologous gene set was obtained from the LocusLink database (5) and our own sequence comparison. As a result, promoters of 3324 gene pairs can now be displayed. In each pair, locally similar sequence segments were detected by a local alignment program, LALIGN (9) and their correspondences are shown graphically (Fig. 2).
|
The fifth novel feature is a function for locating positions similar to known transcription-factor binding sites, which are stored in the TRANSFAC database (10). More specifically, we support TRANSFAC Public-based search (for searches using TRANSFAC Professional, which is a commercial version, users should follow its condition of use, which are shown in our web page). To reduce the number of potentially spurious hits, users can choose various levels of cut-off values and target regions/strands. Moreover, it is also possible to restrict hits within conserved regions between the two species. It is also possible for users to enlist gene names that specify combinations of the above conditions: e.g. genes that harbor both potential binding sites of factors A and B on their upstream regions could be selected with arbitrary cut-off values. With this function, the DBTSS can now be regarded as a platform of systematic promoter analyses.
DBTSS is available at http://dbtss.hgc.jp/ and will continue to expand, incorporating our in-house data and others.
| ACKNOWLEDGEMENTS |
|---|
We thank T. Hasui, K. Abe, M. Morinaga, M. Ishizawa, M. Kawamura, T. Mizuno, A. Kanai and H. Hata for technical support; J. Mizushima-Sugano and E. Nakajima for helpful discussion; Y. Hayashizaki for permission to incorporate their mouse data into DBTSS; and E. Wingender and A. Kel for enabling TRANSFAC-based search. This study was supported by a Grant-in-Aid for Scientific Research on Priority Areas and by special coordination funds for promoting science and technology (SCF), both from the Ministry of Education, Culture, Sports, Science and Technology in Japan.
| REFERENCES |
|---|
|
|
|---|
- Suzuki,Y. and Sugano,S. (2003) Construction of a full-length enriched and a 5'-end enriched cDNA library using the oligo-capping method. Methods Mol. Biol., 221, 7391.[Medline]
- Suzuki,Y., Tsunoda,T., Sese,J., Taira,H., Mizushima-Sugano,J., Hata,H., Ota,T., Isogai,T., Tanaka,T., Nakamura,Y. et al. (2001) Identification and characterization of the potential promoter regions of 1031 kinds of human genes. Genome Res., 11, 677684.
[Abstract/Free Full Text] - Suzuki,Y., Taira,H., Tsunoda,T., Mizushima-Sugano,J., Sese,J., Hata,H., Ota,T., Isogai,T., Tanaka,T., Morishita,S. et al. (2001) Diverse transcriptional initiation revealed by fine, large-scale mapping of mRNA start sites. EMBO Rep., 2, 388393.[CrossRef][ISI][Medline]
- Suzuki,Y. Yamashita,R., Nakai,K. and Sugano S. (2002) DBTSS: DataBase of human transcriptional start sites and full-length cDNAs. Nucleic Acids Res., 30, 328331.
[Abstract/Free Full Text] - Wheeler,D.L., Church,D.M., Federhen,S., Lash,A.E., Madden,T.L., Pontius,J.U., Schuler,G.D., Schriml,L.M., Sequeira,E., Tatusova,T.A. and Wagner,L. (2003) Database resources of the National Center for Biotechnology. Nucleic Acids Res., 31, 2833.
[Abstract/Free Full Text] - Carninci,P. and Hayashizaki,Y. (1999) High-efficiency full-length cDNA cloning. Methods Enzymol., 303, 1944.[ISI][Medline]
- The FANTOM consortium and the RIKEN Genome Exploration Research Group Phase I & II Team (2002) Analysis of the mouse transcriptome based on functional annotation of 60,770 full-length cDNAs. Nature, 420, 563573.[CrossRef][Medline]
- Watanabe,J., Sasaki,M., Suzuki,Y. and Sugano,S. (2002) Analysis of transcriptomes of human malaria parasite Plasmodium falciparum using full-length enriched library: identification of novel genes and diverse transcription start sites of messenger RNAs. Gene, 291, 105113.[CrossRef][ISI][Medline]
- Huang,X.Q., Hardison,R.C. and Miller,W. (1990) A space-efficient algorithm for local similarities. Comput. Appl. Biosci., 16, 373381.
- Matys,V., Fricke,E., Geffers,R., Gossling,E., Haubrock,M., Hehl,R., Hornischer,K., Karas,D., Kel,A. E,, Kel-Margoulis,O.V. et al. (2003) TRANSFAC: transcriptional regulation, from patterns to profiles. Nucleic Acids Res., 31, 374378.
[Abstract/Free Full Text]
This article has been cited by other articles:
![]() |
N.-O. Chimge, A. V. Makeyev, F. H. Ruddle, and D. Bayarsaihan Identification of the TFII-I family target genes in the vertebrate genome PNAS, July 1, 2008; 105(26): 9006 - 9010. [Abstract] [Full Text] [PDF] |
||||
![]() |
R. Yamashita, Y. Suzuki, N. Takeuchi, H. Wakaguri, T. Ueda, S. Sugano, and K. Nakai Comprehensive detection of human terminal oligo-pyrimidine (TOP) genes and analysis of their characteristics Nucleic Acids Res., June 1, 2008; 36(11): 3707 - 3715. [Abstract] [Full Text] [PDF] |
||||
![]() |
M. O. Hoque, M. S. Kim, K. L. Ostrow, J. Liu, G. B. A. Wisman, H. L. Park, M. L. Poeta, C. Jeronimo, R. Henrique, A. Lendvai, et al. Genome-Wide Promoter Analysis Uncovers Portions of the Cancer Methylome Cancer Res., April 15, 2008; 68(8): 2661 - 2670. [Abstract] [Full Text] [PDF] |
||||
![]() |
I. Roxrud, C. Raiborg, N. M. Pedersen, E. Stang, and H. Stenmark An endosomally localized isoform of Eps15 interacts with Hrs to mediate degradation of epidermal growth factor receptor J. Cell Biol., March 24, 2008; 180(6): 1205 - 1218. [Abstract] [Full Text] [PDF] |
||||
![]() |
M. C. Frith, E. Valen, A. Krogh, Y. Hayashizaki, P. Carninci, and A. Sandelin A code for transcription initiation in mammalian genomes Genome Res., January 1, 2008; 18(1): 1 - 12. [Abstract] [Full Text] [PDF] |
||||
![]() |
C. D. Schmid, T. Sengstag, P. Bucher, and M. Delorenzi MADAP, a flexible clustering tool for the interpretation of one-dimensional genome annotation data Nucleic Acids Res., July 13, 2007; 35(suppl_2): W201 - W205. [Abstract] [Full Text] [PDF] |
||||
![]() |
S. Vardhanabhuti, J. Wang, and S. Hannenhalli Position and distance specificity are important determinants of cis-regulatory motifs in addition to evolutionary conservation Nucleic Acids Res., May 11, 2007; 35(10): 3203 - 3213. [Abstract] [Full Text] [PDF] |
||||
![]() |
A. P. W. Funnell, C. A. Maloney, L. J. Thompson, J. Keys, M. Tallack, A. C. Perkins, and M. Crossley Erythroid Kruppel-Like Factor Directly Activates the Basic Kruppel-Like Factor Gene in Erythroid Cells Mol. Cell. Biol., April 1, 2007; 27(7): 2777 - 2790. [Abstract] [Full Text] [PDF] |
||||
![]() |
Y. Tokusumi, Y. Ma, X. Song, R. H. Jacobson, and S. Takada The New Core Promoter Element XCPE1 (X Core Promoter Element 1) Directs Activator-, Mediator-, and TATA-Binding Protein-Dependent but TFIID-Independent RNA Polymerase II Transcription from TATA-Less Promoters Mol. Cell. Biol., March 1, 2007; 27(5): 1844 - 1858. [Abstract] [Full Text] [PDF] |
||||
![]() |
W. Y. I. Chan, G. A. Follows, G. Lacaud, J. E. Pimanda, J.-R. Landry, S. Kinston, K. Knezevic, S. Piltz, I. J. Donaldson, L. Gambardella, et al. The paralogous hematopoietic regulators Lyl1 and Scl are coregulated by Ets and GATA factors, but Lyl1 cannot rescue the early Scl-/- phenotype Blood, March 1, 2007; 109(5): 1908 - 1916. [Abstract] [Full Text] [PDF] |
||||
![]() |
A. E. Vinogradov 'Genome design' model and multicellular complexity: golden middle Nucleic Acids Res., November 6, 2006; 34(20): 5906 - 5914. [Abstract] [Full Text] [PDF] |
||||
![]() |
P. Luykx, I. V. Bajic, and S. Khuri NXSensor web tool for evaluating DNA for nucleosome exclusion sequences and accessibility to binding factors. Nucleic Acids Res., July 1, 2006; 34(Web Server issue): W560 - W565. [Abstract] [Full Text] [PDF] |
||||
![]() |
S. J. Cooper, N. D. Trinklein, E. D. Anton, L. Nguyen, and R. M. Myers Comprehensive analysis of transcriptional promoter structure and function in 1% of the human genome Genome Res., January 1, 2006; 16(1): 1 - 10. [Abstract] [Full Text] [PDF] |
||||
![]() |
K. Kimura, A. Wakamatsu, Y. Suzuki, T. Ota, T. Nishikawa, R. Yamashita, J.-i. Yamamoto, M. Sekine, K. Tsuritani, H. Wakaguri, et al. Diversification of transcriptional modulation: Large-scale identification and characterization of putative alternative promoters of human genes Genome Res., January 1, 2006; 16(1): 55 - 65. [Abstract] [Full Text] [PDF] |
||||
![]() |
E. Blanco, D. Farre, M. M. Alba, X. Messeguer, and R. Guigo ABS: a database of Annotated regulatory Binding Sites from orthologous promoters Nucleic Acids Res., January 1, 2006; 34(suppl_1): D63 - D67. [Abstract] [Full Text] [PDF] |
||||
![]() |
C. D. Schmid, R. Perier, V. Praz, and P. Bucher EPD in its twentieth year: towards complete promoter coverage of selected model organisms Nucleic Acids Res., January 1, 2006; 34(suppl_1): D82 - D85. [Abstract] [Full Text] [PDF] |
||||
![]() |
R. Yamashita, Y. Suzuki, H. Wakaguri, K. Tsuritani, K. Nakai, and S. Sugano DBTSS: DataBase of Human Transcription Start Sites, progress report 2006 Nucleic Acids Res., January 1, 2006; 34(suppl_1): D86 - D89. [Abstract] [Full Text] [PDF] |
||||
![]() |
H. Sun, S. K. Palaniswamy, T. T. Pohar, V. X. Jin, T. H.-M. Huang, and R. V. Davuluri MPromDb: an integrated resource for annotation and visualization of mammalian gene promoters and ChIP-chip experimental data Nucleic Acids Res., January 1, 2006; 34(suppl_1): D98 - D103. [Abstract] [Full Text] [PDF] |
||||
![]() |
A. A. Sharov, D. B. Dudekula, and M. S. H. Ko CisView: A Browser and Database of cis-regulatory Modules Predicted in the Mouse Genome DNA Res, January 1, 2006; 13(3): 123 - 134. [Abstract] [Full Text] [PDF] |
||||
![]() |
X. Chen, J.-m. Wu, K. Hornischer, A. Kel, and E. Wingender TiProD: the Tissue-specific Promoter Database Nucleic Acids Res., January 1, 2006; 34(suppl_1): D104 - D107. [Abstract] [Full Text] [PDF] |
||||
![]() |
H. Kawaji, T. Kasukawa, S. Fukuda, S. Katayama, C. Kai, J. Kawai, P. Carninci, and Y. Hayashizaki CAGE Basic/Analysis Databases: the CAGE resource for comprehensive promoter analysis Nucleic Acids Res., January 1, 2006; 34(suppl_1): D632 - D636. [Abstract] [Full Text] [PDF] |
||||
![]() |
M. P. Lee, K. Howcroft, A. Kotekar, H. H. Yang, K. H. Buetow, and D. S. Singer ATG deserts define a novel core promoter subclass Genome Res., September 1, 2005; 15(9): 1189 - 1197. [Abstract] [Full Text] [PDF] |
||||
![]() |
K. Florquin, Y. Saeys, S. Degroeve, P. Rouze, and Y. Van de Peer Large-scale structural analysis of the core promoter in mammalian and plant genomes Nucleic Acids Res., July 27, 2005; 33(13): 4255 - 4264. [Abstract] [Full Text] [PDF] |
||||
![]() |
T. Morinaga, A. Enomoto, Y. Shimono, F. Hirose, N. Fukuda, A. Dambara, M. Jijiwa, K. Kawai, K. Hashimoto, M. Ichihara, et al. GDNF-inducible zinc finger protein 1 is a sequence-specific transcriptional repressor that binds to the HOXA10 gene regulatory region Nucleic Acids Res., July 26, 2005; 33(13): 4191 - 4201. [Abstract] [Full Text] [PDF] |
||||
![]() |
S. Kamalakaran, S. K. Radhakrishnan, and W. T. Beck Identification of Estrogen-responsive Genes Using a Genome-wide Analysis of Promoter Elements for Transcription Factor Binding Sites J. Biol. Chem., June 3, 2005; 280(22): 21491 - 21497. [Abstract] [Full Text] [PDF] |
||||
![]() |
T. H. Kim, L. O. Barrera, C. Qu, S. Van Calcar, N. D. Trinklein, S. J. Cooper, R. M. Luna, C. K. Glass, M. G. Rosenfeld, R. M. Myers, et al. Direct isolation and identification of promoters in the human genome Genome Res., June 1, 2005; 15(6): 830 - 839. [Abstract] [Full Text] [PDF] |
||||
![]() |
S. Desagher, D. Severac, A. Lipkin, C. Bernis, W. Ritchie, A. Le Digarcher, and L. Journot Genes Regulated in Neurons Undergoing Transcription-dependent Apoptosis Belong to Signaling Pathways Rather than the Apoptotic Machinery J. Biol. Chem., February 18, 2005; 280(7): 5693 - 5702. [Abstract] [Full Text] [PDF] |
||||
![]() |
F. Zhao, Z. Xuan, L. Liu, and M. Q. Zhang TRED: a Transcriptional Regulatory Element Database and a platform for in silico gene regulation studies Nucleic Acids Res., January 1, 2005; 33(suppl_1): D103 - D107. [Abstract] [Full Text] [PDF] |
||||
![]() |
Y. Kasai, S.-i. Hashimoto, T. Yamada, J. Sese, S. Sugano, K. Matsushima, and S. Morishita 5'SAGE: 5'-end Serial Analysis of Gene Expression database Nucleic Acids Res., January 1, 2005; 33(suppl_1): D550 - D552. [Abstract] [Full Text] [PDF] |
||||
![]() |
S. Tang, S. L. Tan, S. K. Ramadoss, A. P. Kumar, M.-H. E. Tang, and V. B. Bajic Computational method for discovery of estrogen responsive genes Nucleic Acids Res., December 1, 2004; 32(21): 6212 - 6217. [Abstract] [Full Text] [PDF] |
||||
| ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||











