Nucleic Acids Research, 2000, Vol. 28, No. 1 320-322
© 2000 Oxford University Press
The Histone Database: a comprehensive WWW resource for histones and histone fold-containing proteins
Computational Biology Branch, National Center for Biotechnology Information, National Library of Medicine, National Institutes of Health, Building 38A Room 8N805, Bethesda, MD 20894, USA, 1Department of Biology, Texas A&M University, College Station, TX 70843, USA and 2Genome Technology Branch, National Human Genome Research Institute, National Institutes of Health, Bethesda, MD 20892-4470, USA
Received October 4, 1999; Accepted October 6, 1999.
| ABSTRACT |
|---|
|
|
|---|
The Histone Database (HDB) is an annotated and searchable collection of all full-length sequences and structures of histone and non-histone proteins containing the histone fold motif. These sequences are both eukaryotic and archaeal in origin. Several new histone fold-containing proteins have been identified, including Spt7p, and a few false positives have been removed from the earlier version of HDB. Database contents include compilations of post-translational modifications for each of the core and linker histones, as well as genomic information in the form of map loci for the human histone gene complement, with the genetic loci linked to Online Mendelian Inheritance in Man (OMIM). Conflicts between similar sequence entries from a number of source databases are also documented. Newly added to the HDB are multiple sequence alignments in which predicted functions of histone fold amino acid residues are annotated. The database is freely accessible through the WWW at http://genome.nhgri.nih.gov/histones/
| INTRODUCTION |
|---|
|
|
|---|
Histone proteins play a primary role in the compaction and accessibility of eukaryotic genomic DNA, and probably archaeal genomic DNA as well (1,2). Two molecules of each of the four core histonesH2A, H2B, H3 and H4form an octamer around which ~146 bp of DNA are wrapped in repeating units called nucleosomes (3). The main chain and sidechains of the basic octameric histones establish hydrogen and ionic bonds with the negatively-charged phosphate backbone of DNA to effect nucleosomal packaging (4). Linker histones (H1 and H5) bind internucleosomal DNA and promote higher-order organization of chromatin (reviewed in 5); a function in which they may be aided by protruding domains of the core histones (4). Core histone sequences have been extraordinarily well conserved across evolution, indicating that there are strict structural constraints on histone function. Nevertheless, there has been enough latitude to allow the evolution of a handful of variant subclasses for each histone. Some of these variants are expressed in a tissue- or developmental stage-specific manner, indicating a specialized function.
A core histone structural motif dubbed the histone fold, consisting of three tandem
-helices connected by two short ß-strand regions, is the primary site of histonehistone and histoneDNA binding (6,7). The histone fold also has been identified in a number of eukaryotic non-histone proteins, most of which are involved in functions related to DNA metabolism via proteinprotein and proteinDNA interactions (8). These include several TATA-binding protein-associated factors (TAFs), which are components of the TFIID basal transcription complex (reviewed in 9). Histone fold-containing proteins have also been identified in archaea, which show many similarities to eukaryotes in their DNA replication and gene expression machinery (10).
The important roles played across the evolutionary spectrum by histones and histone-like proteins, and the proliferation of such sequences in various public databases, have led us to create and maintain a Web-based resource devoted exclusively to them. The Histone Database (HDB) represents a collection of all histone and histone fold-containing sequences available as of October 1999, with links for each to its GenBank flatfile and, where available, to its entry in a database of solved three-dimensional structures. The site also includes information on post-translational modifications extracted from database annotations for these proteins, human genomic locus information and sequence alignments in which the histone fold amino acid residues are functionally annotated according to the most recent crystallographic studies of the octamerDNA complex (4,11), as well as being color-coded by physicochemical properties.
| DATABASE CONTENTS |
|---|
|
|
|---|
The database is divided into 10 subject areas.
(i) Background and summary data, including the primary reference (this paper), the protein databases searched, and a tabulation of the number of sequences, structures and gene loci included in the database.
(ii) A search engine for the database. Selectable search parameters include protein type, sequence set, organism, definition line keyword or sequence pattern.
(iii) All of the eukaryotic histone protein sequences in the database in FASTA format.
(iv) A non-redundant set of the same sequences in FASTA format.
(v) All of the archaeal and non-histone protein sequences in the database, in FASTA format. Both the complete sequences and the histone fold regions alone are available for downloading as FASTA libraries.
(vi) Multiple protein sequence alignments of the full-length core and linker histones, and of the histone fold regions of archaeal and non-histone proteins, performed using CLUSTALW (12) and rendered in downloadable PostScript format. Histone fold residues are color-coded by physicochemical criteria (e.g. polarity, acidity or basicity), and annotated for proteinprotein and proteinDNA binding functionality.
(vii) A table of histone and histone fold protein structures available in three-dimensional structure databases. For each structure accession number, links to the Protein Data Bank (PDB; http://www.rcsb.org/pdb/ ) and the Molecular Modelling Database (MMDB; http:www.ncbi.nlm.nih.gov/Structure/MMDB/mmdb.html ) are provided, along with the protein name and source organism. We also provide a link to a molecular structure viewer (Cn3D; http://www.ncbi.nlm.nih.gov/Structure/CN3D/cn3d.html ).
(viii) A summary of post-translational modifications to histone proteins, rendered as multiple alignments with the modifications color-coded by type.
(ix) A graphical view of the seven human chromosomes (I, IV, VII, XI, XVII, XXII) which contain the human histone gene complement. Color-coded histone locus markers on each chromosome are linked to a table providing the histone name, the OMIM (Online Mendelian Inheritance in Man) accession number and chromosome map location.
(x) A list of discrepancies between multiple entries of the same sequence in primary databases such as GenBank.
| DETECTION OF NOVEL HISTONE FOLD PROTEINS AND IMPLICATIONS FOR CHROMATIN ORGANIZATION |
|---|
|
|
|---|
Using more modern profile searching methods, we have extended and revised the collection of histone fold-containing sequences. The use of the powerful profile search method PSI-BLAST (13) (http://www.ncbi.nlm.nih.gov/BLAST ) vastly improves the ability to detect subtle members of the histone fold superfamily. Using the minimal histone fold represented by the histones from archaea as seeds for PSI-BLAST searches (inclusion threshold e-value 0.01; searches run to convergence), profiles were prepared for the detection of new histone fold proteins. These profiles were then used to search individual complete genomes of yeast and Caenorhabditis elegans. The newly found members were used in subsequent searches to detect homologs in other eukaryotes. Further, a hidden Markov model was made from the alignment of the originally-identified histone fold domains using the HMMER2 package (14) (http://hmmer.wustl.edu/ ) and was used to search the protein sequences from different eukaryotes. This confirmed most of the findings obtained from the PSI-BLAST analysis. As a result, several previously unrecognized histone fold proteins were identified (Fig. 1). These proteins include the Spt7p and YPL011c protein from Saccharomyces cerevisiae, Bip2 (POZ domain protein Bric-a-Brac binding protein 2) and Prodos from Drosophila melanogaster and several related proteins from Arabidopsis thaliana and other animals (Fig. 1). One notable feature was the co-occurrence of the histone fold with other modules in some these polypeptides (Fig. 2). This modular organization had previously been observed only in Son of Sevenless (8), macroH2A (15) and a C.elegans gene product predicted to be a chromosomal protein (16).
|
|
An analysis of these additional modules reveals additional evidence for a potential role for the histone fold proteins in the organization of chromatin structure. As shown in Figure 2, the histone fold is combined with a variety of other protein domain modules [e.g. PHD fingers (17), AT hooks (16) involved in DNA binding, the bromodomain that binds acetyl-lysine containing peptides (18,19) and the POZ domains that mediate homophilic interactions in transcription factors and chromosomal proteins (20)]. Functional clues for such multidomain histone fold proteins can be derived from Spt7p in S.cerevisiae (Figs 1 and 2) which is part of a multiprotein chromatin remodelling complex, named SAGA (21). The SAGA complex possesses Gcn5p-dependent histone acetylase activity and contains another histone fold protein, Spt3p (21). Mutational analysis has shown that the SPT7 gene is central for the integrity and function of this complex (22). The multiple domains suggest that Spt7p acts as an adaptor, forming a nucleosome-like structure (probably in collaboration with the Spt3p family of proteins) using its histone fold while binding to acetylated peptides through the bromodomain. This nucleosome-like structure could provide the basis for the association of the SAGA complex with chromatin. Similar alternate nucleosome-like structures could be formed by other multidomain proteins like CCA3, BIP2 and C11G6.1 from C.elegans that could additionally bind DNA with their alternative DNA binding domains or could organize protein complexes with the POZ and ankyrin repeats (Fig. 2). Thus, the identification of novel histone fold proteins provides a tool to further investigate the role of alternative nucleosome-like structures in the assembly of chromosomal protein complexes.
| DATABASE AVAILABILITY |
|---|
|
|
|---|
The HDB is available on the WWW at http://genome.nhgri.nih.gov/histones/ . Studies utilizing this database should cite this paper as the primary reference.
| FOOTNOTES |
|---|
* To whom correspondence should be addressed. Tel: +1 301 435 5981; Fax: +1 301 480 9241; Email: landsman@ncbi.nlm.nih.gov
| REFERENCES |
|---|
|
|
|---|
-
1 Kornberg,R.D. and Thomas,J.O. (1974) Science, 184, 865868.
2 Pereira,S.L., Grayling,R.A., Lurz,R. and Reeve,J.N. (1997) Proc. Natl Acad. Sci. USA, 94, 1263312637.
3 Thomas,J.O. and Kornberg,R.D. (1975) Proc. Natl Acad. Sci. USA, 72, 26262630.
4 Luger,K., Mader,A.W., Richmond,R.K., Sargent,D.F. and Richmond,T.J. (1997) Nature, 389, 251260.[Medline]
5 Ramakrishnan,V. (1997) Crit. Rev. Eukaryot. Gene Exp., 7, 215230.[ISI][Medline]
6 Arents,G., Burlingame,R.W., Wang,B.C., Love,W.E. and Moudrianakis,E.N. (1991) Proc. Natl Acad. Sci. USA, 88, 1014810152.
7 Arents,G. and Moudrianakis,E.N. (1995) Proc. Natl Acad. Sci. USA, 92, 1117011174.
8 Baxevanis,A.D., Arents,G., Moudrianakis,E.N. and Landsman,D. (1995) Nucleic Acids Res., 23, 26852691.
9 Burley,S.K. and Roeder,R.G. (1996) Annu. Rev. Biochem., 65, 769799.[ISI][Medline]
10 Makarova,K.S., Aravind,L., Galperin,M.Y., Grishin,N.V., Tatusov,R.L., Wolf,Y.I. and Koonin,E.V. (1999) Genome Res., 9, 608628.
11 Luger,K. and Richmond,T.J. (1998) Curr. Opin. Struct. Biol., 8, 3340.[ISI][Medline]
12 Thompson,J.D., Higgins,D.G. and Gibson,T.J. (1994) Nucleic Acids Res., 22, 46734680.
13 Altschul,S.F., Madden,T.L., Schaffer,A.A., Zhang,J., Zhang,Z., Miller,W. and Lipman,D.J. (1997) Nucleic Acids Res., 25, 33893402.
14 Eddy,S.R. (1998) Bioinformatics, 14, 755763.
15 Pehrson,J.R. and Fried,V.A. (1992) Science, 257, 13981400.
16 Aravind,L. and Landsman,D. (1998) Nucleic Acids Res., 26, 44134421.
17 Aasland,R., Gibson,T.J. and Stewart,A.F. (1995) Trends Biochem. Sci., 20, 5659.[ISI][Medline]
18 Haynes,S.R., Dollard,C., Winston,F., Beck,S., Trowsdale,J. and Dawid,I.B. (1992) Nucleic Acids Res., 20, 2603.
19 Dhalluin,C., Carlson,J.E., Zeng,L., He,C., Aggarwal,A.K. and Zhou,M.M. (1999) Nature, 399, 491496.[Medline]
20 Aravind,L. and Koonin,E.V. (1999) J. Mol. Biol., 285, 13531361.[ISI][Medline]
21 Grant,P.A., Duggan,L., Cote,J., Roberts,S.M., Brownell,J.E., Candau,R., Ohba,R., Owen-Hughes,T., Allis,C.D., Winston,F., Berger,S.L. and Workman,J.L. (1997) Genes Dev., 11, 16401650.
22 Sterner,D.E., Grant,P.A., Roberts,S.M., Duggan,L.J., Belotserkovskaya,R., Pacella,L.A., Winston,F., Workman,J.L. and Berger,S.L. (1999) Mol. Cell. Biol., 19, 8698.
23 Ponting,C.P., Schultz,J., Milpetz,F. and Bork,P. (1999) Nucleic Acids Res., 27, 229232. Updated article in this issue: Nucleic Acids Res. (2000) 28, 231234.
This article has been cited by other articles:
![]() |
T. Fries, C. Betz, K. Sohn, S. Caesar, G. Schlenstedt, and S. M. Bailer A Novel Conserved Nuclear Localization Signal Is Recognized by a Group of Yeast Importins J. Biol. Chem., July 6, 2007; 282(27): 19292 - 19301. [Abstract] [Full Text] [PDF] |
||||
![]() |
G. Zhu, H. Chen, B.-K. Choi, F. Del Piero, and D. M. Schifferli Histone H1 Proteins Act As Receptors for the 987P Fimbriae of Enterotoxigenic Escherichia coli J. Biol. Chem., June 17, 2005; 280(24): 23057 - 23065. [Abstract] [Full Text] [PDF] |
||||
![]() |
S. Sullivan, D. W. Sink, K. L. Trout, I. Makalowska, P. M. Taylor, A. D. Baxevanis, and D. Landsman The Histone Database Nucleic Acids Res., January 1, 2002; 30(1): 341 - 342. [Abstract] [Full Text] [PDF] |
||||
![]() |
J. Kirchner, S. L. Sanders, E. Klebanow, and P. A. Weil Molecular Genetic Dissection of TAF25, an Essential Yeast Gene Encoding a Subunit Shared by TFIID and SAGA Multiprotein Transcription Factors Mol. Cell. Biol., October 1, 2001; 21(19): 6668 - 6680. [Abstract] [Full Text] [PDF] |
||||
![]() |
Y.-G. Gangloff, S. L. Sanders, C. Romier, D. Kirschner, P. A. Weil, L. Tora, and I. Davidson Histone Folds Mediate Selective Heterodimerization of Yeast TAFII25 with TFIID Components yTAFII47 and yTAFII65 and with SAGA Component ySPT7 Mol. Cell. Biol., March 1, 2001; 21(5): 1841 - 1853. [Abstract] [Full Text] |
||||
| ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||




