Skip Navigation

This Article
Right arrow Abstract Freely available
Right arrow Print PDF (135K) Freely available
Right arrow Alert me when this article is cited
Right arrow Alert me if a correction is posted
Services
Right arrow Email this article to a friend
Right arrow Similar articles in this journal
Right arrow Similar articles in ISI Web of Science
Right arrow Similar articles in PubMed
Right arrow Alert me to new issues of the journal
Right arrow Add to My Personal Archive
Right arrow Download to citation manager
Right arrow Search for citing articles in:
ISI Web of Science (25)
Right arrowRequest Permissions
Right arrow Commercial Re-use Guidelines
for Open Access NAR Content
Google Scholar
Right arrow Articles by Baxevanis, A. D.
Right arrow Articles by Landsman, D.
Right arrow Search for Related Content
PubMed
Right arrow PubMed Citation
Right arrow Articles by Baxevanis, A. D.
Right arrow Articles by Landsman, D.
Social Bookmarking
 Add to CiteULike   Add to Connotea   Add to Del.icio.us  
What's this?

Nucleic Acids Research Pages 372-375  


Histone Sequence Database: new histone fold family members
Introduction
Database Content
The Histone Fold Motif
Database Availability
Acknowledgements
References


Histone Sequence Database: new histone fold family members

Histone Sequence Database: new histone fold family members

Andreas D. Baxevanis, David Landsman1,*

Genome Technology Branch, National Human Genome Research Institute, National Institutes of Health, Bethesda, MD 20892, USA and 1National Center for Biotechnology Information, National Library of Medicine, National Institutes of Health, Bethesda, MD 20894, USA

Received October 1, 1997; Accepted October 3, 1997

ABSTRACT

Searches of the major public protein databases with core and linker chicken and human histone sequences have resulted in the compilation of an annotated set of histone protein sequences. In addition, new database searches with two distinct motif search algorithms have identified several members of the histone fold family, including human DRAP1 and yeast CSE4. Database resources include information on conflicts between similar sequence entries in different source databases, multiple sequence alignments, links to the Entrez integrated information retrieval system, structures for histone and histone fold proteins, and the ability to visualize structural data through Cn3D. The database currently contains >1000 protein sequences, which are searchable by protein type, accession number, organism name, or any other free text appearing in the definition line of the entry. All sequences and alignments in this database are available through the World Wide Web at http://www.nhgri.nih.gov/DIR/GTB/HISTONES or http://www.ncbi.nlm.nih.gov/Baxevani/HISTONES

INTRODUCTION

The histone proteins play a critical role in the compaction of DNA into nucleosomes as well as in the overall organization of eukaryotic chromosomes (1,2). The four core histones (H2A, H2B, H3 and H4) form a tripartite, octameric assembly (3). The basic nature of these proteins, which have a high proportion of lysine and arginine residues, facilitates the wrapping of 146 bp of DNA around the octamer to form the elementary unit of compaction in eukaryotic DNA, the nucleosomal core particle (4). In turn, the linker histones (H1 and H5) can bind to the DNA between nucleosomes (5), thereby stabilizing the nucleosomes and promoting the formation of higher-order chromatin structures. Due to the central role of the histones within the cell, these proteins have been very highly conserved throughout evolution. This conservation is best illustrated by the >95% identity across all known H4 sequences. The core histones exhibit this degree of conservation throughout their entire length, while the linker histones are only conserved as such within their central, globular domain.

The folding and association of each of the core histones within the octamer complex is driven by a common structural motif called the histone fold (6). This motif, which has been defined as an extended helix-loop-helix domain, is also found in a number of non-histone proteins that, like the histones, are involved in protein-protein and protein-DNA interactions (7). Both the existence and conservation of an extensive protein domain such as the histone fold across a wide span of taxonomic groups argues for an evolutionary relationship between the histone fold proteins, as well as for a critical role of this domain in cellular metabolism.

In this paper, we describe the Histone Sequence Database, a compilation of all of the histone and histone fold protein sequences and structures available as of October 1997. This database is intended to be a source of sequence information for these chromosomal proteins, with particular reference to conflicts between similar sequence entries in different source databases. The database, which will be updated as new histone sequence entries are processed, represents the most comprehensive collection and annotation of histone primary sequences and histone fold-containing sequences assembled to date.

DATABASE CONTENT

The histone protein sequences were compiled by searching the non-redundant (nr) protein sequence database at NCBI using both the BLASTP (8) and PSI-BLAST (9) algorithms. The nr database is a compilation of entries from SWISS-PROT (10), the Protein Identification Resource (PIR) (11), the Protein Data Bank (PDB) and CDS translations from GenBank (12). In each case, the sequences from chicken and human sources were used as the basis for comparison. In the case of H5, no such histone exists in human, so only the chicken sequence was used. Manually added to the histone H1 sequence set was the sequence of Hh01p, a sequence discovered by TBLASTN searches against the complete yeast genome sequence in the Saccharomyces Genome Database. This sequence resides in an open reading frame on yeast chromosome XVI (13) and its ability to adopt the H1 structure has been confirmed by homology model building (unpublished results).

For each of the five histone classes, there are two protein sequence files (see Table 1 for statistics). The first file type contains all of the sequences found for that histone and is, therefore, redundant. The sequence data is presented in FASTA format, with the definition line for each entry containing, in order, the name of the source database, the accession number, the locus name or SWISS-PROT ID (as appropriate to the source database), a word description and a histone code. Each element on the definition line is separated by a vertical bar. The second file type contains only one entry for each unique sequence from a particular organism or variant thereof, making these files non-redundant. These sequences are also in FASTA format, with only a histone code appearing in the definition line. The histone codes can be used to cross-reference entries in the complete, redundant set of protein sequences.

In the course of the database searches, cases were noted where there were conflicts between the individual sequence entries for a given histone. In citing sequence conflicts, a majority-rule approach was used. In cases where there was no clear majority among the sequences, the differences are noted with respect to the entry in SWISS-PROT, where available. A pairwise sequence alignment, generated using CLUSTAL W (14), along with the sequence conflict information is presented for all entries where discrepancies were noted. Cases where the protein sequences are in fact correct but have been incorrectly identified are also noted.

Multiple sequence alignments for each histone protein are available in PostScript format for downloading or viewing with an appropriate PostScript translator. In each alignment, the major human histone sequence (chicken in the case of H5) is shown at the top of the alignment. The region of the histone fold motif, as described above, is boxed; alignment within the histone fold region was done manually. Regions outside the histone fold motif were aligned using CLUSTAL W (14). In the case of H1 and H5, the central, globular domain is boxed instead.

A search engine has been added with the current release of the database. The search engine allows users to amass entries present within different files by histone type, organism, or data set. A free-text search is also available for the complete (redundant) data set, allowing users to search for any text present on the definition line of the individual entries. Search results are returned in FASTA format.

Table 1. Histone Sequence Database statistics
  Total Non-Redundant Structure
  sequence set sequence set  
Histone H1 208 77 1
Histone H2A 224 78 0
Histone H2B 207 72 0
Histone H3 226 82 0
Histone H4 179 59 0
Histone H5 13 5 1
Histone Octamer     1
Nucleosomal Core Particle     1
Total Histone Entries 1057 373 4
Histone Fold Proteins 47   2

THE HISTONE FOLD MOTIF

The original sequence-based identification of non-histone proteins containing the histone fold motif was based on the Motif Search Tool (MoST) (15). Recently, a new method called PROBE (16) has been developed which, through a different algorithm, also detects subtly-conserved sequence patterns. Both of these methods were used to re-examine the protein databases to detect new members of the histone fold family. As a result of these new searches, several new and biologically interesting proteins have been added to the histone fold family (Fig. 1).


Figure 1 Multiple sequence alignment of proteins containing the histone fold motif. Sequences were detected using the MoST (15) and PROBE (16) motif search algorithms. The abbreviation for each protein corresponds to those found in the Histone Sequence Database under the Histone Fold link. The major sequence for all four human core histones are shown at the top of the alignment. At each position, residues in agreement with any of the four core histones are shown in inverse type. ALSCRIPT (26) was used to format the final alignment.

Among the new family members is DRAP1, which associates with the TATA-binding protein-associated phosphoprotein DR1, itself a member of the histone fold family. DRAP1 and DR1 are capable of forming heterodimers, potentially through their histone fold motifs, and the association of DRAP1 with DR1 stabilizes the entire DRAP1-DR1-TATA complex, blocking the entry of TFIIA and/or TFIIB to preinitiation complexes (17). The chromatin-associated protein CSE4 from Saccharomyces cerevisiae has also been added to the alignment. CSE4 is essential for cell division, and mutations in CSE4 have been shown to increase non-disjunction of chromosomes bearing mutant centromeric DNA sequences (18). High-copy CSE4 has also been found to suppress the temperature sensitivity of lethal H4 mutants defective in mitotic chromosome transmission (19).

The sequence for transcription factor IIB (TFIIB) was manually deleted from the sequence set returned by the motif search methods. Recently, the crystal structure of a preinitiation complex from the archaean Pyrococcus woesei was determined at a resolution of 2.1 Å (PDB:1AIS) (20). Based on comparisons with previously-solved structures of other histone fold proteins (21-23), it was determined that, despite sequence information to the contrary, TFIIB does not form the histone fold structure.

A pair of Web pages containing sequence data from non-histone proteins identified as containing the histone fold motif can be found within the Histone Fold section of the Database. One page contains the complete sequence of the protein, while the second contains only the histone fold motif portion of the sequence. These files are both in FASTA format. In addition, multiple sequence alignments of these histone fold motif are available in PostScript format. With respect to structures, information is provided for all histone and histone fold proteins for which three-dimensional coordinate data has been deposited and is available through PDB. If the coordinate data has been released, users can link to both MMDB and PDB to retrieve the files, as described below.

DATABASE AVAILABILITY

The Histone Sequence Database is available through the World Wide Web at either:
http://www.nhgri.nih.gov/DIR/GTB/HISTONES or
http://www.ncbi.nlm.nih.gov/Baxevani/HISTONES

A menu bar appears to the left of each page, allowing users to easily navigate the Web site without having to return to the home page to examine different parts of the site. In order to increase the utility of the database, hyperlinks have been integrated into all of the FASTA-formatted sequence files comprising the complete (redundant) data set. Clicking on the accession number for a particular entry allows the user to view the NCBI Entrez document report for that entry. In most cases, these document reports include links to other relevant data, such as to literature citations in MEDLINE (PubMed), related sequence entries in GenBank, and the Molecular Modeling Database (MMDB). Hyperlinks from the table of structures connect to the MMDB and PDB structure entries for that protein. From the MMDB entry, users can view the structure itself using Cn3D (24), a molecular viewing application that is bundled with Network Entrez and can be downloaded by following hyperlinks on any structure entry page. In this fashion, users can take advantage of the integrated nature of the Entrez retrieval system to gather large amounts of information on a particular sequence or set of sequences (25).

Database flatfiles can also be downloaded directly from the public FTP site at NCBI (ncbi.nlm.nih.gov, directory /pub/baxevanis/histones). There are two FASTA-format protein files for each major histone type, corresponding to the complete or redundant sequence set (*.raw) and to the non-redundant sequence set (*.nr). The format of the definition lines for each entry are as described for the Web site, above. The histone codes used in the definition lines of these entries are in a text file (codes.txt) in the same directory. Two FASTA-format files of the histone fold protein sequences are also in this directory: hf_seqs.txt contains the complete sequence of each protein, while hf_motif.txt contains only the histone fold motif portion of each sequence.

Studies utilizing the data within this database, obtained either through the World Wide Web site or the anonymous FTP site, should cite this paper as the primary reference.

ACKNOWLEDGEMENTS

We would like to thank Erik Ferlanti for his assistance in designing the new graphical front-end for the Database and developing the newly-added sequence search engine.

REFERENCES

1. van Holde,K.E. (1989) Chromatin, Springer-Verlag, New York.

2. Wolffe,A. (1992) Chromatin: Structure and Function, Academic Press, San Diego.

3. Eickbush,T.H. and Moudrianakis,E.N. (1978) Biochemistry, 17, 4955-4964. MEDLINE Abstract

4. Kornberg,R. and Thomas,J.O. (1974) Science, 184, 865-868. MEDLINE Abstract

5. Noll,M. and Kornberg,R.D. (1977) J. Mol. Biol., 109, 393-404. MEDLINE Abstract

6. Arents,G. and Moudrianakis,E.N. (1993) Proc. Natl. Acad. Sci. USA, 90, 10489-10493. MEDLINE Abstract

7. Baxevanis,A.D., Arents,G., Moudrianakis,E.N. and Landsman,D. (1995) Nucleic Acids Res., 23, 2685-2691. MEDLINE Abstract

8. Altschul,S.F., Gish,W., Miller,W., Myers,E.W. and Lipman,D.J. (1990) J. Mol. Biol., 215, 403-410. MEDLINE Abstract

9. Altschul,S.F., Madden,T.L., Schaffer,A.A., Zhang,J., Zhang,Z., Miller,W. and Lipman,D.J. (1997) Nucleic Acids Res., 25, 3389-3402. MEDLINE Abstract

10. Bairoch,A. and Apweiler,R. (1997) Nucleic Acids Res., 25, 31-36. [See also this issue Nucleic Acids Res. (1998) 26, 38-42.]

11. George,D.G., Dodson,R.J., Garavelli,J.S., Haft,D.H., Hunt,L.T., Marzec,C.R., Orcutt,B.C., Sidman,K.E., Srinivasarao,G.Y., Yeh,L.-S.L., Arminski,L.M., Ledley,R.S., Tsugita,A. and Barker,W.C. (1997) Nucleic Acids Res., 25, 24-27. [See also this issue Nucleic Acids Res. (1998) 26, 27-32.]

12. Benson,D., Boguski,M., Lipman,D.J. and Ostell,J. (1997) Nucleic Acids Res., 25, 1-6. [See also this issue Nucleic Acids Res. (1998) 26, 1-7.]

13. Landsman,D. (1996) Trends Biochem. Sci., 21, 287-288. MEDLINE Abstract

14. Higgins,D.G., Bleasby,A.J. and Fuchs,R. (1992) Comput. Appl. Biosci., 8, 189-191. MEDLINE Abstract

15. Tatusov,R.L., Altschul,S.F. and Koonin,E.V. (1994) Proc. Natl. Acad. Sci. USA, 91, 12091-12095. MEDLINE Abstract

16. Neuwald,A.F., Liu,J., Lipman,D. and Lawrence,C. (1997) Nucleic Acids Res., 25, 1665-1677. MEDLINE Abstract

17. Mermelstein,F., Yeung,K., Cao,J., Inostroza,J., Erdjument-Bromage,H., Landsman,D., Levitt,P., Tempst,P. and Reinberg,D. (1996) Genes Dev., 10, 1033-1048. MEDLINE Abstract

18. Stoler,S., Keith,K., Curnick,K. and Fitzgerald-Hayes,M. (1995) Genes Dev., 9, 573-586. MEDLINE Abstract

19. Smith,M.M., Yang,P., Santisteban,M.S., Boone,P.W, Goldstein,A.T and Megee,P.C. (1996) Mol. Cell. Biol., 16, 1017-1026. MEDLINE Abstract

20. Kosa,P., Ghosh,G., DeDecker,B. and Sigler,P. (1997) Proc. Natl. Acad. Sci. USA, 94, 6042-6047. MEDLINE Abstract

21. Arents,G. and Moudrianakis,E. (1995) Proc. Natl. Acad. Sci. USA, 92, 11170-11174. MEDLINE Abstract

22. Starich,M.R., Sandman,K., Reeve,J.N. and Summers,M.F. (1996) J. Mol. Biol., 255, 187-203. MEDLINE Abstract

23. Xie,X., Kokubo,T., Cohen,S.L., Mirza,U.A., Hoffman,A., Chait,B.T., Roeder,R.G., Nakatani,Y. and Burley,S.K. (1996) Nature, 380, 316-322. MEDLINE Abstract

24. Hogue,C.W.V., Ohkawa,H. and Bryant,S.H. (1996) Trends Biochem. Sci., 21, 226-229.

25. Baxevanis,A. (1998) In Baxevanis,A., and Ouellette,B. (eds), Bioinformatics: A Practical Guide to the Analysis of Genes and Proteins. John Wiley & Sons, New York.

26. Barton,G.J. (1993) Protein Engng, 6, 37-40.


*To whom correspondence should be addressed. Tel: +1 301 435 5981; Fax: +1 301 480 9241; Email: landsman@ncbi.nlm.nih.gov


This page is run by Oxford University Press, Great Clarendon Street, Oxford OX2 6DP, as part of the OUP Journals Comments and feedback: www-admin{at}oup.co.uk
Last modification: 17 Dec 1997
Copyright© Oxford University Press, 1998.

Add to CiteULike CiteULike   Add to Connotea Connotea   Add to Del.icio.us Del.icio.us    What's this?


This article has been cited by other articles:


Home page
Nucleic Acids ResHome page
Y. Zhang, J. Lv, H. Liu, J. Zhu, J. Su, Q. Wu, Y. Qi, F. Wang, and X. Li
HHMD: the human histone modification database
Nucleic Acids Res., November 5, 2009; (2009) gkp968v1.
[Abstract] [Full Text] [PDF]


Home page
Nucleic Acids ResHome page
Y. Tsunaka, N. Kajimura, S.-i. Tate, and K. Morikawa
Alteration of the nucleosomal DNA path in the crystal structure of a human nucleosome core particle
Nucleic Acids Res., June 10, 2005; 33(10): 3424 - 3434.
[Abstract] [Full Text] [PDF]


Home page
Cold Spring Harb Symp Quant BiolHome page
A.A. FRANCO and P.D. KAUFMAN
Histone Deposition Proteins: Links between the DNA Replication Machinery and Epigenetic Gene Silencing
Cold Spring Harb Symp Quant Biol, January 1, 2004; 69(0): 201 - 208.
[Abstract] [PDF]


Home page
Mol. Cell. Biol.Home page
S. Muratoglu, S. Georgieva, G. Papai, E. Scheer, I. Enunlu, O. Komonyi, I. Cserpan, L. Lebedeva, E. Nabirochkina, A. Udvardy, et al.
Two Different Drosophila ADA2 Homologues Are Present in Distinct GCN5 Histone Acetyltransferase-Containing Complexes
Mol. Cell. Biol., January 1, 2003; 23(1): 306 - 321.
[Abstract] [Full Text]


Home page
Plant CellHome page
W. Song, H. Solimeo, R. A. Rupert, N. S. Yadav, and Q. Zhu
Functional Dissection of a Rice Dr1/DrAp1 Transcriptional Repression Complex
PLANT CELL, January 1, 2002; 14(1): 181 - 195.
[Abstract] [Full Text] [PDF]


Home page
Mol. Cell. Biol.Home page
A. Hernández-Hernández and A. Ferrús
Prodos Is a Conserved Transcriptional Regulator That Interacts with dTAFII16 in Drosophila melanogaster
Mol. Cell. Biol., January 15, 2001; 21(2): 614 - 623.
[Abstract] [Full Text]


Home page
Nucleic Acids ResHome page
F. Bolognese, C. Imbriano, G. Caretti, and R. Mantovani
Cloning and characterization of the histone-fold proteins YBL1 and YCL1
Nucleic Acids Res., October 1, 2000; 28(19): 3830 - 3838.
[Abstract] [Full Text] [PDF]


Home page
Nucleic Acids ResHome page
J. D. Thompson, F. Plewniak, J.-C. Thierry, and O. Poch
DbClustal: rapid and reliable global multiple alignments of protein sequences detected by database searches
Nucleic Acids Res., August 1, 2000; 28(15): 2919 - 2926.
[Abstract] [Full Text] [PDF]


This Article
Right arrow Abstract Freely available
Right arrow Print PDF (135K) Freely available
Right arrow Alert me when this article is cited
Right arrow Alert me if a correction is posted
Services
Right arrow Email this article to a friend
Right arrow Similar articles in this journal
Right arrow Similar articles in ISI Web of Science
Right arrow Similar articles in PubMed
Right arrow Alert me to new issues of the journal
Right arrow Add to My Personal Archive
Right arrow Download to citation manager
Right arrow Search for citing articles in:
ISI Web of Science (25)
Right arrowRequest Permissions
Right arrow Commercial Re-use Guidelines
for Open Access NAR Content
Google Scholar
Right arrow Articles by Baxevanis, A. D.
Right arrow Articles by Landsman, D.
Right arrow Search for Related Content
PubMed
Right arrow PubMed Citation
Right arrow Articles by Baxevanis, A. D.
Right arrow Articles by Landsman, D.
Social Bookmarking
 Add to CiteULike   Add to Connotea   Add to Del.icio.us  
What's this?