The complete nucleotide sequence of bacteriophage HP1 DNA
The complete nucleotide sequence of bacteriophage HP1 DNA
Dominic
Esposito
,
Wayne P.
Fitzmaurice
+
,
Robert C.
Benjamin
w
,
Steven D.
Goodman
W
,
Alan S.
Waldman
]
and
John J.
Scocca*
Department of Biochemistry, The Johns Hopkins University School of Hygiene and
Public Health,
615 North Wolfe Street,
Baltimore
, MD 21205,
USA
Received February 20, 1996
;
Revised and Accepted April 22, 1996
GenBank accession no. U24159
ABSTRACT
The complete nucleotide sequence of the temperate phage HP1 of
Haemophilus influenzae
was determined. The phage contains a linear, double-stranded genome of 32 355 nt with cohesive termini. Statistical methods
were used to identify 41 probable protein coding segments organized into five
plausible transcriptional units. Regions encoding proteins involved in
recombination, replication, transcriptional control, host cell lysis and phage
production were identified. The sizes of proteins in the mature HP1 particle
were determined to assist in identifying genes for structural proteins.
Similarities between HP1 coding sequences and those in databases, as well as
similar gene organizations and control mechanisms, suggest that HP1 is a member
of the P2-like phage family, with strong similarities to coliphages P2 and 186 and
some similarity to the retronphage Ec67.
INTRODUCTION
HP1 is a temperate bacteriophage which infects and lysogenizes
Haemophilus influenzae
Rd. It was the first phage identified for this host (
1
) and it and temperature-sensitive mutants have been used to study repair and recombination in
H.influenzae
(
2
). The HP1 particle consists of a small icosahedral head and a relatively large
and complex contractile tail (
3
). The genome is a single unique duplex DNA molecule with cohesive ends (
4
). In the lysogenic state, the circularized HP1 genome is inserted into the host
chromosome at a single site (
5
), conforming to the Campbell model for maintenance of the prophage (
6
). Earlier we constructed a physical map of HP1 DNA and located several
mutations on this map (
7
). The nucleotide sequences of two genomic segments were determined; together,
these two blocks constitute ~50% of the HP1 genome. The first sequence was used to locate the targets
for specific DNA-mediated transformation and their distribution in relation to coding
segments (
8
). The sequence of the second segment was determined to support studies on the
site-specific recombination reactions involved in formation and induction of
HP1 lysogens (
9
-
11
). Three HP1 genes have been identified in this region: the
attP
site and the genes encoding HP1 integrase and the regulator Cox. The sequences
of these genes and their arrangement clearly suggested that HP1 was related to
the P2-186 group of temperate coliphages. HP1 integrase is remarkably similar in
sequence to the integrase of 186, especially in the C-terminal region (
9
). The accessory protein needed to activate excisive recombination, Cox, appears
to serve the additional function of repressing the production of the lysogenic
repressor; this dual function is a characteristic of the P2-186 group. Furthermore, the presumptive early control elements in HP1
appeared to be organized like those of P2 and 186 (
12
,
13
).
To further our understanding of the biology of HP1, we determined the sequence
of the entire 32 kb genome. Sequence comparisons and initial experiments were
used to make plausible identifications of many of the open reading frames
(ORFs) encoded by HP1 DNA. The deduced sequences of these probable gene
products are quite similar to those encoded by P2 and 186 and the retronphage
Ec67. These results strongly support the assignment of HP1 to the P2 family of
temperate bacteriophage.
MATERIALS AND METHODS
Bacteria
Escherichia coli
DH5[alpha] was used to propagate plasmids; strains were grown in solid or liquid LB
medium supplemented with antibiotics as needed (
14
).
Haemophilus influenzae
L-10 (lysogenic for HP1) was grown as described (
7
).
Plasmids
Plasmids derived from pBR322 containing
Hae
III fragments of HP1 DNA have been described (
8
). Derivatives of pUC19 containing the HP1
Pst
I fragments were described earlier (
11
).
To obtain HP1 DNA inserts with internal
Pst
I sites, purified phage DNA was joined at the cohesive ends by ligation and then
digested with
Bam
HI and
Bgl
II; the resulting fragments were inserted into the
Bam
HI site of pUC19 and transformant colonies containing the desired plasmids
isolated. The locations of the
Bgl
I,
Bgl
II,
Dra
I,
Ssp
I,
Hae
III,
Kpn
I,
Eco
RI and
Hin
dIII sites were determined to permit identification of inserts.
Subclones for sequence determination
To produce fragments suitable for direct sequencing, plasmids containing the
Pst
I fragments were digested with both
Dra
I and
Ssp
I. Fragments containing HP1 DNA, ranging from 150-800 bp, were ligated into pUC19 which had been linearized with
Sma
I. Clones were selected based on their sizes and restriction maps; both
orientations of a given insert were used. When inserts were too large for
convenient sequencing, the fragment was subcloned further by digestion with
additional restriction enzymes. Subclones were also created using single
digests with
Dra
I or
Ssp
I to ensure that all restriction site boundaries could be sequenced. Similar
subclone panels were prepared from plasmids containing the
Bam
HI-
Bgl
II or
Hae
III fragments as needed.
Plasmid DNA was purified with QiaPrep kits (Qiagen Inc.) and Wizard Clean-up columns (Promega). Phage DNA was purified as described (
7
).
Sequence determinations
Parts of the sequence were determined manually using dideoxy termination (
15
) and Sequenase enzyme (United States Biochemical), but the majority of the new
sequence reported here was determined using an Applied Biosystems Model 373A
automated sequencer operated by the Johns Hopkins Medical Institutions CORE DNA
Sequencing Facility. In most cases the M13 universal forward (5'-CCCAGTCACGACGTTGTAAAACG) and reverse (5'-AGCGGATAACAATTTCACACAGG) primers were used and in many
cases these were sufficient to determine the sequences of both strands of a
given insert. When necessary, sequencing reactions were primed with one of 44
oligonucleotide primers with sequences corresponding to internal positions in
HP1 DNA; the sequences of these oligonucleotides will be provided on request.
Where inconsistencies between two different determinations of the same segment
were encountered, the region in question was sequenced using the polymerase
chain reaction (
16
) with HP1 DNA as the template and a pair of primers from the panel of 44 above.
DNA sequence data was archived and manipulated with MacDNASIS 3.0 (Hitachi,
Inc.) software. The complete sequence data have been submitted to GenBank
(accession no. U24159). Programs used for genome analysis were written in
Pascal and compiled using the Think Pascal (Symantec Inc.) environment for the
Apple Macintosh. PROMSEARCH is based on the promoter tables of Hawley and
McClure (
17
) and the search algorithm of Staden (
18
); The ORFSEARCH program scans input DNA for the presence of ORFs and then
calculates codon correlation values using the algorithm of Stormo (
19
). RBSSCAN scans a DNA sequence and calculates ribosome binding site scores
based on an input matrix derived from a Perceptron algorithm (
20
). These programs are available as part of the MacGUMBY package (GDE
Enterprises).
Cloning of the HP1
rep
gene
Primers flanking the HP1
rep
gene were used to amplify a segment of DNA by PCR which contained the complete
gene as well as an artificial ribosome binding site (AGGAGGTAATATAAATG) and
restriction sites suitable for cloning purposes. The amplified segment (bp 5732-8059) was cut with
Hin
dIII and
Eco
RI and ligated into pUC19. The resulting pREP plasmid was transformed into
H.influenzae
Rd by electroporation and selected on 20 [mu]g/ml ampicillin. As controls, pUC19 alone and pHPC414 (pUC19 with non-
rep
segments of the HP1 genome;
11
) were also transformed under similar conditions.
Purification of HP1 phage and analysis of phage proteins
Cultures of
H.influenzae
L-10 (2 * 10
9
c.f.u./ml) were treated with mitomycin C (35 ng/ml) to induce HP1 production.
Phage were purified as previously described (
7
). Purified infectious phage banded at a density of 1.42 g/ml and DNA-containing head particles (without tails) were recovered at a greater
density. These were dialyzed to remove CsCl and then disrupted in loading
buffer containing 1% SDS. Proteins were separated on SDS-polyacrylamide gels and stained with Coomassie Brilliant Blue R (
14
). Dried stained gels were imported into an Apple Macintosh computer using an
Epson ES-1200C scanner and protein bands were quantitated with the MACBAS 2.0
program (Fuji BioImaging Systems). Molecular weights were calculated by
distance measurement in relation to known molecular weight standards and band
intensities were corrected for the mass of each protein to determine relative
abundances.
RESULTS AND DISCUSSION
Determination of DNA sequence
The HP1 genomic segment between 51 and 79% has been sequenced using chemical
termination (GenBank accession no. M12911) and the sequence of the 6.5 kb
segment at the left end (0-20%) of the HP1 genome has been reported as well (GenBank accession no.
U06847) (
11
). The balance was determined by a combination of automated and manual methods,
using dideoxy termination, as described in Materials and Methods. The complete
sequence contains 32 355 bp of double-stranded DNA with complementary 7 bp 5'-single-stranded cohesive ends (
4
). The HP1 genome has a G + C content of 39 mol%, a value identical to that of
the host
H.influenzae
(
21
). The complete sequence has been deposited in GenBank under the single
accession no. U24159.
Accuracy of the sequence
Considerable effort was devoted to minimizing mistakes and eliminating prior
errors. Each residue was identified at least once on each strand; the average
number of determinations per residue was ~3.2. Earlier sequence assignments based on data from one strand or
containing possible ambiguities (e.g.
22
) were determined again and several errors were corrected. Sequences near the
ends of cloned fragments were confirmed as internal residues using clones
containing overlapping fragments. Restriction sites predicted by the sequence
conformed to the reported map (
4
), except for two pairs of
Hae
III sites whose members were separated by 46 and 177 bp and which were
originally mapped as single sites.
ORFs in the HP1 genome
The DNA sequence was first translated into the encoded strings of amino acids in
all six reading frames. ORFs predicting polypeptides >7 kDa were analyzed
further; 41 candidates met this criterion. Their positions are indicated in
Figure
1
and their parameters are summarized in Table
1
. Four of these correspond to functional HP1 genes and are designated
accordingly. These genes encode HP1 integrase (
int
;
9
), HP1 Cox (
cox
;
11
), lysogenic repressor (
cI
; unpublished) and replication protein (
rep
; described below). In addition, genes encoding lysozyme (
lys
) and holin (
hol
) functions were identified by sequence comparisons and other arguments (
22
) and are named accordingly. The remaining ORFs are designated
orf1
-
orf35
; evidence bearing on their possible functions will be presented below.
The likelihood that candidate coding segments corresponded to HP1 genes was
examined by applying a series of statistical tests. The pattern of codon usage
in HP1 ORFs was compared to the usage in 36
H.influenzae
genes and to the codon usage compiled for
E.coli
genes (
23
). The findings supported two conclusions. First, the biases in codon usage in
H.influenzae
DNA reflected its base composition, since synonymous codons with A or U in the
third position tend to appear more frequently. Second, the overall preferences
found in the HP1 ORFs resemble those found in
H.influenzae
genes, indicating that host and phage share codon preferences (not shown).
To quantitate the codon preferences associated with each ORF, codon correlation
values were computed. These compare the frequencies at which given residues
occur at each position of a reference and a candidate codon. Reference values
are derived from coding segments from the same organism. Candidate sequences
with correlation scores above 0.6 (a score of 1 is perfect correlation) are
almost always coding regions, while those scoring less than 0.3 are generally
non-coding segments (
19
). The codon correlation scores for the 41 ORFs compared to a table based on 800
kb of coding sequences from
H.influenzae
(
24
) are presented in Table
1
; 38 ORFs scored above 0.6, while translations of the complementary strands
encoding the ORFS, or of computer randomized sequences, failed to produce any
score above 0.28. Three ORFs had correlation scores less than 0.6. Two,
orf10
and
orf30
, had the ambiguous score of 0.4, which might be due to their small sizes. The
third,
orf2
, gave a correlation score of 0.22, clearly out of the range of probable HP1
coding segments. The status of
orf2
will be considered further below.
The sequence surrounding each predicted initiation codon was examined for the
presence of an appropriately located ribosome binding site (
19
). A qualitative rule-based method constructed from
E.coli
ribosome binding sites (
25
) was applied to the regions immediately preceding each presumptive initiation
codon. The results in Table
1
show that 35 of the 41 candidate ORFs had sequences which fit at least three of
the seven rules for ribosome binding sites. Rule-matching has two drawbacks: it is qualitative and the rules are derived
from sequences from
E.coli
. The alternative approach used matrix analysis to evaluate the potential
ribosome binding site preceding each ORF. Using a Perceptron algorithm (
20
), a matrix was constructed using the 60 bp sequences upstream of the initiation
codons of 725
H.influenzae
genes (
24
). This matrix should contain the statistical rules for effective
H.influenzae
ribosome binding sites and was used to compute scores for candidate sequences.
Positive scores indicate that the candidate is probably a ribosome binding site
(
20
). The scores for the HP1 ORFs are listed in Table
1
. Only three ORFs gave negative scores and two of these have adequate ribosome
binding sites by the rule-based criteria. Neither test located an effective ribosome binding site
upstream of
orf16
, however, the combination of codon correlation data and similarity to known
proteins suggest that
orf16
is a functional HP1 gene.
Transcriptional signals
The HP1 genome contains four strong promoter sequences of the [sigma]
70
type (
17
,
18
); their locations are indicated in Figure
1
. Three of these, P
L
, P
R1
and P
R2
, are located in the early region (
11
). The fourth, P
14
, is located 9.7 kb from the left end and is directed rightward. Three [rho]-independent transcription terminator sequences were found (
26
). One of these, T
R
, is located 9.7 kb from the left end and is positioned to terminate what is
probably the early lytic transcript. A second terminator, T
14
, is situated 10.4 kb from the left end. The stem-loop of T
14
is flanked on the right by a stretch of T residues and on the left by a stretch
of A residues; this arrangement is consistent with T
14
terminating transcription from either direction. An additional terminator, T
L
, is located at the right end of the phage and, like T
14
, can function on both strands. This terminator probably serves to terminate the
lysogenic transcript during the early stages of infection and the late lytic
transcript at later times. Sequences and locations of these features are
presented in Table
2
.
Reading frames 1-3 correspond to left-to-right frames, while 4-6 correspond to right-to-left frames. Start signifies the location
of the first A of the ATG start codon, end signifies the final base of the stop
codon. Codon correlation scores are calculated as described in the text.
Ribosome binding sites were scored using the Perceptron algorithm; positive
scores indicate good candidate ribosome binding sites. The number of ribosome
binding site rules (19) satisfied by each site is also tabulated. Known
proteins with >30% similarity to the amino acid sequences encoded by each ORF
are indicated. The function of certain ORFs is indicated; starred entries are
those for which experimental support is available.
[sigma]
70
promoters were located with the PROMSEARCH program. All four promoters scored
above 1.5. No other regions of HP1 scored above 0.2. Terminators were
identified with the TERMFIND program. Underlined regions indicate potential
stem-loop structures.
Haemophilus influenzae
transformation uptake sequences consist of the sequence 5'-AAGTGCGGT-3'. Numbers indicate the first base pair of the uptake
sequence. Top strand refers to the DNA sequence listed in the GenBank entry,
while Bottom strand refers to the complementary strand.
By taking into account the directions of the ORFs and the positions of
transcription signals, a plausible organization for HP1 gene expression may be
inferred, as shown in Figure
1
. At 3.7 kb from the left end, a cluster of three overlapping promoters define
two transcription domains. The leftward promoter P
L
controls expression of the lysogenic repressor, several short ORFs and HP1
integrase, while the paired rightward promoters P
R1
and P
R2
govern expression of the multifunctional regulatory protein Cox and of several
other genes; these are probably components of the early lytic pathway, as
discussed below. This latter transcriptional domain ends at T
R
, immediately downstream of
orf13
. The P
14
promoter would allow independent transcription of
orf14
and this transcript would also terminate at the bidirectional stem-loop terminator T
14
.
The segment between
orf16
and
orf17
is an excellent candidate to contain promoters for leftward and rightward
transcription, because of the two divergent sets of ORFs beginning there. This
region contains two oppositely directed 28 bp stretches, from bp 13680 to 13707
and from bp 13720 to 13747, which each consist of two directly repeated copies
of the sequence 5'-ATATCC, separated by 4 bp. In addition, each 28mer is 6 bp upstream
of a stretch of four T residues. In all, the two 28 bp stretches contain 25
identical bases. We speculate that these two 28 residue sequences likely
constitute part of the promoters for late gene expression and the 6 bp repeats
may provide binding sites for one or more proteins activating late
transcription. These sequence features differ substantially from the late
promoter regions in P2 and 186, where long inverted repeats are centered at -57, with non-standard -35 and -10 sequences (
27
).
No other promoter sequences were obvious in the rightmost 17 kb of the HP1
genome. Either all late rightward transcription initiates at a single late
promoter before
orf17
or another class of late promoters is present but not detectable by the
sequence comparisons used. The presence of a single [rho]-independent terminator at the right end of this 17 kb stretch would
suggest that the region consists of a single transcriptional unit.
Comparisons of encoded polypeptides with other protein sequences
The amino acid sequences deduced from the HP1 ORFs were compared to the contents
of the GenBank database. The predicted products of 18 of the 41 HP1 ORFs
resembled sequences in the archive; these are listed in Table
1
. The similarity between HP1 and the P2-186 phage group was reinforced by these comparisons, since 15 of the
similarities were with proteins encoded by either P2 or 186 or both. Four of
these 15 also resembled polypeptides encoded by the retronphage Ec67. These
sequence similarities allowed us to assign provisional functions to many of the
HP1 ORFs. Where available, experimental data substantiate certain of these
assignments. One important factor guiding these identifications is the way in
which the genes are organized, and therefore the presumptive transcriptional
units will be considered in turn.
The segment downstream from P
L
encodes the lysogenic transcript, containing functions needed for prophage
formation and maintenance. Earlier, we proposed that the promoter-proximal ORF encoded a homolog of the 186 CI protein and consequently functioned as the lysogenic repressor (
11
). Recent studies showed that expression of this ORF repressed transcription of
a gene cassette placed under the control of the P
R
promoters, as expected for this repressor (Esposito, Wilson and Scocca, in
preparation). Accordingly, this gene has been designated
cI
. The promoter-distal ORF in this segment, the
int
gene, encodes the HP1 integrase (
9
).
The functions of the other ORFs in this segment are presently unknown. Three of
these,
orf1
,
orf3
and
orf4
, do not resemble any sequences in the database. The
orf2
segment and its predicted product have several curious properties. The codon
usage in this ORF differs substantially from that of
H.influenzae
or the other 40 HP1 ORFs. The amino acid sequence predicted by it resembles a
series of bacterial ORFs called the 9 kDa proteins, which have been found
adjacent to DNA segments containing homologs of the
E.coli
dnaA
gene (
28
,
29
). To date, nine versions of 9 kDa protein ORFs have been reported. An alignment
of the amino acid sequences common to these ORFs shows a high degree of
identity, suggesting evolutionary conservation of this sequence across a wide
range of species. Selective pressure maintaining the sequence implies that it
has a function, but this remains a speculation, since no function has been
demonstrated yet.
The segment from the two overlapping P
R
promoters to the T
R
terminator most probably constitutes the early transcript for the lytic phase
of phage growth. It includes seven ORFs. Experimental support has been obtained
for the functions of two of these. The promoter-proximal ORF
cox
encodes the Cox protein, which activates excision of the HP1 genome from its
site in the host chromosome (
11
). HP1 Cox is similar in gene location and amino acid sequence to P2 Cox and Apl
of 186 and consequently may be expected to regulate the expression of
repressor, like its coliphage counterparts. This regulatory function for Cox has also been confirmed in recent studies and
will be reported elsewhere (Esposito, Wilson and Scocca, in preparation).
The second major product predicted from this segment is a 90 kDa protein which
we designated Rep, because of its probable role in phage DNA replication. HP1
Rep is similar in size and sequence to the P2 A protein, the 186 CP87 protein
and the Ec67 ORF2 protein. The A protein is the only P2-encoded function required for phage DNA replication and similarly CP87 is
the only 186 function needed for DNA replication (
30
,
31
). The A protein is believed to prime P2 DNA synthesis by introducing a specific
nick at the replication origin, which lies within the A gene itself (
32
); this nick provides the initiation point for rolling circle replication. To
explore the hypothesis that the HP1
rep
gene encodes a similar function, we examined the capacity of this segment to
serve as an origin of replication in
H.influenzae
. We took advantage of the fact that the pUC19 origin of replication does not
function in this organism. A DNA segment including the complete HP1
rep
ORF was inserted into pUC19 and introduced into
H.influenzae
as described in Materials and Methods. The transformants retained the plasmid
and exhibited resistance to ampicillin, while
H.influenzae
transformed with pUC19 or derivatives of it containing other HP1-derived segments did not show ampicillin resistance and failed to retain
the plasmid. The nick at the P2 origin occurs at a CG sequence within the P2 A
gene (
32
) in a segment which is also present in the homologous 186 CP87 gene and in HP1
rep
. This conserved sequence may provide the site for the initiating nick in HP1
replication. Together these findings make it likely that the HP1
rep
gene encodes the protein required to initiate HP1 DNA replication and also
includes the origin activated by this protein.
The product of
orf5
shares limited homology with the 186 CII protein and may therefore have an
analogous role in regulating early gene expression (
33
), but this remains to be established. The possible functions encoded by the
small ORFs
orf6
-
orf9
are unknown.
The amino acid sequence predicted by
orf13
is 35% identical to that of the N
6
-adenine methyltransferase of phage T1 (
34
). Neither P2 nor 186 appear to encode a DNA methylase activity and the role of
this activity in the HP1 life cycle is an open question.
A third recognizable
E.coli
[sigma]
70
promoter precedes
orf14
, which encodes a 14.9 kDa polypeptide of unknown function. This segment is
located at the boundary between early and late functions, is isolated from
other transcriptional units by the terminators and has a promoter that
resembles the other early HP1 promoters. There are no sequences near this promoter with any resemblance to those
neighboring the P
R
/P
L
region, suggesting that neither Cox nor the CI repressor interacts with this
region. The
orf14
gene may be expressed in lysogens and have some function there. Alternatively,
it may be regulated by an unidentified mechanism; in this case it might be a
candidate for a late control gene. Transcription of late genes in P2 is
regulated by the Ogr protein (
35
) and in 186 by the homologous late regulator B, which is controlled by the CI
repressor (
36
). No HP1 ORF encodes a product resembling these late control proteins.
The late genes
The protein products of the leftward transcribing
orf16
and
orf15
genes show significant identity to the P2 P and Q proteins (37% and 46%
respectively). The HP1 ORF16 protein is 35% identical to the 186 CP12 protein.
The product encoded by HP1
orf15
is 34% identical to the Ec67 ORF5 protein and the sequence of the
orf16
protein product resembles a portion of the Ec67 genome (GenBank accession no.
M55249), which is listed as two separate ORFs. However, if 1 bp is inserted
into the reported sequence (after bp 11077), the new combined ORF is
extensively similar to 186 CP12, HP1 ORF16 and P2 P, suggesting that a
sequencing error is likely to be present in the Ec67 sequence. It has been
shown that P2 P is the terminase catalyzing the staggered cleavages that
produce the cohesive ends of the mature linear DNA and that P2 Q is a portal
protein (
37
,
38
). These similarities in sequence suggest that the products of HP1
orf15
and
orf16
play equivalent roles in the maturation and packaging of the phage genome.
Proteins of the phage particle
DNA recognition sites for specific transformation
The interaction of transformable
H.influenzae
with DNA is specific for DNA from members of the genus
Haemophilus
(
47
,
48
). Specificity involves the recognition of specific sequence elements. The core
target required for high affinity uptake is the 9 bp sequence 5'-AAGTGCGGT-3' (
4
). The HP1 genome contains 17 such sites; the locations are indicated in Figure
1
and summarized in Table
2
. All but two of these target sites were within presumed coding regions and the
two situated outside ORFs were single and did not appear to be associated with
any obvious signals. Uptake sites in the host are also largely single, though ~15% are paired and have the potential to form stem-loop structures (
49
). The analogous targets in
Neisseria gonorrhoeae
occur frequently in stem-loop arrangements and have locations consistent with terminators of
transcription (
50
). The frequency of recognition site sequences is lower in HP1 (0.53 per kb)
than in
H.influenzae
(0.8 per kb;
49
). This is somewhat surprising, since HP1 DNA is slightly more efficient than
host DNA in interacting with transformable cells (
8
), and suggests that the uptake sites in HP1 DNA occur in optimized
arrangements.
Relationship of HP1 to other P2-like phage
HP1 shows significant similarity to phage P2 and 186 in gene organization and in
the sequences of the encoded proteins, which are similar enough to be
identified from the database with simple search algorithms. In addition, the
ultrastructure of the two phages are very similar. Both have heads of ~60-65 nm in diameter, with tails of 122 (HP1) or 135 (P2) nm in length
and 18-19 nm in width (
27
; H.W.Ackermann, personal communication). To illustrate further the strong
similarities between phages HP1 and P2, the amount of stained protein in each
band in Figure
2
was quantitated by densitometry and the values were corrected for the mass of
each protein, giving relative molar amounts. These values were then normalized
to the values published for P2 protein abundances (
27
,
40
) by setting the 34 kDa band (the ORF18* band) equal to the 415 copies of the P2
N* protein. In this way, the relative amounts of proteins constituting the two
phage particles could be compared. Clearly, the ratios of tail tube to tail
sheath proteins in the two phage are very close, as is the abundance of the
predicted HP1 tail fiber protein and the scaffold protein (ORF17*). These
quantitations and the probable relationships between the P2 and HP1 proteins
are presented in Table
3
.
Though many similar genes are found in HP1 and P2, the organization of the genes
differ in several respects. P2 and 186 each contain at least four late
promoters, while HP1 appears to have only two. Some of the HP1 genes are in
different orientations. The order of the P2 QPONML operon is reproduced
precisely in HP1, but the order of other clusters, such as the tail protein and
lysis genes, are reversed. Perhaps, since most HP1 late genes are transcribed
from a single promoter, proteins required in large quantities, like tail
proteins, have genes situated close to the promoter. In P2, since the FI and
FII proteins are expressed using an independent promoter, they can be placed
much later in the genome. The diagram in Figure
3
highlights several of the similarities and differences in the gene
organizations of HP1 and P2, for which a large portion of the genome has been
sequenced (
27
).
ACKNOWLEDGEMENTS
We thank Ole Skövgaard for helpful insights on the 9 kDa proteins, H.-W.Ackermann of the F.d'Herelle Center of Laval University for
electron micrography and the NCBI for the use of their network BLAST and FASTA
search services. This work was supported by Grant no. NP830 from the American
Cancer Society. DE was supported under a National Science Foundation Graduate
Research Fellowship.
REFERENCES
1 Harm,W. and Rupert,C. (1963) Z. Vererbungsl., 94, 336-348.
2 Boling,M.E. and Setlow,J.K. (1969) J. Virol., 4, 240-243.
3 Boling,M., Allison,D. and Setlow,J.K. (1973) J. Virol., 11, 585-591.MEDLINE Abstract
14 Maniatis,T., Fritsch,E.F. and Sambrook,J. (1982) Molecular Cloning: A Laboratory Manual. Cold Spring Harbor Laboratory Press, Cold Spring Harbor, NY.
15 Sanger,F., Nicklen,S. and Coulson,A.R. (1977) Proc. Natl. Acad. Sci. USA, 74, 5463-5467.MEDLINE Abstract
19 Stormo,G.D. (1987) In Bishop,M.J. and Rawlings,C.J. (eds), Nucleic Acid and Protein Sequence Analysis: A Practical Approach. IRL Press, Oxford, UK, pp. 231-257.
*
To whom correspondence should be addressed
Present addresses:
+
Biosource Technologies, Vacaville, CA, USA,
[sect]
Department of Biological Sciences, University of North Texas, Denton, TX, USA,
[para]
Department of Biochemistry, USC School of Dental Medicine, Los Angeles, CA, USA
and
[Dagger]
Department of Biological Sciences, University of South Carolina, Columbia, SC,
USA