ABSTRACT
POU genes encode a family of transcription factors involved in a wide variety of cell fate decisions and in the regulation of differentiation pathways. We have searched for POU genes in the zebrafish, a popular model organism for the study of early development of vertebrates. Besides five putative pseudogenes we have identified five POU genes that are expressed during embryogenesis. Probes obtained by PCR were used to isolate full-length cDNAs. Four of the isolated genes encode proteins with class III POU domains. Analysis of genomic clones suggests that the fish genes in general do not contain introns, similar to class III genes of mammals. However, the C-termini of two of the encoded proteins vary due to facultative splicing of a short intervening sequence. These two genes show very strong similarities in their sequence. They have probably arisen by gene duplication, possibly as part of a larger scale duplication of part of the zebrafish genome. Analysis of the expression of the class III genes shows that they are predominantly expressed in the central nervous system and that they may play important roles in patterning the embryonic brain.
The zebrafish (Danio rerio) is a popular model organism to study early embryonic development of vertebrates (1 -3 ). The completely transparent embryos develop rapidly and are easily accessible. Furthermore, large-scale mutagenesis screens have been performed in several laboratories identifying hundreds of genes involved in controlling early development (4 ,5 ). Analysis of these mutants will be instrumental for an increased genetic understanding of developmental pathways in vertebrates. Many genes involved in the regulation of early embryogenesis have been isolated from other species. Molecular characterization of homologs of many such genes in zebrafish is complementing the genetic approaches and will provide possible candidate genes for some of the obtained mutants (see e.g. ref. 6 ).
POU genes encode a class of DNA-binding proteins interacting with DNA through a bipartite domain of ~150 amino acids. The POU domain consists of a N-terminal POU specific region separated by a short linker from a particular type of homeodomain (reviewed in refs 7 ,8 ). Both the POU-specific and the homeodomain contribute to DNA binding each via helix-turn-helix motifs. The two subdomains are also involved in several types of protein-protein interactions (reviewed in ref. 8 ). POU domain proteins generally act as transcription factors but some of them also have additional functions e.g. in DNA replication. POU genes are found all across the animal kingdom. According to the sequence of their POU domain these genes have been grouped into at least six classes (reviewed in ref. 9 ).
Some POU genes, like oct-3/4 in mice or pou-2 in zebrafish, are expressed very early during embryogenesis and are possibly involved in controlling some of the first steps of development (10 -13 ). Expression of other POU genes is initiated later during development. Many of those are transcribed in the forming central nervous system, especially the genes with a class III POU domain. In mammals each of these genes show a very elaborate expression pattern in the brain (reviewed in ref. 14 ). Similarly, the zebrafish class III gene zp-50 shows a very complex and dynamic expression pattern in the embryonic brain (15 ). Genetic analysis of naturally occurring or genetically engineered mutations demonstrate that POU genes are involved in cell fate decisions and in the control of terminal differentiation. For example, genetic defects in several class III genes have been investigated: naturally occurring mutations of brn-4 in human patients lead to profound sensorineural deafness (16 ) whereas knock-outs of the mouse brn-2 gene cause the loss of several types of neurons in the hypothalamus (17 ,18 ).
To investigate POU genes that are potentially involved in the control of zebrafish embryogenesis we isolated several cDNAs with POU domain probes obtained by PCR. We have previously described the sequences of two of the identified genes (13 ,15 ). In this communication we report the sequences of three more genes encoding putative transcription factors containing class III POU domains. One of these genes has been independently found by others (19 ). Some of the genes appear to have arisen by gene duplication events that happened after the separation of the actinopterygian lineage from higher vertebrates. The transcripts of two class III genes are alternatively spliced which leads to the formation of POU domain proteins with variable C-termini. This appears to cause a diversity of class III proteins in zebrafish that is larger than in mammals. All these POU genes are expressed predominantly in the central nervous system and may thereby be involved in patterning the developing brain.
PCR reactions on genomic DNA were done first with 35 cycles using the two following primers: 5'-TITWYGGIAAIGTITTYWSICARACIAC-3' (64-fold degeneracy) and 5'-TGGTTYTGYAAYMGIMGICARAAR-3' (256-fold degeneracy). The annealing temperature was 50oC. An aliquot of amplified material was then further amplified over 38 cycles with the nested primers 5'-CCGGAATTCYTIAAIAAYATGTGYAARYT-3' (32-fold degeneracy) and 5'-GTIRTIMGIGTITGGTTYTGYAAGGATCCGGG-3' (16-fold degeneracy) and annealing at 52oC. For PCR of cDNA libraries 3-6 * 107 phage particles were heated to 100oC for 10 min. Their DNA was then amplified with either of the two primer pairs mentioned above. PCR products were treated with Genecleantm (Bio101) glass beads to remove low molecular weight DNA. The ends were polished with T4 DNA polymerase and the DNA fragments were phosphorylated with T4 polynucleotide kinase before subcloning and sequencing.
Standard procedures were used to isolate cDNAs and genomic clones for the different zp genes (20 ). Probes obtained in the original PCR screens were used to probe 5 * 105 plaques of 9-16 h post fertilization (hpf) (neurula) or 20-28 hpf (postsomitogenesis) cDNA libraries constructed in [lambda]ZAPII (prepared by R. Riggleman and K. Helde, a kind gift from D.J. Grunwald). Inserts of the phages were subcloned into BSSK- by in vivo excision. Nested deletions were generated from both ends of the longest cDNAs by the Erase-a-Base system (Promega). These deletions were used for sequence determination using the Sequenase kit and 35S-dATP (USB/Amersham). Sequences were analyzed by the GCG program suite of DNA analysis programs. A genomic library prepared in [lambda]FIXII (Stratagene) was screened using full-length cDNAs of the class III POU genes. Inserts of positive phages were characterized by restriction analysis.
RNA was prepared from embryos or adult tissues. For Northern blot analysis 10 [mu]g total RNA were run on formaldehyde agarose gels (20 ), blotted to nylon filters and hybridized to 32P-labelled cDNA probes.
In situ hybridization analysis was carried out as described (21 ). Digoxigenin-labelled RNA probes were generated by T7 RNA polymerase transcription of subcloned restriction fragments. In the case of zp-12 we used a ~570 nucleotide (nt) probe specific for the 5' UTR reaching from the 5'-end of clone #20 to the XmnI site at -27. The ~840 nt probe for zp-23 extended from the BsmI site downstream of the second stop codon through most of the 3'-UTR to a TaqI site just before the polyadenylation site. The ~2115 nt long zp-47 probe was derived from an exonuclease deletion clone removing the poly-A tail. That probe started at the Asp718 site 59 nucleotides upstream of the end of the open reading frame and extended through the entire 3'-UTR region.
To identify zebrafish POU genes we performed two consecutive rounds of PCR reactions on genomic DNA. We used nested primers targeting conserved regions in POU domains of other species. Sixty randomly picked clones of the subcloned population of PCR products were analyzed by sequencing. Forty-one clones were related to POU domains and could be grouped into 11 different classes (data not shown). Sequence inspection of five classes obviously suggested that they represented either pseudogenes or that they arose only during PCR amplification e.g. by `polymerase jumping' (22 ).
Since the initial PCR screen was carried out using genomic DNA, intron containing POU genes might have been missed. Therefore, we repeated the PCR reactions on phage DNA from different embryonic cDNA libraries using either of the two primer pairs. The frequency of (mostly artefactual) non-POU sequences was higher in the cDNA screens, probably due to the fact that in these experiments only one round of amplification was performed. Of the 25 subcloned PCR amplificates from a 6-9 hpf library four isolates seemed to encode a novel type of POU domain. This GP-9 group of isolates was subsequently used to clone full-length cDNAs of the zebrafish pou-2 gene as described elsewhere (12 ,13 ). Of the total 37 isolates analyzed from 9-16 hpf neurula or 20-28 hpf postsomitogenesis libraries, three more clones were derived from the pou-2 gene and nine isolates were virtually identical to sequences present in three groups of PCR products identified previously in the genomic screen. These groups were represented by isolates termed ZP-12, ZP-23 and ZP-47.
To determine whether any other POU gene is expressed at some point during embryogenesis, we 32P-labelled uncloned PCR products amplified from the cDNA libraries with both sets of amplification primers. These cDNA derived probes were hybridized to dot blots containing DNA from all our zebrafish POU isolates. Using the three different embryonic libraries we found positive hybridization signals in the cases of pou-2 and of four (ZP-12, ZP-23, ZP-47 and ZP-50) of the eight potential POU genes identified by amplification of genomic DNA (data not shown). The other four genes that were apparently not expressed encoded potential POU domains whose sequence deviated considerable from those found in the databases. Therefore, we consider them to be pseudogenes although we cannot exclude that these genes are expressed at a very low level or at some later point during development. The four genes identified by PCR amplification of genomic DNA that are expressed during embryogenesis all encode class III POU domains. We used representative PCR isolates for each gene (ZP-12, ZP-23, ZP-47 and ZP-50) to isolate full length cDNAs. We report below the molecular characterization of the zp-12, zp-23 and zp-47 genes, whereas the sequence and expression of zp-50 has been described earlier (15 ).
The postsomitogenesis cDNA library was screened with the ZP-12 isolate of the initial PCR screen. A PCR fragment with high similarity to ZP-12 was recently identified also by others (23 ). From screening 500 000 plaques we identified 36 plaques which repeatedly hybridized to the ZP-12 probe. The DNA sequence of three overlapping clones (#3, #20, #29) was determined (EMBL accession Y07906). Clones #20 and #3 contain an apparently full-length open reading frame (Fig. 1 ). The frame begins with a motif (MATAA) highly conserved among vertebrate class III POU domain genes. This presumptive start ATG is preceded three codons upstream by a stop codon. cDNA #29 starts 28 bp downstream of the putative initiation codon. In the second half of the reading frame we find a class III POU domain that is 98% conserved with that of mouse brn-1. Except for two nucleotide mismatches the three cDNAs are nearly identical over the entire length of the reading frame. Clones #3 and #29 differ however significantly from #20 by the fact that a short intron has been spliced out from their corresponding RNAs (Fig. 1 ). This 184 bp sequence starts with a canonical GT motif and ends with a consensus AG sequence (24 ). Most of the open reading frame is unaffected by the splicing event but the resulting proteins have slightly different C-termini: in the unspliced version of the mRNA represented by clone #20 the reading frame is extended by five codons encoded by the facultative intron. If the intron is spliced out, as in cDNAs #3 and #29, the second exon contributes 14 amino acids to the body of the reading frame shared by all three clones. Restriction analysis of additional isolated cDNAs suggests that the intron is spliced out in the majority of the cases (data not shown). Although Northern blot analysis (Fig. 2 ) suggests a homogenous size of zp-12 mRNAs we found cDNAs with variable 3'-ends: cDNA #3 ends 14 nucleotides downstream of a perfect AATAAA polyadenylation motif. In contrast, cDNA #20 ends just upstream of an A-rich sequence between nucleotides 1861 and 1880 which may have erroneously served as template for the oligo-dT primers during library preparation. Partial sequencing of additional isolated cDNAs (data not shown) showed 3'-ends with short poly-A stretches at several positions which however were not preceded by obvious AATAAA motifs.
We screened the neurula library prepared from 9-16 hpf embryos with the ZP-23 isolate of the initial PCR screen. Fifteen of 500 000 screened plaques repeatedly hybridized with the ZP-23 probe. The two longest inserts of 2.1 and 2.3 kb were entirely sequenced. Clone #10 is shorter at the 5'-end (Fig. 1 ) resulting in a truncation of the open reading frame by more than 150 amino acids in comparison to the ORF of clone #7. The predicted N-terminus encoded by clone #7 starts with a MATAA motif conserved among other class III POU domain genes. Besides the 5'-end variation, clone #7 differs from #10 downstream of the POU domain by the removal of an intron sequence of 309 nt. Thereby 37 amino acids of clone #7 replace 19 amino acids at the C-terminus encoded by the unspliced cDNA #10. Restriction analysis of additional cDNAs suggests that the majority of zp-23 transcripts are unspliced (not shown). Since Northern analysis (Fig. 2 ) suggested a considerably longer transcript size of around 3 kb we screened with the entire zp-23 #7 cDNA the cDNA library prepared from 1 day old embryos. We found nearly 100 positive phages in the 500 000 screened plaques. The longest insert (#18) was sequenced (EMBL accession Y07907). This cDNA was derived from an unspliced mRNA and extended the region of the 5' untranslated region by 0.5 kb (Fig. 1 ).
Apart from the spliced out intron, isolate #7 is nearly identical to the sequenced cDNAs #10 and #18, except for the insertion of a CAG trinucleotide at amino acid position 116. An additional five mismatches or deletions both in and outside the coding region have no functional consequences. There is a microheterogeneity at the immediate 3'-end of the three cDNAs resulting in the addition of the poly-A tail either 12 or 16 nucleotides downstream of a canonical AATAAA polyadenylation signal. A RNA (zfpou1) with strong similarity to the unspliced zp-23 #18 has been reported by others (19 ). However, its open reading frame deviates at 12 nucleotide positions from the sequences of our three zp-23 cDNA isolates thereby changing eight amino acids in the encoded protein.
The cDNA library prepared from 1 day old embryos was also screened with the ZP-47 isolate of the initial PCR screen. From 500 000 screened plaques eight phages were recovered that repeatedly hybridized to the ZP-47 probe. The clone with the longest insert was subjected to the generation of nested deletions and these were subsequently used for sequence determination (Fig. 1 ; EMBL accession Y07905). Although the long open reading frame extends up to the 5'-end of the cDNA we believe that translation starts at the first ATG codon found 41 nt downstream of the 5'-end. This start codon is part of a MATTA motif as found in other class III POU genes. The POU domain itself is found as in other class III genes in the second half of the protein. The cDNA ends with a long 3' untranslated region of over 2 kb in length. This 3'-UTR terminates with an AATAAA polyadenylation signal followed 27 nucleotides downstream by a long poly-A tail.
We have used the longest cDNA isolates for each of the four class III POU genes to screen a zebrafish genomic library. Inserts of repeatedly hybridizing phages were subcloned and analyzed by restriction digestion (not shown). For this purpose we used restriction enzymes that were cleaving sites close to the 5'- and 3'-ends of the cDNAs. We compared the size of several restriction fragments of genomic clones with fragments derived from the corresponding cDNAs. In all cases we could not detect significant differences in size. This suggests that the genomic sequences are colinear with their respective cDNAs. Therefore, the four genes zp-12, zp-23, zp-47 and zp-50 apparently do not contain intron sequences, with the exception of the facultatively spliced small intervening sequence at the end of the open reading frames of zp-12 and zp-23. Our finding that all four class III POU domain genes isolated in the zebrafish lack introns mirrors the absence of introns in the murine class III genes brn-1, brn-2, brn-4 and oct-6 (25 ). The non-existence of introns in the class III POU genes contrasts, however, with the presence of multiple introns in other POU genes (see e.g. 12 ,13 ,26 ,27 ).
Total RNA was prepared from various developmental stages and of several adult tissues. In all cases single major RNA species were detected after hybridization with probes specific for the longest available cDNA sequences (Fig. 2 ). For zp-47 a RNA was detected that had an estimated size of ~3.1 kb. For the zp-50 gene we observed a RNA of ~2.9 kb. In the case of the highly related zp-12 and zp-23 genes we estimated transcripts sizes of ~2.7 and ~3.2 kb. The two bands predicted from the alternative splicing of the latter two genes were apparently not resolved in these blots.
Transcripts of all four identified class III genes begin to accumulate between 10 and 12 hpf just after completion of the gastrula period. Maximal RNA levels are observed in all cases after 1 to 2 days of development. zp gene expression slightly decreases during the following days but is still clearly detected 2 weeks after fertilization. In adult zebrafish we could detect transcripts of all four genes only in the brain. At the sensitivity of these Northern blots we can, however, not exclude that these genes might also be expressed in other adult tissues, albeit at lower levels.
To investigate the sites of embryonic expression of the zp genes we began to study their expression pattern by in situ hybridization. In the case of zp-50 we have found previously that this gene is expressed in a highly dynamic and complex pattern in several regions of the central nervous system (15 ). To investigate the expression patterns of the other genes we prepared probes specific mostly for untranslated regions thus ensuring that the probes do not crossreact with other class III mRNAs. Similarly to zp-50, we found predominant expression of zp-12, zp-23 and zp-47 in various portions of the CNS (Fig. 3 ). All zp genes are thus expressed in all the major subdivisions of the embryonic CNS, including the fore-, mid- and hindbrain and the spinal cord. Interestingly, the precise arrangement of the expression domains varies between the different genes. Whereas zp-47 shows some similarities of expression to the zp-50 pattern, it differs from zp-12 and zp-23 expression. For example, zp-47 and zp-50 are strongly expressed in the cerebellum whereas the other two genes are not. Although zp-12 and zp-23 show in general very similar expression profiles, there are also differences, e.g. in the telencephalon where zp-23 expression is more widespread. Besides their predominant expression in the CNS zp-12 and zp-23 transcripts were detected at notable levels also in the pronephric duct (Fig. 3 B and D).
After the discovery of the family of POU genes it was initially attempted to group the family members into four subclasses based on differences in their POU domains (28 ). The discovery of additional POU genes led to the expansion of this scheme to six different subclasses (7 ,9 ). Within any of these classes, genes show high similarity within the POU domain. In addition, a limited degree of homology among class members is also found outside of the DNA binding region, especially among vertebrate genes. While the majority of known POU domain genes can be easily fitted into one of the six subclasses, the sequence comparison in Figure 4 shows that some genes cannot be easily assigned to a particular class. These include the zebrafish pou-2 gene (12 ,13 ), the Xenopus oct-25, oct-60, oct-79 and oct-91 genes (29 -31 ), the rat sprm-1 gene (32 ), and the nematode ceh-18 gene (33 ). Similarly, the partial sequences of the planarian dtpou-2 and djpou-2 POU domains deviate considerably in their homeodomains from other POU genes (34 ,35 ) and might thus have to be grouped into a separate class, at least until the evolutionary relationships of invertebrate POU genes are better understood.
We thank Robert Riggleman, Kathryn Helde and David Grunwald for the cDNA libraries and Andreas Vogel for comments on the manuscript. T.G. is a START fellow of the Swiss National Science Foundation. G.H. was in part supported by a short-term fellowship of the Sandoz-Stiftung zur Förderung der medizinisch-biologischen Wissenschaften. This work was supported by the Swiss National Science Foundation and the cantons of Basel-Stadt and Basel-Land.
*To whom correspondence should be addressed. Tel: +41 61 267 20 72/64; Fax: +41 61 267 20 78; Email: Gerster@ubaclu.unibas.ch
REFERENCES
Return



