Nucleic Acids Research, 2003, Vol. 31, No. 22 6561-6569
© 2003 Oxford University Press
Article |
Deletion in a (T)8 microsatellite abrogates expression regulation by 3'-UTR
Laboratory of Cancer Genetics and 1 Laboratory of Gene Transfer and Therapy of the Institute for Cancer Research and Treatment, University of Torino Medical School, SP 142, Km 3.95, 10060 Candiolo, Turin, Italy and 2 Department of Clinical and Biological Sciences of the University of Torino, Az. Osp. S. Luigi, Regione Gonzole 10, 10043 Orbassano, Turin, Italy
*To whom correspondence should be addressed. Tel: +39 11 9933343; Fax: +39 11 9933524; Email: mariaflavia.direnzo{at}ircc.it
Received August 4, 2003; Revised September 11, 2003;. Accepted September 24, 2003
| ABSTRACT |
|---|
|
|
|---|
A high level of genetic instability might cause mutations to accumulate in tumours. Microsatellite instability (MSI), due to defects of the DNA mismatch repair system, affects in particular repeat sequences (microsatellites) scattered throughout the genome. By scanning transcriptome databases, we found that microsatellites in the human genome are less numerous in coding DNA than in the 3'-untranslated region (UTR), known to mediate control of gene expression. By mutation analysis, we identified a 1 bp deletion in a (T)8 microsatellite embedded in the 1801 nucleotide long 3'-UTR of CEACAM1 gene, thought to be involved in tumour onset and progression. By Lentiviral Vector- mediated gene transfer, we showed that the wild-type but not the mutated CEACAM1 3'-UTR greatly decreased transgene expression at both mRNA and protein level. Messenger RNA abundance was fully regulated by the most 3' region of CEACAM1 3'-UTR. This region includes the (T)8 microsatellite but not any known classified regulatory element. These data show that CEACAM1 3'-UTR contains non-canonical elements contributing to mRNA regulation, among which a short repeat sequence could play a critical regulatory function. This suggests that, in cancer cells, a single mutation in a 3'-UTR short microsatellite might strongly affect gene expression.
| INTRODUCTION |
|---|
|
|
|---|
Microsatellites are repeat nucleotide sequences distributed throughout the genome. In cancer, microsatellites might be affected by genetic instability arising from inactivation of DNA mismatch repair genes, leading to an increased rate of mutation (reviewed in 1). This repair gene defect gives rise to instability at nucleotide level, because naturally occurring replication errors cannot be repaired effectively. Tracts of repeat sequences are particularly vulnerable to mutations and this genetic defect of cancer cells is known as microsatellite instability (MSI) as its detection is easier at these sequences (reviewed in 2).
The accurate replication of microsatellites located in coding DNA is obviously critical to gene function. MSI is the hallmark of tumours arising in Hereditary non-Polyposis Colorectal Cancer syndrome (3), one of the commonest hereditary cancer syndromes, and it is also observed in 1015% of the sporadic colorectal (4), gastric (5) and endometrial (6) cancers, thus coding microsatellite mutations have been thoroughly studied. This was done using candidate-gene approaches (reviewed in 7,8), i.e. studying sequences relevant to critical cellular pathways, and genome-wide scanning, taking advantage of the huge amount of sequence information contained in public databases. Several mutations were found in genomic DNA of cancer cells showing MSI (MSI+ cancers).
As mutations in expressed genes are more likely to be functional, we developed a procedure for the systematic identification of mutant repeat-containing expressed sequences [Amplification of Repeat-containing Transcribed Sequences, ARTS (9)]. This method allowed identification of a series of mutated mRNAs in MSI+ cancer cells. Beside mutations in transcript coding sequences, we found several base deletions and insertions within microsatellites in transcript 3'-untranslated regions (UTRs) [(9) and M.Olivero, unpublished results]. This prompted us to understand the importance of mutation at 3'-UTR microsatellites.
Microsatellites in non-coding DNA are commonly polymorphic at genomic level, indicating both a high rate of spontaneous mutations and a low grade of functional significance. However, among non-coding DNA regions, both 3'- and 5'-UTRs, flanking the coding DNA sequence (CDS), are regulatory and their function could be affected by mutation. In a genomic scanning of normal individual DNAs for mononucleotide repeats of 1532 bp in length, it was found that these microsatellites are more monomorphic in 3'- and 5'-UTRs of coding sequences compared with those dispersed in other non-coding DNA regions (10). This suggested that their conservation was due to selective pressure related to possible functional roles. It has also been reported that in cancers showing MSI, repeats of >15 bp length are mutated with a frequency proportional to length. However, in the same study (10), mutations in long repeats located near classified 3'-UTR regulatory elements (11) were not found. Therefore, it was possible to conclude that in MSI+ cancers, mutations at 3'-UTR microsatellites are less likely to be passenger or bystander alterations, i.e. consequences of the generalized instability. Mutations at transcripts 3'-UTRs could be important, but their functional consequences have never, as yet, been explored.
The mean 3'-UTR length in human transcripts is >500 nucleotides, nearly four times longer than mean 5'-UTR (11). In addition, while the 5'-UTR length has comparable lengths in all species, 3'-UTR extent varies considerably undergoing a strong evolutionary expansion from fungi to humans (reviewed in 12). The extended 3'-UTR length provides significant potential for transcript-specific regulation originating at this region. A number of mutations at 5'-UTR have been associated with human disease (reviewed in 13) including cancer. Several regulatory elements have been identified in gene 3'-UTRs (reviewed in 12) and are considered as potential hotspots for pathology (reviewed in 14). However, the importance of 3'-UTR mutations is less understood, although a few mutations were identified (reviewed in 15). None of the mutations already described involve repeat sequences.
In this paper, we report a genome-wide in silico scan of mononucleotide repeats in the human transcriptome, which shows that microsatellites are more frequent in the 3'-UTR than in coding sequences of transcripts. We analyzed the consequences of a 1 bp deletion in a single (T)8 microsatellite of the 1801 long 3'-UTR of CEACAM1, formerly known as CD66a or biliary glycoprotein 1, an adhesion molecule deregulated in several cancer cells (reviewed in 16). Although this 3'-UTR does not contain any canonical regulatory sequence, we found that it regulates mRNA abundance and that the (T)8 microsatellite is a key component of this regulation.
| MATERIALS AND METHODS |
|---|
|
|
|---|
Identification of CDS and 3'-UTRs containing mononucleotide repeats
The analysis was performed on the human reference sequence file (hs.fna, hs.gbff files) (17) downloaded on 6 May 2003 from ftp.ncbi.nih.gov. Using a PERL script, 18 807 CDSs and 17 797 3'-UTRs were extracted. The number of CDSs and 3'-UTRs containing at least one mononucleotide repeat ranging from 6 to 15 bases were also determined by PERL scripts. The 3'-UTR poly (A) sequences were not taken into account.
Identification of genes down regulated in colorectal cancer cells
Using the tool available at http://www.ncbi.nlm.nih.gov/SAGE/sagexpsetup.cgi, normal colon (NC1 and NC2) tissue SAGE tags were selected, together with those of other normal tissues (brain, cerebellum, breast and prostate). 782 tags found at least eight times over-expressed in normal colon tissues versus other tissues were defined as colon specific. Colon specific tags were associated with their unigene loci using a PERL script. Redundancies and unreliable tags were removed, yielding 576 tags. Tags were then associated with their expression level in tumour colon tissues and HCT116 colon cell line. Ninety-one transcripts were selected, as they showed a relatively homogeneous expression in both normal and tumour replicates (average = 5 tags, log2 NC1/NC2 < |0.6| and log2 TU98/TU102 < |0.6|) and were also at least 2-fold down-regulated in tumour and in HCT116 cell lines. Using the hs.fna and the LL_tmpl downloaded on 6 May 2003 from ftp.ncbi.nih.gov, we assigned CDSs and 3'-UTRs to 68 of these transcripts (the full list is given as Table S1 in the Supplementary Material).
Cell lines
The following colorectal carcinoma-derived cell lines were used: the MSI negative SW480, SW620, HT29 and the MSI positive DLD1/HCT15, HCT116, SW48 and LoVo cell lines. The human 293T and HeLa cell lines were used to produce and to test Lentiviral vectors, respectively. Cell lines were purchased from the American Type Culture Collection (ATCC, Manassas, VA).
PCR and direct sequencing
Genomic fragments of 11 genes (listed in Table S2 of Supplementary Material) were amplified and sequenced using specific primers. Primer sequences and PCR conditions are available from the authors. To perform mutation analysis, we developed a PERL program that allowed the identification of sequence-specific intronic primers in genomic data banks by Virtual Chromosome Walking.
Real-time PCR
PCR reactions were carried out using an ABI Prism 7700 Sequence Detection System (Perkin-Elmer Applied Biosystems, Oak Brook, IL). Quantitative real-time PCR with TaqMan assay was used to measure integrated transgene copies with the following primers and TaqMan probes, based on lentiviral vector and MET gene sequences. To measure transgene copies, the forward primer was located nearby Primer Binding Site Region (PBS) (5'-TGAAAGCGAAAGGGAAACCA-3') and the reverse primer was located near the
region (5'-CCGTGCGCGCTTCAG-3') of the Lentivirus backbone; the probe was (5' VIC) (5'-AGCTCTCTCGACGCAGGACTCGG-3'). Genomic MET gene copies were measured using the forward primer 5'-TTGCCAGAGACATGTATGATAAAGAATACT-3' and the reverse primer 5'-TTTCCAAAGCCATCCACTTCA-3' and the probe (5' FAM) 5'-TGTACACAACAAAACAGGTGCAAAGCTGC-3'.
cDNA synthesis and end-point and real-time RTPCR
Cytoplasmic RNA was isolated from cultured cells using the ConcertTM Cytoplasmic RNA Reagent (Invitrogen, Carlsbad, CA). cDNA was synthesized from 5 µg of mRNA in a reaction buffer containing 1x M-MLVRT reaction buffer (Promega, Madison, WI), 2.5 µg random hexanucleotides (Invitrogen), 0.25 mM dNTPs (Invitrogen), 5 µl of M-MLVRT(H-) enzyme (Promega) in DEPC-treated H2O; the reaction was incubated for 1 h at 42°C and inactivated for 15 min at 70°C.
To evaluate the expression of 11 genes (listed in Table S2 of Supplementary Material) in the panel of colorectal carcinoma cell lines, end-point RTPCR was performed, using specific primers and conditions that are available from the authors. Northern blot analysis was carried out on 30 µg of mRNA with the indicated probes (Fig. S1).
Real-time RTPCR with TaqMan assay was used to measure transgene expression in the ABI PRISM 7700 Sequence Detection System (Perkin Elmer Applied Biosystems, Foster City, CA). To detect GFP expression the following primers were used: forward primer 5'-CTACGGCGTGCAGTGCTTC-3' and reverse primer 5'-AGATGGTGCGCTCCTGGAC-3', in the presence of SYBR® Green PCR Core Reagents kit (Perkin-Elmer Applied Biosystems). GAPDH expression was measured using the Perkin-Elmer GAPDH kit.
Plasmids
To generate Lentiviral vectors by transient transfection, the three-plasmid expression system was used, as previously described (18). The three plasmids were: the packaging plasmid, pCMVR8.74 designed to provide the HIV proteins needed to produce the virus particle; the envelope-coding plasmid, pMD.G, for pseudotyping the virion with VSV-G, and the self-inactivating (SIN) transfer vector plasmid (pRRLsin.PPT.hPGK.EGFP.WPRE). The transfer vector plasmid contains the enhanced GFP marker gene driven by the human phosphoglycerate kinase promoter (hPGK). In this plasmid, the different 3'-UTR fragments were subcloned.
The CEACAM1 3'-UTR of fragment of 1458 bp was amplified using the following primers: forward primer 5'-ACGCGTCGACCAACCTGGACTTGTTTTAAACTTGC-3' and reverse primer 5'-TTCCGCGGCCGCTATGGCCGACGGTCGACAGACATGATCTTAGCCAGGAA-3'. The amplified fragment was first subcloned into the pCR2.1-TOPO vector (Invitrogen, Carlsbad, CA) to generate the plasmid p3'UTRCEACAM1 (5412 bp) and then into pRRL.sin. PPT.hGFP.pre transfer vector to generate pRRL.sin.PPT.hGFP.3'UTRCEACAM1.pre. Mutant 3'-UTR of CEACAM1 transfer vector was obtained using a PCR-based technique as described (19) and was subcloned as above. The most 3' and 5' fragments of the CEACAM1 3'-UTR were obtained by cleavage with EcoNI and subcloning as above. The unrelated 3'-UTR transgene was obtained by cleavage of a MET cDNA 1384 bp fragment with ACCI and subcloning as above.
Lentiviral vector production
Vector stocks were produced by calcium phosphate transient transfection, co-transfecting the three plasmids in 293T human embryonic kidney cells. The calcium phosphateDNA precipitate was allowed to stay on the cells for 1416 h, after which the medium was replaced, collected 24 h later and filtered through 0.22 µm pore nitrocellulose filters (20).
Determination of viral titre and transduction efficiency
In order to determine the viral particle concentration of the supernatants from 293T cells, p24 antigen was analyzed by HIV-1 p24 Core profile ELISA (Abbott Diagnostics or NENTM Life Science Products) following the manufacturers instructions. Vector serial dilutions were performed and 106 cells were transduced in the presence of 8 µg/ml polybrene. Seventy-two hours after transduction, cells were analyzed by FACS (FACS Calibur, Becton Dickinson Immunocytometry Systems); CellQuest (Becton Dickinson) or WindMDI (Microsoft) software was used for data analysis. Relative mean fluorescence intensity (M.F.I.) was calculated using FACS analysis parameters. Southern blot analysis was carried out as reported elsewhere (18).
| RESULTS |
|---|
|
|
|---|
Search for mononucleotide repeat sequences in the human transcriptome
Using ARTS (9), we have previously studied base insertion and deletion at mononucleotide repeats in the human transcriptome with a genome-wide approach. We found several base deletions and insertions at (A/T/C/G)n>6 microsatellites in transcript 3'-UTRs [(9) and M.Olivero, unpublished results].
To assess the frequency and importance of mononucleotide repeats in transcript 3'-UTRs, we carried out a systematic search of transcripts containing 615 mononucleotide repeats. The frequencies of mononucleotide repeats over 15 bases long in 3'-UTRs have already been calculated in cDNA sequences contained in GenBank (10). It was reported that a total of 35 poly(A) and poly(T) sequences, between 15 and 32 bp, were located no more than 2 kb from coding sequences either upstream or downstream.
We found that the frequencies of CDS and 3'-UTRs containing at least one (A)6, (T)6, (G)58 and (C)58 are similar (Fig. 1). Frequencies were increasingly different if one considers longer repeats. As an example, CDSs containing either (A)>8 or (T)>8 are more than one log less frequent than 3'-UTRs containing repeats of the same length.
|
Sequence analysis of mononucleotide repeats in human transcripts
We hypothesized that a mutation at 3'-UTR mononucleotide repeats can contribute to expression regulation in cancer. As a model, we studied colorectal cancer, as MSI is the mark of hereditary non-polyposis colorectal cancer and is present in 1015% of the sporadic counterpart (4). Therefore, from the transcriptome databases generated by means of serial analysis of gene expression (SAGE), we extracted a list of genes whose expression was detectable in normal colorectal tissues and changed in colorectal cancer samples and cell lines. SAGE data are particularly useful as one can use expression data of normal colonic tissues as the referral pool and those of carcinoma cells as experimental samples. In addition, SAGE analysis of colorectal cancer cells showing MSI is available.
We compared the list of mononucleotide repeat-containing transcripts (see above) with the list of colorectal cancer cDNAs identified by tags sequenced in SAGE analysis. The result of this in silico analysis was a list of mononucleotide repeat-containing transcripts that are differentially expressed in colorectal carcinoma cells versus normal colon tissues.
First, we considered transcripts down-modulated in colorectal carcinoma cells as 3'-UTR sequences are mainly described as de-stabilizing mRNA. We took into account transcripts that are >2 times under-represented in cancers versus normal tissues, were also consistently down-modulated in MSI+ cancer cells and corresponded to known genes. Second, by comparing the latter list of down-modulated transcripts with that of mRNAs containing mononucleotide repeats, we identified 68 transcripts (the full list is given in Table S1 in the Supplementary Material).
We confirmed the low-absent expression of 11 genes (the list is given in Table S2 in the Supplementary Material) in MSI+ (HCT116, HCT15/DLD1, SW48 and LoVo) and MSI (SW620/SW480) colorectal cancer cell lines using end-point RTPCR (data not shown). Expression of one of these genes (CEACAM1) in the above cell lines was also studied by northern blot analysis; data are shown in Figure S1 in the Supplementary Material. These results suggested that loss of these 11 genes, expressed in normal colorectal cancer tissues, could be relevant to colorectal tumourigenesis. Subsequently, we carried out mutation analysis of the coding microsatellites of these 11 transcripts in the above panel of cell lines; this analysis was performed to rule out the possibility that mutations in coding microsatellites, in particular frameshift mutations, could be responsible for transcript aberrancy leading to degradation. By means of PCR and direct sequencing, we analyzed the 46 repeats found in the 11 gene CDS (the list is given in Table S2 in the Supplementary Material), that include mono-, di- and tri-nucleotide repeats, in the above listed MSI+ and MSI colorectal cell lines. We did not find mutations in the coding microsatellites of the 11 transcripts.
To assess if 3'-UTRs of transcripts containing mononucleotide repeats could regulate gene expression, we analyzed the 3'-UTR sequences of the 11 transcripts. We took advantage of the annotated 3'-UTR databases, which list either mRNAs containing AU-rich elements (ARE) (21,22) or the mRNAs whose putative regulatory motifs are highlighted (11). AREs are the best-studied signals in the 3'-UTR, that target a variety of labile mRNAs for degradation and for translation blockade (reviewed in 23). AREs are repetitive repeat-containing elements. By means of in silico analysis, we found that only villin 2 (ezrin) transcript 3'-UTR contains a cluster V-type ARE sequence. We did not find mutation of this ARE sequence in MSI+ cells (data not shown). The other above-mentioned deregulated 11 genes do not give rise to canonical ARE-containing transcripts.
As the ARE database contains only genes with adenylate- and uridylate-rich elements organized in specific base stretches, we subsequently evaluated if the 3'-UTRs of the 11 deregulated genes contain any of the common motifs evidenced in 3'-UTR databases (11). We found that none of these genes contain any known putative regulatory elements. We then examined mononucleotide repeats of the 11 transcripts 3'-UTRs, as listed in Table S2 in the Supplementary Material and found no mutations in most of them. We found that the (T)8 sequence of the 1801 bp long 3'-UTR of CEACAM1 gene (Fig. 2A) showed a heterozygous 1 bp deletion in the MSI+ LoVo colorectal cancer cell line.
|
Regulation of protein expression by the non-mutated and 1 bp deleted 3'-UTR from CEACAM1 gene
To assess if mutation in the (T)8 microsatellite of CEACAM1 transcript 3'-UTR could influence gene expression, we added 1458 bp (Fig. 2B) of this 1801 bp long 3'-UTR sequence to a reporter gene cDNA. The aims were to evaluate if this transcript 3'-UTR contributes to gene expression regulation and if deletion at its microsatellite could influence the possible regulation.
We studied reporter gene expression using transgene transfer by means of Lentiviral vectors (LVs). A sequence complementary to 1458 bp of CEACAM1 wild-type (wt) 3'-UTR, including the (T)8 microsatellite, was added as 3'-UTR to the green fluorescence protein (GFP) cDNA in third generation LVs. Using site-specific mutagenesis, we also obtained vectors carrying the GFP cDNAs followed by the same but 1 bp deleted (mut) sequence. As a control construct, 1384 unrelated (unrel) base pairs were added to GFP cDNA at 3'-UTR to space out other LV regulatory sequences, like Woodchuck PRE (24).
Due to a deletion in the LTR promoter region, integrated LVs are transcribed from the internal promoter that is present in the expression cassette (25). For this we used the human phosphoglycerate kinase 1 (hPGK) gene promoter; this gene is expressed ubiquitously in human tissues. hPGK promoter-driven GFP expression was analyzed by fluorescent activated flow cytometry. The latter measures both the percentage of GFP expressing cells and the mean fluorescence intensity (MFI) of positive cells.
Both LoVo cells, showing MSI, and MSI negative HT29 colorectal cancer cells were transduced. Cells were left in monolayer culture and analyzed for GFP expression 3 and 21 days after transduction. Superimposable results were obtained in the two cell lines and at the two time points of analysis; this was expected as LVs allow stable and long-term transgene expression. To demonstrate the suitability of the two cell lines for transduction, experiments with LVs carrying only the GFP reporter gene were carried out. Transgene copies were integrated at multiple sites in all cells at the highest vector input, as demonstrated by the percentage of GFP positive cells (data not shown). By decreasing vector transduction units, the percentage of GFP positive cells decreased (data not shown).
To study regulation driven by CEACAM1 3'-UTR, cells were transduced with serial dilutions of vector preparations carrying the GFP cDNA followed by the three 3'-UTRs described above. Transduction efficiencies of vector preparation were quantitated using end-point titration on HeLa cells (data not shown). In each experiment, vectors showing similar titers were used. To compare GFP expression modulation by the different 3'-UTRs, vector preparations were diluted to obtain identical transducing unit concentrations. At each transducing unit concentration assayed, the wt 3'-UTR led to GFP expression in a lower percentage of cells than control transgene including the unrel 3'-UTR sequence (Fig. 3). On the other hand, the mut 3'-UTR failed to reduce GFP expression comparably (Fig. 3). At maximal frequency of transduction, achieved at vector dilutions resulting in 100% positive cells showing a plateau level of GFP expression, MFI of cells expressing the transgenes with either the mut or the unrel 3'-UTR were comparable (data not shown). On the other hand, the MFI of cells transduced with the wt 3'-UTR transgene was one log lower.
|
Regulation of mRNA abundance by the non-mutated and 1 bp deleted 3'-UTR from CEACAM1 gene
GFP expression does not stringently reflect mRNA regulation, as the protein is quite stable (approx. half-life 50 h) and accumulates in cells after transduction. Therefore, to better characterize expression modulation driven by transgenes carrying the different sequences at 3'-UTR, 3 and 21 days after transduction, both the integrated transgene copies and mRNA levels were measured using quantitative RTPCR with TaqMan assay.
By serially diluting vectors, we obtained a linear relationship between vector concentration and the number of integrated transgene copies (data not shown). In fact, at low vector concentrations, LVs allow integration of a number of transgene copies linearly related to the number of vector particles used (26). Transgenes were randomly integrated at different sites, as demonstrated by Southern blot analysis of cell genomic DNA digested with a restriction enzyme showing a unique site in the transgene sequence (data not shown).
In cells transduced with LVs, the amount of mRNA transcribed from each copy of integrated vector DNA was linear (Fig. 4A). This was expected as it has been shown that every transduced LV genome is competent for transcription (27).
|
The amounts of transgene mRNAs carrying different 3'-UTR were compared: equal number of transgene integrated copies gave different amount of detectable mRNAs (Fig. 4B). The level of mRNA with wt 3'-UTR was lower than that of mRNA with unrel 3'-UTR: it was 4-fold lower for each integrated transgene copy. Mutation at the 3'-UTR microsatellite completely reverted regulation of mRNA exerted by wt 3'-UTR. Differences in mRNA abundance were more striking than that of GFP expression measured by flow cytometry (see above), as predicted due to GFP stability and accumulation.
To localize the elements contributing to CEACAM1 mRNA regulation, we examined separately the two most 5' and 3' fragments of both the CEACAM1 wild-type and mutated 3'-UTRs (Fig. 2B). The most 3' fragment of 598 bases fully carried out the regulatory function of the wt 3'-UTR (Fig. 4C). The mutated fragment (made of 597 base, i.e. carrying the 1 bp deletion), that therefore includes the (T)7 microsatellite, did not comparably reduce GFP mRNA amount, confirming that this mutation impairs the regulatory function of CEACAM1 3'-UTR.
The MFOLD computer program was used to predict the secondary structure of the wt and mut CEACAM1 3'-UTRs. The best-folded structure showed a stability of 544 kcal. The stability of the mut version slightly differed (542.4 kcal). Differences were found in the stemloop formed by the 35 bases which include the (T)8 microsatellite (10591094, Fig. 2C). The presence of the T deletion caused the stem to become two bases shorter and the loop two bases longer; in addition, a three-base bulge appears three bases downstream from the loop. It is worth noting that the structures of both the wt and mut CEACAM1 3'-UTR were predicted to be conserved in the most 3' 598 bases that maintained the full regulatory activity (data not shown).
Regulation of CEACAM1 mRNA abundance was confirmed using northern blot and RTPCR analyses of CEACAM1 transcripts in colorectal cancer cells lines (Fig. S1 in the Supplementary Material).
| DISCUSSION |
|---|
|
|
|---|
Data reported in this work show that, unexpectedly, a short repeat sequence carries out an important regulatory function in a 1801 bp long CEACAM1 3'-UTR, which does not contain any of the regulatory elements already classified. This suggests that gene expression changes in cancer cells might be due to point mutation in 3'-UTR regulatory sequences. Mutations could be particularly frequent in cancer cells where genetic instability at nucleotide level affects repeat sequences (MSI). It is noteworthy that MSI is not only the mark of hereditary non-polyposis colorectal cancer (3), one of the commonest hereditary cancer syndromes, but is also present in common sporadic cancers. It is detected in
15% of the sporadic colorectal cancers (4), and is present at variable frequency in cancer of the uterus (6), urinary tract (28) and stomach (5). To understand the importance of microsatellite mutation in 3'-UTR, we hypothesized that mutation could influence gene expression. In fact, several elements have been identified in gene 3'-UTRs leading to gene expression regulation (reviewed in 12). Some of them are made of repeat sequences. These include AU-rich elements found in many short-lived mRNAs (21), which bind proteins mediating stabilization; the signals that regulate mRNA localization which hinder compartment-specific degradation (reviewed in 29) and the signals that regulate end-to-end mRNA interaction leading to circularization (reviewed in 12), through which 3'-UTR-binding proteins join 5'-UTR binding factors involved in regulating translation. In all cases, either the amount of mRNA or the level of the encoded protein varies.
First, we performed in silico analyses, taking advantage of the huge amount of information available in public databases. Mononucleotide containing transcripts were extracted from messenger RNA databases. To select potentially regulated transcripts in MSI+ colorectal cancers, we analyzed expression databases (SAGE). We focused on those genes that are frequently down-modulated in colorectal cancer versus normal tissues and are also affected in colorectal cancer cells showing MSI (30). We were mainly interested in gene expression decrease, as most of the already known 3'-UTR regulatory sequences affect mRNA stability. We focused attention on genes containing mononucleotide repeats that are under-expressed in MSI+ colorectal cancers. We first ruled out the possibility that their reduced expression was due to mutations in coding microsatellites. In fact, it is known that instability at repeat sequences frequently leads to frameshift mutation and that the latter might result in aberrant transcripts degraded by the surveillance system (reviewed in 31). Considering that important microsatellites should be located in regulatory domains of 3'-UTR, we also explored 3'-UTR databases before performing mutation analysis. None of the selected transcripts, i.e. transcripts that contain microsatellites and are down-modulated in colorectal cancer, showed classified regulatory sequences (11) in their 3'-UTRs. Therefore we hypothesized that microsatellites in still unknown elements could contribute to expression regulation. We selected transcripts containing mononucleotide repeats in their 3'-UTRs. Mutation analysis identified a 1 bp deletion in a (T)8 microsatellite of the CEACAM1 gene 3'-UTR in MSI+ cancer cells, but there were no mutations in the microsatellites of the other ten 3'-UTR transcripts. Although limited, these data give an insight into the total mutational load of mismatch repair deficient cells. It is generally thought that the mutator phenotype of MSI+ cancer cells generates massive genomic variation, demonstrated by observing the appearance (gains) and disappearance (losses) of bands, corresponding to PCR products, obtained by randomly amplifying sequences from MSI+ tumours and matched normal tissues. Our results imply that either random mutation is less frequent than predicted or that most of these mutations are not selected during tumour progression. Altogether data might suggest that mutation in microsatellites of expressed sequences does not contribute to expression down-modulation in MSI+ colorectal cancer cells.
Here we report that the CEACAM1 gene 3'-UTR can regulate this gene expression and that the apparently anonymous (T)8 microsatellite, embedded in this 1801 bp long 3'-UTR sequence, participates in regulation. CEACAM1 is a member of the human carcinoembryonic antigen (CEA) gene family, composed of 29 genes expressed in the epithelium and endothelium of various tissues and further subdivided into two families (reviewed in 32,33). The CEACAM subgroup members belong to the immunoglobulin superfamily of adhesion molecules. The expression pattern, relationship to the immunoglobulin superfamily and presence of signal transduction motifs in the cytoplasmic domain suggest potential and diverse functions for CEACAM1 as an adhesion molecule and as a regulator of signal transduction. CEACAM1 is also an important regulator of insulin action and mediates Neisseria infection. As far as human cancer is concerned, a variety of functions that may be important in cell growth, some apparently contradictory to each other, have been ascribed to CEACAM1. On one hand, in several cases, CEACAM1 expression is lower in tumour tissue compared to normal tissue. These findings, and the observation that CEACAM1 is a negative regulator of tumour cell growth in some cancer models, suggested that it might be a tumour suppressor gene (34,35). On the other hand, more recent reports suggest that CEACAM1 is a malignancy-promoting molecule and a prognostic marker of poor outcome in different human cancers (reviewed in 16) including colorectal cancer (36,37). CEACAM1 could contribute to tumour progression (38,39) by different mechanisms. CEACAM1 is a highly glycosylated membrane bound protein and is the main carrier of the selectin-binding carbohydrate group sialyl Lewis X (40). The CEACAM1selectin interaction is thought to be involved in the metastatic cascade. CEACAM1 is also a potent pro-angiogenic factor and a major effector of VEGF (41) known as a tumour pro-angiogenic factor. In addition, it has been shown that CEACAM1 binds both SHP-1 and 2 phosphatases (42,43). The latter is expressed in epithelial cells and is, unexpectedly, a mediator of integrin, growth factor and cytokine activation of SRC, RAS and ERK signalling (reviewed in 44) and thus might contribute to tumour cell proliferation and invasiveness. It is known that CEACAM1 gene expression in normal and cancer tissues is regulated at the transcriptional level. By means of in vitro experiments, we report here that the full 3'-UTR of CEACAM1 transcript strongly down-modulated mRNA abundance in transduced colorectal cancer cells. The finding that mutation within the CEACAM1 3'-UTR microsatellite reverts down-modulation is in line with its possible role as a molecule contributing to tumour progression.
Data presented here demonstrate that previously unclassified 3'-UTR sequences strongly modulate mRNA abundance. This was obtained by using LVs, that allow transfer of reporter cDNA and its random integration in the totality of cultured cells, as LVs are also able to transduce non-dividing cells. Random and multiple integrations allowed us to rule out a role of integration sites on expression variation. In addition, we show that there was a constant linear relationship between the number of integrated copies and mRNA amount. This is a particular advantage of transgene expression operated by LV gene transfer. Therefore, transgene expression only depends on its intrinsic regulation, that depends only on 3'-UTR in our experiments, where reporter gene expression was driven by the same internal promoter. Using a housekeeping gene promoter, we obtained regulation at low-level transgene expression, which prevented transgene-mediated toxicity (45). By means of quantitative RTPCR with TaqMan assay, we could establish precisely the relative amount of mRNA accumulated in cells. This allowed us to leave aside assays based on transgene expression, that have low sensitivity and are feasible only above a threshold level (45).
The 1801 bp long CEACAM1 3'-UTR does not contain any of the sequences already described, that regulate mRNA abundance such as those reported in the ARE database (22), in the 3'-UTR databases (11) and in many works that studied RNA-binding proteins potentially regulating mRNA abundance.
We could attribute a regulatory function to a short (T)8 microsatellite. In general, only in a few cases simple uridine-rich sequences have been reported to bind regulatory proteins. In Drosophila, the uridine-rich polypyrimidine tract of the tra gene mRNA, which has no base pair, binds the Sex-lethal protein that plays a key role in sex determination (46). A poly-U stretch different from classical AU-rich elements mediates regulation of insulin like growth factor binding protein expression (47). Surprisingly, we found that a single uridine-rich sequence carries out regulatory functions. It has been stressed that multiple AU- or U-rich elements in a 3'-UTR are required for full regulation of mRNA (reviewed in 21,48). However, the importance of each of these canonical elements has been emphasized by their experimental single nucleotide mutagenesis.
It was known that 3'-UTRs are highly diverse in sequence. Their lengths indicate that several regulatory elements should be present, although only a few have already been classified. In this work, we highlight the regulatory functions of a 3'-UTR that does not contain any of the already described elements, including an apparently anonymous (T)8 microsatellite that can be targeted by genetic instability in cancer cells. This sequence is of interest, as common mRNA motifs have been thoroughly studied, for the development of RNA based therapy, aimed at targeting single or related genes. The recent discoveries of the multiple physiological roles of small RNAs (reviewed in 49) further emphasised the importance of identifying functionally active motifs diverse from the well known canonical ones. Those located in the 3'-UTRs are particularly attractive as they can be targeted to obtain expression regulation.
| SUPPLEMENTARY MATERIAL |
|---|
|
|
|---|
Supplementary Material is available at NAR Online.
| ACKNOWLEDGEMENTS |
|---|
We thank Enzo De Sio, Lucia Sergi Sergi and Raffaella Albano for technical help, and Elaine Wright for reading the English. This work was supported by the Italian Ministry of Research and Education (MIUR) Cofin and FIRB project funding to M.F.D. and R.C. and the Italian Association for Cancer Research (AIRC) funding to M.F.D.
| REFERENCES |
|---|
|
|
|---|
- Lengauer,C., Kinzler,K.W. and Vogelstein,B. (1997) Genetic instability in colorectal cancers. Nature, 386, 623627.[CrossRef][Medline]
- Jiricny,J. and Nystrom-Lahti,M. (2000) Mismatch repair defects in cancer. Curr. Opin. Genet. Dev., 10, 157161.[CrossRef][Web of Science][Medline]
- Aaltonen,L.A., Peltomaki,P., Leach,F.S., Sistonen,P., Pylkkanen,L., Mecklin,J.P., Jarvinen,H., Powell,S.M., Jen,J., Hamilton,S.R. et al. (1993) Clues to the pathogenesis of familial colorectal cancer. Science, 260, 812816.
[Abstract/Free Full Text] - Peltomaki,P. (2001) Deficient DNA mismatch repair: a common etiologic factor for colon cancer. Hum. Mol. Genet., 10, 735740.
[Abstract/Free Full Text] - Chung,Y.J., Park,S.W., Song,J.M., Lee,K.Y., Seo,E.J., Choi,S.W. and Rhyu,M.G. (1997) Evidence of genetic progression in human gastric carcinomas with microsatellite instability. Oncogene, 15, 17191726.[CrossRef][Web of Science][Medline]
- Risinger,J.I., Berchuck,A., Kohler,M.F., Watson,P., Lynch,H.T. and Boyd,J. (1993) Genetic instability of microsatellites in endometrial carcinoma. Cancer Res., 53, 51005103.
[Abstract/Free Full Text] - Duval,A. and Hamelin,R. (2002) Mutations at coding repeat sequences in mismatch repair-deficient human cancers: toward a new concept of target genes for instability. Cancer Res., 62, 24472454.
[Abstract/Free Full Text] - Vilkki,S., Launonen,V., Karhu,A., Sistonen,P., Vastrik,I. and Aaltonen,L.A. (2002) Screening for microsatellite instability target genes in colorectal cancers. J. Med. Genet., 39, 785789.
[Abstract/Free Full Text] - Olivero,M., Ruggiero,T., Coltella,N., Maffe,A., Calogero,R., Medico,E. and Di Renzo,M.F. (2003) Amplification of repeat-containing transcribed sequences (ARTS): a transcriptome fingerprinting strategy to detect functionally relevant microsatellite mutations in cancer. Nucleic Acids Res., 31, e33.
[Abstract/Free Full Text] - Suraweera,N., Iacopetta,B., Duval,A., Compoint,A., Tubacher,E. and Hamelin,R. (2001) Conservation of mononucleotide repeats within 3' and 5' untranslated regions and their instability in MSI-H colorectal cancer. Oncogene, 20, 74727477.[CrossRef][Web of Science][Medline]
- Pesole,G., Liuni,S., Grillo,G., Licciulli,F., Mignone,F., Gissi,C. and Saccone,C. (2002) UTRdb and UTRsite: specialized databases of sequences and functional elements of 5' and 3' untranslated regions of eukaryotic mRNAs. Update 2002. Nucleic Acids Res., 30, 335340.
[Abstract/Free Full Text] - Mazumder,B., Seshadri,V. and Fox,P.L. (2003) Translational control by the 3'-UTR: the ends specify the means. Trends Biochem. Sci., 28, 9198.[CrossRef][Web of Science][Medline]
- Cazzola,M. and Skoda,R.C. (2000) Translational pathophysiology: a novel molecular mechanism of human disease. Blood, 95, 32803288.
[Abstract/Free Full Text] - Conne,B., Stutz,A. and Vassalli,J.D. (2000) The 3' untranslated region of messenger RNA: a molecular hotspot for pathology? Nature Med., 6, 637641.[CrossRef][Web of Science][Medline]
- Mendell,J.T. and Dietz,H.C. (2001) When the message goes awry: disease-producing mutations that influence mRNA content and performance. Cell, 107, 411414.[CrossRef][Web of Science][Medline]
- Plunkett,T.A. and Ellis,P.A. (2002) CEACAM1: a marker with a difference or more of the same? J. Clin. Oncol., 20, 42734275.
[Free Full Text] - Pruitt,K.D., Tatusova,T. and Maglott,D.R. (2003) NCBI Reference Sequence project: update and current status. Nucleic Acids Res., 31, 3437.
[Abstract/Free Full Text] - Follenzi,A., Ailles,L.E., Bakovic,S., Geuna,M. and Naldini,L. (2000) Gene transfer by lentiviral vectors is limited by nuclear translocation and rescued by HIV-1 pol sequences. Nature Genet., 25, 217222.[CrossRef][Web of Science][Medline]
- Bardelli,A., Longati,P., Gramaglia,D., Basilico,C., Tamagnone,L., Giordano,S., Ballinari,D., Michieli,P. and Comoglio,P.M. (1998) Uncoupling signal transducers from oncogenic MET mutants abrogates cell transformation and inhibits invasive growth. Proc. Natl Acad. Sci. USA, 95, 1437914383.
[Abstract/Free Full Text] - Follenzi,A. and Naldini,L. (2002) Generation of HIV-1 derived lentiviral vectors. Methods Enzymol., 346, 454465.[Web of Science][Medline]
- Bakheet,T., Frevel,M., Williams,B.R., Greer,W. and Khabar,K.S. (2001) ARED: human AU-rich element-containing mRNA database reveals an unexpectedly diverse functional repertoire of encoded proteins. Nucleic Acids Res., 29, 246254.
[Abstract/Free Full Text] - Bakheet,T., Williams,B.R. and Khabar,K.S. (2003) ARED 2.0: an update of AU-rich element mRNA database. Nucleic Acids Res., 31, 421423.
[Abstract/Free Full Text] - Wilusz,C.J., Wormington,M. and Peltz,S.W. (2001) The cap-to-tail guide to mRNA turnover. Nature Rev. Mol. Cell Biol., 2, 237246.[CrossRef][Web of Science][Medline]
- Zufferey,R., Donello,J.E., Trono,D. and Hope,T.J. (1999) Woodchuck hepatitis virus posttranscriptional regulatory element enhances expression of transgenes delivered by retroviral vectors. J. Virol., 73, 28862892.
[Abstract/Free Full Text] - Zufferey,R., Dull,T., Mandel,R.J., Bukovsky,A., Quiroz,D., Naldini,L. and Trono,D. (1998) Self-inactivating lentivirus vector for safe and efficient in vivo gene delivery. J. Virol., 72, 98739880.
[Abstract/Free Full Text] - De Palma,M. and Naldini,L. (2002) Transduction of a gene expression cassette using advanced generation lentiviral vectors. Methods Enzymol., 346, 514529.[Web of Science][Medline]
- Jordan,A., Defechereux,P. and Verdin,E. (2001) The site of HIV-1 integration in the human genome determines basal transcriptional activity and response to Tat transactivation. EMBO J., 20, 17261738.[CrossRef][Web of Science][Medline]
- Hartmann,A., Zanardo,L., Bocker-Edmonston,T., Blaszyk,H., Dietmaier,W., Stoehr,R., Cheville,J.C., Junker,K., Wieland,W., Knuechel,R. et al. (2002) Frequent microsatellite instability in sporadic tumors of the upper urinary tract. Cancer Res., 62, 67966802.
[Abstract/Free Full Text] - Jansen,R.P. (2001) mRNA localization: message on the move. Nat. Rev. Mol. Cell Biol., 2, 247256.[CrossRef][Web of Science][Medline]
- Velculescu,V.E., Zhang,L., Vogelstein,B. and Kinzler,K.W. (1995) Serial analysis of gene expression. Science, 270, 484487.
[Abstract/Free Full Text] - Hilleren,P. and Parker,R. (1999) Mechanisms of mRNA surveillance in eukaryotes. Annu. Rev. Genet., 33, 229260.[CrossRef][Web of Science][Medline]
- Obrink,B. (1997) CEA adhesion molecules: multifunctional proteins with signal-regulatory properties. Curr. Opin. Cell Biol., 9, 616626.[CrossRef][Web of Science][Medline]
- Beauchemin,N., Draber,P., Dveksler,G., Gold,P., Gray-Owen,S., Grunert,F., Hammarstrom,S., Holmes,K.V., Karlsson,A., Kuroki,M. et al. (1999) Redefined nomenclature for members of the carcinoembryonic antigen family. Exp. Cell Res., 252, 243249.[CrossRef][Web of Science][Medline]
- Luo,W., Wood,C.G., Earley,K., Hung,M.C. and Lin,S.H. (1997) Suppression of tumorigenicity of breast cancer cells by an epithelial cell adhesion molecule (C-CAM1): the adhesion and growth suppression are mediated by different domains. Oncogene, 14, 16971704.[CrossRef][Web of Science][Medline]
- Izzi,L., Turbide,C., Houde,C., Kunath,T. and Beauchemin,N. (1999) cis-Determinants in the cytoplasmic domain of CEACAM1 responsible for its tumor inhibitory function. Oncogene, 18, 55635572.[CrossRef][Web of Science][Medline]
- Nakagoe,T., Sawai,T., Tsuji,T., Jibiki,M., Nanashima,A., Yamaguchi,H., Kurosaki,N., Yasutake,T. and Ayabe,H. (2001) Circulating sialyl Lewis(x), sialyl Lewis(a) and sialyl Tn antigens in colorectal cancer patients: multivariate analysis of predictive factors for serum antigen levels. J. Gastroenterol., 36, 166172.[CrossRef][Web of Science][Medline]
- Grabowski,P., Mann,B., Mansmann,U., Lovin,N., Foss,H.D., Berger,G., Scherubl,H., Riecken,E.O., Buhr,H.J. and Hanski,C. (2000) Expression of SIALYL-Le(x) antigen defined by MAb AM-3 is an independent prognostic marker in colorectal carcinoma patients. Int. J. Cancer, 88, 281286.[CrossRef][Web of Science][Medline]
- Laack,E., Nikbakht,H., Peters,A., Kugler,C., Jasiewicz,Y., Edler,L., Brummer,J., Schumacher,U. and Hossfeld,D.K. (2002) Expression of CEACAM1 in adenocarcinoma of the lung: a factor of independent prognostic significance. J. Clin. Oncol., 20, 42794284.
[Abstract/Free Full Text] - Thies,A., Moll,I., Berger,J., Wagener,C., Brummer,J., Schulze,H.J., Brunner,G. and Schumacher,U. (2002) CEACAM1 expression in cutaneous malignant melanoma predicts the development of metastatic disease. J. Clin. Oncol., 20, 25302536.
[Abstract/Free Full Text] - Stocks,S.C. and Kerr,M.A. (1993) Neutrophil NCA-160 (CD66) is the major protein carrier of selectin binding carbohydrate groups LewisX and sialyl lewisX. Biochem. Biophys. Res. Commun., 195, 478483.[CrossRef][Web of Science][Medline]
- Ergun,S., Kilik,N., Ziegeler,G., Hansen,A., Nollau,P., Gotze,J., Wurmbach,J.H., Horst,A., Weil,J., Fernando,M. et al. (2000) CEA-related cell adhesion molecule 1: a potent angiogenic factor and a major effector of vascular endothelial growth factor. Mol. Cell, 5, 311320.[CrossRef][Web of Science][Medline]
- Huber,M., Izzi,L., Grondin,P., Houde,C., Kunath,T., Veillette,A. and Beauchemin,N. (1999) The carboxyl-terminal region of biliary glycoprotein controls its tyrosine phosphorylation and association with protein-tyrosine phosphatases SHP-1 and SHP-2 in epithelial cells. J. Biol. Chem., 274, 335344.
[Abstract/Free Full Text] - Boulton,I.C. and Gray-Owen,S.D. (2002) Neisserial binding to CEACAM1 arrests the activation and proliferation of CD4+ T lymphocytes. Nat. Immun., 3, 229236.
- Neel,B.G., Gu,H. and Pao,L. (2003) The Shping news: SH2 domain-containing tyrosine phosphatases in cell signaling. Trends Biochem. Sci., 28, 284293.[CrossRef][Web of Science][Medline]
- Lizee,G., Aerts,J.L., Gonzales,M.I., Chinnasamy,N., Morgan,R.A. and Topalian,S.L. (2003) Real-time quantitative reverse transcriptase-polymerase chain reaction as a method for determining lentiviral vector titers and measuring transgene expression. Hum. Gene Ther., 14, 497507.[CrossRef][Web of Science][Medline]
- Handa,N., Nureki,O., Kurimoto,K., Kim,I., Sakamoto,H., Shimura,Y., Muto,Y. and Yokoyama,S. (1999) Structural basis for recognition of the tra mRNA precursor by the Sex-lethal protein. Nature, 398, 579585.[CrossRef][Medline]
- Erondu,N.E., Nwankwo,J., Zhong,Y., Boes,M., Dake,B. and Bar,R.S. (1999) Transcriptional and posttranscriptional regulation of insulin-like growth factor binding protein-3 by cyclic adenosine 3',5'-monophosphate: messenger RNA stabilization is accompanied by decreased binding of a 42-kDa protein to a uridine-rich domain in the 3'-untranslated region. Mol. Endocrinol., 13, 495504.
[Abstract/Free Full Text] - Chen,C.Y. and Shyu,A.B. (1995) AU-rich elements: characterization and importance in mRNA degradation. Trends Biochem. Sci., 20, 465470.[CrossRef][Web of Science][Medline]
- Cerutti,H. (2003) RNA interference: traveling in the cell and gaining functions? Trends Genet., 19, 3946.[CrossRef][Web of Science][Medline]
- Mathews,D.H., Sabina,J., Zuker,M. and Turner,D.H. (1999) Expanded sequence dependence of thermodynamic parameters improves prediction of RNA secondary structure. J. Mol. Biol., 288, 911940.[CrossRef][Web of Science][Medline]
This article has been cited by other articles:
![]() |
Z. Yuan, J. Shin, A. Wilson, S. Goel, Y.-H. Ling, N. Ahmed, H. Dopeso, M. Jhawer, S. Nasser, C. Montagna, et al. An A13 Repeat within the 3'-Untranslated Region of Epidermal Growth Factor Receptor (EGFR) Is Frequently Mutated in Microsatellite Instability Colon Cancers and Is Associated with Increased EGFR Expression Cancer Res., October 1, 2009; 69(19): 7811 - 7818. [Abstract] [Full Text] [PDF] |
||||
![]() |
S. Baranovskaya, Y. Martin, S. Alonso, K. L. Pisarchuk, M. Falchetti, Y. Dai, S. Khaldoyanidi, S. Krajewski, I. Novikova, Y. S. Sidorenko, et al. Down-regulation of Epidermal Growth Factor Receptor by Selective Expansion of a 5'-End Regulatory Dinucleotide Repeat in Colon Cancer with Microsatellite Instability Clin. Cancer Res., July 15, 2009; 15(14): 4531 - 4537. [Abstract] [Full Text] [PDF] |
||||
![]() |
T. Hienonen, H. Sammalkorpi, S. Enholm, P. Alhopuro, T. D. Barber, R. Lehtonen, N. N. Nupponen, H. Lehtonen, R. Salovaara, J.-P. Mecklin, et al. Mutations in Two Short Noncoding Mononucleotide Repeats in Most Microsatellite-Unstable Colorectal Cancers Cancer Res., June 1, 2005; 65(11): 4607 - 4613. [Abstract] [Full Text] [PDF] |
||||
| ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||





