Published online 24 November 2005
Article |
A thermodynamic model of transcriptome formation
Faculty of Bioresource Sciences, Akita Prefectural University Shimoshinjyo Nakano, Akita 010-0195, Japan
*Tel: +81 18 872 1603; Fax: +81 18 872 1677; Email: konishi{at}akita-pu.ac.jp
Received October 2, 2005. Revised November 2, 2005. Accepted November 2, 2005.
| ABSTRACT |
|---|
|
|
|---|
The genome supplies information on both the quality and quantity of the transcriptome. However, as it remains unknown how a cell determines transcript levels from the genome sequences, despite comprehensive knowledge of the cellular components involved, the quantity information held by the genome cannot as yet be derived from nucleotide sequences. The model presented here explains on a thermodynamic basis how the components decode the genome to form and maintain the transcriptome. The model describes the level of a transcript as a pseudo-equilibrium between velocities of synthesis and degradation, both of which are controlled by sequence-specific interactions between protein factors and nucleic acids. Each of the transcript levels can be described by a single equation expressing a function of the activity concentrations of the protein factors. Quantitative information in the genome can thus be transformed into constants determined from the nucleotide sequences. Using this model, the transcriptome can be traced back to the protein factors and the state of chromosome packaging. The total description of transcript levels allows the model to be verified through comparison of derived hypotheses with comprehensive measurements of the transcriptome. The hypotheses thus derived in the present study are well supported by experimental microarray data, confirming the appropriateness of the model.
| INTRODUCTION |
|---|
|
|
|---|
Organization of the large volumes of experimental data acquired to date requires an appropriate model to serve as a framework for analysis. The ability of such a model to integrate the data is critical and will affect the accuracy and potential utility of data comparisons. Objectivity is required in the model, particularly when the results are to be shared among researchers, such as for transcriptome analyses. Many transcriptome studies have employed versatile theoretical models for each step of data analysis, including normalization, noise treatment and data interpretation that includes validations for expressional changes (15). Unfortunately, in exchange for such versatility, the objectivity of the models is reduced. Some models even rely on arbitrary calculations to eliminate the linear responses of data (5). In general, poor objectivity or linearity prevents meaningful comparisons between multiple experiments, giving rise to inconsistencies among sets of analytical results.
This report presents an objective model that explains the determination of transcript levels based on the relationship between the genome and the cellular components. Objectivity is achieved by representing the biochemical processes in a cell in terms of thermodynamics, adopting a similar approach to other bottom-up research on a part of transcriptional control (611). The model sees the level of a transcript as a balance between velocities of synthesis and degradation, both of which are controlled by interactions between nucleotide sequences and protein factors, which are known to affect the levels of transcripts (1214). The objectivity of the model allows it to be verified as an appropriate framework for analysis through comparison of derivable hypotheses with experimental results.
| MODEL OVERVIEW |
|---|
|
|
|---|
The basis of the proposed model is the existence of a quasi-equilibrium between the synthesis and degradation of each transcript. A cell forms a closed system for mRNA; transcripts will not go out from or come into the cell. This means that concentration of a transcript accumulated in a cell is determined by the velocity of synthesis and the velocity of degradation within the cell. It has been shown that each transcript has a unique half-life, i.e. the velocity of degradation is linear with respect to the concentration of the transcript (12). Consequently, the pseudo-equilibrium can be applied (6), and state of the equilibrium determines the level of the transcript. Although the half-life and velocity of synthesis can change frequently within each cell, giving rise to transient departures from this pseudo-equilibrium estimation, these deviations are not expected to be large and will rapidly attenuate in accordance with the half-life of the mRNA. This pseudo-equilibrium approximately describes the balance between the velocity of transcription (vs) and the velocity of degradation (vd) for a particular gene g, as follows.
![]() | (1) |
Regulation of mRNA synthesis
The rate-limiting step for mRNA synthesis is likely to occur at the onset of RNA elongation. For elongation to begin, the RNA polymerase II bound at the promoter must be hyperphosphorylated (15). Each phosphorylation step is energy-dependent, and should thus be restricted. This step cannot occur in parallel in the cell, and is unique to each gene. The velocity at this step can be described by considering a rapid pre-equilibrium between the binding and dissociation of RNA polymerase II with the promoter:
![]() | (2) |
At a certain frequency, the bound polymerase obtains the potential energy required to overcome the energy barrier, and then initiates transcription. Consequently, vs can be described as a mathematical expectation determined by the concentration of the promoterpolymerase complex ([complex]) and the frequency (height of the barrier in Figure 1), as given by
![]() | (3) |
|
The concentration of the promoterpolymerase complex can be expressed as an equilibrium constant (Kp), as follows.
![]() | (4) |
![]() | (5) |
![]() | (6) |
This pre-equilibrium condition thus tends to favor the dissociated state. This assumption is introduced to allow the state of pre-equilibrium to determine the velocity in Equation 3; only under this condition, the pre-equilibrium can define the frequency or length of time of complex formation, during which the polymerase is susceptible to the synthesis initiation (Figure 1).
The equilibrium constant Kp is determined by regulators, which are protein factors that bind around the promoter in a sequence-specific manner. Each of the regulators contacts the polymerase, affecting the equilibrium constant with a certain Gibbs free energy (
). The equilibrium constant is determined by the composite of the free energy as follows:
![]() | (7) |
Here, R is the gas constant and T is the absolute temperature. The Gibbs free energy can be further described in terms of regulators, which are in equilibrium between binding and dissociation with particular cis elements. Using the activity concentration of free regulators ([regulator]), the Gibbs free energy can thus be rewritten as
![]() | (8) |
The frequency with which a bound polymerase enters the elongation state is affected by the stimulation of the mediator complex by enhancer-binding regulators (Figure 1). The complex phosphorylates the polymerase, changing the charge of the enzyme (15), which has been prevented from elongation by Coulombic force (13,14). The reduction of Coulombic force lowers the Arrhenius activation energy (Ep) for the initiation of elongation. According to the Arrhenius equation, the parameter ks in Equation 3 can be represented by the composition of activation energies related to enhancer binding protein factors, as given by
![]() | (9) |
![]() | (10) |
![]() | (11) |
Regulation of mRNA degradation
The rate limiting for degradation is probable to occur in the shortening or removal of the poly(A) tail (Figure 1). This slow step is immediately followed by decapping and then rapid removal of nucleotides from both termini. This sequential process is considered to be a common mechanism of degradation for properly synthesized mRNAs. The slow step is catalyzed by motif-specific nucleases, i.e. poly(A)-specific exonucleases and site-specific endonucleases. Inhibitor proteins that bind the same motifs, competing with the RNases, are also known. Each transcript may have a motif that determines the transcript's stability (12,16).
The velocity of degradation is thus considered to be proportional to the concentration of the transcript of the gene ([mRNAg]), as given by
![]() | (12) |
![]() | (13) |
![]() | (14) |
Accumulated amount of each transcript
Equations 1 and 1113 lead to the following relationship between the energy parameters and the concentration of the transcript:
![]() | (15) |
, Ep and Ed are determined by the activity concentrations of the protein factors bound to specific nucleotide sequences of DNA or RNA as described in Equations 8, 10 and 14, respectively. | HYPOTHESES AND VERIFICATION |
|---|
|
|
|---|
Lognormality in data distribution
For verification of the proposed model, a number of hypotheses derived from the model were tested against the relevant characteristics of measured transcriptome data. The first feature of transcriptome data considered is the statistical distribution of concentrations of transcripts in a cell. The model predicts that the protein factors (Equations 8, 10 and 14) will be basically independent of other factors, while the distributions of these factors with respect to
,
Ep and
Ed will have common characteristics owing to the commonality of the physical bases. The composites of the energies can thus be considered to be the sums of independent, identically distributed variables. According to the central limit theorem, the distribution of (
) in Equation 15 will be normal, leading to a lognormal distribution of [mRNAg]. This hypothesis can be verified by comparison with comprehensive transcriptome data. Expressional microarray data for any organism on any analytical platform follows a three-parameter lognormal distribution (17), with an additional third parameter for compensation of signal background. This three-parameter distribution has been observed repeatedly in hundreds of experimental datasets on different platforms, providing strong support for this model-derived hypothesis.
Stability in scale parameter
The model also predicts a stable characteristic for the scale parameter of the log[mRNAg] distribution. According to Equation 15, the scale parameter is defined by the distributions of energetic parameters. These distributions would be stable and common because all can be considered to represent the sums of variables. Changes in the parameters therefore indicate a total shift in proteinnucleic acid interactions, and such shifts will require changes in the conditions of the cell, such as salt content or pH. However, the conditions of the cell will remain stable owing to the homeostatic character of cells in general. The stability of the scale parameter has been observed experimentally (17), supporting this prediction. The stability and mode of distribution are checked routinely in experiments as part of the microarray normalization process.
Multiplicative effects to [mRNA]
Another hypothesis derivable from the proposed model is the multiplicative change in transcript concentration by each protein factor. According to Equations 8, 10, 14 and 15, additive changes in the activity concentrations of the factors will cause additive changes in energy, which will in turn cause multiplicative changes in [mRNA]. Generally, a stimulus applied to a cell induces a change in the activity concentrations of certain protein factors, which in turn affects the concentrations of certain transcripts. These effects can be measured by conducting paired microarray experiments in which the changes are measured as ratios (17). If the hypothesis is correct, the ratios resulting from the application of a pair of simultaneous stimuli will coincide with the products of the ratios provoked by each individual stimulus. Such combinations of measurements have been reported in a series of experiments on the effects of environmental changes in yeast (18). In these experiments, a linear relationship was obtained for numerous combinations of stimuli following the products of the individual ratios (Figure 2A). It should be noted that the slight overestimation of the product can be attributed to the effect of medium replacement (18), which is counted twice in the product calculations. This correlation could not have been obtained by chance or systematic error, since the replacement of any of the ratios in a combination with another in a different time phase (i.e. a different stage of response) almost eliminates the relationship (Figure 2B). Thus, the multiplicative effect of factors predicted by the present model is supported by experimental findings.
|
| DISCUSSION |
|---|
|
|
|---|
The proposed model describes transcriptome formation in a cell on the basis of thermodynamic expressions. The model is expected to provide both the fidelity of a bottom-up approach and the objectivity required for evaluating its appropriateness. The observed characteristics of transcriptome data support the model. Although the verifications presented above are somewhat indirect, the objectivity of the proposed model should allow many other types of verification using transcriptome analyses.
The model can be used to identify the quantitative aspects of genome information giving the transcriptome. The genome is presented by the model in terms of two series of constants (equilibrium and activity constants) as a decoded form of the quantitative information. The constants in a genome locus can be determined through in vitro kinetic experiments. With accumulation of such measured data, it will become possible to predict the values by simulations in silico.
The nucleotide sequences do not encode all of the information necessary to reproduce the chromatin structure or factor activity concentrations, giving rise to the observed variety among transcriptomes. The structure is controlled by chemical modifications of histones (19), which are directed by covalent modification of the DNA. Tight restrictions on the changes in the nucleosome effectively preserve the modifications of histones at the locus when the genome is replicated (20). Alterations of DNA may instead be introduced during the development of multicellular organisms (21). Accordingly, such modifications seem to play an important role in cell differentiation, causing static changes in the transcriptome. Thus, the value of ag is expected to be relatively stable.
This model provides objectivity for transcriptome data analyses. For example, the synergy of additive effects, as commonly observed in combinations of stimuli or the artificial induction of factors (9,10), can be assessed in a more objective manner by the proposed model as the product of effects (Figure 2a). This shows that the protein factors that had changed the expression levels within the short time course experiment practically worked independent to each other. The independence supports the assumption that the pre-equilibrium condition of many protein factors tends to favor the dissociated state; any two factors that have neighboring or overlapping binding sites rarely interfere with each other, since chances that the factors hit with each others at the site are rather small. Of course, this observation obtained by a transcriptome analysis of yeast does not deny existence of dependences, which includes interference or stabilizing, between factors. For example, some constitutive factors may tend to favor to the binding state; such constitutive factors can bind to certain sites and affect other factors' binding for a long period, producing a tendency of the corresponding genes' expression levels. However, a constitutive factor would influence only limited genes and factors, since it can affect to molecules that are reachable at its binding state. Additionally, if dependences are common to factors' effects, they might cause contradict to the required condition of the central limit theorem, providing conflicts with the observed distribution of transcriptome data.
The transcriptome is represented in the model as a series of functions of protein factor activity concentrations (Equations 8, 10, 14 and 15). Protein factors can be expected to be dispersed uniformly in the corresponding cellular compartments owing to the rapid diffusion typical of large soluble molecules (22). Consequently, the activity concentration of a protein factor is probable to common to the genes. Equation 15 can thus be generalized to any gene, providing a series of simultaneous equations consisting of all genes in a genome (each equation forms a row in the spreadsheet shown in Figure 3). The low concentrations and substantial variations of specific activities owing to chemical modifications render the activity concentrations difficult to measure at present. However, if the activity and equilibrium constants k and K and local activity ag are available, the values can be derived from transcriptome data by solving the set of simultaneous equations. The set of activity concentrations can then be used to trace the changes in the transcriptome back to changes in each factor. The effect of changes in the factors can also be estimated by substituting activity concentration values, allowing the proper set of stimuli required to obtain the desired transcriptome to be predicted by simulation.
|
| METHODS |
|---|
|
|
|---|
Estimation of simultaneous stimuli effect
Data for heat shock, hypo-osmotic shock and the simultaneous application of both stimuli (18) were calculated by a parametric method (17). The combined effect of the two stimuli was estimated from the individual effects (measured in experiments 7547 and 2555) by multiplying the obtained ratios. The logarithms of the estimated ratios were plotted against the log ratios for experimental data (experiment 4787; Figure 2A). For comparison, data from experiment 4787 was exchanged with that for experiment 4786, representing data for a different time point in an identical experiment (Figure 2B). (Calculated data as well as parameters and raw data are provided in Supplementary Data sheet.)
| SUPPLEMENTARY DATA |
|---|
|
|
|---|
Supplementary Data are available at NAR Online.
| ACKNOWLEDGEMENTS |
|---|
Funding to pay the Open Access publication charges for this article was provided by subsidy for scientific research of Akita Prefectural University.
Conflict of interest statement. None declared.
| REFERENCES |
|---|
|
|
|---|
- Eisen, M., Spellman, P., Brown, P., Botstein, D. (1998) Cluster analysis and display of genome-wide expression patterns Proc. Natl Acad. Sci. USA, 95, 1486314868
[Abstract/Free Full Text] . - Friedman, N., Linial, M., Nachman, I., Pe'er, D. (2000) Using Bayesian Network to analyze expression data J. Comput. Biol., 7, 601620[CrossRef][Web of Science][Medline] .
- Strogatz, S. (2001) Exploring complex networks Nature, 410, 268276[CrossRef][Medline] .
- Lee, T., Rinaldi, N., Robert, F., Odom, D., Bar-Joseph, Z., Gerber, G., Hannett, N., Harbison, C., Thompson, C., Simon, I., et al. (2002) Transcriptional regulatory networks in Saccharomyces cerevisiae Science, 298, 799804
[Abstract/Free Full Text] . - Quackenbush, J. (2003) Microarray data normalization and transformation Nature Genet., 32, 496501 .
- Hargrove, J.L., Hulsey, M.G., Beale, E.G. (1991) The kinetics of mammalian gene expression Bioessays, 13, 667674[CrossRef][Medline] .
- Herschlag, D. and Johnson, F. (1993) Synergism in transcriptional activation: a kinetic view Genes Dev., 7, 173179
[Free Full Text] . - Chi, T., Lieberman, P., Ellwood, K., Carey, M. (1995) A general mechanism for transcriptional synergy by eukaryotic activators Nature, 377, 254257[Medline] .
- Carey, M. (1998) The enhanceosome and transcriptional synergy Cell, 92, 58[CrossRef][Web of Science][Medline] .
- Wang, J., Ellwood, K., Lehman, A., Carey, M., She, Z. (1999) A mathematical model for synergistic eukaryotic gene activation J. Mol. Biol., 286, 315325[CrossRef][Web of Science][Medline] .
- Tsujikawa, L., Tsodikov, O., deHaseth, P. (2002) Interaction of RNA polymerase with forked DNA: evidence for two kinetically significant intermediates on the pathway to the final complex Proc. Natl Acad. Sci. USA, 19, 34933498 .
- Ross, J. (1996) Control of messenger RNA stability in higher eukaryotes Trends Genet., 12, 171175[CrossRef][Web of Science][Medline] .
- Myers, L.C. and Kornberg, R.D. (2000) Mediator of transcriptional regulation Ann. Rev. Biochem., 69, 729749[CrossRef][Web of Science][Medline] .
- Naar, A.M., Lemon, B.D., Tjian, R. (2001) Transcriptional coactivator complexes Ann. Rev. Biochem., 70, 475501[CrossRef][Web of Science][Medline] .
- Palancade, B. and Bensaude, O. (2003) Investigating RNA polymerase II carboxyl-terminal domain (CTD) phosphorylation Eur. J. Biochem., 270, 38593870[Web of Science][Medline] .
- Doyle, G., Betz, N., Leeds, P., Fleisig, A., Prokipcak, R., Ross, J. (1998) The c-myc coding region determinant-binding protein: a member of a family of KH domain RNA-binding proteins Nucleic Acids Res., 26, 50365044
[Abstract/Free Full Text] . - Konishi, T. (2004) Three-parameter lognormal distribution ubiquitously found in cDNA microarray data and its application to parametric data treatment BMC Bioinformatics, 5, 5[Medline] .
- Gasch, A., Spellman, P., Kao, C., Carmel-Harel, O., Eisen, M., Storz, G., Botstein, D., Brown, P. (2000) Genomic expression programs in the response of yeast cells to environmental changes Mol. Biol. Cell, 11, 42414257
[Abstract/Free Full Text] . - Strahl, B. and Allis, C. (2000) The language of covalent histone modifications Nature, 403, 4145[CrossRef][Medline] .
- Becker, P. and Horz, W. (2002) ATP-dependent nucleosome remodeling Annu. Rev. Biochem., 71, 247273[CrossRef][Web of Science][Medline] .
- Bird, A. (2002) DNA methylation patterns and epigenetic memory Genes Dev., 16, 621
[Free Full Text] . - Verkman, A. (2002) Solute and macromolecule diffusion in cellular aqueous compartments Trends Biochem. Sci., 27, 2733[CrossRef][Web of Science][Medline]
.
This article has been cited by other articles:
![]() |
V. B. Teif General transfer matrix formalism to calculate DNA-protein-drug binding in gene regulation: application to OR operator of phage {lambda} Nucleic Acids Res., June 28, 2007; 35(11): e80 - e80. [Abstract] [Full Text] [PDF] |
||||
| ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||


















