Difference between revisions of "Team:OUC-China/polycistron"

Revision as of 22:31, 17 October 2018

Team OUC-China: Main

polycistron

In our miniToe polycistron system, we build a coupled transcription-translation model considering several events in prokaryotes to get a deep understanding of polycistron. Then we simplify this model into a more flexible model to predict how the miniToe structure changes the relative expression level in polycistron.

1.Current model for polycistron expression system

Before we modeling our system, we firstly give a short review on model of polycistron expression system. For the common model, they believe that the mRNA of different cistrons in different positions has the same abundance and if they have the same translation rate, the protein which produced by different cistrons is equal. But in the truth, the natural polycistron has many strategies in regulate the protein abundance such as the overleap or hairpin in 3’. And for the synthetic polycistron, many things just like the transcription polarity and translation coupling paly an important roles. Many of them control the protein by control the mRNA abundance. So a more precise model for polycistron is needed.

2.The coupled transcription-translation model for monocistron

In this part we will present a coupled transcription-translation model for the polycistron in prokaryotes. The model is based on the Andre S Riberio’s work, he presents a coupled transcription-translation model for monocistron. We have done some works to extend the model to use in the polycistron.

2.1 The origin model for monocistron

The origin model build by Andre S Riberio is a stochastic delayed differential equation model in sequence-level, and it can be divided into two mian part: the transcriptional part and the translational part. The transcriptional part can be described by the following events:
(1)Initiation and promoter complex formation:

\Pr o + R N A p \overset{}{\to} R N A p \cdot \Pr o (τ_{O C})

(2)Promoter clearance:

R N A p \cdot \Pr o + U_{[1, △ R N A p + 1]} \overset{}{\to} O_{1} + \Pr o

(3)Elongation:

A_{n} + U_{n + △ R N A p + 1} \overset{}{\to} O_{n + 1} + U_{n - △ R N A p} + U^{R}_{n - △ R N A p}

(4)Activation:

O_{n} \overset{}{\to} A_{n}

(5)Pausing:

\begin{array}{l} O_{n} \overset{k_{p}}{\to} O_{n_{p}} \\ O_{n} \overset{1 / τ_{p}}{\leftarrow} O_{n_{p}} \end{array}

(6)Pause release due to collision:

O_{n_{p}} + A_{n - 2 △ R N A p - 1} \overset{0.8 k_{m}}{\to} O_{n} + A_{n - 2 △ R N A p - 1} i

(7) Pause release by collision

O_{n_{p}} + A_{n - 2 △ R N A p - 1} \overset{0.2 k_{m}}{\to} O_{n} + A_{n - 2 △ R N A p - 1_{p}}

(8)Arrest:

\begin{array}{l} O_{n} \overset{k_{a r}}{\to} O_{n_{a r}} \\ O_{n} \overset{1 / τ_{a r}}{\leftarrow} O_{n_{a r}} \end{array}

(9)Editing:

\begin{array}{l} O_{n} \overset{k_{e c}}{\to} O_{n_{c o r r e c t i n g}} \\ O_{n} \overset{1 / τ_{c}}{\leftarrow} O_{n_{c o r r e c t i n g}} \end{array}

(10)Premature termination:

O_{n} \overset{k_{p r e}}{\to} R N A p + U_{[n - △ R N A p, n + △ R N A p]}

(11)Pyrophosporolysis:

O_{n} + U_{n - △ R N A p - 1} + U^{R}_{n - △ R N A p - 1} \overset{k_{p y r o}}{\to} O_{n - 1} + U_{n + △ R N A p - 1}

(12)Completion:

A_{n_{l a s t}} \overset{k_{f}}{\to} R N A p + U_{[n_{l a s t}, n_{l a s t} - △ R N A p]}

(13) mRNA degradation:

A_{n_{l a s t}} \overset{k_{d r}}{\to} ϕ

In the 13 reaction equations above, the Pro stands for the promoter region, the RNAp is RNA polymerase while the Pro-RNAp stands for the promoter which is occupied by the RNA polymerase. An, On and Un are standing for the nth nucleotides in the stage of activated, occupied and unoccupied. U[strat,end] stands for the nucleotides in the range from start number to end number in index. Onp, Onar and Oncorrecting represents the a paused, arrested and error correcting at position n. And due to the temporal steric, the RNAp will occupied about () nucleotides. denotes transcribed ribonucleotides which are free.

The translation part can be described by the following events:
(1)Initiation:

R i b + U^{R}_{[1, n + △ R i b + 1]} \overset{k_{t r a n s_i n i t}}{\to} O_{1}^{R} + R i b^{R}

(2)Stepwise translocation:

A_{n - 3}^{R} + U^{R}_{[n + △ R i b - 3, n + △ R i b - 1]} \overset{k_{t m}}{\to} O_{n - 2}^{R}

O_{n - 2}^{R} \overset{k_{t m}}{\to} O_{n - 1}^{R}

O_{n - 1}^{R} \overset{k_{t m}}{\to} O_{n}^{R} + U^{R}_{[n + △ R i b - 2, n + △ R i b]}

(3)Activation:

O_{n}^{R} \overset{k_{t r a n s [A, B, C]}}{\to} A_{n}^{R}

(4)Back-translocation:

O_{n - 1}^{R} + U^{R}_{[n + △ R i b - 2, n - △ R i b]} \overset{k_{b t}}{\to} A_{n - 3}^{R} + U^{R}_{[n + △ R i b - 3, n + △ R i b - 1]}

(5)Drop-off:

O_{n}^{R} \overset{k_{d r o p}}{\to} Rib+ U^{R}_{[n - △ R i b, n + △ R i b]}

(6)Trans-translation:

R \overset{k_{u}}{\to} [R i b^{R}] R i b

(7)Elongation completion

A^{R}_{n_{l a s t}} \overset{k_{u}}{\to} [R i b^{R}] R i b + U_{[n_{l a s t}, n_{l a s t} - △ R i b]}^{R} + P_{prem}

(8)Folding and activation:

P_{prem} \overset{k_{f o l d}}{\to} P

(9)Protein degradation

P \overset{k_{d e c}}{\to} ϕ

In the 8 reaction equations above, the Rib stands for the free ribosome while the RibR represents to the ribosome which is binding to the RNA chain.

△ R N A p

represents to the footprint of ribosome. Every ribosome will occupied about (

2 △ R ib+1

) nucleotides. URN ,Orn and Arn are the ribonucleic equivalent fo Un, On and An in transcriptional part, which has similarity meaning.

2.2 The model we improve for the polycistron

Now we have known the coupled transcription-translation model for bi-cistron, which is the simplest polycistron.
In order to extend it to use in the bi-cistron, we simplify add another translational part into the old model. So our new model have one translational part and two translational parts for two CDSs in the bi-cistron. Now the most important things are to build the relationship between two translational part.
The first thing we need to reconsidering that is to recalculate the initiation translation rate for the second CDS because this parameter is influence by translation coupling,
For the translate rates of the second CDS,

k_{2}

, can be calculated by the following formula in statistical thermodynamics:

k_{2} \propto r_{r e i n i t i a t i o n}^{(2)} + e^{- β \cdot Δ G_{t o t a l}^{(2)}}

The formula is divided into two parts to describe the transcript coupling. The first part,

r_{r e i n i t i a t i o n}^{(2)}

, showing that the ribosome terminates the translation of upstream CDS then dissociate and re-initiate the translation of downstream CDS, is called the ribosome re-initiation. The second part,

e^{- β \cdot Δ G_{t o t a l}^{(2)}}

, showing that the elongate along the upstream CDS and unfolding the mRNA structure which increase the expression of the upstream CDS, is called de novo ribosome initiation. The two kinds of initiation can be seen in the Fig.2-1.

Fig.2-1 two kinds of initiation

The first part in formula can be calculated by the following formula:

r_{r e i n i t i a t i o n}^{(2)} = k_{p} \cdot k_{r e i n i t i a t i o n} (d_{1, 2}) \cdot e^{- β \cdot Δ G_{t o t a l}^{(1)}}

Where the

k_{r e i n i t i a t i o n} (d_{1, 2})

refers to the intergenic distance dependence and the

k_{p}

refers to the proportionality constant between the ribosome assemble rate and the translation initiation rate.
For the

k_{r e i n i t i a t i o n} (d_{1, 2})

is proved that can be calculate by the formula following:

k_{r e i n i t i a t i o n} (d) = {\begin{cases} 0 .0072 \pm 0 .0048 \overset{}{} \overset{}{} \overset{}{} \overset{}{} \overset{}{} 0 \leq d \leq 25 \\ 0 .0220 \overset{}{} \overset{}{} \overset{}{} \overset{}{} \overset{}{} \overset{}{} \overset{}{} \overset{}{} \overset{}{} \overset{}{} \overset{}{} \overset{}{} d = - 4 \\ 0 .0072+0 .0004 \cdot (d + 10) \overset{}{} \overset{}{} - 10 \leq d \leq - 25 \end{cases}

Where the

d = x_{s t a r t} - x_{s t o p} - 3

,

x_{s t a r t}

refers to the first nucleotides in

j

th CDS’s start codon while the

x_{s t o p}

refers to first nucleotides in th CDS ‘s stop codon. And it also points that the

k_{p} =10

.
The second part in formula can be calculated by the following formula: Where the refers to the free energy of folding for all inhibitory RNA structure that block the standby site, overlap with SD sequence, spacer region or the downstream footprint region of ribosome. The refers to the free energy of all the other RNA structure except the inhibitory RNA structure. And the can be calculated by the following formula:

Δ G_{t o t a l}^{(2)} = Δ G_{m R N A - r R N A} + Δ G_{s p a c i n g} + Δ G_{s t a r t} + Δ G_{s \tan d b y} - Δ G_{n o n c o u p l i n g} - Δ G_{c o u p l i n g} F_{c o u p l i n g}

Where the

Δ G_{c o u p l i n g}

refers to the free energy of folding for all inhibitory RNA structure that block the standby site, overlap with SD sequence, spacer region or the downstream footprint region of ribosome. The

Δ G_{n o n c o u p l i n g}

refers to the free energy of all the other RNA structure except the inhibitory RNA structure. And the

F_{c o u p l i n g}

can be calculated by the following formula:

F_{c o u p l i n g} = \frac{1}{1 + C \cdot e^{β \cdot Δ G_{t o t a l}^{(1)}}}

Which the

C

is the ribosome-assisted unfolding coefficient.

C = 0.81

in our study.
The

Δ G_{t o t a l}^{(1)}

refers to the total binding free energy between the ribosome and 5’UTR, according to the equation:

Δ G_{t o t a l}^{(1)} = Δ G_{m R N A - r R N A} + Δ G_{s p a c i n g} + Δ G_{s t a r t} + Δ G_{s \tan d b y} - Δ G_{m R N A}

The

Δ G_{m R N A - r R N A}

refers to the free energy of folding for mRNA-rRNA complex, which is negative.
The

Δ G_{s p a c i n g}

refers to the free energy for the non-optimal physical distance between SD sequence and the start codon, which is positive.
The

Δ G_{s t a r t}

refers to the free energy for

t R N A^{f M E T}

-start codon complex, which is negative.
The

Δ G_{s \tan d b y}

refers to the free energy for the , which is negative.
The

Δ G_{m R N A}

refers to the free energy of folding for 5’UTR, which is negative.
All these energies can be calculated by the NUPACK suit of energy with mFold 3.0 RNA energy parameter. And five energy can be seen vividly in the Fig.2-2.

Fig.2-2 the five part of the total binding free energy

The second thing we need to reconsidering that is to recalculate the premature termination rate for the second CDS because this parameter is influence by transcription polarity. Even there is only one transcriptional part, the premature termination rate of the second CDS is higher than the first CDS because the Rho factor will bind to RNA in the intergenic regions to cause rho- dependent termination. And almost 80% of premature termination is caused by rho-dependent termination. We are going to use the queening theory to build the an model to describe it haven’t finish yet.

2.3 Explore the mRNA abundance using the model

We then constructed a polycistron which has two LacZ gene in the model. And then correcting the premature termination and premature termination rate of the second CDS. The other parameters can be find in Andre S Riberio’s work. And carrying out the simulation in StochPy and SGNSim. We get the distribution of mRNA at t=100s and t=600s in Fig2-3 and Fig 2-4.

Fig.2-3

Fig.2-4

The mRNA distribution in 100s stands for the origin time of mRNA distribution while the mRNA distribution in 600s stans for the finally state mRNA

3.A flexible model for polycistron

We have got some points from the coupled transcription-translation model: the mRNA of different cistrons in different positions has different abundance. This phenomenon may be caused by premature termination or something others, and this will result in the different protein level. And different protein level as caused by the different translation time.
The coupled transcription-translation model is too complex and hard to operate. Here we propose a framework to explain the more realistic situation happened in the polycistron, but still hope it also can keep the simple forms.

Fig.3-1 the organization of operon

Considering a polycistron like the Fig.3-1 shows, it contains a promoter, a 5’UTR and two CDSs which is separated by the intergenic regions, and following a terminator in the end.
The reaction can be described following four main steps:
(1)The transcription of two CDSs region:

\begin{array}{l} \overset{k_{1}}{\to} m R N A_{1} \\ \overset{k_{2}}{\to} m R N A_{2} \end{array}

Here we divided the polycistron into two part with different transcription paraments

k_{1}

,

k_{2}

to deal with the problem of different mRNA abundance due to the premature termination. The two paraments

k_{1}

,

k_{2}

is totally a sequence-dependent as we discuss before.

(2)The degradation of mRNA

\begin{array}{l} m R N A_{1} \overset{k d_{1}}{\to} ϕ \\ m R N A_{2} \overset{k d_{2}}{\to} ϕ \end{array}

The degradation of RNA also divided into two parts with different transcription paraments

k_{d 1}

,

k_{d 2}

to deal with the problem of different translational time for two mRNA. Each

k_{d i}

can be divided into two part:

k_{d i} = k_{d} - k_{recoup}

The

k_{recoup}

denotes the recoup item for the translational time difference and the

k_{d}

denotes the common degradation rate of mRNA.
(3)The translation of protein.

\begin{array}{l} m R N A_{1} \overset{k p_{1}}{\to} m R N A_{1} {+ Protein}_{1} \\ m R N A_{2} \overset{k p_{2}}{\to} m R N A_{2} {+ Protein}_{2} \end{array}

In the translation of two proteins, the two paraments used to describe the translation also should be different considering the translation coupling. And we build a thermodynamic model to calculated it before in the 2.2, now we will give a conclusion for it.
For the translate rates of the second CDS,

k_{2}

, can be calculated by the following formula in statistical thermodynamics:

k_{2} \propto r_{r e i n i t i a t i o n}^{(2)} + e^{- β \cdot Δ G_{t o t a l}^{(2)}}

The formula is divided into two parts to describe the transcript coupling. The first part,

r_{r e i n i t i a t i o n}^{(2)}

, showing that the ribosome terminates the translation of upstream CDS then dissociate and re-initiate the translation of downstream CDS, is called the ribosome re-initiation. The second part,

e^{- β \cdot Δ G_{t o t a l}^{(2)}}

, showing that the elongate along the upstream CDS and unfolding the mRNA structure which increase the expression of the upstream CDS, is called de novo ribosome initiation. The two kinds of initiation can be seen in the Fig.2-1.

Fig.2-1 two kinds of initiation

The first part in formula can be calculated by the following formula:

r_{r e i n i t i a t i o n}^{(2)} = k_{p} \cdot k_{r e i n i t i a t i o n} (d_{1, 2}) \cdot e^{- β \cdot Δ G_{t o t a l}^{(1)}}

Where the

k_{r e i n i t i a t i o n} (d_{1, 2})

refers to the intergenic distance dependence and the

k_{p}

refers to the proportionality constant between the ribosome assemble rate and the translation initiation rate.
For the

k_{r e i n i t i a t i o n} (d_{1, 2})

is proved that can be calculate by the formula following:

k_{r e i n i t i a t i o n} (d) = {\begin{cases} 0 .0072 \pm 0 .0048 \overset{}{} \overset{}{} \overset{}{} \overset{}{} \overset{}{} 0 \leq d \leq 25 \\ 0 .0220 \overset{}{} \overset{}{} \overset{}{} \overset{}{} \overset{}{} \overset{}{} \overset{}{} \overset{}{} \overset{}{} \overset{}{} \overset{}{} \overset{}{} d = - 4 \\ 0 .0072+0 .0004 \cdot (d + 10) \overset{}{} \overset{}{} - 10 \leq d \leq - 25 \end{cases}

Where the

d = x_{s t a r t} - x_{s t o p} - 3

,

x_{s t a r t}

refers to the first nucleotides in

j

th CDS’s start codon while the

x_{s t o p}

refers to first nucleotides in th CDS ‘s stop codon. And it also points that the

k_{p} =10

.
The second part in formula can be calculated by the following formula: Where the refers to the free energy of folding for all inhibitory RNA structure that block the standby site, overlap with SD sequence, spacer region or the downstream footprint region of ribosome. The refers to the free energy of all the other RNA structure except the inhibitory RNA structure. And the can be calculated by the following formula:

Δ G_{t o t a l}^{(2)} = Δ G_{m R N A - r R N A} + Δ G_{s p a c i n g} + Δ G_{s t a r t} + Δ G_{s \tan d b y} - Δ G_{n o n c o u p l i n g} - Δ G_{c o u p l i n g} F_{c o u p l i n g}

Where the

Δ G_{c o u p l i n g}

refers to the free energy of folding for all inhibitory RNA structure that block the standby site, overlap with SD sequence, spacer region or the downstream footprint region of ribosome. The

Δ G_{n o n c o u p l i n g}

refers to the free energy of all the other RNA structure except the inhibitory RNA structure. And the

F_{c o u p l i n g}

can be calculated by the following formula:

F_{c o u p l i n g} = \frac{1}{1 + C \cdot e^{β \cdot Δ G_{t o t a l}^{(1)}}}

Which the

C

is the ribosome-assisted unfolding coefficient.

C = 0.81

in our study.
The

Δ G_{t o t a l}^{(1)}

refers to the total binding free energy between the ribosome and 5’UTR, according to the equation:

Δ G_{t o t a l}^{(1)} = Δ G_{m R N A - r R N A} + Δ G_{s p a c i n g} + Δ G_{s t a r t} + Δ G_{s \tan d b y} - Δ G_{m R N A}

The

Δ G_{m R N A - r R N A}

refers to the free energy of folding for mRNA-rRNA complex, which is negative.
The

Δ G_{s p a c i n g}

refers to the free energy for the non-optimal physical distance between SD sequence and the start codon, which is positive.
The

Δ G_{s t a r t}

refers to the free energy for

t R N A^{f M E T}

-start codon complex, which is negative.
The

Δ G_{s \tan d b y}

refers to the free energy for the , which is negative.
The

Δ G_{m R N A}

refers to the free energy of folding for 5’UTR, which is negative.
All these energies can be calculated by the NUPACK suit of energy with mFold 3.0 RNA energy parameter. And five energy can be seen vividly in the Fig.2-2.

Fig.2-2 the five part of the total binding free energy

The second thing we need to reconsidering that is to recalculate the premature termination rate for the second CDS because this parameter is influence by transcription polarity. Even there is only one transcriptional part, the premature termination rate of the second CDS is higher than the first CDS because the Rho factor will bind to RNA in the intergenic regions to cause rho- dependent termination. And almost 80% of premature termination is caused by rho-dependent termination. We are going to use the queening theory to build the an model to describe it haven’t finish yet.

@@ Line 1,504: / Line 1,504: @@
    </msub>
    </mrow>
-</math>
+</math></div>
 <br/>The <math>
   <mrow>