Difference between revisions of "Team:Tec-Monterrey/Model"

 
(42 intermediate revisions by 5 users not shown)
Line 10: Line 10:
 
<style>
 
<style>
 
     .masthead {
 
     .masthead {
       background: url(https://static.igem.org/mediawiki/2018/d/d8/T--Tec-Monterrey--Header_Modelling.jpeg);
+
       background: url(https://static.igem.org/mediawiki/2018/5/53/T--Tec-Monterrey--banner_modeling.jpg);
 
       background-size: cover;
 
       background-size: cover;
 
       background-repeat: no-repeat;
 
       background-repeat: no-repeat;
Line 106: Line 106:
 
         <div class="contenido">
 
         <div class="contenido">
 
             <div class="body-title">Introduction</div>
 
             <div class="body-title">Introduction</div>
The description of our system had a progressive development parallel to the realization of the whole project. Initially, a simple system of linear coupled differential equations was proposed to describe the whole system, but as the understanding of all the aspects the system involves improved, the model was continuously refined to encompass a more suitable description that became both accurate and practical.
+
<br>
 +
The description of the E. coding system had a progressive development parallel to the realization of the whole project. Initially, a simple system of linear coupled differential equations was proposed to describe the whole system, but as the understanding of all the aspects the system involves improved, the model was continuously refined to encompass a more suitable description that became both accurate and practical.
 
       </div>
 
       </div>
 
   </section>
 
   </section>
Line 112: Line 113:
 
         <div class="contenido">
 
         <div class="contenido">
 
           <div class="body-title">Biological System</div>
 
           <div class="body-title">Biological System</div>
 +
<br>
 
Biological systems modeling is the creation of a kinetics diagram that connects all the important species of the system through relationships that represent bio-chemical reactions.
 
Biological systems modeling is the creation of a kinetics diagram that connects all the important species of the system through relationships that represent bio-chemical reactions.
 
In the case of the E. coding system, the methodology consisted in:
 
In the case of the E. coding system, the methodology consisted in:
Line 122: Line 124:
  
 
               <div class="body-subtitle">Identification</div>
 
               <div class="body-subtitle">Identification</div>
In the bateria, five precursor species were identified as the ones relevant for the system. These precursors are precisely $Cas1$ protein, $Cas2$ protein, the msr-msd ($UM$ from ‘unprocessed message’), the retrotranscriptase ($RT$) protein, and IPTG ($I$ from 'inductor'). The first four species are produced by IPTG regulated transcription, which has some advantages, mainly the fact that the bacteria is not able to metabolize it, so its concentration does not change.
+
In the bateria, five precursor species were identified as the most relevant for the system. These precursors are precisely $Cas1$ protein, $Cas2$ protein, the msr-msd ($UM$ from ‘unprocessed message’), the reverse transcriptase ($RT$) protein, and IPTG ($I$ from 'inductor'). The first four species are produced by IPTG regulated transcription, which has some advantages, mainly the fact that the bacteria is not able to metabolize it, so its concentration does not change.
 
               <div class="body-subtitle">Protein expression</div>
 
               <div class="body-subtitle">Protein expression</div>
 
<b>Production</b>
 
<b>Production</b>
<br><br>Research on IPTG induction in our plasmid led us to the Hill equation, an expression that is simple, practical and commonly used in regulated protein production modeling.  
+
<br><br>Research on IPTG induction in the plasmid led to the Hill equation, an expression that is simple, practical and commonly used in regulated protein production modeling.  
 
<br><br>It is well known that the induction stimulates the transcription of RNA which then is translated by the ribosomes into proteins. So, the Hill equation is valid when it is assumed that, as soon as the RNA is produced, it is immediately processed and translated into the corresponding proteins by the bacteria.This means that all these four species are able to be modeled by the Hill equation.
 
<br><br>It is well known that the induction stimulates the transcription of RNA which then is translated by the ribosomes into proteins. So, the Hill equation is valid when it is assumed that, as soon as the RNA is produced, it is immediately processed and translated into the corresponding proteins by the bacteria.This means that all these four species are able to be modeled by the Hill equation.
 
<br><br>An important thing to consider is the native production of these proteins by the bacteria. Although it is expected that this basal production is insignificant when compared to the IPTG regulated production, the use of this basal production increases the accuracy in which the insertions are related to the inductor concentrations.
 
<br><br>An important thing to consider is the native production of these proteins by the bacteria. Although it is expected that this basal production is insignificant when compared to the IPTG regulated production, the use of this basal production increases the accuracy in which the insertions are related to the inductor concentrations.
Line 138: Line 140:
  
 
<br><br><b>$RT-UM$ complex</b>
 
<br><br><b>$RT-UM$ complex</b>
<br><br>The other fundamental complex in the E-coding system is the one that carries out the message processing function, denoted $MP$. This is essential because, in order for the message to be inserted in the bacteria’s genome, a retrotranscription of the msd in the msr-msr is necessary by means of the $RT$ to yield the RNA-ssDNA or msDNA, which contains the target sequence to be inserted. This is analogous to a ready-to-insert message, so for compactness in notation, we denote it as $M$.
+
<br><br>The other fundamental complex in the E-coding system is the one that carries out the message processing function, denoted $MP$. This is essential because, in order for the message to be inserted in the bacteria’s genome, a retrotranscription of the msd in the msr-msd is necessary by means of the $RT$ to yield the RNA-ssDNA or msDNA, which contains the target sequence to be inserted. This is analogous to a ready-to-insert message, so for compactness in notation, we denote it as $M$.
  
 
               <div class="body-subtitle">Complexes interaction</div>
 
               <div class="body-subtitle">Complexes interaction</div>
Line 145: Line 147:
 
The capability of message insertions by the bacteria is greatly affected by the concentration of the $SM$ available inside the bacteria. An efficiency factor $P_{SM}$ is taken under consideration, which can be interpreted as the probability that a single $SM$ is inserted, or the affinity to effectuate the insertion inside the bacteria's genome.
 
The capability of message insertions by the bacteria is greatly affected by the concentration of the $SM$ available inside the bacteria. An efficiency factor $P_{SM}$ is taken under consideration, which can be interpreted as the probability that a single $SM$ is inserted, or the affinity to effectuate the insertion inside the bacteria's genome.
 
<br><br>It is known that the insertions inside the bacteria's genome is not frequent enough to make valid the continuum approximation. Nevertheless, its differential expression is valid to model its expected value. The insertions' expected value is denoted $Ins$ for compactness in the expressions.
 
<br><br>It is known that the insertions inside the bacteria's genome is not frequent enough to make valid the continuum approximation. Nevertheless, its differential expression is valid to model its expected value. The insertions' expected value is denoted $Ins$ for compactness in the expressions.
<br><br>Although mainly the project was based on inserting inside the bacteria’s genome, another proposition is the insertion in another plasmid, designed specifically to store E. coding messages. So, this insertion factor would help model the discrepancies between different insertion sites.
+
<br><br>Although mainly the project was based on insertions inside the bacteria’s genome, another proposition is the insertion in another plasmid, designed specifically to store E. coding messages. So, this insertion factor would help model the discrepancies between different insertion sites.
 
       </div>     
 
       </div>     
  
Line 156: Line 158:
 
                 <div class="body-subtitle">Kinetic diagram</div>
 
                 <div class="body-subtitle">Kinetic diagram</div>
 
Using all of the above-specified facts regarding E-coding, the following diagram was created to represent all the interactions between all the relevant species inside a bacteria.
 
Using all of the above-specified facts regarding E-coding, the following diagram was created to represent all the interactions between all the relevant species inside a bacteria.
<br> DIAGRAMA
+
<br>  
 +
 
 +
 
 +
<div class="imagen-centrada-completa">
 +
          <img class="foto" src="https://static.igem.org/mediawiki/2018/c/c9/T--Tec-Monterrey--MODEL_DIAGRAM.jpeg">
 +
          <div class="leyenda">Kinetic Diagram: E-Coding</div>
 +
</div>
 +
 
 
                 <div class="body-subtitle">Equations</div>
 
                 <div class="body-subtitle">Equations</div>
 
With the kinetic diagram established, by law of mass action, and the hill equation, the set of differential equations for E-coding is obtained as shown below:
 
With the kinetic diagram established, by law of mass action, and the hill equation, the set of differential equations for E-coding is obtained as shown below:
Line 201: Line 210:
 
   <tbody>
 
   <tbody>
 
     <tr>
 
     <tr>
       <td>Parameter</td>
+
       <td><b>Parameter</b></td>
       <td>Interpretation</td>
+
       <td><b>Interpretation</b></td>
 
     </tr>     
 
     </tr>     
 
     <tr>
 
     <tr>
Line 308: Line 317:
 
           <div class="contenido">
 
           <div class="contenido">
 
                 <div class="body-title">Modified Quasi-Steady State Approximation</div>
 
                 <div class="body-title">Modified Quasi-Steady State Approximation</div>
<br><br>As it can be seen from the above, the final differential equations system turns out to have many free parameters. A practical, but a still accurate approach to describe the system is to use a modified quasi-steady state approximation.  
+
<br><br>As it can be seen from above, the final differential equations system remains with several free parameters. A practical but still very accurate approach to describe the system is to use a modified quasi-steady state approximation. An improvement to our mathematical model would be to use a more sophisticated method of approximation, mainly because along the intermediate species, between the $I$ concentration and the $Ins$ description, $MP$ may not be stable. However, the amount of experimental data that could be gathered with the lab equipment and techniques that were available to us was not enough for the use of a more complex model.
<br><br>Using a pure quasi-steady state approximation may not be the best approximation, mainly because along the intermediate species, between the $I$ concentration and the $Ins$ description, $MP$ may not be stable.
+
<br><br>Conceptually, it is expected that $MP$ will need some time from the $RT-UM$ binding to the $M$ formation, because it involves a retrotranscription, therefore, its concentration is not stable. A simple way to grasp this idea is to imagine that $MP$ concentration grows with a sudden increase in protein production, falling sometime after the increment is processed and liberated as $M$.
<br><br>Conceptually, it is expected that $MP$ will need some time from the $RT-UM$ binding to the $M$ formation, because it involves a retrotranscription, therefore, its concentration is not stable. A simple way to grasp this idea is to imagine $MP$ concentration grows with a sudden increase in protein production, falling sometime after the previous increment is processed and liberated as $M$.
+
<br><br>
 
+
      Under this assumption, Equations 3 and 4 are equal to zero. From Equation 3, $k_{MP}[RT][UM]$ is equal to $\frac{r_{RT}[I]}{C_{RT}+[I]} + b_{RT} - \delta_{RT}[RT] + \delta_{MP}[MP] + k_M[MP]$.
                <div class="body-subtitle">Equations</div>
+
From Equation 4, $k_{MP}[RT][UM]$ is equal to $\frac{r_{UM}[I]}{C_{UM}+[I]} + b_{UM} - \delta_{UM}[UM]$. Substituting $k_{MP}[RT][UM]$ into Equation 5, the expressions result in $\frac{d[MP]}{dt}= \frac{r_{RT}[I]}{C_{RT} + [I]} + b_{RT} - \delta_{RT}[RT]$
                <div class="body-subtitle">Parameters</div>
+
$\frac{d[MP]}{dt}= \frac{r_{UM}[I]}{C_{UM}+[I]} + b_{UM} - \delta_{UM}[UM] - \delta_{MP}[MP] - k_M[MP]$, which is the rate of $RT$ binding to $UM$.<br>
          </div>
+
      </section>
+
 
<br>
 
<br>
 +
      Taking Equation 8 as equal to zero, solving for $P_{SM}[SM]$ and substituting in Equation 10: $\frac{d[Ins]}{dt}= k_{SM}[M][X] - \delta_{SM}[SM]$
 +
From Equation 6 as equal to zero and solving for $k_{SM}[M][X]= k_M[MP]$ and substituting in Equation 10: $\frac{d[Ins]}{dt}= k_M[MP] - \delta_{SM}[SM]$.<br>
 
<br>
 
<br>
        <div class="header-subtitle">
+
      Considering $[SM]=0$, then $\frac{d[Ins]}{dt}= k_M[MP]$.
               Protein Production characterization
+
<br><br>
 +
      The interpretation of this expression is that the rate of insertion is delayed with respect to the $RT$ bound to $UM$, under the assumptions that the Cas Complex inserts the target DNA sequence in an insignificant amount of time compared to the time that the retrotranscription of the protospacer takes, so no effective amount of Complex is bound to protospacer at any given time.
 +
<br><br>
 +
                
  
</div>     
+
<div class="body-subtitle">Equations</div>
 +
 
 +
\begin{equation}
 +
\frac{d[MP]}{dt}= \frac{r_{RT}[I]}{C_{RT}+[I]}+b_{RT}-\delta_{RT}[RT]
 +
\end{equation}
 +
 
 +
\begin{equation}
 +
\frac{d[Ins]}{dt}= k_M[MP]
 +
\end{equation}
 +
 
 +
 
 +
<div class="body-subtitle">Comparisons between complete model and QSSA model</div>
 +
 
 +
<div class="imagen-centrada-completa">
 +
          <img class="foto" src="https://static.igem.org/mediawiki/2018/2/2e/T--Tec-Monterrey--MODEL_NoAproxPoisson.svg">
 +
         
 +
</div>
 +
 
 +
<div class="imagen-centrada-completa">
 +
          <img class="foto" src="https://static.igem.org/mediawiki/2018/8/87/T--Tec-Monterrey--MODEL_QSSAPoisson.svg">
 +
         
 +
</div>
 +
 
 +
<div class="imagen-centrada-completa">
 +
          <img class="foto" src="https://static.igem.org/mediawiki/2018/3/3f/T--Tec-Monterrey--ModelComplete2d.svg">
 +
      
 +
</div>
 +
 
 +
 
 +
<div class="imagen-centrada-completa">
 +
          <img class="foto" src="https://static.igem.org/mediawiki/2018/6/69/T--Tec-Monterrey--Modelxd.svg">
 +
   
 +
</div>
 +
 
 +
 
 +
 
 +
      </section>
 +
 
 +
 
 +
 
 +
<br>
 +
   
 
   </section>
 
   </section>
      <section id="qssa" class="seccion-responsiva">
+
 
          <div class="contenido">
+
         
+
<br><br> In the search of the parameters that better describes the E-coding system, experiments about protein production were run. These experiments recorded the population and the protein content on various bacteria populations, with different known IPTG constant concentrations (0.1mM, 0.5mM and 1 mM) inside the medium.
+
The experimental data obtained described the regulated protein production of retrotranscriptase (RT) and Cas1 and Cas2 proteins. The Cas proteins measurements were not able to differentiate between Cas1 and Cas2, but contained the information about their total production. The parametrization of the production of total Cas1&2 was considered appropriate. The RT protein production
+
This data was then adjusted on MATLAB by a non-linear square errors minimization to the differential equations that would describe only the protein production:$M$.
+
          </div>
+
  
 
           <section id="protprod" class="seccion-responsiva">
 
           <section id="protprod" class="seccion-responsiva">
Line 337: Line 384:
  
 
\begin{equation}
 
\begin{equation}
\frac{d[Cas1]}{dt}=\frac{r_{1}[I]}{C_{1}+[I]}+b_1-\delta_1[Cas1]
+
\frac{d[CasWT]}{dt}=\frac{r_{WT}[I]}{C_{WT}+[I]}+b_{WT}-\delta_{WT}[CasWT]
\end{equation}
+
\begin{equation}
+
\frac{d[Cas2]}{dt}=\frac{r_{2}[I]}{C_{2}+[I]}+b_2-\delta_{2}[Cas2]
+
 
\end{equation}
 
\end{equation}
 
\begin{equation}
 
\begin{equation}
 
\frac{d[RT]}{dt}=\frac{r_{RT}[I]}{C_{RT}+[I]}+b_{RT}-\delta_{RT}[RT]
 
\frac{d[RT]}{dt}=\frac{r_{RT}[I]}{C_{RT}+[I]}+b_{RT}-\delta_{RT}[RT]
 
\end{equation}
 
\end{equation}
 +
 +
                <div class="body-subtitle">Experimentation</div>
 +
      In order to obtain values for the expressions describing protein production a quantification of each of the proteins was done at particular induction conditions, inductor concentration and induction time. The quantification was done by Biuret method. These experiments recorded the population and the protein content on various bacteria populations, with different known IPTG constant concentrations (0.1mM, 0.5mM and 1 mM) in the medium. These samples were analyzed in a UV-Vis spectrophotometer in order to obtain their absorbances and relate them to the correspondent amount of protein.<br>
 +
<br>
 +
The experimental data obtained described the regulated protein production of reverse transcriptase ($RT$), $Cas1$ and $Cas2$ proteins. The Cas proteins measurements were not able to differentiate between each other, but contained the information about their total production. The parametrization of the production of total $Cas1$ and $Cas2$ was considered appropriate. The $RT$ protein production.<br>
 +
<br>
 +
This data was then adjusted on MATLAB by a non-linear square errors minimization to the differential equations that would describe only the protein production:
 +
  
 
                 <div class="body-subtitle">Parameters</div>
 
                 <div class="body-subtitle">Parameters</div>
          </div>
+
      The results of parametrization were very inconclusive, mainly due to a lot of local minimums appearing, as well as divergence in the parameter search. Additionally, these minimums often displayed that IPTG did not affect the protein production, something opposite to what was expected.
      </section>
+
The gathered experimental data varied a little with different IPTG concentrations, but the results of the production differential equations with the best-fit parameters were found  to not vary significantly with different IPTG concentration.
 +
<br><br>
 +
Production of Cas Wild Type proteins are displayed in the following Figures:
 +
<div class="imagen-centrada-reducida">
 +
          <img class="foto" src="https://static.igem.org/mediawiki/2018/4/4e/T--Tec-Monterrey--ModelWT.png">
 +
</div>
 +
 
 +
<div class="imagen-centrada-reducida">
 +
          <img class="foto" src="https://static.igem.org/mediawiki/2018/1/1b/T--Tec-Monterrey--ModelWT0.png">
 +
</div>
 +
 
 +
<div class="imagen-centrada-reducida">
 +
          <img class="foto" src="https://static.igem.org/mediawiki/2018/8/8b/T--Tec-Monterrey--ModelWT01.png">
 +
</div>
 +
 
 +
<div class="imagen-centrada-reducida">
 +
          <img class="foto" src="https://static.igem.org/mediawiki/2018/4/47/T--Tec-Monterrey--ModelWT05.png">
 +
</div>
 +
 
 +
<div class="imagen-centrada-reducida">
 +
          <img class="foto" src="https://static.igem.org/mediawiki/2018/a/ab/T--Tec-Monterrey--ModelWT10.png">
 +
</div>
 +
 
 
<br>
 
<br>
<br>      
+
Production of Reverse Transcriptase are displayed in the following Figures:
        <div class="header-subtitle">
+
<div class="imagen-centrada-reducida">
            Predictions
+
          <img class="foto" src="https://static.igem.org/mediawiki/2018/7/7d/T--Tec-Monterrey--ModelRT.png">
           </div>
+
</div>
 +
 
 +
<div class="imagen-centrada-reducida">
 +
          <img class="foto" src="https://static.igem.org/mediawiki/2018/c/c3/T--Tec-Monterrey--ModelRT0.png">
 +
</div>
 +
 
 +
<div class="imagen-centrada-reducida">
 +
           <img class="foto" src="https://static.igem.org/mediawiki/2018/f/f6/T--Tec-Monterrey--ModelRT01.png">
 +
</div>
 +
 
 +
<div class="imagen-centrada-reducida">
 +
          <img class="foto" src="https://static.igem.org/mediawiki/2018/c/cf/T--Tec-Monterrey--ModelRT05.png">
 +
</div>
 +
 
 +
<div class="imagen-centrada-reducida">
 +
          <img class="foto" src="https://static.igem.org/mediawiki/2018/1/1b/T--Tec-Monterrey--ModelRT10.png">
 +
</div>
 +
 
 +
Even though the graphs seem to be linear regressions, they are actually square errors minimizations for variations in the parameters of the differential equations that describe the different proteins productions.
 +
 
 +
The codes that are responsible for this parametrizations are available in the GitHub page at the end of this site.
 +
 
 +
<br><br> The parameters were adjusted to the differential equations model throught the lsqnonlin method using MatLab. This parametrization helped in obtaining an approximation of levels of protein production and the consequente amount of insertion inside the bacteria's genome. The obtained numerical values for the parameters are as follows:
 +
        <table>
 +
          <tbody>
 +
            <tr>
 +
              <td><b>Parameter</b></td>
 +
              <td><b>Value</b></td>
 +
            </tr>   
 +
    <tr>
 +
      <td>$r_{WT}$</td>
 +
      <td>$7.473778 \cdot 10^{-2}$ mgmL$^-1$s$^-1$</td>
 +
    </tr>   
 +
    <tr>
 +
      <td>$C_{WT}$</td>
 +
      <td>$5.630415 \cdot 10^{-2}$ mM</td>
 +
    </tr>     
 +
    <tr>
 +
      <td>$b_{WT}$</td>
 +
      <td>$5.976100 \cdot 10^{-2}$ mgmL$^-1$s$^-1$</td>
 +
    </tr>
 +
    <tr>
 +
      <td>$\delta_{WT}$</td>
 +
      <td>$2.220446 \cdot 10^{-14}$ mgmL$^-1$s$^-1$</td>
 +
    </tr>   
 +
    <tr>
 +
      <td>$r_{RT}$</td>
 +
      <td>$7.883950 \cdot 10^{-4}$ mgmL$^-1$s$^-1$</td>
 +
    </tr>   
 +
    <tr>
 +
      <td>$C_{RT}$</td>
 +
      <td>$2.220487 \cdot 10^{-14}$ mM</td>
 +
    </tr>   
 +
    <tr>
 +
      <td>$b_{RT}$</td>
 +
      <td>$1.140019  \cdot 10^{-1}$ mgmL$^-1$s$^-1$</td>
 +
    </tr> 
 +
    <tr>
 +
      <td>$\delta_{RT}$</td>
 +
      <td>$2.222364 \cdot 10^{-14}$ mgmL$^-1$s$^-1$</td>
 +
    </tr>     
 +
          </tbody>
 +
        </table>
 +
 
 +
   
 +
    </section>   
 +
     
 
           <section id="gauss" class="seccion-responsiva">
 
           <section id="gauss" class="seccion-responsiva">
 
           <div class="contenido">
 
           <div class="contenido">
                 <div class="body-title">Sudden bulk induction</div>
+
                 <div class="body-title">GitHub</div>
                 <div class="body-subtitle">Expression</div>
+
                 <div class="body-subtitle"> <a href="https://github.com/juanfernandezg/iGEMTec-Monterrey2018.git">Check the Matlab code of the mathematical model here!</a></div>
                <div class="body-subtitle">Insertions</div>
+
                <div class="body-subtitle">Initial conditions expansion</div>
+
 
           </div>
 
           </div>
 
       </section>           
 
       </section>           
       <section id="heaviside" class="seccion-responsiva">
+
        
          <div class="contenido">
+
                <div class="body-title">Industry induction</div>
+
                <div class="body-subtitle">Expression</div>
+
                <div class="body-subtitle">Insertions</div>
+
                <div class="body-subtitle">Initial conditions expansion</div>
+
          </div>
+
      </section>
+
 
<br>
 
<br>
 
<br>
 
<br>

Latest revision as of 03:54, 18 October 2018

<

Introduction

The description of the E. coding system had a progressive development parallel to the realization of the whole project. Initially, a simple system of linear coupled differential equations was proposed to describe the whole system, but as the understanding of all the aspects the system involves improved, the model was continuously refined to encompass a more suitable description that became both accurate and practical.
Biological System

Biological systems modeling is the creation of a kinetics diagram that connects all the important species of the system through relationships that represent bio-chemical reactions. In the case of the E. coding system, the methodology consisted in:
  • The identification of each individual component
  • Consulting literature on the possible interactions
  • Creating a kinetic diagram
  • Establishing the whole differential equations system via biochemical reactions and the eventual message insertion inside our bacteria
Identification
In the bateria, five precursor species were identified as the most relevant for the system. These precursors are precisely $Cas1$ protein, $Cas2$ protein, the msr-msd ($UM$ from ‘unprocessed message’), the reverse transcriptase ($RT$) protein, and IPTG ($I$ from 'inductor'). The first four species are produced by IPTG regulated transcription, which has some advantages, mainly the fact that the bacteria is not able to metabolize it, so its concentration does not change.
Protein expression
Production

Research on IPTG induction in the plasmid led to the Hill equation, an expression that is simple, practical and commonly used in regulated protein production modeling.

It is well known that the induction stimulates the transcription of RNA which then is translated by the ribosomes into proteins. So, the Hill equation is valid when it is assumed that, as soon as the RNA is produced, it is immediately processed and translated into the corresponding proteins by the bacteria.This means that all these four species are able to be modeled by the Hill equation.

An important thing to consider is the native production of these proteins by the bacteria. Although it is expected that this basal production is insignificant when compared to the IPTG regulated production, the use of this basal production increases the accuracy in which the insertions are related to the inductor concentrations.

Degradation

It is also known that the proteins are prone to degradation, which implies that the protein production can be ‘shielded’ by this degradation. Therefore, knowing how the degradation operates, turns out to be essential to the description of E. coding. It is also important to take into consideration that this degradation is proportional to the protein quantity; so the protein count, in controlled optimal conditions, will tend to an equilibrium at every IPTG concentration.
Complex formation
Cas complex

The key complex in the whole system is the Cas complex, responsible for the eventual message insertion in the bacteria's genome. Aiming for a simple and compact notation, this complex will hereby be denoted by $X$.

The formation of $X$ is identified in previous research as the binding of two dimers of $Cas1$, and one dimer of $Cas2$, forming a hexamer of $Cas1$ and $Cas2$. To prevent introducing complexity to the model, $X$ formation is taken as the independent binding of $4$ $Cas1$ proteins with $2$ $Cas2$ proteins.

Although it may happen that $X$ is stable, a backward biochemical reaction is considered as possible in order to improve the precision of the model.

$RT-UM$ complex

The other fundamental complex in the E-coding system is the one that carries out the message processing function, denoted $MP$. This is essential because, in order for the message to be inserted in the bacteria’s genome, a retrotranscription of the msd in the msr-msd is necessary by means of the $RT$ to yield the RNA-ssDNA or msDNA, which contains the target sequence to be inserted. This is analogous to a ready-to-insert message, so for compactness in notation, we denote it as $M$.
Complexes interaction
It is known that both $X$ and $M$ interact because of the fact that the Cas proteins complex $X$ will cut the target message $M$ from the msDNA, forming a ‘Storage Machinery' ($SM$). And this final $SM$ is the complex responsible of the final insertion into the bacteria’s genome.
Insertion
The capability of message insertions by the bacteria is greatly affected by the concentration of the $SM$ available inside the bacteria. An efficiency factor $P_{SM}$ is taken under consideration, which can be interpreted as the probability that a single $SM$ is inserted, or the affinity to effectuate the insertion inside the bacteria's genome.

It is known that the insertions inside the bacteria's genome is not frequent enough to make valid the continuum approximation. Nevertheless, its differential expression is valid to model its expected value. The insertions' expected value is denoted $Ins$ for compactness in the expressions.

Although mainly the project was based on insertions inside the bacteria’s genome, another proposition is the insertion in another plasmid, designed specifically to store E. coding messages. So, this insertion factor would help model the discrepancies between different insertion sites.
Differential Equations System
Kinetic diagram
Using all of the above-specified facts regarding E-coding, the following diagram was created to represent all the interactions between all the relevant species inside a bacteria.
Kinetic Diagram: E-Coding
Equations
With the kinetic diagram established, by law of mass action, and the hill equation, the set of differential equations for E-coding is obtained as shown below:
\begin{equation} \frac{d[Cas1]}{dt}=\frac{r_{1}[I]}{C_{1}+[I]}+b_1-\delta_1[Cas1] - 4k_X[Cas1]^4[Cas2]^2 + 4k_{-X}[X] \end{equation} \begin{equation} \frac{d[Cas2]}{dt}=\frac{r_{2}[I]}{C_{2}+[I]}+b_2-\delta_{2}[Cas2] - 2k_X[Cas1]^4[Cas2]^2 + 2k_{-X}[X] \end{equation} \begin{equation} \frac{d[RT]}{dt}=\frac{r_{RT}[I]}{C_{RT}+[I]}+b_{RT}-\delta_{RT}[RT] - k_{MP}[RT][UM] + \delta_{MP}[MP] + k_M[MP] \end{equation} \begin{equation} \frac{d[UM]}{dt}=\frac{r_{UM}[I]}{C_{UM}+[I]}+b_{UM}-\delta_{UM}[UM] - k_{MP}[RT][UM] \end{equation} \begin{equation} \frac{d[MP]}{dt}= k_{MP}[RT][UM] - \delta_{MP}[MP] - k_M[MP] \end{equation} \begin{equation} \frac{d[M]}{dt}= k_M[MP] - k_{SM}[M][X] \end{equation} \begin{equation} \frac{d[X]}{dt}= k_X[Cas1]^4[Cas2]^2 - k_{-X}[X] - k_{SM}[M][X] + \delta_{SM}[SM] + P_{SM}[SM] \end{equation} \begin{equation} \frac{d[SM]}{dt}= k_{SM}[M][X] - \delta_{SM}[SM] - P_{SM}[SM] \end{equation} \begin{equation} \frac{d[I]}{dt} = f'(t) \end{equation} \begin{equation} \frac{d[Ins]}{dt}= P_{SM}[SM] \end{equation} Equation $(9)$ represents the inductor concentration time derivative function. It is also important to state because through this equation, in the MATLAB written codes, it is possible to simulate the insertions results.
Parameters
The Differential equations system makes use of the following parameters that have several physical interpretations: m
Parameter Interpretation
$r_1$ Regulated production rate for Cas1 proteins
$C_1$ Characteristic concentration of inductor for Cas1 proteins' regulated production
$b_1$ Native production rate for Cas1 proteins
$\delta_1$ Degradation of Cas1 proteins
$r_2$ Regulated production rate for Cas2 proteins
$C_2$ Characteristic concentration of inductor for Cas2 proteins' regulated production
$b_2$ Native production rate for Cas2 proteins
$\delta_2$ Degradation of Cas2 proteins
$r_{RT}$ Regulated production rate for RT proteins
$C_{RT}$ Characteristic concentration of inductor for RT proteins' regulated production
$b_{RT}$ Native production rate for RT proteins
$\delta_{RT}$ Degradation of RT proteins
$r_{UM}$ Regulated production rate for UM proteins
$C_{UM}$ Characteristic concentration of inductor for UM proteins' regulated production
$b_{UM}$ Native production rate for UM proteins
$\delta_{UM}$ Degradation of UM proteins
$k_{MP}$ Affinity for RT to bind with msr-msd
$\delta_{MP}$ Degradation constant for the Message Processing complex
$k_{M}$ Message Processing velocity
$k_{X}$ CAS Complex formation constant
$k_{-X}$ CAS Complex dissociation constant
$k_{SM}$ Affinity for CAS Complex to bind to the message
$\delta_{SM}$ Degradation constant for the Storage Machinery complex
$P_{SM}$ Probability of insertion of the Storage Machinery complex
Modified Quasi-Steady State Approximation


As it can be seen from above, the final differential equations system remains with several free parameters. A practical but still very accurate approach to describe the system is to use a modified quasi-steady state approximation. An improvement to our mathematical model would be to use a more sophisticated method of approximation, mainly because along the intermediate species, between the $I$ concentration and the $Ins$ description, $MP$ may not be stable. However, the amount of experimental data that could be gathered with the lab equipment and techniques that were available to us was not enough for the use of a more complex model.

Conceptually, it is expected that $MP$ will need some time from the $RT-UM$ binding to the $M$ formation, because it involves a retrotranscription, therefore, its concentration is not stable. A simple way to grasp this idea is to imagine that $MP$ concentration grows with a sudden increase in protein production, falling sometime after the increment is processed and liberated as $M$.

Under this assumption, Equations 3 and 4 are equal to zero. From Equation 3, $k_{MP}[RT][UM]$ is equal to $\frac{r_{RT}[I]}{C_{RT}+[I]} + b_{RT} - \delta_{RT}[RT] + \delta_{MP}[MP] + k_M[MP]$. From Equation 4, $k_{MP}[RT][UM]$ is equal to $\frac{r_{UM}[I]}{C_{UM}+[I]} + b_{UM} - \delta_{UM}[UM]$. Substituting $k_{MP}[RT][UM]$ into Equation 5, the expressions result in $\frac{d[MP]}{dt}= \frac{r_{RT}[I]}{C_{RT} + [I]} + b_{RT} - \delta_{RT}[RT]$ $\frac{d[MP]}{dt}= \frac{r_{UM}[I]}{C_{UM}+[I]} + b_{UM} - \delta_{UM}[UM] - \delta_{MP}[MP] - k_M[MP]$, which is the rate of $RT$ binding to $UM$.

Taking Equation 8 as equal to zero, solving for $P_{SM}[SM]$ and substituting in Equation 10: $\frac{d[Ins]}{dt}= k_{SM}[M][X] - \delta_{SM}[SM]$ From Equation 6 as equal to zero and solving for $k_{SM}[M][X]= k_M[MP]$ and substituting in Equation 10: $\frac{d[Ins]}{dt}= k_M[MP] - \delta_{SM}[SM]$.

Considering $[SM]=0$, then $\frac{d[Ins]}{dt}= k_M[MP]$.

The interpretation of this expression is that the rate of insertion is delayed with respect to the $RT$ bound to $UM$, under the assumptions that the Cas Complex inserts the target DNA sequence in an insignificant amount of time compared to the time that the retrotranscription of the protospacer takes, so no effective amount of Complex is bound to protospacer at any given time.

Equations
\begin{equation} \frac{d[MP]}{dt}= \frac{r_{RT}[I]}{C_{RT}+[I]}+b_{RT}-\delta_{RT}[RT] \end{equation} \begin{equation} \frac{d[Ins]}{dt}= k_M[MP] \end{equation}
Comparisons between complete model and QSSA model

Protein Production Model
Equations
\begin{equation} \frac{d[CasWT]}{dt}=\frac{r_{WT}[I]}{C_{WT}+[I]}+b_{WT}-\delta_{WT}[CasWT] \end{equation} \begin{equation} \frac{d[RT]}{dt}=\frac{r_{RT}[I]}{C_{RT}+[I]}+b_{RT}-\delta_{RT}[RT] \end{equation}
Experimentation
In order to obtain values for the expressions describing protein production a quantification of each of the proteins was done at particular induction conditions, inductor concentration and induction time. The quantification was done by Biuret method. These experiments recorded the population and the protein content on various bacteria populations, with different known IPTG constant concentrations (0.1mM, 0.5mM and 1 mM) in the medium. These samples were analyzed in a UV-Vis spectrophotometer in order to obtain their absorbances and relate them to the correspondent amount of protein.

The experimental data obtained described the regulated protein production of reverse transcriptase ($RT$), $Cas1$ and $Cas2$ proteins. The Cas proteins measurements were not able to differentiate between each other, but contained the information about their total production. The parametrization of the production of total $Cas1$ and $Cas2$ was considered appropriate. The $RT$ protein production.

This data was then adjusted on MATLAB by a non-linear square errors minimization to the differential equations that would describe only the protein production:
Parameters
The results of parametrization were very inconclusive, mainly due to a lot of local minimums appearing, as well as divergence in the parameter search. Additionally, these minimums often displayed that IPTG did not affect the protein production, something opposite to what was expected. The gathered experimental data varied a little with different IPTG concentrations, but the results of the production differential equations with the best-fit parameters were found to not vary significantly with different IPTG concentration.

Production of Cas Wild Type proteins are displayed in the following Figures:

Production of Reverse Transcriptase are displayed in the following Figures:
Even though the graphs seem to be linear regressions, they are actually square errors minimizations for variations in the parameters of the differential equations that describe the different proteins productions. The codes that are responsible for this parametrizations are available in the GitHub page at the end of this site.

The parameters were adjusted to the differential equations model throught the lsqnonlin method using MatLab. This parametrization helped in obtaining an approximation of levels of protein production and the consequente amount of insertion inside the bacteria's genome. The obtained numerical values for the parameters are as follows:
Parameter Value
$r_{WT}$ $7.473778 \cdot 10^{-2}$ mgmL$^-1$s$^-1$
$C_{WT}$ $5.630415 \cdot 10^{-2}$ mM
$b_{WT}$ $5.976100 \cdot 10^{-2}$ mgmL$^-1$s$^-1$
$\delta_{WT}$ $2.220446 \cdot 10^{-14}$ mgmL$^-1$s$^-1$
$r_{RT}$ $7.883950 \cdot 10^{-4}$ mgmL$^-1$s$^-1$
$C_{RT}$ $2.220487 \cdot 10^{-14}$ mM
$b_{RT}$ $1.140019 \cdot 10^{-1}$ mgmL$^-1$s$^-1$
$\delta_{RT}$ $2.222364 \cdot 10^{-14}$ mgmL$^-1$s$^-1$


Contact Us

Sponsors