Introduction
The description of our system had a progressive development parallel to the realization of the whole project. Initially, a simple system of linear coupled differential equations was proposed to describe the whole system, but as the understanding of all the aspects the system involves improved, the model was continuously refined to encompass a more suitable description that became both accurate and practical.
Biological System
Biological systems modeling is the creation of a kinetics diagram that connects all the important species of the system through relationships that represent bio-chemical reactions. In the case of the E. coding system, the methodology consisted in:
- The identification of each individual component
- Consulting literature on the possible interactions
- Creating a kinetic diagram
- Establishing the whole differential equations system via biochemical reactions and the eventual message insertion inside our bacteria
Identification
In the bateria, five precursor species were identified as the ones relevant for the system. These precursors are precisely $Cas1$ protein, $Cas2$ protein, the msr-msd ($UM$ from ‘unprocessed message’), the retrotranscriptase ($RT$) protein, and IPTG ($I$ from 'inductor'). The first four species are produced by IPTG regulated transcription, which has some advantages, mainly the fact that the bacteria is not able to metabolize it, so its concentration does not change.
Protein expression
Production
Research on IPTG induction in our plasmid led us to the Hill equation, an expression that is simple, practical and commonly used in regulated protein production modeling.
It is well known that the induction stimulates the transcription of RNA which then is translated by the ribosomes into proteins. So, the Hill equation is valid when it is assumed that, as soon as the RNA is produced, it is immediately processed and translated into the corresponding proteins by the bacteria.This means that all these four species are able to be modeled by the Hill equation.
An important thing to consider is the native production of these proteins by the bacteria. Although it is expected that this basal production is insignificant when compared to the IPTG regulated production, the use of this basal production increases the accuracy in which the insertions are related to the inductor concentrations.
Degradation
It is also known that the proteins are prone to degradation, which implies that the protein production can be ‘shielded’ by this degradation. Therefore, knowing how the degradation operates, turns out to be essential to the description of E. coding. It is also important to take into consideration that this degradation is proportional to the protein quantity; so the protein count, in controlled optimal conditions, will tend to an equilibrium at every IPTG concentration.
Complex formation
Cas complex
The key complex in the whole system is the Cas complex, responsible for the eventual message insertion in the bacteria's genome. Aiming for a simple and compact notation, this complex will hereby be denoted by $X$.
The formation of $X$ is identified in previous research as the binding of two dimers of $Cas1$, and one dimer of $Cas2$, forming a hexamer of $Cas1$ and $Cas2$. To prevent introducing complexity to the model, $X$ formation is taken as the independent binding of $4$ $Cas1$ proteins with $2$ $Cas2$ proteins.
Although it may happen that $X$ is stable, a backward biochemical reaction is considered as possible in order to improve the precision of the model.
$RT-UM$ complex
The other fundamental complex in the E-coding system is the one that carries out the message processing function, denoted $MP$. This is essential because, in order for the message to be inserted in the bacteria’s genome, a retrotranscription of the msd in the msr-msd is necessary by means of the $RT$ to yield the RNA-ssDNA or msDNA, which contains the target sequence to be inserted. This is analogous to a ready-to-insert message, so for compactness in notation, we denote it as $M$.
Complexes interaction
It is known that both $X$ and $M$ interact because of the fact that the Cas proteins complex $X$ will cut the target message $M$ from the msDNA, forming a ‘Storage Machinery' ($SM$). And this final $SM$ is the complex responsible of the final insertion into the bacteria’s genome.
Insertion
The capability of message insertions by the bacteria is greatly affected by the concentration of the $SM$ available inside the bacteria. An efficiency factor $P_{SM}$ is taken under consideration, which can be interpreted as the probability that a single $SM$ is inserted, or the affinity to effectuate the insertion inside the bacteria's genome.
It is known that the insertions inside the bacteria's genome is not frequent enough to make valid the continuum approximation. Nevertheless, its differential expression is valid to model its expected value. The insertions' expected value is denoted $Ins$ for compactness in the expressions.
Although mainly the project was based on inserting inside the bacteria’s genome, another proposition is the insertion in another plasmid, designed specifically to store E. coding messages. So, this insertion factor would help model the discrepancies between different insertion sites.
Differential Equations System
Kinetic diagram
Using all of the above-specified facts regarding E-coding, the following diagram was created to represent all the interactions between all the relevant species inside a bacteria.
Kinetic Diagram: E-Coding
Equations
With the kinetic diagram established, by law of mass action, and the hill equation, the set of differential equations for E-coding is obtained as shown below:
\begin{equation} \frac{d[Cas1]}{dt}=\frac{r_{1}[I]}{C_{1}+[I]}+b_1-\delta_1[Cas1] - 4k_X[Cas1]^4[Cas2]^2 + 4k_{-X}[X] \end{equation} \begin{equation} \frac{d[Cas2]}{dt}=\frac{r_{2}[I]}{C_{2}+[I]}+b_2-\delta_{2}[Cas2] - 2k_X[Cas1]^4[Cas2]^2 + 2k_{-X}[X] \end{equation} \begin{equation} \frac{d[RT]}{dt}=\frac{r_{RT}[I]}{C_{RT}+[I]}+b_{RT}-\delta_{RT}[RT] - k_{MP}[RT][UM] + \delta_{MP}[MP] + k_M[MP] \end{equation} \begin{equation} \frac{d[UM]}{dt}=\frac{r_{UM}[I]}{C_{UM}+[I]}+b_{UM}-\delta_{UM}[UM] - k_{MP}[RT][UM] \end{equation} \begin{equation} \frac{d[MP]}{dt}= k_{MP}[RT][UM] - \delta_{MP}[MP] - k_M[MP] \end{equation} \begin{equation} \frac{d[M]}{dt}= k_M[MP] - k_{SM}[M][X] \end{equation} \begin{equation} \frac{d[X]}{dt}= k_X[Cas1]^4[Cas2]^2 - k_{-X}[X] - k_{SM}[M][X] + \delta_{SM}[SM] + P_{SM}[SM] \end{equation} \begin{equation} \frac{d[SM]}{dt}= k_{SM}[M][X] - \delta_{SM}[SM] - P_{SM}[SM] \end{equation} \begin{equation} \frac{d[I]}{dt} = f'(t) \end{equation} \begin{equation} \frac{d[Ins]}{dt}= P_{SM}[SM] \end{equation} Equation $(9)$ represents the inductor concentration time derivative function. It is also important to state because through this equation, in the MATLAB written codes, it is possible to simulate the insertions results.
Parameters
The Differential equations system makes use of the following parameters that have several physical interpretations:
Parameter | Interpretation |
$r_1$ | Regulated production rate for Cas1 proteins |
$C_1$ | Characteristic concentration of inductor for Cas1 proteins' regulated production |
$b_1$ | Native production rate for Cas1 proteins |
$\delta_1$ | Degradation of Cas1 proteins |
$r_2$ | Regulated production rate for Cas2 proteins |
$C_2$ | Characteristic concentration of inductor for Cas2 proteins' regulated production |
$b_2$ | Native production rate for Cas2 proteins |
$\delta_2$ | Degradation of Cas2 proteins |
$r_{RT}$ | Regulated production rate for RT proteins |
$C_{RT}$ | Characteristic concentration of inductor for RT proteins' regulated production |
$b_{RT}$ | Native production rate for RT proteins |
$\delta_{RT}$ | Degradation of RT proteins |
$r_{UM}$ | Regulated production rate for UM proteins |
$C_{UM}$ | Characteristic concentration of inductor for UM proteins' regulated production |
$b_{UM}$ | Native production rate for UM proteins |
$\delta_{UM}$ | Degradation of UM proteins |
$k_{MP}$ | Affinity for RT to bind with msr-msd |
$\delta_{MP}$ | Degradation constant for the Message Processing complex |
$k_{M}$ | Message Processing velocity |
$k_{X}$ | CAS Complex formation constant |
$k_{-X}$ | CAS Complex dissociation constant |
$k_{SM}$ | Affinity for CAS Complex to bind to the message |
$\delta_{SM}$ | Degradation constant for the Storage Machinery complex |
$P_{SM}$ | Probability of insertion of the Storage Machinery complex |
Modified Quasi-Steady State Approximation
As it can be seen from above, the final differential equations system remains with several free parameters. A practical but still very accurate approach to describe the system is to use a modified quasi-steady state approximation. An improvement to our mathematical model would be to use a more sophisticated method of approximation, mainly because along the intermediate species, between the $I$ concentration and the $Ins$ description, $MP$ may not be stable. However, the amount of experimental data that could be gathered with the lab equipment and techniques that were available to us was not enough for the use of a more complex model.
Conceptually, it is expected that $MP$ will need some time from the $RT-UM$ binding to the $M$ formation, because it involves a retrotranscription, therefore, its concentration is not stable. A simple way to grasp this idea is to imagine that $MP$ concentration grows with a sudden increase in protein production, falling sometime after the increment is processed and liberated as $M$.
Under this assumption, Equations 3 and 4 are equal to zero. From Equation 3, $k_{MP}[RT][UM]$ is equal to $\frac{r_{RT}[I]}{C_{RT}+[I]} + b_{RT} - \delta_{RT}[RT] + \delta_{MP}[MP] + k_M[MP]$. From Equation 4, $k_{MP}[RT][UM]$ is equal to $\frac{r_{UM}[I]}{C_{UM}+[I]} + b_{UM} - \delta_{UM}[UM]$. Substituting $k_{MP}[RT][UM]$ into Equation 5, the expressions result in $\frac{d[MP]}{dt}= \frac{r_{RT}[I]}{C_{RT} + [I]} + b_{RT} - \delta_{RT}[RT]$ $\frac{d[MP]}{dt}= \frac{r_{UM}[I]}{C_{UM}+[I]} + b_{UM} - \delta_{UM}[UM] - \delta_{MP}[MP] - k_M[MP]$, which is the rate of $RT$ binding to $UM$.
Taking Equation 8 as equal to zero, solving for $P_{SM}[SM]$ and substituting in Equation 10: $\frac{d[Ins]}{dt}= k_{SM}[M][X] - \delta_{SM}[SM]$ From Equation 6 as equal to zero and solving for $k_{SM}[M][X]= k_M[MP]$ and substituting in Equation 10: $\frac{d[Ins]}{dt}= k_M[MP] - \delta_{SM}[SM]$.
Considering $[SM]=0$, then $\frac{d[Ins]}{dt}= k_M[MP]$.
The interpretation of this expression is that the rate of insertion is delayed with respect to the $RT$ bound to $UM$, under the assumptions that the Cas Complex inserts the target DNA sequence in an insignificant amount of time compared to the time that the retrotranscription of the protospacer takes, so no effective amount of Complex is bound to protospacer at any given time.
Equations
\begin{equation}
\frac{d[MP]}{dt}= \frac{r_{UM}[I]}{C_{UM}+[I]}+b_{UM}-\delta_{UM}[UM] - \delta_{MP}[MP] - k_M[MP]
\end{equation}
\begin{equation}
\frac{d[MP]}{dt}= \frac{r_{RT}[I]}{C_{RT}+[I]}+b_{RT}-\delta_{RT}[RT]
\end{equation}
\begin{equation}
\frac{d[Ins]}{dt}= k_M[MP]
\end{equation}
Protein Production Model
Equations
\begin{equation}
\frac{d[CasWT]}{dt}=\frac{r_{WT}[I]}{C_{WT}+[I]}+b_{WT}-\delta_{WT}[CasWT]
\end{equation}
\begin{equation}
\frac{d[RT]}{dt}=\frac{r_{RT}[I]}{C_{RT}+[I]}+b_{RT}-\delta_{RT}[RT]
\end{equation}
Experimentation
In order to obtain values for the expressions describing protein production a quantification of each of the proteins was done at particular induction conditions, inductor concentration and induction time. The quantification was done by Biuret method. These experiments recorded the population and the protein content on various bacteria populations, with different known IPTG constant concentrations (0.1mM, 0.5mM and 1 mM) in the medium. These samples were analyzed in a UV-Vis spectrophotometer in order to obtain their absorbances and relate them to the correspondent amount of protein.The experimental data obtained described the regulated protein production of reverse transcriptase ($RT$), $Cas1$ and $Cas2$ proteins. The Cas proteins measurements were not able to differentiate between each of them, but contained the information about their total production. The parametrization of the production of total $Cas1$ and $Cas2$ was considered appropriate. The $RT$ protein production.
This data was then adjusted on MATLAB by a non-linear square errors minimization to the differential equations that would describe only the protein production:
Parameters
The results of parametrization were very inconclusive, mainly due to a lot of local minimums appearing, as well as divergence in the parameter search. Additionally, these minimums often displayed that IPTG did not affect the protein production, something opposite to what was expected.
The gathered experimental data varied a little with different IPTG concentrations, but the results of the production differential equations with the best-fit parameters we found ended up not varying that significatelly with different IPTG concentration.
The parameters were adjusted to the differential equations model throught the lsqnonlin method in matlab. This parametrization helped us in obtaining an approximation of the protein production and the subsequent insertion inside the bacteria's genome. The obtained numerical values for the parameters obtained are the following:
Parameter | Value |
$r_{WT}$ | $7.473778 \cdot 10^{-2}$ mgmL$^-1$s$^-1$ |
$C_{WT}$ | $5.630415 \cdot 10^{-2}$ mM |
$b_{WT}$ | $5.976100 \cdot 10^{-2}$ mgmL$^-1$s$^-1$ |
$\delta_{WT}$ | $2.220446 \cdot 10^{-14}$ mgmL$^-1$s$^-1$ |
$r_{RT}$ | $7.883950 \cdot 10^{-4}$ mgmL$^-1$s$^-1$ |
$C_{RT}$ | $2.220487 \cdot 10^{-14}$ mM |
$b_{RT}$ | $1.140019 \cdot 10^{-1}$ mgmL$^-1$s$^-1$ |
$\delta_{RT}$ | $2.222364 \cdot 10^{-14}$ mgmL$^-1$s$^-1$ |
Graph of protein production: E-Coding
Sudden bulk induction
Expression
Insertions
Initial conditions expansion
Industry induction
Expression
Insertions
Initial conditions expansion