Enzyme
Kinetics
Enzyme Kinetics and Least Squares Regression
The efficiency of our RESCUE system is likely to be dependent on multiple factors such as mismatch distance, length of spacer regions as with ADAR-dCas13b constructs (Cox et al., 2016), as well as the relative concentrations of the substrates/enzymes. The concept of regression models can be utilized to identify and evaluate the significance of these factors from experimental results. As such, a early build of an enzyme kinetics regression model on dCas13b-APOBEC editing efficiency may help us to gain further insights of the RESCUE system.
Goal
- 1. Simulate the RESCUE system under different relative concentrations of substrates and enzymes to determine the concentrations that might yield maximum efficiency.
- 2. Determine the change in the binding and catalytic efficiency when spacer length and mismatch distance is varied, which will help in the design of gRNA for more efficient base editing.
Our assumptions of the model are as follows:
- 1. Association between dCas13b and gRNA is reversible and precedes enzyme-gRNA complex association with substrate mRNA. This is because dCas13b requires the gRNA to bind to the correct target sequence.
- 2. Once the Enzyme-gRNA-Substrate-mRNA trinity complex (ERS) is formed, the reaction will proceed in a single direction to produce the cleaved product.
First, we build upon a kinetics model for our fusion protein product.
Where:
Given the set of kinetic equation as defined above, the rate of change of each protein/RNA can then be described in the following differential equations as follow:
$$\delta_{t}S = G_{s} - D_{s\cdot}S -k_{2}\cdot ER\cdot S +k{-2}\cdot ERS \quad \textrm{(Equation 1)}$$
$$\delta_{t}ER = k_{1}\cdot E\cdot R - k{-1}\cdot ER - k_{2}\cdot ERS +k_{-2}\cdot ERS \quad \textrm{(Equation 2)}$$
$$\delta_{t}ERS = k_{2}\cdot ER\cdot S - k_{3}\cdot ERS - k_{-2}\cdot ERS \quad \textrm{(Equation 3)} $$
$$\delta_{t}P = k_{3}\cdot ERS - D_{s}\cdot P \quad \textrm{(Equation 4)} $$
$$\delta_{t}F = k_{4}\cdot P - D_{f}\cdot F \quad \textrm{(Equation 5)} $$
where Equation 4 assumes that ERS to ER + P is an non-energetically favourable step, and hence is unlikely to occur.
The differential equations, when plotted, yield a complex, non-linear kinetic curve that can be comparable to the standard Komod kinetics curve. But unlike Komod kinetics, logarithmic conversion of the axes may not always result in a linear plot. Non-linear least-square regression (LSR) is thus an alternative strategy that can be used to obtain a best fit curve for enzymatic assays, capable of generating a set of possible values for the coefficients within the differential equations.
S - Unedited Substrate mRNA
R - Guide RNA
E - Enzyme for C to U editing
ER - Enzyme-gRNA complex
ERS - Enzyme-gRNA-Substrate complex
P - Edited Product mRNA
F - Green Fluorescent protein produced
Given the set of kinetic equation as defined above, the rate of change of each protein/RNA can then be described in the following differential equations as follow:
$$\delta_{t}S = G_{s} - D_{s\cdot}S -k_{2}\cdot ER\cdot S +k{-2}\cdot ERS \quad \textrm{(Equation 1)}$$
$$\delta_{t}ER = k_{1}\cdot E\cdot R - k{-1}\cdot ER - k_{2}\cdot ERS +k_{-2}\cdot ERS \quad \textrm{(Equation 2)}$$
$$\delta_{t}ERS = k_{2}\cdot ER\cdot S - k_{3}\cdot ERS - k_{-2}\cdot ERS \quad \textrm{(Equation 3)} $$
$$\delta_{t}P = k_{3}\cdot ERS - D_{s}\cdot P \quad \textrm{(Equation 4)} $$
$$\delta_{t}F = k_{4}\cdot P - D_{f}\cdot F \quad \textrm{(Equation 5)} $$
where Equation 4 assumes that ERS to ER + P is an non-energetically favourable step, and hence is unlikely to occur.
The differential equations, when plotted, yield a complex, non-linear kinetic curve that can be comparable to the standard Komod kinetics curve. But unlike Komod kinetics, logarithmic conversion of the axes may not always result in a linear plot. Non-linear least-square regression (LSR) is thus an alternative strategy that can be used to obtain a best fit curve for enzymatic assays, capable of generating a set of possible values for the coefficients within the differential equations.
The sum of squared differences is the sum of the computed y-axis differences of each readings to their respective points on the current curve. Mathematically, the squared differences (abbreviated as D) can be expressed through Equation 6 below.
$$D = \sum_{i=1}^{n} (y_{i}-p_{i})^2 \quad \textrm{(Equation 6)} $$
where \(y_{i}\) is the observed data and \(p_{i}\) is the predicted data.
Subsequently, each coefficients are increased by a small interval of 0.005 A.U. and also decreased by the same interval to check for the lower D value. The lowest D value will be selected for and the coefficients will be updated as accordingly. The simulations will continue to run until no further decrease to the D can be found (localised to a minima). To perform the simulation, we hard-coded the entire regression script in Python, which is shown in the document below.
$$D = \sum_{i=1}^{n} (y_{i}-p_{i})^2 \quad \textrm{(Equation 6)} $$
where \(y_{i}\) is the observed data and \(p_{i}\) is the predicted data.
Subsequently, each coefficients are increased by a small interval of 0.005 A.U. and also decreased by the same interval to check for the lower D value. The lowest D value will be selected for and the coefficients will be updated as accordingly. The simulations will continue to run until no further decrease to the D can be found (localised to a minima). To perform the simulation, we hard-coded the entire regression script in Python, which is shown in the document below.
As a proof-of-concept, we ran a simulation for a likely positive control experiment that may occur during EGFP reporter construct testing, where cells will be transfected with functional EGFP instead of the mutant EGFP. This thought experiment assumes that the enzyme (E) is absent and the substrate (S) produces functional fluorescent protein product (F). As such, the model is simplified to contain only 2 equations, with only 4 coefficients to tune.
$$ S \rightarrow F$$
$$ \delta_{t}S = G_{S} - D_{S}\cdot S $$
$$ \delta_{t}F = k_{4}\cdot S - D_{f}\cdot F $$
We started off defining an initial set of coefficients for the simulation. A time-dependent graph is plotted for the fluorescent concentrations obtained using defined “actual” coefficients (experimental fluo). Guess coefficients (1st guess fluo) were randomly assigned, and derived coefficients (theoretical fluo) were calculated using the Python script. The plots were shown in Table 1 and Figure 3 (left). To test our script in a more realistic situation, we introduced some noise (assumed to follow Gaussian distribution with standard deviation of 0.5) to our simulation. The results were shown in Table 2 and Figure 3 (right). In general, the generated parameters are found to be a good fit to each of our initial input values.
$$ S \rightarrow F$$
$$ \delta_{t}S = G_{S} - D_{S}\cdot S $$
$$ \delta_{t}F = k_{4}\cdot S - D_{f}\cdot F $$
We started off defining an initial set of coefficients for the simulation. A time-dependent graph is plotted for the fluorescent concentrations obtained using defined “actual” coefficients (experimental fluo). Guess coefficients (1st guess fluo) were randomly assigned, and derived coefficients (theoretical fluo) were calculated using the Python script. The plots were shown in Table 1 and Figure 3 (left). To test our script in a more realistic situation, we introduced some noise (assumed to follow Gaussian distribution with standard deviation of 0.5) to our simulation. The results were shown in Table 2 and Figure 3 (right). In general, the generated parameters are found to be a good fit to each of our initial input values.
Without Gaussian Noise | $$G_{s}$$ | $$D_{s}$$ | $$k_{4}$$ | $$D_{f}$$ |
---|---|---|---|---|
Actual (Theoretical) | 20.00 | 0.10 | 15.00 | 0.25 |
Guess (1st Guess) | 10.00 | 0.05 | 12.50 | 0.25 |
Derived (Experimental) | 15.04 | 0.15 | 16.80 | 0.15 |
With Gaussian Noise | $$G_{s}$$ | $$D_{s}$$ | $$k_{4}$$ | $$D_{f}$$ |
---|---|---|---|---|
Actual (Theoretical) | 2.00 | 0.10 | 2.50 | 0.05 |
Guess (1st Guess) | 0.50 | 0.15 | 0.75 | 0.15 |
Derived (Experimental) | 1.45 | 0.05 | 1.55 | 0.05 |
This experiment has also benefits us to reduce the number of coefficients required for subsequent test, if time dependent results can be obtained from laboratory experiments. The construction of a positive control experiment also allowed us to immediately obtain a gauge of the \(G_{s}, D_{s}, k_{4}\) and \(D_{f}\), thereby reducing the total number of coefficients we need to test for the final fusion protein experiment by 4.
In general, we have shown that the LSR method may be used to derive approximate set of coefficients for dCas13-APOBEC mediated editing on EGFP mRNA and therefore fluorescence. More experimental data will be needed for better optimization of model. In addition, our preliminary investigation suggested that coefficients with smaller contribution to the system overall, such as \(D_{s}\) and \(D_{f}\) cannot be determined with a good certainty. We hope to further improve the LSR by adding in further algorithms like basin hopping for better optimization.
In general, we have shown that the LSR method may be used to derive approximate set of coefficients for dCas13-APOBEC mediated editing on EGFP mRNA and therefore fluorescence. More experimental data will be needed for better optimization of model. In addition, our preliminary investigation suggested that coefficients with smaller contribution to the system overall, such as \(D_{s}\) and \(D_{f}\) cannot be determined with a good certainty. We hope to further improve the LSR by adding in further algorithms like basin hopping for better optimization.