Difference between revisions of "Team:Cornell/Model"

(Changed text/javascript to application/javascript in Modeling)
Line 101: Line 101:
 
                         </div>
 
                         </div>
 
                         <div class="dropdown">
 
                         <div class="dropdown">
                             <button class="dropbtn">MODELING</button>
+
                             <button class="dropbtn active-page">MODELING</button>
 
                             <div class="dropdown-content modeling-dropdown-content">
 
                             <div class="dropdown-content modeling-dropdown-content">
 
                                 <a href="https://2018.igem.org/Team:Cornell/Model">MODELING</a>
 
                                 <a href="https://2018.igem.org/Team:Cornell/Model">MODELING</a>
Line 134: Line 134:
 
                         </div>
 
                         </div>
 
                         <div class="dropdown">
 
                         <div class="dropdown">
                             <button class="dropbtn active-page" onclick="location.href='https://2018.igem.org/Team:Cornell'">HOME</button>
+
                             <button class="dropbtn" onclick="location.href='https://2018.igem.org/Team:Cornell'">HOME</button>
 
                         </div>
 
                         </div>
 
                     </div>
 
                     </div>

Revision as of 15:47, 16 October 2018

Team:Cornell/Modeling - 2018.igem.org

Modeling
Overview

At the core of our project is a novel genetic circuit. Our first objective, then, is to accurately model this circuit to provide insights on the characteristics of the system and offer suggestions to our Wet Lab team with regards to the specific components that would best yield desired results in actual lab trials. Extending beyond the scope of our own project, we wanted to create a set of models that can be used by future teams to evaluate the dynamics of their own circuits. To that end, we needed to identify general characteristics of this type of AND gate genetic circuit that would be applicable to any circuit of this type.

Our first approach was a deterministic system of ordinary differential equations (ODEs) used to model the kinetics of the circuit. The resulting system can be found below, in figure 1. While developing systems of ODEs to model reaction kinetics is quite common practice for its relative ease of construction and interpretation, this class of models does have considerable drawbacks. Most notably, systems of ODEs are generally highly sensitive to parameter values. For example (provide example of very simple system of like two ODEs that is very sensitive to one parameter). It can be difficult or nearly impossible to obtain the necessary parameter values for a system of ODEs modeling reactions within a genetic circuit, which leaves such a model vulnerable to large errors.

To avoid producing such results, we took three key steps. First, we utilized derivative-free optimization to estimate parameters in our model that did not have sufficient documentation in literature or data from our lab. Second, we generalized our system of ODEs and performed a mathematical analysis of the system. Using degree theory, we were able to prove multiple properties regarding the dynamics of the system that allowed us to construct proofs regarding properties of our system without having to worry about specific parameter values. Stochastic models more closely simulate the reality of kinetics inside a cell, particularly in cases of low molecule counts (such as gene copy number) and when the quasi-steady-state assumptions typically used in deterministic models are not valid. In these situations, the system is far more sensitive to noise, and comparing the difference between a stochastic and deterministic model of the system can help estimate the reliability of the circuit when implemented.

Creating the Model

Deterministic Model

It is rather natural to convert a set of kinetic reaction equations to a system of ordinary differential equations (ODEs). To create our deterministic model, then, we began by identifying the reactions relevant to our system. Our primary focus was the reactions involved in and surrounding our “AND” gate genetic circuit.

Figure 1: Reaction Network for our Genetic Circuit.

Assuming an excess of ribosomes and, given that the activator and repressor mRNA are translated by ribosomes immediately as they are produced, a steady state production of activator and repressor mRNA, we can construct our system of ODEs as the following:

$$\dot{x_1} = U_1 - \frac{P_{Tot,lon}*(x_1/k_1)}{1+x_1/k_1+x_3/k_3}-\delta x_1$$ $$\dot{x_2} = H(x_1) - \frac{P_{Tot,CLP}*(x_2/k_2)}{1+x_2/k_2+x_4/k_4}-\delta x_2 - \frac{1}{2}\hat{H}(x_2,x_4)$$ $$\dot{x_3} = U_3 - \frac{P_{Tot,lon}*(x_3/k_3)}{1+x_1/k_1+x_3/k_3}-\delta x_3$$ $$\dot{x_4} = H(x_3) - \frac{P_{Tot,CLP}*(x_4/k_4)}{1+x_2/k_2+x_4/k_4}-\delta x_4 - \frac{1}{2}\hat{H}(x_2,x_4)$$

Where x_1 := concentration of activator, x_2: = concentration of hrpS, x_3 := concentration of repressor, and x_4 := concentration of hrpR. Each k_i represents the dissociation-association constants between protein i and the protease it has been tagged for, and delta represents the dilution rate. U_i represents the production rate of the activator and repressor, which are solely dependent on the RNA thermometer controlling the mRNA production of those proteins because we have assumed an excess of ribosomes. The equations governing the output of U_1 and U_2 can be written as:

$$U_i(t) = R_{max,mRNA}*g(f(t)) $$

Where f(t)is some function mapping a time t in a trial to a temperature T, and then g(T) maps from that temperature T to a fraction that scales R_{max,mRNA}, the maximum rate of mRNA production, accordingly. It is important to note that f(t) is something that we have control over. That is, f(t) determines our temperature oscillation, which we will want to work with later on.

H(x) represents the Hill kinetic equations dependent on protein x_i, specified in the ODEs. Depending on the situation, this equation can either be of the form (R_{max}*k_i)/(k_i + x_i) if the equation is dependent on a repressor, or (R_{max}*x_i)(k_i + x_i) if the equation is dependent on an activator, where R_{max} represents some maximum rate of production. Ĥ(x) represents the Hill kinetics equations related to the dimerization of hrpR and hrpS, which can be written as a product of two Hill kinetics equations for each protein, such as (R_{max}*x_2*x_4)((k_2+x_2)*(k_4+x_4)). Since we measured GFP fluorescence, the actual output of GFP from the model could be interpreted on a relative scale, such that we could assume the rate of GFP production is proportional to Ĥ(x).

The system of equations above provides the structure of the circuit itself, and the four primary proteins that are interacting, but in order to incorporate elements such as the RNA thermometers and temperature oscillations, we had to carefully design our simulations.

To implement the deterministic model, we used MATLAB and the ODE23s solver. The differential equations used are the same as those above, with a few added tricks. We assume that the speed at which an RNA hairpin forms and dissolves is significantly faster than the other reactions in the circuit, so we treat the RNA thermometer as instantaneous in our simulations. To model this RNA Thermometer, we create a function, T(t), within our script containing the system of ODEs in MATLAB such that T : t → {0,1}, where t is the current time within the simulation. We can then easily set any period for our temperature oscillation and T will translate this temperature to a binary value that can then be used in the system of ODEs to control the output of the activator and repressor.

Simulation Results

IFor general results, we ran a simulation with the parameters we found and derived. A full description of these values and our derivations can be found here. We used a period of one hour for this trial, as an educated guess of a reasonable timescale. A sweep of periods is also performed. The results can be found below in Figure //X//. Note again that the GFP production is relative to the combined hill kinetics equation of hrpR and hrpS, and does not represent an actual quantity of GFP, though it has been scaled with a reasonable R_{max} parameter that puts GFP output within the same order of magnitude of the observed hrpR/S output. We can make a few observations based solely on this simulation.

Figure 2: A basic trial using the calculated parameters.
Figure 3: A frequency sweep of our dynamical system.

First, we can note that hrpS production and degradation is seemingly rapid, and oscillates between two states. The rapid change in concentration is likely due to the simplification of the system. To obtain general results, we have ignored some lag time between processes, such as the RNA thermometer referenced above, that would likely smooth out the concentration curve for hrpS. The simulation also ignores heat transfer, which would likely slow down all processes. Regardless, we are still able to ascertain that hrpS should oscillate in a fashion that resembles a square wave in reality. On the other hand, hrpR clearly does not oscillate properly given the parameters in this simulation. The amplitude of the oscillations increase over time, and start off quite small. Still, we observe a decent output of GFP as desired. The fact that GFP is being produced suggests that the chosen frequency is within some acceptable range, though we do not know yet if this frequency is optimal. Figure 5 shows the results from a frequency sweep, where we ran the simulation with a period of oscillation ranging from one minute to two and a half hours.

The result from this sweep suggest that, although the acceptable band of frequencies is rather large, between one and two hours, we do get the desired high pass and low pass behavior. GFP output for high frequencies, anything with a period less than 30 minutes, is very low according to our model. Similarly, when the period gets too large, implying a very low frequency, we can observe a drop-off in GFP output, therefore producing the desired band-pass filter characteristics.

To further investigate what is happening with hrpR production, we performed a sensitivity analysis on the system.

Sensitivity Analysis

Perhaps the most important component of analyzing a system of differential equations is testing the sensitivity of the output to some perturbation in parameter values. Since we have a large system, performing a direct analysis on the system would be quite intensive, and would likely provide little insight besides a jumble of symbols. Instead, we perform a computational analysis by varying parameter values over a reasonable range of values and observe the resulting variance in output.

Figure 4: Varying the production of activator and repressor.

When calculating the production rate of the activator and repressor, we assumed that ribosomes translate mRNA at a rate of 20 amino acids per second. This value can vary between 5 and 25 amino acids per second, so we varied the production rate of the activator and repressor accordingly. The results of varying these parameter values can be observed in figure 4, as the maximum difference between GFP output is on the order of 10^-5, which is minimal in this system. The results suggest that the system is not particularly system to these parameters, and we have likely not missed any behavior of the system due to some error in the calculation of these values.

Figure 5: Varying the dissociation coefficients of the activator and repressor.

Recall the strange behavior observed in the first simulation with regard to hrpR production. Looking at the results in Figure 5, it is clear that the system is quite sensitive to the dissociation constants of TetR with hrpR and Sigma-F with hrpS. This makes sense intuitively, since a high K value implies the reverse reaction is favored. So K_r being set to 5.8, as shown in our calculations here implies that the reverse reaction between TetR and hrpR is strongly favored over the forward reaction. Therefore, the presence of TetR in the system has minimal impact on the production of hrpR. This is depicted in Figure 4, where it can be observed that hrpR is not oscillating with an amplitude much smaller than that of hrpS. TetR has been shown to be an effective repressor in multiple studies, so it is likely that the parameter value we obtained is inaccurate. The fact that one parameter, in this case K_r can have such a drastic effect on the output of the model only reaffirms the necessity to perform a careful proof-based analysis of the system.

Figure 6: Varying the concentration of protease species.
Stochastic Model

In most stochastic models, every single reaction in the system is simulated. For example, while a deterministic model would model cooperative interactions with a Hill coefficient, a stochastic model would model binding and unbinding of every possible intermediate and possible number of attached ligands, with corresponding rate constants in each case. For our system, there is not enough data available for each of these reactions to accurately simulate it; this is compounded by the fact that while only equilibrium constants are needed for deterministic models, both the forwards and backwards rates are needed to do a stochastic model.

However, the kinetics in our system happen on vastly different time scales--dimerization can be assumed to happen much faster than something like transcription. This allows us to make the same quasi-steady-state assumptions for those reactions, greatly simplifying the model and increasing the speed of the simulation (Rao & Arkin, 2003). Additionally, we assume that binding of inhibitors and activators is fast, which allows us to use steady state inhibition kinetics. Finally, we make the implicit assumption that reaction propensities are equal to the reaction rates, which may not be true in general. With that in mind, the entire stochastic system is as follows:

Figure 7: Reaction Network for Stochastic Model.

where the bracket ([]) notation denotes counts rather than concentration.

In practice, we found that the stochastic model is quite sensitive to initial choices of parameters. While the deterministic model happily calculates concentrations on the order of 10-13 μM, this corresponds to less than one molecule of the species in the cell, which is modeled as 0. Thus, it appears that averaging only occurs at a cellular level--at any given time, each individual cell either expresses or does not express the downstream product regulated by our circuit. Noise is moderate and results in fluctuations when the deterministic model predicts steady state, but the general trend of oscillations remains.

Consider the variation of the concentration of protease species in Figure 6. While the system is clearly sensitive to P_{clp}, the system still behaves in a similar fashion to the base trial, depicted in Figure 2. This is good, because it means we can treat this part of the system modularly. That is, future teams could plug in different proteases and the overall behavior of the system will remain the same, but the optimal frequency band for GFP production will vary.

Proofs on our Dynamical System

As demonstrated in the sections above, the dynamical system model of biochemical reaction networks, such as our genetic circuit, are often high-dimensional, nonlinear, and contain many unknown parameters. Recently, work has been done by Craciun et. al and Mcbride et. al to determine the behavior of various biochemical reaction networks regardless of parameter values (Mcbride, Craciun). Both groups utilized degree theory to reason about their respective systems under consideration. The idea of degree theory is to give a count of the number of solutions of nonlinear equations but to count solutions in a special way so that the count is stable to changes in the equations (Craciun). To demonstrate this idea, consider the function f_t(x) = x^2 + t. When t > 0 there exists no solutions to f_t(x) = 0. When t < 0, there exists two solutions to f_t(x) = 0, x =±√(-t) . In the case that t=0, there exists exactly one solution, x=0. Clearly, the count of solutions of this nonlinear equation is dependent on the value of parameter t. The goal of degree theory, then, is to find a way to ascertain a count of solutions for such an equation that does not change depending on the value of t. The actual derivation of such a count is outside the scope of this section, but a collection of lectures on the topic can be found here.

Theorem regarding behavior of our dynamical system.

In the modeling paper we wrote, which can be found here, we prove that the dynamical system given above has a number of equilibria, or solutions, that are unaffected by the types of protease and the dimerization of hrpR and hrpS. We leverage the concept of Degree Theory and some proofs by McBride to prove our own theorem regarding our dynamical system. To summarize our work in this section of the paper, using theorems from McBride et. al and our own theorem regarding our dynamical system, we have shown the following claims hold:

  1. The number of equilibria of the system is independent of the type and number of protease.
  2. The number of equilibria of the system is independent of the dimerization of hrpR/S.
  3. Our system is modular with respect to the proteins used in the "AND" gate that dimerize and the type of proteases used to tune the system.

These facts give us and future teams a considerable amount of freedom with regard to designing a biological "AND" gate. That is, one can mix and match protease tags and types and rates of dimerization without risking a large shift in the behavior of the system. We hope future teams can use this framework to further optimize genetic circuits that can be represented by the same, or a similar, dynamical system.

Stochastic Model

In most stochastic models, every single reaction in the system is simulated. For example, while a deterministic model would model cooperative interactions with a Hill coefficient, a stochastic model would model binding and unbinding of every possible intermediate and possible number of attached ligands, with corresponding rate constants in each case. For our system, there is not enough data available for each of these reactions to accurately simulate it; this is compounded by the fact that while only equilibrium constants are needed for deterministic models, both the forwards and backwards rates are needed to do a stochastic model.

However, the kinetics in our system happen on vastly different time scales--dimerization can be assumed to happen much faster than something like transcription. This allows us to make the same quasi-steady-state assumptions for those reactions, greatly simplifying the model and increasing the speed of the simulation (Rao & Arkin, 2003).

In practice, we found that the stochastic model is quite sensitive to initial choices of parameters. While the deterministic model happily calculates concentrations on the order of 10-13 μM, this corresponds to less than one molecule of the species in the cell, which is modeled as 0. Thus, it appears that averaging only occurs at a cellular level--at any given time, each individual cell either expresses or does not express the downstream product regulated by our circuit. Noise is moderate and results in fluctuations when the deterministic model predicts steady state, but the general trend of oscillations remains.

Parameter Estimation

A large number of our parameters are unknown, and we are unable to track the concentrations of every species while testing. Thus, the problem of finding the parameters that produce the observed data is quite underdetermined. However, we can still make progress by applying some priors to the parameter values, such as the expected order of magnitude of each parameter. Then, we can frame this as a nonconvex optimization problem--minimizing the difference between the observed data and the calculated data from the simulations given the parameters, with additional regularization on biologically implausible solutions. This also meshes well with the nature of our experimental data--while we may not have high quality data, we can still incorporate data from multiple trials to help cut through the noise and find the most probable result. The actual implementation was done using the BlackBoxOptim.jl package for Julia, which uses a variety of derivative-free optimization methods that allows for optimization of any arbitrary objective function. The objective function minimized here is the L2 difference between the observed and predicted reporter concentrations in the experimental data. In order to estimate the variance in our predicted parameters and to mitigate overfitting, we performed leave-one-out cross-validation on our collected data. We assume that GFP had folding time of 6 minutes with a very low variance, so the fluorescence measured at a given time would be reflective of the total GFP concentration at a time point 6 minutes before.

Unfortunately, we were unable to obtain good results. This is primarily due to the limited data we had available--we only had access to normalized fluorescence data that gave a relative concentration of hrpR and hrpS, but not that of $\sigma F$ or tetR. On the other hand, the model results are of absolute concentrations, and we did not have a way to convert between the two. This extra free parameter ended up making it too difficult to solve. It's possible that this could have been added as a parameter and optimized as well, but we did not have time to do it.

Implementation


The stochastic model and parameter estimation were performed in Julia, using the Julia Differential Equation library and the package BlackBoxOptim.jl. Here is an example of how to define the stochastic model: each possible elementary reaction (such as production, degradation, or bimolecular interaction) is defined as a jump with a stoichiometric change and a propensity, which describes the probability that that reaction occurs. For simplicity, here we assume that propensities take the same form as the rate in mass-action kinetics, and that the species vector `u` and other parameters are already in the units of molecule counts.

The deterministic model and all related sensitivity analysis was performed in MATLAB using the ode23s differential equation solver. To test the system, the program is broken down into two scripts, a main file and a function file containing the ODEs. The function file specifies the differential equations and the parameter values, including the binary temperature function described in the Deterministic Model Section. To test the system, the function file was designed to accept parameter value input, such that a value could be varied in the main file and this value would override the one provided in the function file. Using this program, we were able to loop through period values in the main file and plot the graph depicted in Figure 3. The sensitivity analysis was performed in a similar manner, looping through a range of values in the main file for an isolated parameter and plotting the output, as shown in Figures 4,5, and 6.

$$rate8(u,p,t) = (1-(mod(t, 3600) > 1800)) * u[7] * p["H_tet"]$$ $$function repressor_prod!(integrator)$$ $$integrator.u[5] += 1$$ $$end$$ $$rate9(u,p,t) = p["P_lon"] u[5] / (p["K_lon"] * (1 + u[4]/p["K_lon"] +u[5]/p["K_lon"]))$$ $$rate5_delt(u,p,t) = p["delt"] * u[5]$$ $$function repressor_deg!(integrator)$$ $$integrator.u[5] -= 1$$ $$end$$ $$repressor_prod_jump = VariableRateJump(rate8, repressor_prod!)$$ $$repressor_deg_jump = ConstantRateJump(rate9, repressor_deg!)$$ $$repressor_delt_jump = ConstantRateJump(rate5_delt, repressor_deg!)$$

Conclusions

In creating deterministic and stochastic models of our genetic circuit, we were able to come to a few conclusions. Most notably, our models reinforced the desired behavior of the circuit. Although the deterministic model suggested that the band of frequencies the system accepts is rather large, on the order of an hour, we should see low pass and high pass behavior. The stochastic model, which more accurately captures the behavior of a biochemical reaction network, produced near deterministic behavior, thereby supporting the results and conclusions drawn from the deterministic model.As shown in our own modeling process, parameter values for these systems can be quite difficult to obtain, and can often lead to behavior that is unrepresentative of the reality of the system. This reaffirmed the importance of constructing and proving claims regarding our dynamical system. These claims and proofs created a framework for ensuring that the system is modular with respect to protein degradation tags and the dimerization of the proteins used in the "AND" gate, which could help future teams trying to create similar circuits.

Interactive Interface


To help users visualize protein production and degradation in the genetic circuits our wet-lab members engineered, we developed an interactive web tool as part of the iGEM website that graphs target protein quantities in our bacteria over time. In creating the web-tool, we utilized D3.js, a JavaScript library used to produce data visualizations. Parsing data from csv files containing values based on our mathematical model of protein production and degradation, the interactive graph displays in real-time the theoretical amount of a target protein in bacteria. The interactive graph further allows users to track changes in protein quantity over ranging frequencies of input signals, which can be determined by the user using a slider. The webkit overall provides a convenient tool for data visualization over varying frequencies. Our software can be accessed here.

References


  1. Rao, C. V., & Arkin, A. P. (2003). Stochastic chemical kinetics and the quasi-steady-state assumption: Application to the Gillespie algorithm. The Journal of Chemical Physics, 118(11), 4999–5010. https://doi.org/10.1063/1.1545446
  2. Tsitouras Ch., “Runge–Kutta pairs of order 5(4) satisfying only the first column simplifying assumption”, Computers & Mathematics with Applications, 62 (2): 770-775, dx.doi.org/10.1016/j.camwa.2011.06.002
  3. E. Hairer, S.P. Norsett, G. Wanner, (1993) Solving Ordinary Differential Equations I. Nonstiff Problems. 2nd Edition. Springer Series in Computational Mathematics, Springer-Verlag.
  4. Di Marzo G. RODAS5(4) – Méthodes de Rosenbrock d’ordre 5(4) adaptées aux problemes différentiels-algébriques. MSc mathematics thesis, Faculty of Science, University of Geneva, Switzerland.
  5. Rackauckas, C. & Nie, Q., (2017). DifferentialEquations.jl – A Performant and Feature-Rich Ecosystem for Solving Differential Equations in Julia. Journal of Open Research Software. 5(1), p.15. DOI: http://doi.org/10.5334/jors.151
  6. StochasticDiffEq.jl. 2018, October 12. v5.8.0. https://github.com/JuliaDiffEq/StochasticDiffEq.jl.
  7. Gillespie, Daniel T. (1976). A General Method for Numerically Simulating the Stochastic Time Evolution of Coupled Chemical Reactions. Journal of Computational Physics. 22 (4): 403–434. doi:10.1016/0021-9991(76)90041-3.
  8. Salis H., Kaznessis Y., Accurate hybrid stochastic simulation of a system of coupled chemical or biochemical reactions, Journal of Chemical Physics, 122 (5), DOI:10.1063/1.1835951
  9. McBride, C. & Del Vecchio, D. (2017, July 1). Analyzing and exploiting the effects of protease sharing in genetic circuits **this work was supported by AFOSR grant number FA9550-14-1-0060 and NSF expeditions in computing award number 1521925. IFACPapersOnLine. 20th IFAC World Congress, 50 (1), 10924–10931. doi:10. 1016 / j.ifacol. 2017.08.2459
  10. Craciun, G., Helton, J. W., & Williams, R. J. (2007, November 9). Homotopy methods for counting reaction network equilibria. arXiv:0711.1552 [math]. arXiv: 0711 . 1552. Retrieved October 14, 2018, from http://arxiv.org/abs/0711.1552