Introduction to Control Theory¶
Control theory is the intersection of mathematics and engineering that is used to describe the dynamic processes in engineered systems. The methods of control theory have broad applicability, from aerospace design to large-scale electrical infrastructure to the regulation of genes. A common example of gene regulation that can be described with control theory is the galactose regulon in S. cerevisiae; galactose, a common sugar, is detected by protein Gal3p, which activates transcription of itself and Gal1p, another galactose-induced TF that activates transcription of itself and Gal3p. This dual-positive feedback loop (a.k.a. the controller in the control system) enables fast, strong transcription of enzymes capable of galactose metabolism in the presence of galactose, and relatively low expression of these enzymes in its absence [1]. These interactions can be mathematically tabulated with a system of ordinary differential equations, as they were in Venturelli et al [2]. Systems of differential equations may then be represented with control system equations.
Control systems normally consist of the following parts: a sensor, an actuator, a process, a plant and one or more controllers. Let’s describe these parts in the context of an example, like a thermostat controlling the temperature of a room. The room whose temperature (the control variable) is to be controlled is called a process. A thermometer, which measures the temperature of the room would be a sensor. The actuator is the device that influences the control variable, which would be a furnace in our example. The plant is the actuator and the process together, or the furnace and the room together. Finally, the controller uses information from the sensor to compute the control signal; in our example, the controller would be a thermostat taking information from the thermometer to calculate how much more the temperature of the room needed to change. In general, a control system takes an input from the sensor concerning the process, calculates the control variable, informs the actuator how to act and delivers an output.
Synthetic biology systems, like those in most engineering disciplines, can be described with control systems. There are numerous well-described control mechanisms in synthetic biology, including a series of well-characterized synthetic exogenous mammalian promoters described by the BostonU 2016 iGEM team [3]. Light-inducible promoters like LOV2 and PhiReX are essentially tunable controllers, mapping the energy of the light (the input) on to the expression of the genes they are driving (the output). The primary goal of LEO is to describe these light-inducible promoters both experimentally and mathematically.
The scope of LEO deals with LOV2 and PhiReX only in the context of "open loop control," a control system without feedback (e.g. a washing machine with predetermined cycles, or in our case, a preset increase in gene expression). As such, robust characterization and models of these systems is paramount to their success in more complicated control systems involving feedback; if these systems are poorly defined, then they are useless for controlling gene expression beyond "on" and "off.” Characterization and modeling go hand-in-hand, each informing the other. Characterization determines model parameters, which help design experiments to improve characterization, which in turn improves the models. The rest of this page will deal with the modeling portion side of this cycle; applying data to models, and applying models to data. Control systems are often described by their transfer functions, which are maps of differential equations in the complex frequency (or "Laplace") domain. Transfer functions take an input, light in this case, and map it onto an output. In the case of LOV2 and PhiReX the direct output is mRNA, where mRNA is proportional to the gene it transcribes, so that we are able to treat gene expression (in the case of our work, the Red Fluorescent Protein mRuby) as the output of the system. The transfer function for a given system can be experimentally determined by observing system output under a known input.
Note: All code used for modeling and data analysis by BostonU can be found in the form of IPython notebooks here.
LOV2¶
First Order Approximation¶
The simplest transfer function is first order, and of the form $G(s)=\frac{Y(s)}{U(s)}=\frac{k}{s+a}$. A first order system can be described completely by its step response, or the system response when an input that moves from 0 to some constant instantaneously for an indefinite amount of time. In our case, a step input would be the turning on of either blue or red light, depending on the light-inducible promoter, and the measured output would be measured fluorescence. A first-order system has two parameters: k, or the gain of the system (e.g. how much greater than the input the response is) and $a = \frac{1}{\tau}$, where $\tau$ is the time constant. $a$ describes how quickly or slowly a system will react in response to a stimulus. It is also the location of the system poles. A step experiment is also the easiest light program to set up, so by running this preliminary experiment in the eVOLVER we can acquire the initial parameters for our model.
Interpreting this data requires us to make some assumptions. First, because $y(t) = K(1-\frac{\alpha}{K}e^{-\beta t})$ for first order transfer functions, then we must assume that the data is going to plateau at a certain point $K$. Choosing $max(y(t)) = K$ forces the assumption that the decreasing portion of this response is an experimental anomaly.
The time constant $\tau$, an integral parameter to the first order transfer function, can be determined by assuming that $\tau$ is the time when $y(tau) = 0.63*K$ for first order systems. Solve for $\tau$:
$y(\tau) = K-\alpha(e^{\beta \tau}) = 0.63*K$
$\Rightarrow -\frac{K(0.63-1)}{\alpha} = e^{\beta \tau}$
$\Rightarrow \log(\frac{K(1-0.63)}{\alpha}) = \beta \tau$
$\Rightarrow \tau = \frac{1}{\beta}\log(\frac{K(1-0.63)}{\alpha})$
$\alpha$ and $\beta$ can be determined by taking the log regression of the step data. This can be accomplished easily and accurately with the scipy.optimize
library:
import numpy as np
from scipy import optimize
# blue light step mRuby values for LOV2, which in our lab is strain "yKL814."
# NOTE: this data also contains the "off" values, which will be used later and were taken in the same experiment.
yKL814_rep1 = np.array([114, 4143, 7597, 8402, 9246, 8999, 8318, 7357,\
5657, 5675, 3986, 1964, 2031, 1281, 965, 767])
yKL814_rep2 = np.array([95.3, 3027, 5426, 6467, 6550, 6387, 5746, 4601,\
3577, 3609, 2547, 18.4, 1359, 907, 659, 508])
σ = np.std([yKL814_rep1, yKL814_rep2], axis=0) # variance of data points
y_t = np.mean([yKL814_rep1, yKL814_rep2], axis=0) # mean of data points
t = np.array(range(len(y_t))) # one data point every hour
K = max(y_t[:8])
# get the coefficients of the log regression; p0 is an arbitrary starting point.
popt, pcov = optimize.curve_fit(lambda x,α,β: K-α*np.exp(β*x), t[:8], y_t[:8], p0=(K, 0.25))
α, β = popt
τ = 1/β*np.log(K*(1-0.63)/α) # time constant, from derivation above
Using this data set, we know $\tau \approx 1.34$. Several values are defined in terms of $\tau$. $a$, the location of the transfer function's pole, is given by $a = \frac{1}{\tau}$. $\tau_r = 2.2\tau$ is the Rise Time, or the time that it takes for the output to go from 10% to 90% of its final value. The Settling Time $\tau_s=4\tau$ is the time it takes for the function to reach 2% of its maximum value. Finally, the gain of the Transfer Function is $k = \frac{K}{\tau}$ where $K$ is the steady state value of the system. From $\tau \approx 1.34$, $a = \frac{1}{\tau} \approx 0.746$ and $k = \frac{K}{\tau} = 5894$. Recalling that for a first order system $G(s) = k\frac{a}{s+a},$ then $G(s) = 5894\frac{0.746}{s+0.746}$.
(Aside: For the purposes of our models, we calculated $k$ but did not set $G(s) = k\frac{a}{s+a}$, instead treating $k$ as the value of our step function e.g. $u(t) = ks(t)$, where $s(t)$ is the Heaviside step function. This follows for all models here, and was done to account for the variability in luminosity between shaker and eVOLVER experiments, and to account for variability between FACs readings, which are in arbitrary units.)
Plotting the step response $Y(s) = G(s)U(s)$ alongside the original data, it is clear that our model diverges from the actual behavior of LOV2 after 5 hours, though it remains within one standard deviation until the 7 hour point. A better "test" for our model would be to continue induction for more than 7 hours; in this experiment, however, the lights in the eVOLVER were turned off at hour 7 and the following 6 hours recorded the "off"-dynamics of the LOV2 controlled system.. Below is the data plotted with our model responding to an input $u(t) = s(t)-s(t-7)$
Predicting PWM with a First Order Model¶
Light pulsing programs are simply tuned, especially in the eVOLVER. It would be informative to find how expression varies across different light pulsing programs. A common method for generating complex signals is Pulse Width Modulation, or "PWM." PWM is where a signal is made out of identical step functions, turned on and off with a constant period. PWM is defined by period and duty cycle, where the duty cycle is the percentage of the period that the signal is "on" for. Below is our model, applied to PWM light programs with four different duty cycles.
Plotting the mean value of each of these programs, this model clearly predicts that expression will vary linearly with changes in duty cycle:
However, after running PWM experiments in the eVOLVER with LOV2, it was clear that, while the model was qualitatively describing the behavior of LOV2, it was not sufficiently accurate:
Here and in Figure 3, the model occaisonally predicts values outside of 1 standard deviation of the data. In Figure 3, it "overshoots" in late times when the light is on, and reaches 0 too quickly when the light is off. The decrease in fluorescence during the late data informed our group that there might be second order dynamics to the LOV2 transfer function, i.e. oscillations before reaching a steady state.
Second Order Approximation¶
Second order transfer functions are of the general form $G(s) = \frac{\omega_n^2}{s^2+2\zeta\omega_n+\omega_n^2}$, where $\zeta$, the damping constant, and $\omega_n$, the natural frequency, are parameters. Their fundamental properties are defined slightly differently from first order transfer functions; first, they do not immediately reach the total gain like first order functions, instead oscillating around and overshooting that gain value until the system reaches a steady state. The ratio of this first peak to the steady state value is defined as the "percent overshoot," which can be found as the function $\%OS = 100\exp(\frac{\zeta\pi}{\sqrt{1-\zeta^2}})$. The time at which the first peak occurs is the peak time, where $T_p = \frac{\pi}{\omega_n\sqrt{1-\zeta^2}}$.
In order to make a model of LOV2 using a second order transfer function, we have to make several assumptions:
- The actual transfer function for LOV2 only has two complex conjugate poles, or alternatively its higher-order poles are so negative as to be negligible.
- LOV2 does not reach steady-state during the observed period from $t=0$ to $t=7$.
- The LOV2 data from the eVOLVER step experiment shows all of LOV2's transient qualities: it is oscillating while reaching a steady state.
Using these assumptions, we can look at the data to determine values for $\max y(t) = y(T_p)$ and $T_p$. The first peak occurs at $t = 4$, and the value of that peak is $y(Tp=4) = 7898$. The latter of these can be used to calculate the percent overshoot, since $\%OS = \frac{\max y(t) - y(\infty)}{y(\infty)}$. Knowing $\%OS$ and $T_p$ allows us to calculate $\zeta$ and $\omega_n$, and in doing so find $G(s)$. Below, we rearrange the definitions of $\%OS$ and $T_p$ to find:
$OS(\zeta) \Rightarrow \zeta = -\frac{\log(OS)}{\sqrt{\log(OS)^2+\pi^2}}$
$T_p(\omega_n, \zeta) \Rightarrow \omega_n = \frac{\pi}{T_p\sqrt{1-\zeta^2}}$
However, $\%OS$ is a function of the steady state value of the system, $y(\infty)$, which is unknown for this data set. This unknown parameter means that there are innumerable models that will fit what we do know, but each with a different steady state value. Based on our assumptions, we know the model is reaching a steady state value between $0$ and $y(T_p)$. As such, we can try all of these second order models, and see which one best fits our data.
Expectation Maximization¶
Defining a "best fit" for non-linear systems is a nontrivial issue. We decided that the model which best fits our data will be the model that maximizes the likelihood of seeing that data given the parameters of our model. Stated more explicitly, the best fit model is defined by the parameter $y(\infty)$ which is true for $\underset{y(\infty)}{\arg\max} E[\mathbf X; y(\infty)]$.
In order to determine this, two major assumptions were made:
- All data points in $\mathbf X$ and independent and identically distributed.
- Each data point $x \in \mathbf X$ is drawn from a normal distribution $\mathbf X \sim \mathcal{N(\mu,\sigma^2)}$ with a probability density function $\rho(x\ ; \mu,\sigma^2)$.
- The standard deviation of $\mathbf X$ is the sample standard deviation, $\sigma = \overline\sigma$.
With these assumptions in mind, it is possible to find the likelihood of some data given a particular mean, $\mathcal{L}(\mathbf X\ ; \mu_k)$ by knowing that for a normal distribution:
$\mathcal{L(x_1,\ldots,x_n\ ; \mu_k, \sigma^2)} = \rho(x_1,\ldots,x_n\ ;\mu_k, \sigma^2) = \prod_{i=1}^n\rho(x_i\ ;\mu_k, \sigma^2)$
$\Rightarrow \prod_{i=1}^n\rho(x_i\ ;\mu_k, \sigma^2) = (\frac{1}{2\pi\sigma^2})^{\frac{n}{2}}\exp(-\frac{\sum_{i=1}^n(x_i-\mu_k)^2}{2\sigma^2})$
Here, we define each $\mu_k = y(t)$, where $y(t) = \mathcal{L}^{-1} G(s)U(s)$, the output of our model. The likelihood that the model fits our entire data set is then:
$likelihood\ of\ model = \prod_{k=i}^N \mathcal{L}(\mathbf X\ ; \mu_k, \sigma^2)$
Because the natural logarithm is monotonically increasing, finding the $\log$-likelihood of the data given the model will produce the same best fit as the likelihood itself, and allows us to rewrite the equation above as:
$\log-likelihood\ of\ model = \sum_{k=i}^N \log\mathcal{L}(\mathbf X\ ; \mu_k, \sigma^2)$
Where $\log\mathcal{L}(\mathbf X\ ; \mu_k, \sigma^2) = \frac{n}{2}\log(2\pi\sigma^2)-\frac{1}{2\sigma^2}\sum_{i=1}^n(x_i-\mu_k)^2$
Now, by iterating through all possible values of the steady-state $y(\infty)$, we can make a second-order model for each and evaluate the likelihood of our data given that model. The best fit model will be the model with the maximum expected value of the likelihood. The code used for determining this is below:
def log_L(X, **kwargs):
"""
the natural log of the likelihood of data points X, assuming that X comes from a normal distribution.
optional arguments: σ or s as the standard deviation of the normal distribution, µ or u as the mean of the normal
distribution.
"""
if 'σ' in kwargs:
σ = kwargs['σ']
elif 's' in kwargs:
σ = kwargs['s']
else:
σ = np.std(X)
if 'µ' in kwargs:
µ = kwargs['µ']
elif 'u' in kwargs:
µ = kwargs['u']
else:
µ = np.mean(X)
N = len(X)
return N/2*np.log(2*np.pi*σ**2)-1/(2*σ**2)*np.sum((x - µ)**2 for x in X)
second_order = {'model': None, 'likelihood': 0, 'intensity': 0}
for L in range(1,int(np.max(y_t))): # iterate through most possible steady states ('light intensities')
Tp = 4 # peak time
OS = (y_t[Tp]-L)/L # overshoot given steady-state is L
ζ = -np.log(OS)/np.sqrt(np.log(OS)**2+np.pi**2) # damping value given steady-state is L
ωn = np.pi/(Tp*np.sqrt(1-ζ**2)) # natural frequency given steady-state is L
model = signal.lti([ωn**2],[1, 2*ζ*ωn, ωn**2])
u = L*np.ones_like(t)
u[8:] = 0 # u(t) = L*(s(t)-s(t-8))
T, y_model, xout = signal.lsim(model, u, t)
likelihood = np.sum(log_L(y_t, u=µ) for µ in y_model)
if likelihood >= second_order['likelihood']:
second_order['model'] = model
second_order['likelihood'] = likelihood
second_order['intensity'] = L
For our eVOLVER step response data, this function takes approximately 40 seconds to evaluate, and predicts that the best fit for a second-order transfer function will be $G(s) = \frac{1.04915778}{s^2+1.31500191s+1.04915778} \approx \frac{1.049}{s^2+1.315s+1.049} $, with $\omega_n \approx 1.024$ and $\zeta \approx 1.284$. This model accurately predicts the eVOLVER step experiment behavior, as below:
Predicting PWM Values with a Second Order Model¶
As with the first order model, it would still be useful to predict expression as across varying light pulsing programs. The second order model was run with the same parameters as the first order, across PWM light programs with four different duty cycles.
Plotting the mean value of each of these programs, the second order model predicts that expression will vary linearly with changes in duty cycle, similarly to the first order model, but with a smaller rate of change.
Comparing both the first order model and the second order model to the eVOLVER PWM experiment data, it is clear that the second order model has superior predictive powers.
PhiReX by Analogy¶
We modeled PhiReX concurrently with LOV2, and so the methodology is identical. This section summarizes the results of modeling PhiReX as a control system. We found LOV2 to be a more reliable system; as such the modeling for PhiReX does not fit as well, despite identical modeling methodology.
First Order Approximation¶
The first order model, like LOV2's first order model, was built off of PhiReX's step response data. However, the PhiReX eVOLVER step experiment did not express above the negative control; this data is PhiReX's response to a continuous red light in an incubating shaker.
This first order model was found to be $G(s) = 2311\frac{0.4756}{s+0.4756}$
Second Order Approximation¶
The second order model for PhiReX was constructed identically to LOV2's second order model, using PhiReX's step response data. It follows the step response curve much more closely than the first order approximation.
The second order model was found to be $G(s) = \frac{1.35633789}{s^2 + 1.71986932s + 1.35633789} \approx \frac{1.356}{s^2 + 1.720s + 1.356}$, where $\omega_n = 1.1646192038602146 \approx 1.165$ and $\zeta = 1.476765379017767 \approx 1.477$.
Applying Models to a PWM Light Program¶
Applying these models to the PhiReX PWM was not nearly as fruitful as the same for LOV2. Here, the models predict much higher expression than is shown. This, however, is consistent with our experiences with PhiReX; even the same clonal colony grown into two different containers often shows wildly different levels of expression. In the PhiReX PWM eVOLVER experiment, PhiReX only showed 2-4X fold-change in expression over the negative control. This is not significant levels of expression, especially compared to same experiment done in LOV2, which saw 50-100X fold-change expression over the negative control. It is likely, though not proven, that these control system models of PhiReX predict expression rates if the PhiReX colonies are working as well as they did in that particular shaker experiment, which saw 8X fold-change.
To compare, below is the same experiment without the models overlaid.
Discussion¶
While the LOV2 and PhiReX models do not perfectly predict the data, an important parameter in their development is the number of data points at each timepoint. Currently, these models were developed off of data where $n=2$. Running these experiments again would produce more data, which would produce better models.
Similarly, running a 16-hour step response instead of a 7-hour step response in the eVOLVER would be a simple experiment that would help determine the veracity of the assumptions made about LOV2 when modeling it as a second order transfer function.
Finally, it is possible that there are poles in both systems which cannot be considered negligible, in which case LOV2 and PhiReX could be approximated as third-order systems.
References¶
[1] Endalur Gopinarayanan, V., & Nair, N. U. _A semi-synthetic regulon enables rapid growth of yeast on xylose._ Nature Communications, 9(1), 1233. (2018).
[2] Venturelli, O. S., El-samad, H. & Murray, R. M. _Synergistic dual positive feedback loops established by molecular sequestration generate robust bimodal response._ Proc. Natl Acad. Sci. 109, E3324–E3333 (2012).