Team:ShanghaiTech/Computer Experiments

ShanghaiTech iGEM

Computer Experiments

Nomination

(Listed in the order of article)

Variable	Mean
P	Vector of all the parameters $p_i$
Err()	The error function describing the distance between the system output and the expected output

Assumptions

$Err(\textbf{P})$ is smooth in the parameter space, so that we would not miss important point because of the jump of $Err(\textbf{P})$
Elements in $\textbf{P}$ are independent from each other. One specific part will determine a set of corresponding parameters. However, we assume all the parameters independent to make our algorithm more general. In the future, if a database of parts dynamic properties is established, the algorithm will be further improved.

Computer Experiment

We chose lots of parts to build the NFBL system. We wanted to know whether these parts are the best. Because of the time and financial cost, it is not feasible to try every part we can get and choose the best. Then we decided to use computer to do the experiments and help us to find better parts for this system.

Considering that in NFBL the output product is a little bit different from the chemical compound produced by output node to activate regular node, we rewrite the ODE.

$$ \frac{dI}{dt}=C_{Inducer \cdot R}\cdot \frac{Inducer^{n_{Inducer}}}{k_{Inducer}+Inducer^{n_{Inducer}}} \cdot \frac{1}{k_{R}+R^{n_R}}-d_I \cdot I$$

$$ \frac{dO}{dt}=C_{I} \cdot \frac{I^{n_I}}{k_I+I^{n_I}}-d_O \cdot O$$

$$ \frac{dR}{dt}=C_O \cdot \frac{O^{n_O}}{k_O+O^{n_O}}-d_R \cdot R$$

$$ \frac{dOutput}{dt} = C_{I} \cdot \frac{I^{n_I}}{k_I+I^{n_I}}-d_{Output} \cdot Output$$

This system can be represented by one equation

$$ Output = f(Inducer) $$

However, it is impossible to obtain the analytic solution of $Output=f(Inducer)$. We decided to use variations method to derive the best form of $f(Inducer)$. In order to use variations method, we needed to rewrite $Output=f(Inducer)$ to $g(\textbf{P})$

$$ Output = g(p_1, p_2, p_3, ... , p_n) $$

$$ \textbf{P}=p_1, p_2, p_3, ... , p_n $$

In our NFBL system, n=16. Compared $g(\textbf{P})$ to the expected output, we could get the distance between them using error function.

$$ Err(\textbf{P})=\sum_{i=1}^N([g(\textbf{P})]_i-[ExpectOutputSequence]_i)^2 $$

In a biological system, $\textbf{P}$ can not be any values. For example, common Hill coefficients are in the range of 0 to 3. Thus, we gave ranges to every element of $\textbf{P}$. These intervals are determined according to the experiment data from our lab and others achievements.

$$ p_{c_i} \in (0,100)$$ $p_{c_i}$: $C_i$ in the above equations

$$ p_{n_i} \in (0,3)$$ $p_{n_i}$: $n_i$ in the above equations

$$p_{k_i} \in (0,50)$$ $p_{k_i}$: $k_i$ in the above equations

$$ p_{d_i} \in (0,1) $$ $p_{d_i}$: $d_i$ in the above equations

These parameters form a super-cube in parameter space. Usually, we needed to travel through the whole parameter space, which will take too much time to solve the problem. So, we used some approximations to make it simpler.

$Err(P)$ is smooth in the parameter space, so that we would not miss important point because of the jump of $Err(P)$ .
Elements in $P$ are independent from each other. One specific part will determine a set of corresponding parameters. However, we assume all the parameters independent to make our algorithm more general. In the future, if a database of parts dynamic properties is established, the algorithm will be further improved.

At first, we assumed that $\frac{\partial^2Err}{\partial p_i\partial p_j}=0$, which means there is no coupling between parameters. To be more specific, if we control $p_i|_{i=j}$ constants, and change $p_j$, we could get the optimal solution $p_j^0$ for $p_j$. Now we change $p_i|_{i=j}$ to another condition $(p_i|_{i=j})'$, the optimal solution for $p_j$ is still $p_j^0$. Under this hypothesis, we can separately change each parameter, and get the optimal solution $\textbf{P}^0$ easily. However, we found that this hypothesis do not valid.

Condition 1

Condition 2

It is obvious that the hypothesis fails. However, this method can still be used in finding better solutions. For example, in condition 1, we can see that the least $Err$ appears in $p_{C3}$ at $p_{C3} \approx 0.2$.

Then we improved our algorithm. We still alter one parameter each time, and find its corresponding optimal solution. Yet among all these optimal solutions received by changing different parameters, we will only select the best optimal solution and treat it as the optimal solution under this circumstance. In this way, we can make sure that every time the optimal solution we find is surely better than the previous one. And by repeating this procedure until our optimal solution converges to a constant value, we will eventually find this constant optimal solution as the best optimal solution we could find.

The nth step:

$$ \frac{\partial{Err(\textbf{P}^n)}}{\partial p_i}|_{p_i=p_i^0} =0$$

$$ Err_{min}=min\{Err(\textbf{P}^n(p_i=p_j)) \} \Rightarrow $$

$$ Err(\textbf{P}^0(p_i=p_j))=Err_{min} $$

$$ \textbf{P}^{n+1}=\textbf{P}^n(p_j=p_j^n) $$

We repeat this step until $\textbf{P}^{n+1}=\textbf{P}^n$. Then $\textbf{P}^{n+1}$ is the optimal solution.

optimum solution search Method

We used MATLAB to run the algorithm. At first, a test program was run. Five parameters were going to be optimized. Each parameter space was separated into 50 grids. The program finished in 8 minutes, and the final optimum solution was reached.

Then we optimized our system. Starting from the parameter got from experiment and publications, we got the optimum solution which suggested us to increase the strength of the promoter of output node, choose a stronger inducer and choose an output signal that decays faster. The optimization cost 90 minutes, which is not too long and practical for users.

At last, we packed the program in a software. Please refer to software section for further information.