# Model

In order to reach the goal of minimizing the average degradation rate of E.coli, i.e. the highest average expression level of our target gene, we chose to simulate the behaviors of individual E.coli by a segregated population competition model, furthermore used the method of cellular automaton, while incorporating several modulating model parts into steps of cell reproduction, to get the population of different E.coli at each time step.

To specify, we first distinguish the E.coli by their plasmid numbers, under the assumption that individual behaviors are alike within the group with the same plasmid numbers. Then, by getting the ribosome bind rate through mechanism model and the growth promoting effect through 3-dimensional random walk model, we adjusted the E.coli’s individual growth rate. Meanwhile, queue theory is used for exploring the relationship between intracellular plasmid number and target gene expression level. What’s more, a fall-off mechanism is incorporated by energy estimation, when considering the subsequent translation of glucose dehydrogenase, which serves as a growth promoter of E.coli.

After repeating the process of reproduction in a reasonable period of time, the population of E.coli with various plasmids can be simulated, therefore provide us with the population’s information as time goes on.

## Main model

The whole population of E.coli are separated into 100 different groups, with $N(i)$ E.coli attributed to the $i^{th}$ group, in which they all possess $i$ plasmids at current time. Each group of E.coli was regarded as a distinct population, under the assumption that individuals are alike within, i.e. their gene expression level, growth rate are basically the same under similar circumstances. One minute is set as a time step. For each group of E.coli, a fraction of the population will go through the process of plasmid replication and dividend at each time step, through which the fraction is determined by the level of E.coli’s growth rate $\mu(i,S)$, who is closely related to the E.coli’s type and culture concentration.

After each time step $dt$, the population change of the $i^{th}$ group $\frac{dN}{dt}$ is constituted with two parts: E.coli with $i$ plasmids produced by parent cell in other groups, and the E.coli with $j(\neq i)$ plasmids produced by parent cell in the $i^{th}$ group.

Given a random distribution mechanism of plasmids into son cells, the distribution rate $\delta_{i,j}$, which stands for the probability of parent E.coli in $i^{th}$ group producing son E.coli in $j^{th}$ group. We then multiply this probability to the fraction of population who replicates, to get the change in population respectively as $$\frac{dN(i)}{dt} = \sum_{j \neq i}^{N_{p,max}}\delta_{i,j} \mu(j,S) N(j) - (1-2(\frac{1}{2})^i) \mu(i,S) N(i)$$

Meanwhile, the consumed substrate can also be describe in an ODE as $$\frac{dS}{dt} = -\frac{1}{Ys} \sum_{i=0}^{N_{p,max}} \frac{dN(i)}{dt}$$

Programs simulating this process is done at each time step, and each group’s population is well-known. We selected ten representative groups (E.coli with 1,20,40,60,80,100,120,140,160,180,200 plasmids) and have their population drawn in the curves below.

## Probability of distribution

The loss of plasmids during reproduction, is mainly due to the uneven distribution of plasmids into two son E.coli after plasmid replication. Therefore, it is important to give the probability of each possible situation. To be more specified, the probability of a i-plasmid parent producing a j-plasmid son has to be calculated.

Plasmids’ self-replication has to take place ahead, for a empirical formula of replication number is given $$CopyNum(i) = \frac{V_m (i-1)}{K_m+i-1} - (i-1)\mu(i,S)$$ to describe the average plasmid copy number of E.coli with $i$ plasmids.

Then, treating the plasmid distribution process as totally random distribution, the allocation of n plasmids into two son E.coli can be viewed as Bernoulli experiment for n times, from which the probability of i-plasmid parent producing a j-plasmid son is $$\delta_{i,j}=2C_i^j (\frac{1}{2})^j(\frac{1}{2})^{i-j}=\frac{C_i^j}{2^{i-1}}$$. If we mark each row by their intracellular plasmid number, and each column by the plasmid number of their offspring, a distribution possibility table can be listed. However, since the table is too large and lack of readability, we display it more intuitively by using a 3-dimensional plot, in which the x-axis is the plasmid number of parent E.coli, while the y-axis is the number of plasmids possessed by one of its son, the height of each point stands for the possibility of above reproduction.

## E.coli growth rate

The growth rate of each E.coli can be controlled by an upper limitation of growth rate $\mu_{max}$, influenced by the number of plasmids $i$ contained, length of target gene incorporated, the culture concentration $S$, as well as the growth promoting effect $\gamma$ brought by the growth-promoter (glucose dehydrogenase). Referring to several well-established models, we form the individual growth rate at a given culture condition as $\mu(i,S) = \mu_{max} \frac{K}{K + i^n} \frac{S}{S+Ks}\gamma(i,S)$.

Equipped with theoretical formula, the change of individual growth rate with respect to several factors can be simulated:

I.Change of E.coli’s Growth Rate v.s. Change of Plasmid Number

II.Change of E.coli’s Growth Rate v.s. Change of Culture Concentration

## Growth promoting effect

Knowing the promoter’s mechanism of increasing individual growth rate, our model wish to quantitatively figure out the extent of increase in E.coli’s growth rate with the influence of a certain amount of glucose dehydrogenase. Because of the great complexity of knowing and simulating the whole promoting process, our model only focuses on the steps influenced by the promoter, who serves as an enzyme in cellular respiration. Both the enzyme and the substrate moves randomly in the E.coli, which induced us to a 3-dimensional random walk model. The initial substrate concentration is set fixed throughout experiments, while the concentration of enzyme (i.e. the growth-promoter glucose dehydrogenase) varies among different situations, simulation with programming is done, and the average combination ratio can be calculated.

This value is then used as the adjustment multiplier of E.coli’s individual growth rate, since their combination stands for the happening of enzymatic reaction. By treating the substrate-enzyme combination rate as a quantified value of E.coli’s growth promoting effect, we further incorporate this term as the change of individual growth rate (no units), which means, the greater this value is, E. coli shall experience a faster growth, and the promoting effect changes, under various content of intercellular glucose dehydrogenase, as follows.

## Gene expression level

Following the steps mentioned above, the composition of E.coli population at each time is well known. Our goal now is to find out the relationship between individual plasmid number and the recombinant gene expression level.

We absorbed the queue theory in operational research, by treating the nutrients as customers, and the plasmid in each E.coli as the service station. Given the E.coli’s average nutrient intaking capacity, as well as the plasmid processing rate, the amount of nutrients taken by E.coli and processed by each plasmid are known. Then, with different intracellular plasmid numbers between different groups, the processing ability between individual E.coli can vary, indicating a bigger velocity of reaction with a greater number of plasmids contained.

However, more plasmids can leads to decreasing amount of allocated nutrients on average. Since the E.coli’s nutrient intake ability is fixed, more plasmid within means less nutrients sent to each of them, this will on the other hand, leads to a decrease in target gene expression level of each plasmid.

Programming goes along with theoretical deductions, by setting the time step as one second, E.coli’s nutrient intake ability is set as at most 10 units, and the plasmid’s average processing ability is 2 units at a time, under the assumption that the absorbed amount is totally proportional to the gene expression level. Individual E.coli with 0 to 200 plasmids are considered, and after the simulation for 3600 seconds, we count the total amount of nutrients utilized by each kind of E.coli, which reveals the total amount of target protein produced. Then the gene (taken by each plasmid) expression level within each group of E.coli can be calculate, and its change are drawn into the curve below.

## Ribosome fall-off rate

But one thing left to be mentioned is that, the express possibility of target gene can differ from the gene in charge of promoting cell growth (i.e. glucose dehydrogenase). Since the target gene can be successfully expressed as long as the promoter combines with the ribosome, while the subsequent TGATG sequence allows for a ratio of fall-off , therefore results in a decrease in the expression level of glucose dehydrogenase.

The possibility of fall-off is relevant to the length of target gene, which situated before the TGATG sequence, while the gene of growth-promoting part is relatively fixed, which follows the TGATG sequence, and our task is trying to figure out, the change of fall-off rate with respect to the length of target gene.

The intensity of ribosome-sequence combination can be measured as their binding energy through translation, which we assume to be proportional as the length of target gene. Referring to [], the fraction of time that a ribosome elongates the translation process and remains in combination with subsequent sequence. $F = \frac{1}{1+C \exp(G_bind)}$, $C$ a parameter representing the ribosome-assisted unfolding coefficient, while $G_bind$ is the binding energy between given target gene and the ribosome. The curve describing the remaining rate of ribosome, with respect to the length of target gene, is shown.