Abstract
To improve the efficiency of producing limonene, we build a model to help us design our genetic machine. We use flux balance analysis to simulate our system, with the matrix of the pathway and the \(V_{max}\) (calculated by \(k_{cat}\) and \(E_t\) ) of each reactions. And, inspired of machine learning algorithms, we established an algorithm using gradient descent method to search for the optimal solution of \(E_t\). Finally, we got results that were close to the results on some published articles we read, and hence we decided to design our experiment based on the model. Also, while building our model, we have developed a software tool which may be helpful for those who need to optimize a pathway.
Flux Balance Analysis
Flux balance analysis is a method to calculate the flow of metabolites through a metabolic network. It assumes that under steady state, the concentration of each metabolite remains unchanged, and the reaction rate satisfies a certain distribution.
The first step we did was to convert the pathway into a mathematical form, a matrix \(S\) .
$$S= \left[ \begin{matrix} & v1 & v2 & v3 & v4 & v5 & v6 & v7 & v8 & v9 & b1 & b2 \\ Acetyl-CoA & -1 & -1 & 0 & 0 & 0 & 0 & 0 & 0 & 0 & 1 & 0 \\ Acetoacetyl-CoA & 1 & -1 & 0 & 0 & 0 & 0 & 0 & 0 & 0 & 0 & 0 \\ HMG-CoA & 0 & 1 & -1 & 0 & 0 & 0 & 0 & 0 & 0 & 0 & 0 \\ Mevalonate & 0 & 0 & 1 & -1 & 0 & 0 & 0 & 0 & 0 & 0 & 0 \\ Mevalonate-5-phosphate & 0 & 0 & 0 & 1 & -1 & 0 & 0 & 0 & 0 & 0 & 0 \\ Mevalonate-diphosphate & 0 & 0 & 0 & 0 & 1 & -1 & 0 & 0 & 0 & 0 & 0 \\ IPP & 0 & 0 & 0 & 0 & 0 & 1 & 1 & -1 & -1 & 0 & 0 \\ DMAPP & 0 & 0 & 0 & 0 & 0 & 0 & -1 & 1 & 0 & 0 & 0 \\ NPP & 0 & 0 & 0 & 0 & 0 & 0 & 0 & 0 & 1 & 0 & -1 \\ \end{matrix} \right]\tag{001} $$
Then we use flux balance analysis to maximize \( f=c^t v\) with constrains, where \(f\) is our objective function, and \(c\) is a vector of zeros with a one at the last reaction, \(b2\) , and \(v\) represents for the flux through all of the reactions.
And we use M-M equation to calculate \( V_{max}\):
We found \(k_{cat}\) (Turnover Number) from brenda-enzymes and assumed all initial \(E_t\) are the same:
enzyme | Substrate | Turnover Number [1/s] | KM Value [mM] |
---|---|---|---|
ERG10 | acetyl-CoA | 2.1 | 0.33 |
ERG13 | acetoacetyl-CoA, acetyl-CoA | 4.6 | acetoacetyl-CoA:0.0014, acetyl-CoA:0.05 |
HMG1 | hydroxymethylglutaryl-CoA | 0.023 | 0.045 |
ERG12 | mevalonate | 2.36 | 0.012 |
ERG8 | phosphomevalonate | 3.4 | 0.0042 |
ERG19 | (R,S)-5-diphosphomevalonate | 5.9 | 0.0091 |
NDPS1 | isopentenyl diphosphate | 0.14 | 0.047 |
Gradient Descent Method
Inspired by some machine learning methods, we established an algorithm using gradient descent method and innovatively combined it with flux balance analysis. We calculated the gradients of \(f\) on \(E_t\), and searched for the best length of step on the gradient, to improve \(f\). And we have repeat for 10 times to get the results.
Results
This is the result we got after running our model. The ordinate indicates the multiple of the predicted product generation rate of the model, and the sequence below the abscissa indicates the priority of the enzymes (the left is the highest).
From the figure we can see that the priority of enzymes are: HMG1->NDPS1->ERG10->ERG12->..... For NDPS1 is the enzyme which will be put into the Y. lipolytica and ERG10 shares the substrate (Acetyl-CoA) with ERG13, we finally decided to overexpress HMG1 and ERG12.