Difference between revisions of "Team:Tongji-Software/Model"

m
Line 299: Line 299:
  
 
<img src ="https://static.igem.org/mediawiki/2018/4/4f/T--Tongji-Software--model5.png" height ="70%"></br>
 
<img src ="https://static.igem.org/mediawiki/2018/4/4f/T--Tongji-Software--model5.png" height ="70%"></br>
<span class="top">Fig.5. Every reaction r in the pathway has the score log f(r),S refers to total score.</span></br>
+
<figure>
 +
<figcaption class="top">Fig.5. Every reaction r in the pathway has the score log f(r),S refers to total score.</figcaption></br>
 +
</figure>
 
<span class="detail">S is used to evaluate the pathway. The higher the score is, the better the pathway is.</span></br></br>
 
<span class="detail">S is used to evaluate the pathway. The higher the score is, the better the pathway is.</span></br></br>
  

Revision as of 06:20, 6 October 2018

Model
Overview
We will introduce how we use knowledge of mathematics and algorithms to implement function of Alpha Ant in details in this section. It consists of three parts, Pathway Search Algorithm, Pathway Ranking Methods and Additional functions.

Pathway Search Algorithm: Depth-first search
Seeking for biosynthesis pathways from the starting material to the product is a typical search problem. We abstracted the original biosynthesis pathway search problem and turned it into a directed graph search problem.
First we built a directed graph that models the transformation of metabolites where its vertices represent metabolites and its edges represent chemical transformations via reactions.

Fig.1. Search algorithm:DFS(Depth-first Search)
Then we used the Depth-first search algorithm to traverse and search this graph data structures. Not only do we need to get all of the solutions that satisfy the constraints, but also need to record the search path. The DFS algorithm starts at the root node and explores as far as possible along each branch before backtracking. The search remembers previously visited nodes and will not repeat them therefore avoiding infinite loop. As a result, all solutions that satisfy the constraints will be returned.
The procedure of DFS is described as follows:
The reason we choose this algorithm is that it can solve the pathway search problem efficiently. Based on adjacency matrix, DFS algorithm can solve the problem within the time complexity of O (E + V).[1] ‘E’ means the number of edges and ‘V’ means the number of vertex, and we are able to solve the problem by traversing all the edges and vertex just only once. Moreover, the core algorithm of DFS is flexible, so that we can combine it with other evaluation algorithms to solve more complex problems.
Pathway Ranking Criteria
We adopt three criteria to evaluate the efficacy of the pathways which are thermodynamic feasibility& precursor competition, toxicity of metabolites and atom conservation. After grading pathway using different criteria, we normalize the scores and give users the right to define different weights of different ranking criteria.
The criterion of thermodynamic feasibility& precursor competition:
We use a statistical mechanical model to present the competition for a metabolic precursor with endogenous reactions. We compute the probability of each reaction with ∆_r G^('°) ( the standard reaction Gibbs free energy) through the Boltzmann distribution according to study of Hiroyuki Kuwahara et.al[5]. Here is the mathematical description.

Fig.3. We consider C as the metabolic precursor. Let RN be a set of native reactions that can transform C in a given host organism and Rr is the reaction in the pathway to be evaluated.
The Boltzmann factor of reaction r that can transform C :

We define the normalized Boltzmann factor for r as f(r) :
∆_r G^('°): the standard reaction Gibbs energy
RN: a set of native reactions that can transform C in a given host organism
R : gas constant
T : absolute temperature

Fig.4. If r∈R_N , then f(r) is simply based on the Boltzmann distribution of the native reaction system transforming compound C. If r∉R_N, then f(r) is based on the Boltzmann distribution of the reaction system that contains all native reactions transforming C and foreign reaction r.


For each pathway, every reaction r in the pathway has the score log f(r)
The score of the pathway is as follows:

Fig.5. Every reaction r in the pathway has the score log f(r),S refers to total score.

S is used to evaluate the pathway. The higher the score is, the better the pathway is.

Atom conservation
There is a one-to-one correspondence between atom index from reactant and atom index from product in MetaCyc. We integrated the data form into the main pairs in KEGG. We cleaned and removed the redundant data. In one pathway, each step of the reaction is recursive, leaving only the atomic number derived from the source compound, and the rest of the positions are -1. Finally, you can calculate how many atoms in the target are from the source compound.

Additional function algorithm
Microorganism recommendation
After searching the pathways, we first select n(n is defined by users) pathways ranking by sum of free Gibbs energy and then use previous scoring method to calculate each route’s score of certain organism. Next we rank the average score of all species and the highest is the best. The number of the selected pathways n can be defined by users. The default value is 50.

Fig.6. At first, search all possible pathway by using DFS.then use previous scoring method to calculate each route’s score of certain organism. Next we rank the average score of all species and the highest is the best. The number of the selected pathways n can be defined by users.
The average score of every organism:

Fig.7. The average score of every organism.Ai refers to score of Route i in A organism. n refers to the number of pathway.
The organism with highest score will be the best.
Max{Ave(A), Ave(B), Ave(C), Ave(D)…}
Flux balance analysis(FBA) Flux balance analysis is a mathematical approach for analyzing the flow of metabolites through a metabolic network. It required very little information in terms of the enzyme kinetic parameters and concentration of metabolites in the system in contrast to the traditionally followed approach of metabolic modeling using coupled ordinary differential equations.[3] FBA achieves this by making two assumptions, steady state and optimality.
Assumption 1: The modeled system has entered a steady state, where the metabolite concentrations no longer change, i.e. in each metabolite node the producing and consuming fluxes cancel each other out.
Assumption 2: The organism has been optimized through evolution for some biological goal, such as optimal growth or conservation of resources.
We use the COBRApy package to implement the function of FBA. [4]The following are illustrations of flux balance analysis.
First we construct a new model based on a model of E. coli core metabolism. This genome-scale metabolic network contains the core metabolism reactions in E. coli. When we need to construct novel reactions into E.coli, we can add the reactions in Systems Biology Markup Language(SBML) which is an XML-based standard format for distributing models supporting for COBRA models through the FBC extension version 2.

Fig.8. First we construct a new model based on a model of E. coli core metabolism. This genome-scale metabolic network contains the core metabolism reactions in E. coli.
Then we present metabolic reactions as a stoichiometric matrix (S) of size m × n. Every row of this matrix represents one unique compound (for a system with m compounds) and every column represents one reaction (n reactions). The entries in each column are the stoichiometric coefficients of the metabolites participating in a reaction. There is a negative coefficient for every metabolite consumed and a positive coefficient for every metabolite that is produced. A stoichiometric coefficient of zero is used for every metabolite that does not participate in a particular reaction.

Fig.9.we present metabolic reactions as a stoichiometric matrix (S) of size m × n. Every row of this matrix represents one unique compound (for a system with m compounds) and every column represents one reaction (n reactions).
Constraints are represented in two ways, as equations that present steady-state mass balance and as inequalities that impose bounds on the system.
The concentrations of all metabolites are represented by the vector x, with length m. The flux through all of the reactions in a network is represented by the vector v, which has a length of n. A steady-state mass balance constraint was imposed according to assumption 1.
dX/dt=0
S*v=0
Every reaction will be given upper and lower bounds, which define the maximum and minimum allowable fluxes of the reactions. In our software, v_^Upper was set to 1000 mmol/gDW/hour and v_^Lower was set to 0 or -1000 mmol/gDW/hour for irreversible and reversible reactions, respectively.
vLowervivUpper
The next step is to define the objective function. It can be any linear combination of fluxes, where c is a vector of weights indicating how much each reaction (such as the biomass reaction when simulating maximum growth) contributes to the objective function. This function is defined by users.
Z=cTv
Last we use linear programming to identify a flux distribution that maximizes or minimizes the objective function within the space of allowable fluxes defined by the constraints imposed by the mass balance equations and reaction bounds.
Similarity Comparison of Compound