Difference between revisions of "Team:Tongji-Software/Model"

 
(162 intermediate revisions by 4 users not shown)
Line 1: Line 1:
 
<html>
 
<html>
</script>
+
<head>
 +
    <script type="text/x-mathjax-config">
 +
      MathJax.Hub.Config({
 +
        extensions: ["tex2jax.js"],
 +
        jax: ["input/TeX", "output/HTML-CSS"],
 +
        tex2jax: {
 +
          inlineMath: [ ['$','$'], ["\\(","\\)"] ],
 +
          displayMath: [ ['$$','$$'], ["\\[","\\]"] ],
 +
          processEscapes: true
 +
        },
 +
        "HTML-CSS": { fonts: ["TeX"] }
 +
      });
 +
    </script>
 +
    <script type="text/javascript" src="https://2018.igem.org/common/MathJax-2.5-latest/MathJax.js?config=TeX-AMS-MML_HTMLorMML"></script>
 +
<head/>
  
  
Line 21: Line 35:
 
margin: 0;
 
margin: 0;
 
color:#685454;
 
color:#685454;
font-family: "Haas Grot Text R Web", "Helvetica Neue", Helvetica, Arial, sans-serif;
+
font-family: Helvetica, Arial, sans-serif;
 
/* font-family: "Times New Roman", Cambria,  serif;*/
 
/* font-family: "Times New Roman", Cambria,  serif;*/
 
background-color:#fff;
 
background-color:#fff;
Line 30: Line 44:
 
   float: left;
 
   float: left;
 
   top: 0px;
 
   top: 0px;
   background-color: #685454;
+
   background-color: #592e2e;
 
   margin-top:0px;  
 
   margin-top:0px;  
 
   width: 10%;
 
   width: 10%;
Line 68: Line 82:
 
filter: brightness(200%);
 
filter: brightness(200%);
 
transform: scale(1.2);
 
transform: scale(1.2);
 +
}
 +
 +
.line {
 +
  position: fixed;
 +
  float: left;
 +
  top: 0px;
 +
  left:10%;
 +
  background-color: #78ccfc;
 +
  width: 1%;
 +
  height: 100%;
 +
  padding-right:-20px;
 +
  padding-top:10px;
 
}
 
}
  
Line 75: Line 101:
 
   top: 0px;
 
   top: 0px;
 
   left:10%;
 
   left:10%;
   background-color: #ffb6bc;
+
   background-color: #78ccfc;
   width: 14%;
+
   width: 13%;
 
   height: 100%;
 
   height: 100%;
 
   padding-right:-20px;
 
   padding-right:-20px;
Line 115: Line 141:
 
   transition:all 0.3s;
 
   transition:all 0.3s;
 
   transform: scale(1.15);
 
   transform: scale(1.15);
 +
}
 +
 +
.left li:before {
 +
  content: '';
 +
  height: 100%;
 +
  left: 0;
 +
  position: absolute;
 +
  top: 0;
 +
  -webkit-transition: width 0.2s ease-in;
 +
  transition: width 0.2s ease-in;
 +
  width: 3px;
 +
  z-index: -1;
 +
}
 +
 +
.left li:hover:before {
 +
  -webkit-transition: width 0.2s ease-in;
 +
  transition: width 0.2s ease-in;
 +
  width: 100%;
 +
}
 +
 +
.left li.open:hover before {
 +
  -webkit-transition: width 0.2s ease-in;
 +
  transition: width 0.2s ease-in;
 +
  width: 100%;
 
}
 
}
  
Line 129: Line 179:
 
top:0px;
 
top:0px;
 
z-index: -1;
 
z-index: -1;
opacity: 0.3;
 
 
}
 
}
  
Line 136: Line 185:
 
   position:absolute;
 
   position:absolute;
 
   float: left;
 
   float: left;
   top:25%;
+
   top:40%;
 
   left:28%;
 
   left:28%;
   font-size: 48px;
+
   font-size: 55px;
   color:#fc8190 ;
+
   color:#78ccfc ;
 
   z-index: 2;
 
   z-index: 2;
 
   font-weight: bolder;
 
   font-weight: bolder;
Line 148: Line 197:
 
   float: left;
 
   float: left;
 
   top:110%;
 
   top:110%;
   left:28%;
+
   left:20%;
 
   width:71%;
 
   width:71%;
 
   font-size: 18px;
 
   font-size: 18px;
 
   z-index: 2;
 
   z-index: 2;
  color:#685454;
 
 
   margin-left:50px;
 
   margin-left:50px;
 
  line-height: 35px;
 
  line-height: 35px;
Line 160: Line 208:
 
.content .h1{
 
.content .h1{
 
   margin-top: 20px;
 
   margin-top: 20px;
   font-size: 30px;
+
   font-size: 33px;
 
   font-weight: bold;
 
   font-weight: bold;
   color: #fc8190;
+
   color: #78ccfc;
 
}
 
}
  
 
.content .top{
 
.content .top{
  padding-left: 50px;
+
   display:block;
   display: table-cell;
+
 
   vertical-align:middle;
 
   vertical-align:middle;
 +
  text-align:center;
 
   font-weight:bold;
 
   font-weight:bold;
 
   font-size: 14px;
 
   font-size: 14px;
   color:#685454;
+
   padding-bottom: 10px;
   padding-bottom: 20px;
+
   font-family:Georgia;
 
}
 
}
 +
 +
.content .quote{
 +
  display: block;
 +
  text-align:right;
 +
  font-size: 18px;
 +
}
 +
 +
.content .middle{
 +
  display:block;
 +
  vertical-align:middle;
 +
  text-align:center;
 +
  font-size: 20px;
 +
  padding-bottom: 10px;
 +
}
 +
  
 
.content .detail{
 
.content .detail{
Line 179: Line 242:
 
   text-indent:2em;
 
   text-indent:2em;
 
   font-size: 18px;
 
   font-size: 18px;
   padding-top: 20px;
+
   padding-top: 10px;
   padding-bottom: 20px;
+
   padding-bottom: 10px;
  font-family:"Khmer";
+
 
}
 
}
  
.content img{
+
 
  padding-left: 50px;
+
.content img{
 
   display: table-cell;
 
   display: table-cell;
 
   vertical-align:middle;
 
   vertical-align:middle;
 +
  margin:0 auto;
 +
  text-align:center;
 
}
 
}
  
 +
.pic img{
 +
  padding-left: 250px;
 +
  display: table-cell;
 +
  vertical-align:middle;
 +
}
 +
#contact{
 +
  border-radius: 15px;
 +
  width: 90%;
 +
  height:auto;
 +
  background:rgba(255,255,255,0.4);
 +
/* border: 2.5px solid rgba(0,0,0,0.7);*/
 +
  z-index: 5;
 +
  text-align: center;
 +
  display: block;
 +
  margin-bottom: 100px;
 +
}
 +
 +
#contact h3{
 +
  font-size: 22px;
 +
  margin-top:35px;
 +
  margin-bottom:10px;
 +
  color : #5b3d3d;
 +
}
 +
 +
#contact span{
 +
  margin-top: 15px;
 +
  font-size: 19px;
 +
  color : #826767;
 +
}
 +
 +
#contact img{
 +
  margin-top: 10px;
 +
  transition: all 0.4s;
 +
  margin-bottom: 30px;
 +
}
 +
#contact img:hover{
 +
  transform: scale(1.15);
 +
  transition: all 0.4s;
 +
}
 +
#btn{
 +
display: none;
 +
position: fixed;
 +
left: 90%;
 +
bottom: 40px;
 +
height:80px;
 +
width: 80px;
 +
background: url(https://static.igem.org/mediawiki/2018/d/d1/T--Tongji-Software--top.png) no-repeat left top;
 +
background-size:100% auto;
 +
}
  
 
</style>
 
</style>
 +
<link rel="stylesheet" href="https://2018.igem.org/wiki/index.php?title=Template:Tongji-Software/test/css/style.css&action=raw&ctype=text/css">
  
  
Line 197: Line 311:
 
<div class="icon">
 
<div class="icon">
 
     <ul>
 
     <ul>
     <a href="https://2018.igem.org/Team:Tongji-Software/Project" original-title="Project"><li><img src="https://static.igem.org/mediawiki/2018/f/fa/T--Tongji-Software--logo.svg" width="55%"  ></li></a>
+
     <a href="https://2018.igem.org/Team:Tongji-Software" original-title="Project"><li><img src="https://static.igem.org/mediawiki/2018/f/fa/T--Tongji-Software--logo.svg" width="55%"  ></li></a>
  
 
     <a href="https://2018.igem.org/Team:Tongji-Software/Project" original-title="Project"><li><img src="https://static.igem.org/mediawiki/2018/e/ed/T--Tongji-Software--Pro.svg" width="35%" ></li></a>
 
     <a href="https://2018.igem.org/Team:Tongji-Software/Project" original-title="Project"><li><img src="https://static.igem.org/mediawiki/2018/e/ed/T--Tongji-Software--Pro.svg" width="35%" ></li></a>
Line 205: Line 319:
 
     <a href="https://2018.igem.org/Team:Tongji-Software/Attributions" original-title="Project"><li><img src="https://static.igem.org/mediawiki/2018/8/86/T--Tongji-Software--Att.svg" width="35%" ></li></a>
 
     <a href="https://2018.igem.org/Team:Tongji-Software/Attributions" original-title="Project"><li><img src="https://static.igem.org/mediawiki/2018/8/86/T--Tongji-Software--Att.svg" width="35%" ></li></a>
  
     <a href="https://2018.igem.org/Team:Tongji-Software/Modeling" original-title="Project"><li><img src="https://static.igem.org/mediawiki/2018/7/77/T--Tongji-Software--Mod.svg" width="35%" ></li></a>
+
     <a href="https://2018.igem.org/Team:Tongji-Software/Model" original-title="Project"><li><img src="https://static.igem.org/mediawiki/2018/7/77/T--Tongji-Software--Mod.svg" width="35%" ></li></a>
  
     <a href="https://2018.igem.org/Team:Tongji-Software/Collaboration" original-title="Project"><li><img src="https://static.igem.org/mediawiki/2018/d/d3/T--Tongji-Software--Col.svg" width="35%" ></li></a>
+
     <a href="https://2018.igem.org/Team:Tongji-Software/Collaborations" original-title="Project"><li><img src="https://static.igem.org/mediawiki/2018/d/d3/T--Tongji-Software--Col.svg" width="35%" ></li></a>
  
 
     <a href="https://2018.igem.org/Team:Tongji-Software/Requirements" original-title="Project"><li><img src="https://static.igem.org/mediawiki/2018/5/5f/T--Tongji-Software--Req.svg" width="35%" ></li></a>
 
     <a href="https://2018.igem.org/Team:Tongji-Software/Requirements" original-title="Project"><li><img src="https://static.igem.org/mediawiki/2018/5/5f/T--Tongji-Software--Req.svg" width="35%" ></li></a>
  
     <a href="https://2018.igem.org/Team:Tongji-Software/Human_Practice" original-title="Project"><li><img src="https://static.igem.org/mediawiki/2018/c/cc/T--Tongji-Software--Hum.svg" width="35%" ></li></a>
+
     <a href="https://2018.igem.org/Team:Tongji-Software/Human_Practices" original-title="Project"><li><img src="https://static.igem.org/mediawiki/2018/c/cc/T--Tongji-Software--Hum.svg" width="35%" ></li></a>
  
 
     <a href="https://2018.igem.org/Team:Tongji-Software/Medal" original-title="Project"><li><img src="https://static.igem.org/mediawiki/2018/8/8b/T--Tongji-Software--Med.svg" width="35%" ></li></a>
 
     <a href="https://2018.igem.org/Team:Tongji-Software/Medal" original-title="Project"><li><img src="https://static.igem.org/mediawiki/2018/8/8b/T--Tongji-Software--Med.svg" width="35%" ></li></a>
 
   </ul>
 
   </ul>
 
   </div>
 
   </div>
 +
<div class="line">
 +
</div>
  
 +
<div><a href="javascript:;" id="btn" title="Return to Top"></a></div>
  
<div class="left">
+
<div id = "wrapper">
 +
<div class="left"  id = "sidebar-wrapper">
 
     <ul>
 
     <ul>
 
       <li><a href="#Overview">Overview</a></li>
 
       <li><a href="#Overview">Overview</a></li>
Line 229: Line 347:
 
    
 
    
 
<div class="picture">
 
<div class="picture">
     <img src="https://static.igem.org/mediawiki/2018/2/24/T--Tongji-Software--Attribution-background.jpeg"  width="100%" >
+
     <img src="https://static.igem.org/mediawiki/2018/8/80/T--Tongji-Software--model-background.jpeg"  width="100%" >
 
   </div>
 
   </div>
 
<div class="background">
 
<div class="background">
Line 238: Line 356:
 
   </div>
 
   </div>
  
<div class="content" >
+
<div class="content" id="con">
  
  
Line 246: Line 364:
 
<span class="detail"><b>We will introduce how we use knowledge of mathematics and algorithms to implement function of Alpha Ant in details in this section. It consists of three parts, Pathway Search Algorithm, Pathway Ranking Methods and Additional functions.</b></span></br>
 
<span class="detail"><b>We will introduce how we use knowledge of mathematics and algorithms to implement function of Alpha Ant in details in this section. It consists of three parts, Pathway Search Algorithm, Pathway Ranking Methods and Additional functions.</b></span></br>
 
</br>
 
</br>
<span class="h1" id="DFS">Pathway Search Algorithm: Depth-first search</span></br>
+
 
 +
<span class="h1" id="DFS">Search Algorithm: DFS</span></br>
 
<span class="detail">Seeking for biosynthesis pathways from the starting material to the product is a typical search problem. We abstracted the original biosynthesis pathway search problem and turned it into a directed graph search problem.</span></br>
 
<span class="detail">Seeking for biosynthesis pathways from the starting material to the product is a typical search problem. We abstracted the original biosynthesis pathway search problem and turned it into a directed graph search problem.</span></br>
 
<span class="detail">First we built a directed graph that models the transformation of metabolites where its vertices represent metabolites and its edges represent chemical transformations via reactions. </span></br>
 
<span class="detail">First we built a directed graph that models the transformation of metabolites where its vertices represent metabolites and its edges represent chemical transformations via reactions. </span></br>
  
<img src="https://static.igem.org/mediawiki/2018/d/d9/T--Tongji-Software--model1.png" width="100%"></br>
+
<img src="https://static.igem.org/mediawiki/2018/archive/d/d9/20181007080105%21T--Tongji-Software--model1.png" width="70%"></br>
 
<span class="top">Fig.1. Search algorithm:DFS(Depth-first Search)</span></br>
 
<span class="top">Fig.1. Search algorithm:DFS(Depth-first Search)</span></br>
  
 
<span class="detail">Then we used the Depth-first search algorithm to traverse and search this graph data structures. Not only do we need to get all of the solutions that satisfy the constraints, but also need to record the search path. The DFS algorithm starts at the root node and explores as far as possible along each branch before backtracking. The search remembers previously visited nodes and will not repeat them therefore avoiding infinite loop. As a result, all solutions that satisfy the constraints will be returned. </span></br>
 
<span class="detail">Then we used the Depth-first search algorithm to traverse and search this graph data structures. Not only do we need to get all of the solutions that satisfy the constraints, but also need to record the search path. The DFS algorithm starts at the root node and explores as far as possible along each branch before backtracking. The search remembers previously visited nodes and will not repeat them therefore avoiding infinite loop. As a result, all solutions that satisfy the constraints will be returned. </span></br>
 
<span class="detail">The procedure of DFS is described as follows:</span></br>
 
<span class="detail">The procedure of DFS is described as follows:</span></br>
 +
 +
<img src="https://static.igem.org/mediawiki/2018/archive/0/01/20181009030951%21T--Tongji-Software--model12.png" width="90%"></br>
 +
<span class="top">Fig.2. The procedure of DFS</span></br>
  
 
<span class="detail">The reason we choose this algorithm is that it can solve the pathway search problem efficiently. Based on adjacency matrix, DFS algorithm can solve the problem within the time complexity of O (E + V).[1] ‘E’ means the number of edges and ‘V’ means the number of vertex, and we are able to solve the problem by traversing all the edges and vertex just only once. Moreover,  the core algorithm of DFS is flexible, so that we can combine it with other evaluation algorithms to solve more complex problems.</span></br>
 
<span class="detail">The reason we choose this algorithm is that it can solve the pathway search problem efficiently. Based on adjacency matrix, DFS algorithm can solve the problem within the time complexity of O (E + V).[1] ‘E’ means the number of edges and ‘V’ means the number of vertex, and we are able to solve the problem by traversing all the edges and vertex just only once. Moreover,  the core algorithm of DFS is flexible, so that we can combine it with other evaluation algorithms to solve more complex problems.</span></br>
  
 
+
</br>
 
<span class="h1" id="Ranking">Pathway Ranking Criteria</span></br>
 
<span class="h1" id="Ranking">Pathway Ranking Criteria</span></br>
 
<span class="detail">We adopt three criteria to evaluate the efficacy of the pathways which are thermodynamic feasibility& precursor competition, toxicity of metabolites and atom conservation. After grading pathway using different criteria, we normalize the scores and give users the right to define different weights of different ranking criteria.</span></br>
 
<span class="detail">We adopt three criteria to evaluate the efficacy of the pathways which are thermodynamic feasibility& precursor competition, toxicity of metabolites and atom conservation. After grading pathway using different criteria, we normalize the scores and give users the right to define different weights of different ranking criteria.</span></br>
  
 
<b>The criterion of thermodynamic feasibility& precursor competition:</b></br>
 
<b>The criterion of thermodynamic feasibility& precursor competition:</b></br>
<span class="detail">We use a statistical mechanical model to present the competition for a metabolic precursor with endogenous reactions. We compute the probability of each reaction with ∆_r G^('°) ( the standard reaction Gibbs free energy) through the Boltzmann distribution according to study of Hiroyuki Kuwahara et.al[5]. Here is the mathematical description.</span></br>
+
<span class="detail">We use a statistical mechanical model to present the competition for a metabolic precursor with endogenous reactions. We compute the probability of each reaction with ∆_r G^('°) ( the standard reaction Gibbs free energy) through the Boltzmann distribution according to study of Hiroyuki Kuwahara et.al[2]. Here is the mathematical description.</span></br>
  
<img src="https://static.igem.org/mediawiki/2018/f/f0/T--Tongji-Software--model2.png" width ="85%"></br>
+
<img src="https://static.igem.org/mediawiki/2018/archive/f/f0/20181007075919%21T--Tongji-Software--model2.png" width ="50%"></br>
 
<span class="top">Fig.3. We consider C as the metabolic precursor. Let RN be a set of native reactions that can transform C in a given host organism and Rr is the reaction in the pathway to be evaluated.</span></br>
 
<span class="top">Fig.3. We consider C as the metabolic precursor. Let RN be a set of native reactions that can transform C in a given host organism and Rr is the reaction in the pathway to be evaluated.</span></br>
  
 
<span class="detail">The Boltzmann factor of reaction r that can transform C :</span></br>
 
<span class="detail">The Boltzmann factor of reaction r that can transform C :</span></br>
<img src="https://static.igem.org/mediawiki/2018/0/0a/T--Tongji-Software--model3.png" width ="70%" height="50%"></br>
+
<span class="middle">\[{{\rm{e}}^{ - {\Delta _r}{G^{' \circ /RT}}}}\]</span>
 
+
<span class="quote">(Equation 1.1)</span></br>
 
<span class="detail">We define the normalized Boltzmann factor for r as f(r) :</span></br>
 
<span class="detail">We define the normalized Boltzmann factor for r as f(r) :</span></br>
<b>∆_r G^('°): the standard reaction Gibbs energy</br>
+
<span class="top">∆_r G^('°): the standard reaction Gibbs energy</br>
<b>RN: a set of native reactions that can transform C in a given host organism</br>
+
RN: a set of native reactions that can transform C in a given host organism</br>
<b>R : gas constant </br>
+
R : gas constant</br>
<b>T : absolute temperature</br>
+
T : absolute temperature</span></br>
  
 +
<img src="https://static.igem.org/mediawiki/2018/archive/8/8b/20181011161557%21T--Tongji-Software--model4.png" width="70%">
  
<img src="https://static.igem.org/mediawiki/2018/8/8b/T--Tongji-Software--model4.png" width ="80%"></br>
+
<span class="quote">(Equation 1.2)</span>
<span class="top">Fig.4. If r∈R_N , then f(r) is simply based on the Boltzmann distribution of the native reaction system transforming compound C. If r∉R_N, then f(r) is based on the Boltzmann distribution of the reaction system that contains all native reactions transforming C and foreign reaction r.</span></br>
+
<img src ="https://static.igem.org/mediawiki/2018/7/77/T--Tongji-Software--model7.gif" width="80"></br>
+
  
<span class="detail">For each pathway, every reaction r in the pathway has the score log f(r)</span></br>
+
<span class="top">If r∈R_N , then f(r) is simply based on the Boltzmann distribution of the native reaction system transforming compound C. If r∉R_N, then f(r) is based on the Boltzmann distribution of the reaction system that contains all native reactions transforming C and foreign reaction r.</span></br></br>
<span class="detail">The score of the pathway is as follows:</span></br>
+
  
<img src ="https://static.igem.org/mediawiki/2018/4/4f/T--Tongji-Software--model5.png" height ="70%"></br>
+
<img src ="https://static.igem.org/mediawiki/2018/7/77/T--Tongji-Software--model7.gif" width="50%"></br>
<span class="top">Fig.5. Every reaction r in the pathway has the score log f(r),S refers to total score.</span></br>
+
<span class="detail">S is used to evaluate the pathway. The higher the score is, the better the pathway is.</span></br>
+
  
<b>Atom conservation</b></br>
+
<span class="top">Fig.4. Gibbs standard energy data distribution.</span></br>
<span class="detail">There is a one-to-one correspondence between atom index from reactant and atom index from product in <b>MetaCyc</b>. We integrated the data form into the main pairs in <b>KEGG</b>. We cleaned and removed the redundant data. In one pathway, each step of the reaction is recursive, leaving only the atomic number derived from the source compound, and the rest of the positions are -1. Finally, you can calculate how many atoms in the target are from the source compound.</span></br>
+
<span class="detail">For each pathway, every reaction r in the pathway has the score log f(r),The score of the pathway is as follows:</span></br>
  
  
 +
<span class="middle">\[S = \log f(r1) + \log f(r2) + ... + \log f(rn)\]</span>
 +
<span class="quote">(Equation 1.3)</span></br>
 +
 +
<span class ="detail">Every reaction r in the pathway has the score log f(r),S refers to total score.S is used to evaluate the pathway. The higher the score is, the better the pathway is.</span></br></br>
 +
 +
<span class="middle">\[{S_{total}} = \left( {\begin{array}{*{20}{c}}{{S_{\;th}}}&{{S_t}}&{{S_f}}\end{array}} \right)\left( {\begin{array}{*{20}{c}}{{w_{th}}}\\{{w_t}}\\{{w_f}}\end{array}} \right)\]</span></br>
 +
 +
<span class="quote">(Equation 1.4)</span></br>
 +
<span class="detail">The equation above show how the total score is calculated. Corresponding to each ranking criteria, we give them a certain weight.Of course, users can adjust the weight as well. Consequently, the result list is arranged in descending order of total score.</span></br>
 +
 +
<b>Standardization of score</b></br>
 +
<span class="detail">Graphs below show the distribution of our data.</span></br>
 +
 +
<img src="https://static.igem.org/mediawiki/2018/archive/8/8e/20181016161355%21T--Tongji-Software--model28.png" width="60%"></br>
 +
<img src="https://static.igem.org/mediawiki/2018/archive/2/2c/20181016161447%21T--Tongji-Software--model29.png" width="60%"></br>
 +
 +
<span class="detail">In order to elimate the influence of exceptional data, we use pauta criterion to calculate the upper and lower limit according to average value and variance.</span></br>
 +
 +
<span class="middle">\[\begin{array}{l}\mu {\rm{ = }}\frac{1}{n}\sum\limits_{i = 1}^n {arra{y_i}} \\\sigma  = \frac{1}{n}\sum\limits_{i = 1}^n {{{(arra{y_i} - \mu )}^2}} \\lower = \mu  - 3\sigma \\upper = \mu  + 3\sigma \end{array}\]</span></br>
 +
<span class="quote">(Equation 1.5)</span></br>
 +
 +
 +
 +
<span class="middle">\[\begin{array}{l}arra{y_i} = \left\{ \begin{array}{l}lower,{\rm{    }}arra{y_i} < lower\\arra{y_i},{\rm{      }}lower \le arra{y_i} \le upper\\upper,{\rm{    }}arra{y_i} > upper\end{array} \right.\\{\rm{array  =  }}\left( {array - \mu } \right)/\sigma \end{array}\]</span></br>
 +
 +
<span class="quote">(Equation 1.6)</span></br>
 +
 +
<span class="detail">The distribution of reaction frequency is skewed, cause most reactions can only endogenously construct in a few organisms, which leads to the unbalance of score values. So before turning the frequency data into the standard normal distribution, we use ln(x) to process it to get a more reliable result.We adjust the score array by the limits and turn it into the standard normal distribution. So that all the scores of different criteria are on the same scale, and we can calculate the total score through weighted summation method.</span></br>
 +
 +
</br>
 
<span class="h1" id="Additional">Additional function algorithm</span></br>
 
<span class="h1" id="Additional">Additional function algorithm</span></br>
 
<b>Microorganism recommendation</b></br>
 
<b>Microorganism recommendation</b></br>
<span class="detail">After searching the pathways, we first select some pathways and then use previous scoring method to calculate each route’s score of certain organism. Next we rank the average score of all species and the highest is the best. The number of the selected pathways n can be defined by users. The default value is 50.</span></br>
+
<span class="detail">After searching the pathways, we first select n(n is defined by users) pathways ranking by sum of free Gibbs energy  and then use previous scoring method to calculate each route’s score of certain organism. Next we rank the average score of all species and the highest is the best. The number of the selected pathways n can be defined by users. The default value is 50.</span></br>
  
<img src ="https://static.igem.org/mediawiki/2018/1/1a/T--Tongji-Software--model6.png" width="90%">
+
<img src ="https://static.igem.org/mediawiki/2018/archive/1/1a/20181007081059%21T--Tongji-Software--model6.png" width="75%"></br>
 +
<span class="top">Fig.5. At first, search all possible pathway by using DFS.then use previous scoring method to calculate each route’s score of certain organism. Next we rank the average score of all species and the highest is the best. The number of the selected pathways n can be defined by users.</span></br>
  
 +
<span class="detail">The average score of every organism:</span></br>
  
  
  
 +
<span class="middle">\[Ave(A) = \frac{{\sum\limits_{i = 1}^n {Score{A_i}} }}{n}\]</span>
 +
<span class="quote">(Equation 2.1)</span></br>
 +
 +
 +
<span class="detail">The organism with highest score will be the best.</span></br>
 +
<span class="middle">\[{Max\left\{ {Ave(A),Ave(B),Ave(C),Ave(D)...} \right\}\begin{array}{*{20}{c}}{}&{}\end{array}}\]</span>
 +
<span class="quote">(Equation 2.2)</span></br>
 +
 +
<b>Flux balance analysis(FBA)</b>
 +
 +
<span class="detail">Flux balance analysis is a mathematical approach for analyzing the flow of metabolites through a metabolic network. It required very little information in terms of the enzyme kinetic parameters and concentration of metabolites in the system in contrast to the traditionally followed approach of metabolic modeling using coupled ordinary differential equations.[3] FBA achieves this by making two assumptions, steady state and optimality.</span></br>
 +
 +
<span class="detail">Assumption 1: The modeled system has entered a steady state, where the metabolite concentrations no longer change, i.e. in each metabolite node the producing and consuming fluxes cancel each other out.</span></br>
 +
 +
<span class="detail">Assumption 2: The organism has been optimized through evolution for some biological goal, such as optimal growth or conservation of resources.</span></br>
 +
 +
<span class="detail">We use the COBRApy package to implement the function of FBA. [4]The following are illustrations of flux balance analysis.</span></br>
 +
 +
<span class="detail">First we construct a new model based on a model of E. coli core metabolism. This genome-scale metabolic network contains the core metabolism reactions in E. coli. When we need to construct novel reactions into E.coli, we can add the reactions in Systems Biology Markup Language(SBML) which is an XML-based standard format for distributing models supporting for COBRA models through the FBC extension version 2.</span></br>
 +
<figure>
 +
<img src ="https://static.igem.org/mediawiki/2018/archive/e/e0/20181007081731%21T--Tongji-Software--model10.png" width="80%"></br>
 +
<figcaption class="top">Fig.6. First we construct a new model based on a model of E. coli core metabolism. This genome-scale metabolic network contains the core metabolism reactions in E. coli. </figcaption>
 +
</figure>
 +
<span class="detail">Then we present metabolic reactions as a stoichiometric matrix (S) of size m × n. Every row of this matrix represents one unique compound (for a system with m compounds) and every column represents one reaction (n reactions). The entries in each column are the stoichiometric coefficients of the metabolites participating in a reaction. There is a negative coefficient for every metabolite consumed and a positive coefficient for every metabolite that is produced. A stoichiometric coefficient of zero is used for every metabolite that does not participate in a particular reaction.</span></br>
 +
 +
<img src ="https://static.igem.org/mediawiki/2018/archive/b/b6/20181007081929%21T--Tongji-Software--model11.png" width ="80%"></br>
 +
<span class="top">Fig.7.we present metabolic reactions as a stoichiometric matrix (S) of size m × n. Every row of this matrix represents one unique compound (for a system with m compounds) and every column represents one reaction (n reactions). </span></br>
 +
 +
<span class="detail">Constraints are represented in two ways, as equations that present steady-state mass balance and as inequalities that impose bounds on the system.</span></br>
 +
<span class="detail">The concentrations of all metabolites are represented by the vector x, with length m. The flux through all of the reactions in a network is represented by the vector v, which has a length of n. A steady-state mass balance constraint was imposed according to assumption 1.</span></br>
 +
 +
 +
<span class="middle">\[\begin{array}{l}dX/dt = 0\\
 +
{\rm{ }}S \times v = 0\end{array}\]</span>
 +
<span class="quote">(Equation 3.1)</span></br>
 +
 +
<span class="detail">Every reaction will be given upper and lower bounds, which define the maximum and minimum allowable fluxes of the reactions. In our software, v_^Upper was set to 1000 mmol/gDW/hour and v_^Lower was set to 0 or -1000 mmol/gDW/hour for irreversible and reversible reactions, respectively.</span></br>
 +
 +
 +
<span class="middle">\[{\rm{ }}{v^{Lower}} \le {v_i} \le {v^{Upper}}\]</span>
 +
<span class="quote">(Equation 3.2)</span></br>
 +
 +
<span class="detail">The next step is to define the objective function. It can be any linear combination of fluxes, where c is a vector of weights indicating how much each reaction (such as the biomass reaction when simulating maximum growth) contributes to the objective function. This function is defined by users.</span></br>
 +
 +
 +
<span middle="middle">\[Z = {c^T}v\]</span>
 +
<span class="quote">(Equation 3.3)</span></br>
 +
 +
<span class="detail">Last we use linear programming to identify a flux distribution that maximizes or minimizes the objective function within the space of allowable fluxes defined by the constraints imposed by the mass balance equations and reaction bounds.</span></br>
 +
 +
<b>Similarity Comparison of Compound</b></br>
 +
<span class="detail">We use Extended-Connectivity Fingerprints (ECFPs) to present the structure of compound for molecular similarity searching. ECFPs represent molecular structures by means of circular atom neighborhoods.[5]</span></br>
 +
 +
 +
<b>Representation:</b></br>
 +
<span class="detail">The representation of ECFPs is by means of varying-length lists of integer identifiers. Each identifier represents a particular substructure, more precisely, a circular atom neighborhood, which is present in the molecule. This identifier captures some local information about the corresponding atom in such a way that various atom properties (e.g., atomic number, connection count, etc.) are packed into a single integer value using a hash function. [5]The list of integer identifiers is sorted in ascending order. Then the identifier list is converted to the fixed length bit string.[6]</span></br>
 +
 +
<img src ="https://static.igem.org/mediawiki/2018/c/c3/T--Tongji-Software--model18.png" width="80%"></br>
 +
<span class="top">Fig.8. ECFP generation process.</span></br>
 +
 +
<img src ="https://static.igem.org/mediawiki/2018/6/69/T--Tongji-Software--model19.png" width="80%"></br>
 +
<span class="top">Fig.9. Generation of the fixed-length bit string ("folding")</span></br>
 +
 +
<span class="quote">More information could be found in <a href="https://docs.chemaxon.com/display/docs/Extended+Connectivity+Fingerprint+ECFP">here</a>.</span></br>
 +
 +
<span class="detail">We turn the fixed-length binary of a compound into an array. We use dice coefficient to evaluate the similarity of two compounds, which is a function to measure the set similarity.</span></br>
 +
 +
<span class="detail">Length: This parameter specifies the length of the bit string representation. The default length is 1024.</span></br>
 +
<b>a</b>: Array of compound a</br>
 +
<b>b</b>: Array of compound b</br>
 +
 +
 +
 +
<span class="middle">\[{\rm{ }}Dice(a,b) = \frac{{2 \times \sum\limits_{i = 1}^{length(a)} {commo{n_i}(a,b)} }}{{length(a) + length(b)}}\]</span>
 +
<span class="quote">(Equation 4.1)</span>
 +
 +
<span class="middle">\[commo{n_i}(a,b) = \left\{ \begin{array}{l}1,{\rm{ }}\begin{array}{*{20}{c}}{}\end{array}{{\rm{a}}_i} = {b_i}\\0{\rm{,  }}\begin{array}{*{20}{c}}{}\end{array}{{\rm{a}}_i} \ne {b_i}\end{array} \right.\]</span>
 +
<span class="quote">(Equation 4.2)</span>
 +
 +
 +
<b>Atom conservation</b></br>
 +
<span class="detail">There is a one-to-one correspondence between atom index from reactant and atom index from product in <b>MetaCyc</b>. We integrated the data form into the main pairs in <b>KEGG</b>. We cleaned and removed the redundant data. In one pathway, each step of the reaction is recursive, leaving only the atomic number derived from the source compound, and the rest of the positions are -1. Finally, you can calculate how many atoms in the target are from the source compound.</span></br>
 +
 +
<span class="detail">In order to calculate the final atom conservation rate in a pathway, we need to figure out a data format to describe the atom transferring. In our data format, we create an array for each reactant and product. Every atom in the product has a specific position which is labeled as a sequential array. Each number in the array of reactant means the position of the target compound that atom will transfer to.</span></br>
 +
 +
<img src="https://static.igem.org/mediawiki/2018/2/2b/T--Tongji-Software--model20.png" width ="80%"></br>
 +
<span class="top">Fig.10. Some concepts we define in a certain pathway.</span></br>
 +
 +
 +
<span class="middle">\[T(i,source,target) = \left\{ \begin{array}{l} - 1{\rm{ ,    }}\begin{array}{*{20}{c}}{}&{}\end{array}{\rm{  }}{{\rm{i}}^{th}}{\rm{ atom}}\begin{array}{*{20}{c}}{}\end{array}{\rm{of}}\begin{array}{*{20}{c}}{}\end{array}{\rm{source}}\begin{array}{*{20}{c}}{}\end{array}{\rm{doesn't}}\begin{array}{*{20}{c}}{}\end{array}{\rm{transfer}}\begin{array}{*{20}{c}}{}\end{array}{\rm{to}}\begin{array}{*{20}{c}}{}\end{array}{\rm{target}}\\x{\rm{  ,  }}\begin{array}{*{20}{c}}{}&{}\end{array}{\rm{  }}{{\rm{i}}^{th}}{\rm{ atom}}\begin{array}{*{20}{c}}{}\end{array}{\rm{of}}\begin{array}{*{20}{c}}{}\end{array}{\rm{source}}\begin{array}{*{20}{c}}{}\end{array}{\rm{transfers}}\begin{array}{*{20}{c}}{}\end{array}{\rm{to}}\begin{array}{*{20}{c}}{}\end{array}{\rm{the}}\begin{array}{*{20}{c}}{}\end{array}{{\rm{x}}^{th}}\begin{array}{*{20}{c}}{}\end{array}{\rm{atom}}\begin{array}{*{20}{c}}{}\end{array}{\rm{of}}\begin{array}{*{20}{c}}{}\end{array}{\rm{target}}\end{array} \right.\]</span>
 +
<span class="quote">(Equation 5.1)</span></br>
 +
 +
 +
<span class="middle">\[\begin{array}{l}{{\rm{R}}_j}(i) = T(i,{{\rm{R}}_{\rm{j}}}{\rm{,}}{{\rm{P}}_{\rm{j}}}{\rm{)  }}\begin{array}{*{20}{c}}{}&{}\end{array}{\rm{ i =  1,2, }}...{\rm{ , A(}}{{\rm{R}}_{\rm{j}}}{\rm{)    }}\begin{array}{*{20}{c}}{}&{}\end{array}{\rm{    j = 1,2, }}...{\rm{ , n}}\\{\rm{                                  }}{{\rm{P}}_j}(i) = T(i,{{\rm{P}}_j}{\rm{,}}{{\rm{R}}_{\rm{j}}}{\rm{)  }}\begin{array}{*{20}{c}}{}&{}\end{array}{\rm{  i =  1,2, }}...{\rm{ , A(}}{{\rm{P}}_{\rm{j}}}{\rm{)    }}\begin{array}{*{20}{c}}{}&{}\end{array}{\rm{    j = 1,2, }}...{\rm{ , n}}\end{array}\]</span>
 +
<span class="quote">(Equation 5.2)</span></br>
 +
<span class="detail">where R<sub>j</sub> denotes the n<sup>th</sup> reactant, R<sub>j</sub>(i) denotes the i<sup>th</sup> number of the R<sub>j</sub> array, n denotes the amount of reactants in this pathway, A(R<sub>j</sub>) denotes the amount of atoms in R<sub>j</sub>. Other symbols with P have the same meanings about product.</span></br>
 +
 +
 +
<span class="middle">\[{{\rm{R}}_{{\rm{j + 1}}}}(i) = \left\{ \begin{array}{l} - 1,\begin{array}{*{20}{c}}{}&{}&{}&{}&{}&{}&{}&{}&{}\end{array}{{\rm{P}}_{\rm{j}}}(i) =  - 1\\T(i,{\rm{reactant,product),}}\begin{array}{*{20}{c}}{}&{}\end{array}{{\rm{P}}_{\rm{j}}}(i) \ne  - 1\end{array} \right.\]</span>
 +
<span class="quote">(Equation 5.3)</span></br>
 +
 +
<span class="detail">Since the j<sup>th</sup> product is the (j+1)<sup>th</sup> reactant in the same pathway, the R<sup>j+1</sup> array should be adjusted according to the Pn array. Only the atoms from nth reactant can transfer to the (j+1)<sup>th</sup> reactant. In this way we can find out how many atoms are conserved through our data format.</span></br>
 +
 +
 +
<span class="middle">\[\begin{array}{l}{\rm{C(}}{{\rm{R}}_{\rm{j}}}(i)) = \left\{ \begin{array}{l}0,{\rm{  }}{{\rm{R}}_{\rm{j}}}(i) =  - 1\\1,{\rm{  }}{{\rm{R}}_{\rm{j}}}(i) \ne  - 1{\rm{        }}\end{array} \right.\begin{array}{*{20}{c}}{}&{}\end{array}i = 1,2,...,{\rm{ A(}}{{\rm{R}}_{\rm{j}}}{\rm{)}}\\{\rm{                                        C(}}{{\rm{P}}_{\rm{j}}}(i)) = \left\{ \begin{array}{l}0,{\rm{  }}{{\rm{P}}_{\rm{j}}}(i) =  - 1\\1,{\rm{  }}{{\rm{P}}_{\rm{j}}}(i) \ne  - 1{\rm{        }}\end{array} \right.\begin{array}{*{20}{c}}{}&{}\end{array}i = 1,2,...,{\rm{ A(}}{{\rm{P}}_{\rm{j}}}{\rm{)}}\end{array}\]</span>
 +
<span class="quote">(Equation 5.4)</span></br>
 +
 +
<span class="middle">\[{\rm{ }}Atom\begin{array}{*{20}{c}}{}\end{array}conservation\begin{array}{*{20}{c}}{}\end{array}rate{\rm{  =  }}\frac{{\sum\limits_{{\rm{i = 1}}}^{{\rm{A(}}{{\rm{P}}_{\rm{n}}}{\rm{)}}} {{\rm{C(}}{{\rm{P}}_{\rm{n}}}(i))} }}{{{\rm{A}}({R_1})}} \times 100\% \]</span></br>
 +
<span class="quote">(Equation 5.5)</span></br>
 +
 +
<span class="detail">At last, we can calculate the atom conservation rate through the first reactant array and the final product array.</span></br>
 +
 +
<b>References:</b></br>
 +
[1] Rogers, D.; Hahn, M. Extended-Connectivity Fingerprints. J. Chem. Inf. Model. 2010, 50(5): 742-754.</br>
 +
[2] Hiroyuki Kuwahara, Meshari Alazmi, Xuefeng Cui and Xin Gao. MRE: a web tool to suggest foreign enzymes for the biosynthesis pathway design with competing endogenous reactions in mind. Nucleic Acids Research, 2016, Vol. 44, Web Server issue W217–W225.</br>
 +
[3] Orth, J.D.; Thiele, I.; Palsson, B.Ø. (2010). "What is flux balance analysis?". Nature Biotechnology. 28: 245–248. doi:10.1038/nbt.1614. PMC 3108565 Freely accessible. PMID 20212490</br>
 +
[4] Schellenberger J, Que R, Fleming RMT, Thiele I, Orth JD, Feist AM, Zielinski DC, Bordbar A, Lewis NE, Rahmanian S et al., Quantitative prediction of cellular metabolism with constraint-based models: the COBRA Toolbox v2.0  Nature Protocol, 2011,6(9):1290-307.</br>
 +
[5] Rogers, D.; Hahn, M. Extended-Connectivity Fingerprints. J. Chem. Inf. Model. 2010, 50(5): 742-754.</br>
 +
[6] Morgan, H. L. The Generation of a Unique Machine Description for Chemical Structures - A Technique Developed at Chemical Abstracts Service. J. Chem. Doc. 1965, 5: 107-112.</br>
 +
 +
<div id="contact">
 +
<h3>CONTACT</h3>
 +
<span>annecao@tongji.edu.cn</span><hr/>
 +
<h3>ADDRESS</h3>
 +
<span>1239# Siping Road</br>Tongji University,Shanghai China</span><hr/>
 +
<h3>GET IN TOUCH</h3>
 +
<a href="https://www.facebook.com/tongjiIGEMer/?ref=bookmarks"><img src="https://static.igem.org/mediawiki/2018/8/8c/T--Tongji-Software--facebook.png" width="5%"></a>
 
</div>
 
</div>
 +
 +
</div>
 +
</div>
 +
 +
<script src="https://2018.igem.org/wiki/index.php?title=Template:Tongji-Software/test/js/jquery-1.8.3.min.js&action=raw&ctype=text/javascript"></script>
 +
<script src="https://2018.igem.org/wiki/index.php?title=Template:Tongji-Software/test/js/bootstrap.min.js&action=raw&ctype=text/javascript"></script>
 +
<script type="text/javascript">
 +
$(document).ready(function () {
 +
  var trigger = $('.icon'),
 +
    slide = $('.left'),
 +
isClosed = true;
 +
 +
trigger.mouseover(function () {
 +
  cross_out();     
 +
});
 +
slide.mouseleave(function () {
 +
  cross_in();     
 +
});
 +
 +
function cross_out() {
 +
  if (isClosed == true) {
 +
trigger.removeClass('is-open');
 +
trigger.addClass('is-closed');
 +
 +
$('#wrapper').toggleClass('toggled');
 +
isClosed = false;
 +
  }
 +
}
 +
function cross_in() {
 +
  if (isClosed == false) {
 +
trigger.removeClass('is-closed');
 +
trigger.addClass('is-open');
 +
 +
$('#wrapper').toggleClass('toggled');
 +
isClosed = true;
 +
  }
 +
  }
 +
});
 +
</script>
 +
 +
<script>
 +
window.onload = function(){
 +
           
 +
                var mybtn = document.getElementById("btn");
 +
                var Time1 = null;
 +
                var isTop = true;
 +
 +
                mybtn.onclick = function(){
 +
                 
 +
                    Time1 = setInterval(function(){
 +
                        var osTop = document.body.scrollTop||document.documentElement.scrollTop;
 +
                       
 +
                        var speed = Math.ceil(osTop/2);
 +
                       
 +
                        document.body.scrollTop = document.documentElement.scrollTop = osTop - speed;
 +
                        if(osTop <= 0){
 +
                           
 +
                            clearInterval(Time1);
 +
                        }
 +
                        isTop = true ;
 +
                    },100)
 +
                }
 +
               
 +
               
 +
                window.onscroll = function(){
 +
                   
 +
                    var osTop = document.body.scrollTop||document.documentElement.scrollTop;
 +
                   
 +
                    var clientHeight = document.documentElement.clientHeight;
 +
                   
 +
                    if (osTop>clientHeight) {
 +
                        mybtn.style.display = "block"
 +
                    }
 +
                    else{
 +
                        mybtn.style.display = "none"
 +
                    }
 +
                    if (!isTop) {
 +
                        clearInterval(Time1);
 +
                    }
 +
                    isTop = false;
 +
                }
 +
            }
 +
 +
</script>
 +
 +
 
</body>
 
</body>
 
</html>
 
</html>

Latest revision as of 16:24, 17 October 2018

Model
Overview
We will introduce how we use knowledge of mathematics and algorithms to implement function of Alpha Ant in details in this section. It consists of three parts, Pathway Search Algorithm, Pathway Ranking Methods and Additional functions.

Search Algorithm: DFS
Seeking for biosynthesis pathways from the starting material to the product is a typical search problem. We abstracted the original biosynthesis pathway search problem and turned it into a directed graph search problem.
First we built a directed graph that models the transformation of metabolites where its vertices represent metabolites and its edges represent chemical transformations via reactions.

Fig.1. Search algorithm:DFS(Depth-first Search)
Then we used the Depth-first search algorithm to traverse and search this graph data structures. Not only do we need to get all of the solutions that satisfy the constraints, but also need to record the search path. The DFS algorithm starts at the root node and explores as far as possible along each branch before backtracking. The search remembers previously visited nodes and will not repeat them therefore avoiding infinite loop. As a result, all solutions that satisfy the constraints will be returned.
The procedure of DFS is described as follows:

Fig.2. The procedure of DFS
The reason we choose this algorithm is that it can solve the pathway search problem efficiently. Based on adjacency matrix, DFS algorithm can solve the problem within the time complexity of O (E + V).[1] ‘E’ means the number of edges and ‘V’ means the number of vertex, and we are able to solve the problem by traversing all the edges and vertex just only once. Moreover, the core algorithm of DFS is flexible, so that we can combine it with other evaluation algorithms to solve more complex problems.

Pathway Ranking Criteria
We adopt three criteria to evaluate the efficacy of the pathways which are thermodynamic feasibility& precursor competition, toxicity of metabolites and atom conservation. After grading pathway using different criteria, we normalize the scores and give users the right to define different weights of different ranking criteria.
The criterion of thermodynamic feasibility& precursor competition:
We use a statistical mechanical model to present the competition for a metabolic precursor with endogenous reactions. We compute the probability of each reaction with ∆_r G^('°) ( the standard reaction Gibbs free energy) through the Boltzmann distribution according to study of Hiroyuki Kuwahara et.al[2]. Here is the mathematical description.

Fig.3. We consider C as the metabolic precursor. Let RN be a set of native reactions that can transform C in a given host organism and Rr is the reaction in the pathway to be evaluated.
The Boltzmann factor of reaction r that can transform C :
\[{{\rm{e}}^{ - {\Delta _r}{G^{' \circ /RT}}}}\] (Equation 1.1)
We define the normalized Boltzmann factor for r as f(r) :
∆_r G^('°): the standard reaction Gibbs energy
RN: a set of native reactions that can transform C in a given host organism
R : gas constant
T : absolute temperature

(Equation 1.2) If r∈R_N , then f(r) is simply based on the Boltzmann distribution of the native reaction system transforming compound C. If r∉R_N, then f(r) is based on the Boltzmann distribution of the reaction system that contains all native reactions transforming C and foreign reaction r.


Fig.4. Gibbs standard energy data distribution.
For each pathway, every reaction r in the pathway has the score log f(r),The score of the pathway is as follows:
\[S = \log f(r1) + \log f(r2) + ... + \log f(rn)\] (Equation 1.3)
Every reaction r in the pathway has the score log f(r),S refers to total score.S is used to evaluate the pathway. The higher the score is, the better the pathway is.

\[{S_{total}} = \left( {\begin{array}{*{20}{c}}{{S_{\;th}}}&{{S_t}}&{{S_f}}\end{array}} \right)\left( {\begin{array}{*{20}{c}}{{w_{th}}}\\{{w_t}}\\{{w_f}}\end{array}} \right)\]
(Equation 1.4)
The equation above show how the total score is calculated. Corresponding to each ranking criteria, we give them a certain weight.Of course, users can adjust the weight as well. Consequently, the result list is arranged in descending order of total score.
Standardization of score
Graphs below show the distribution of our data.


In order to elimate the influence of exceptional data, we use pauta criterion to calculate the upper and lower limit according to average value and variance.
\[\begin{array}{l}\mu {\rm{ = }}\frac{1}{n}\sum\limits_{i = 1}^n {arra{y_i}} \\\sigma = \frac{1}{n}\sum\limits_{i = 1}^n {{{(arra{y_i} - \mu )}^2}} \\lower = \mu - 3\sigma \\upper = \mu + 3\sigma \end{array}\]
(Equation 1.5)
\[\begin{array}{l}arra{y_i} = \left\{ \begin{array}{l}lower,{\rm{ }}arra{y_i} < lower\\arra{y_i},{\rm{ }}lower \le arra{y_i} \le upper\\upper,{\rm{ }}arra{y_i} > upper\end{array} \right.\\{\rm{array = }}\left( {array - \mu } \right)/\sigma \end{array}\]
(Equation 1.6)
The distribution of reaction frequency is skewed, cause most reactions can only endogenously construct in a few organisms, which leads to the unbalance of score values. So before turning the frequency data into the standard normal distribution, we use ln(x) to process it to get a more reliable result.We adjust the score array by the limits and turn it into the standard normal distribution. So that all the scores of different criteria are on the same scale, and we can calculate the total score through weighted summation method.

Additional function algorithm
Microorganism recommendation
After searching the pathways, we first select n(n is defined by users) pathways ranking by sum of free Gibbs energy and then use previous scoring method to calculate each route’s score of certain organism. Next we rank the average score of all species and the highest is the best. The number of the selected pathways n can be defined by users. The default value is 50.

Fig.5. At first, search all possible pathway by using DFS.then use previous scoring method to calculate each route’s score of certain organism. Next we rank the average score of all species and the highest is the best. The number of the selected pathways n can be defined by users.
The average score of every organism:
\[Ave(A) = \frac{{\sum\limits_{i = 1}^n {Score{A_i}} }}{n}\] (Equation 2.1)
The organism with highest score will be the best.
\[{Max\left\{ {Ave(A),Ave(B),Ave(C),Ave(D)...} \right\}\begin{array}{*{20}{c}}{}&{}\end{array}}\] (Equation 2.2)
Flux balance analysis(FBA) Flux balance analysis is a mathematical approach for analyzing the flow of metabolites through a metabolic network. It required very little information in terms of the enzyme kinetic parameters and concentration of metabolites in the system in contrast to the traditionally followed approach of metabolic modeling using coupled ordinary differential equations.[3] FBA achieves this by making two assumptions, steady state and optimality.
Assumption 1: The modeled system has entered a steady state, where the metabolite concentrations no longer change, i.e. in each metabolite node the producing and consuming fluxes cancel each other out.
Assumption 2: The organism has been optimized through evolution for some biological goal, such as optimal growth or conservation of resources.
We use the COBRApy package to implement the function of FBA. [4]The following are illustrations of flux balance analysis.
First we construct a new model based on a model of E. coli core metabolism. This genome-scale metabolic network contains the core metabolism reactions in E. coli. When we need to construct novel reactions into E.coli, we can add the reactions in Systems Biology Markup Language(SBML) which is an XML-based standard format for distributing models supporting for COBRA models through the FBC extension version 2.

Fig.6. First we construct a new model based on a model of E. coli core metabolism. This genome-scale metabolic network contains the core metabolism reactions in E. coli.
Then we present metabolic reactions as a stoichiometric matrix (S) of size m × n. Every row of this matrix represents one unique compound (for a system with m compounds) and every column represents one reaction (n reactions). The entries in each column are the stoichiometric coefficients of the metabolites participating in a reaction. There is a negative coefficient for every metabolite consumed and a positive coefficient for every metabolite that is produced. A stoichiometric coefficient of zero is used for every metabolite that does not participate in a particular reaction.

Fig.7.we present metabolic reactions as a stoichiometric matrix (S) of size m × n. Every row of this matrix represents one unique compound (for a system with m compounds) and every column represents one reaction (n reactions).
Constraints are represented in two ways, as equations that present steady-state mass balance and as inequalities that impose bounds on the system.
The concentrations of all metabolites are represented by the vector x, with length m. The flux through all of the reactions in a network is represented by the vector v, which has a length of n. A steady-state mass balance constraint was imposed according to assumption 1.
\[\begin{array}{l}dX/dt = 0\\ {\rm{ }}S \times v = 0\end{array}\] (Equation 3.1)
Every reaction will be given upper and lower bounds, which define the maximum and minimum allowable fluxes of the reactions. In our software, v_^Upper was set to 1000 mmol/gDW/hour and v_^Lower was set to 0 or -1000 mmol/gDW/hour for irreversible and reversible reactions, respectively.
\[{\rm{ }}{v^{Lower}} \le {v_i} \le {v^{Upper}}\] (Equation 3.2)
The next step is to define the objective function. It can be any linear combination of fluxes, where c is a vector of weights indicating how much each reaction (such as the biomass reaction when simulating maximum growth) contributes to the objective function. This function is defined by users.
\[Z = {c^T}v\] (Equation 3.3)
Last we use linear programming to identify a flux distribution that maximizes or minimizes the objective function within the space of allowable fluxes defined by the constraints imposed by the mass balance equations and reaction bounds.
Similarity Comparison of Compound
We use Extended-Connectivity Fingerprints (ECFPs) to present the structure of compound for molecular similarity searching. ECFPs represent molecular structures by means of circular atom neighborhoods.[5]
Representation:
The representation of ECFPs is by means of varying-length lists of integer identifiers. Each identifier represents a particular substructure, more precisely, a circular atom neighborhood, which is present in the molecule. This identifier captures some local information about the corresponding atom in such a way that various atom properties (e.g., atomic number, connection count, etc.) are packed into a single integer value using a hash function. [5]The list of integer identifiers is sorted in ascending order. Then the identifier list is converted to the fixed length bit string.[6]

Fig.8. ECFP generation process.

Fig.9. Generation of the fixed-length bit string ("folding")
More information could be found in here.
We turn the fixed-length binary of a compound into an array. We use dice coefficient to evaluate the similarity of two compounds, which is a function to measure the set similarity.
Length: This parameter specifies the length of the bit string representation. The default length is 1024.
a: Array of compound a
b: Array of compound b
\[{\rm{ }}Dice(a,b) = \frac{{2 \times \sum\limits_{i = 1}^{length(a)} {commo{n_i}(a,b)} }}{{length(a) + length(b)}}\] (Equation 4.1) \[commo{n_i}(a,b) = \left\{ \begin{array}{l}1,{\rm{ }}\begin{array}{*{20}{c}}{}\end{array}{{\rm{a}}_i} = {b_i}\\0{\rm{, }}\begin{array}{*{20}{c}}{}\end{array}{{\rm{a}}_i} \ne {b_i}\end{array} \right.\] (Equation 4.2) Atom conservation
There is a one-to-one correspondence between atom index from reactant and atom index from product in MetaCyc. We integrated the data form into the main pairs in KEGG. We cleaned and removed the redundant data. In one pathway, each step of the reaction is recursive, leaving only the atomic number derived from the source compound, and the rest of the positions are -1. Finally, you can calculate how many atoms in the target are from the source compound.
In order to calculate the final atom conservation rate in a pathway, we need to figure out a data format to describe the atom transferring. In our data format, we create an array for each reactant and product. Every atom in the product has a specific position which is labeled as a sequential array. Each number in the array of reactant means the position of the target compound that atom will transfer to.

Fig.10. Some concepts we define in a certain pathway.
\[T(i,source,target) = \left\{ \begin{array}{l} - 1{\rm{ , }}\begin{array}{*{20}{c}}{}&{}\end{array}{\rm{ }}{{\rm{i}}^{th}}{\rm{ atom}}\begin{array}{*{20}{c}}{}\end{array}{\rm{of}}\begin{array}{*{20}{c}}{}\end{array}{\rm{source}}\begin{array}{*{20}{c}}{}\end{array}{\rm{doesn't}}\begin{array}{*{20}{c}}{}\end{array}{\rm{transfer}}\begin{array}{*{20}{c}}{}\end{array}{\rm{to}}\begin{array}{*{20}{c}}{}\end{array}{\rm{target}}\\x{\rm{ , }}\begin{array}{*{20}{c}}{}&{}\end{array}{\rm{ }}{{\rm{i}}^{th}}{\rm{ atom}}\begin{array}{*{20}{c}}{}\end{array}{\rm{of}}\begin{array}{*{20}{c}}{}\end{array}{\rm{source}}\begin{array}{*{20}{c}}{}\end{array}{\rm{transfers}}\begin{array}{*{20}{c}}{}\end{array}{\rm{to}}\begin{array}{*{20}{c}}{}\end{array}{\rm{the}}\begin{array}{*{20}{c}}{}\end{array}{{\rm{x}}^{th}}\begin{array}{*{20}{c}}{}\end{array}{\rm{atom}}\begin{array}{*{20}{c}}{}\end{array}{\rm{of}}\begin{array}{*{20}{c}}{}\end{array}{\rm{target}}\end{array} \right.\] (Equation 5.1)
\[\begin{array}{l}{{\rm{R}}_j}(i) = T(i,{{\rm{R}}_{\rm{j}}}{\rm{,}}{{\rm{P}}_{\rm{j}}}{\rm{) }}\begin{array}{*{20}{c}}{}&{}\end{array}{\rm{ i = 1,2, }}...{\rm{ , A(}}{{\rm{R}}_{\rm{j}}}{\rm{) }}\begin{array}{*{20}{c}}{}&{}\end{array}{\rm{ j = 1,2, }}...{\rm{ , n}}\\{\rm{ }}{{\rm{P}}_j}(i) = T(i,{{\rm{P}}_j}{\rm{,}}{{\rm{R}}_{\rm{j}}}{\rm{) }}\begin{array}{*{20}{c}}{}&{}\end{array}{\rm{ i = 1,2, }}...{\rm{ , A(}}{{\rm{P}}_{\rm{j}}}{\rm{) }}\begin{array}{*{20}{c}}{}&{}\end{array}{\rm{ j = 1,2, }}...{\rm{ , n}}\end{array}\] (Equation 5.2)
where Rj denotes the nth reactant, Rj(i) denotes the ith number of the Rj array, n denotes the amount of reactants in this pathway, A(Rj) denotes the amount of atoms in Rj. Other symbols with P have the same meanings about product.
\[{{\rm{R}}_{{\rm{j + 1}}}}(i) = \left\{ \begin{array}{l} - 1,\begin{array}{*{20}{c}}{}&{}&{}&{}&{}&{}&{}&{}&{}\end{array}{{\rm{P}}_{\rm{j}}}(i) = - 1\\T(i,{\rm{reactant,product),}}\begin{array}{*{20}{c}}{}&{}\end{array}{{\rm{P}}_{\rm{j}}}(i) \ne - 1\end{array} \right.\] (Equation 5.3)
Since the jth product is the (j+1)th reactant in the same pathway, the Rj+1 array should be adjusted according to the Pn array. Only the atoms from nth reactant can transfer to the (j+1)th reactant. In this way we can find out how many atoms are conserved through our data format.
\[\begin{array}{l}{\rm{C(}}{{\rm{R}}_{\rm{j}}}(i)) = \left\{ \begin{array}{l}0,{\rm{ }}{{\rm{R}}_{\rm{j}}}(i) = - 1\\1,{\rm{ }}{{\rm{R}}_{\rm{j}}}(i) \ne - 1{\rm{ }}\end{array} \right.\begin{array}{*{20}{c}}{}&{}\end{array}i = 1,2,...,{\rm{ A(}}{{\rm{R}}_{\rm{j}}}{\rm{)}}\\{\rm{ C(}}{{\rm{P}}_{\rm{j}}}(i)) = \left\{ \begin{array}{l}0,{\rm{ }}{{\rm{P}}_{\rm{j}}}(i) = - 1\\1,{\rm{ }}{{\rm{P}}_{\rm{j}}}(i) \ne - 1{\rm{ }}\end{array} \right.\begin{array}{*{20}{c}}{}&{}\end{array}i = 1,2,...,{\rm{ A(}}{{\rm{P}}_{\rm{j}}}{\rm{)}}\end{array}\] (Equation 5.4)
\[{\rm{ }}Atom\begin{array}{*{20}{c}}{}\end{array}conservation\begin{array}{*{20}{c}}{}\end{array}rate{\rm{ = }}\frac{{\sum\limits_{{\rm{i = 1}}}^{{\rm{A(}}{{\rm{P}}_{\rm{n}}}{\rm{)}}} {{\rm{C(}}{{\rm{P}}_{\rm{n}}}(i))} }}{{{\rm{A}}({R_1})}} \times 100\% \]
(Equation 5.5)
At last, we can calculate the atom conservation rate through the first reactant array and the final product array.
References:
[1] Rogers, D.; Hahn, M. Extended-Connectivity Fingerprints. J. Chem. Inf. Model. 2010, 50(5): 742-754.
[2] Hiroyuki Kuwahara, Meshari Alazmi, Xuefeng Cui and Xin Gao. MRE: a web tool to suggest foreign enzymes for the biosynthesis pathway design with competing endogenous reactions in mind. Nucleic Acids Research, 2016, Vol. 44, Web Server issue W217–W225.
[3] Orth, J.D.; Thiele, I.; Palsson, B.Ø. (2010). "What is flux balance analysis?". Nature Biotechnology. 28: 245–248. doi:10.1038/nbt.1614. PMC 3108565 Freely accessible. PMID 20212490
[4] Schellenberger J, Que R, Fleming RMT, Thiele I, Orth JD, Feist AM, Zielinski DC, Bordbar A, Lewis NE, Rahmanian S et al., Quantitative prediction of cellular metabolism with constraint-based models: the COBRA Toolbox v2.0 Nature Protocol, 2011,6(9):1290-307.
[5] Rogers, D.; Hahn, M. Extended-Connectivity Fingerprints. J. Chem. Inf. Model. 2010, 50(5): 742-754.
[6] Morgan, H. L. The Generation of a Unique Machine Description for Chemical Structures - A Technique Developed at Chemical Abstracts Service. J. Chem. Doc. 1965, 5: 107-112.

CONTACT

annecao@tongji.edu.cn

ADDRESS

1239# Siping Road
Tongji University,Shanghai China

GET IN TOUCH