Difference between revisions of "Team:Tongji-Software/Project"

Line 444: Line 444:
  
 
<b>6. One-direction search</b></br>
 
<b>6. One-direction search</b></br>
<span class="detail" >We want to provide more possibilities to everyone concerned. </span></br>
+
<span class="detail" >We want to provide more possibilities to everyone concerned. You only need to input one of source compound and target compound. As long as you enter one of them, you will come up with many possible outcomes, some of which may inspire you to think about the problem with new ideas. </span></br>
<span class="detail" >You only need to input one of source compound and target compound. As long as you enter one of them, you will come up with many possible outcomes, some of which may inspire you to think about the problem with new ideas. </span></br>
+
  
  

Revision as of 12:07, 14 October 2018

PROJECT
What is Alpha Ant?
Already got a brilliant idea in metabolic engineering?
Too much information to search?
Still,
You need Alpha Ant as a guide
Alpha ant is a computational tool for pathway design and reconstruction. With full consideration of metabolic burden and some useful functions, we provide an efficient and powerful pathway design guide. Why Alpha Ant?
Background
Pathway engineering has proven indispensible in synthetic biology for its utility in design of microbes for generating value-added products, which is also the ultimate goal of our project. The core idea is to design and reconstruct pathway for proper use, including introducing heterologous metabolic reaction into a host organism, optimizing genetic processes within cells, modeling for yield prediction, flux balance analysis and so on.
However, it’s quite a challenge to reach high yield and productivity while balancing the metabolic burden in certain organism. For example, it can require sorting through thousands of possible reactions and enzymes. Also, it requires evaluation and simulation of pathway using in silico analysis. Of course, wet-lab experiments are necessary for pathway validation. Inspiration
After investigating into metabolic pathway engineering, we realize that there is much work to do in a certain project. Commonly, we have to do a lot of research before we get started with actual experiment, such as database search, paper reading and so on. So that’s why we come up with this idea that help synthetic biologists do previous work in some ways. Fig.1. Process of traditional pathway design.
In specific, We collect metabolic data from several databases including metabolic reactions, reaction main pairs, enzyme , gene , compound and so on. In this section, we provide all the related information of metabolic pathway. When it comes to ranking criteria, we choose some reliable ones and design novel ones to make our results more convincing. As for pathway search algorithm, we choose DFS (depth first search) because of its great performance in both speed and quality. Origin of Name: Alpha Ant
Alpha Ant stands for an efficient and convenient tool for pathway engineering. Alpha means ‘origin’. In fact, Alpha Ant is the first software equipped with the most comprehensive ranking criteria. Recently, it impressed people by the project “Alpha GO”, which also endowed “Alpha” with intelligence. Alpha Ant means its capacity to find the most efficient metabolic pathway linking two molecules is just like the ant colony’s intelligence of quickly organizing itself to find the most efficient path to a food source once it has been discovered by scouts. So ants are great signal detector and way finder. Fig.2.The meaning and origin of the name-Alpha Ant.
Data processing
We acquire metabolic reactions, gene information from KEGG. Standard Gibbs Energy are from MetaCyc and eQuilibrator. Furthermore, we obtain compound information from KEGG , ChEBI & KnowledgeBase. Enzyme data are from BRENDA and KEGG. The small molecular drug information is from DRUGBANK. Besides, we use MayaChemTools to calculate physiochemical properties of compounds. Since we use so many databases, we came across some problem during data processing. Most challenging thing is to string all these information together because each database has its unique ID and special data format. We tried our best to integrate all these information and we hope our software can be useful to synthetic biologists. Fig.3. Databases we use to integrate data.
Algorithm
Finding proper metabolic pathway is a typical search problem. Consequently, we turn a biosynthesis problem into a directed graph search problem. Not only do we need to get all of the solutions that satisfy the constraints, but also need to record the search path. Fig.4. we turn a biosynthesis problem into a directed graph search problem
We use DFS algorithm here. In theoretical computer science, DFS is typically used to traverse an entire graph, and takes time Θ(|V| + |E|),[2] linear in the size of the graph. The core idea of DFS is simple and elegant, so that it is convenient for us to introduce appropriate pruning algorithms based on the original algorithm. More details can be found in model. Fig.5. DFS(depth-first search).Order in which the nodes are visited.
Ranking criteria
In total, we have three ranking criteria, which are thermodynamic feasibility & competition of heterologous reactions, atom mapping and toxicity of compound. Rights are given to users to decide different weights of different ranking criteria. Many of you may think that length of pathway should be one of the ranking criteria , however, the fact is that the shortest pathway could be the most unrealistic one. So we decide not to use it. 1. Thermodynamic feasibility & competition of heterologous reactions As we all known, thermodynamic feasibility of a certain reaction can decide the probability of reaction. In many occasions , the smaller standard Gibbs is, the more probability of reaction is. And so does competition of heterologous reactions. Enzymes, ribosomes and source compounds are possible things that may trigger We compute the probability of each reaction with △rG through the Boltzmann distribution. According to study of Hiroyuki Kuwahara et.al[3], they derive a mathematical description of the weighting scheme. And in our software, we use this formula to compute and generate a score of each reaction. (Equation 1)
2. Frequency of reaction in all organism The idea came from our visit to Key Synthetic Biology Laboratory of Chinese Academy of Sciences.We were inspired by Prof.Yang who conducted his own life in synthetic biology. After long conversation with him, we found that living creatures are of great wisdom. They know how to make good use of energy from nature and develop their own core metabolic system. So the more frequency a certain reaction is, the better and much more efficient it is. There is no doubt that core pathways are more frequently used in organisms. Consequently, we count the frequency of reaction in all orgainsms. During that process, the data distribution fully confirm that this ranking criteria is reasonable and reliable.Details can be found in model. 3. Toxicity of compound We use the data from Knowledgebase to assess potential toxic effects of chemical compounds on certain organism. Then these effects will be taken into account according to the given weight when we calculate the total score. Additional functions
To improve is to change, to be perfect is to change often. At the beginning of the beginning, we only developed the most ordinary search function. After communicating with some experimenters in Tongji University, we start to know their needs. All we need to do is to try our best to meet their needs. So we add those two functions, which are microbiological recommendation and multi-microbial system. 1. Microbiological recommendation
Don’t know which expression system to use? We offer microbiological recommendation function for experimenters. Based on this purpose, we develop a model to scoring each microorganism (details can be found in model section). After ranking all those score, we provide users with top five organisms to choose. At the same time, related information about organism and pathway are optional to get. 2. Atom conservation Given a chemical reaction, an atom mapping rule defines which atom of a substrate compound is transferred to which atom of a product compound [4]. This is helpful for many applications of system biology, in particular in metabolic pathway engineering. Reducing the loss of atoms from the start compound to the target compound is likely to provide good route candidates for pathway design. Here, we present users with atom conservation rate of different reactions. 3.FBA Flux balance analysis (FBA) is a mathematical method for simulating metabolism in genome-scale reconstructions of metabolic networks. It can evaluate the metabolic flux distribution, and is one of the most used modeling approaches for metabolic systems. In comparison to traditional methods of modeling, FBA is less intensive in terms of the input data required for constructing the model. Simulations performed using FBA are computationally inexpensive and can calculate steady-state metabolic fluxes for large models (over 2000 reactions) in a few seconds on modern personal computers. Users can select one from pathway search result. Since E.coli is the most frequently used host organism, we will analyze the selected pathway and construct a new model based on classic E.coli core model(from biomodel.com). After simulating this model, our software will provide quantitative predictions of cellular behavior such as metabolic flux patterns by using cobra toolbox which provides insights into the metabolic pathways [5]. 4. SMILES comparison Original thinking about this topic is derived from our visits to WuXi AppTec. Experts of WuXi AppTec proposed an idea to us. They said that sometimes their company got or designed a novel compound which did not exist in current database, and they want to find a possible way to synthesize it. So it came to our mind that what if we could compare the similarity between different compounds and select the most similar compound as a trigger to help us design new synthetic pathway, which can be very useful in small molecular drug discovery and synthesis. First of all, we convert user’s input SMILES into molecular fingerprints by using RDkit toolbox. Then we compute similarity score between input compound and compound in databases by comparing their fingerprints. At last, output is similarity score and a ranking list. The best thing is that we can search not only novel compound, but also existing compound in database. So if you get a compound with structure information and you don’t know what it is, you will find its compound ID and name by using SMILES comparison. Validation
We validate Alpha Ant’s outcome against published pathways that were engineered into different organisms. The results show that Alpha Ant can perform its intended function well. It identifies several pathways that are known to be productive. The results show as follows.
Case study 1:
Pathway for the production of flavonoids from glucose:
Flavonoids comprise a large family of secondary plant metabolic intermediates that exhibit a wide variety of antioxidant and human health-related properties. However, their wide spread use and availability are currently limited by inefficiencies in both their chemical synthesis and extraction from natural plant sources. As a result, significant strides have been made recent years in improving the microbial production of flavonoids. There are four steps of pathway that are known to be productive for the conversion of L-tyrosine to naringenin(C00509), the main flavonoid precursor. We searched for pathways from L-tyrosine to naringenin in Alpha Ant. The results show that this productive pathway has a higher ranking in our outcomes. But we didn’t find this path in the top 10 results in the Gil which is developed by Korea_U_Seoul.

Results in Gil:

Case study 2:
Artemisinin(C20309) is a sesquiterpene lactone endoperoxide extracted from Artemisia annua L with highly effective against multidrug-resistant Plasmodium spp. The semi-synthesis of artemisinin or any derivative from microbially sourced artemisinic acid, its immediate precursor, could be a cost-effective, environmentally friendly, high-quality and reliable source of artemisinin8,9. The study of Jay D. Keasling etc. designed and constructed an engineered artemisinic acid biosynthetic pathway in S. cerevisiae strain EPY224 that is productive. The biochemical pathway leading from farnesyl pyrophosphate (FPP) to artemisinic acid was introduced into S. cerevisiae from A. annua. We searched for pathways from L-tyrosine to naringenin in Alpha Ant. This published pathway is the highest-ranking pathway, which demonstrates that Alpha Ant can perform its intended function well.
This pathway couldn’t be found in Gil.
Case study 3:
Production of 1,2-PD(C02912):
The individual enantiomers (R-1,2-PD and S-1,2-PD) have potential uses as chiral synthons for the production of pharmaceuticals and novel polymers [2]; however, their use is limited due to their high cost. We applied Alpha Ant to search for pathways of biological production of 1,2-PD from glucose. The top four pathways all contain the core part of converting Glycerone phosphate to (R)-Propane-1,2-diol which is published in literature.
No pathway could be found in Gil. Top pathways in MRE do not contain the core part of converting Glycerone phosphate to (R)-Propane-1,2-diol
Case study 4:
Production of 1,3-propanediol(C02457):
The monomeric form of 1,3-propanediol (1,3-PD) has gained use in large-volume production of polyester fibers and polyurethanes in recent years. In order to develop an improved and more environmentally favorable process for 1,3-PD production, many researchers engaged to explore methods for 1,3-PD production via the microbial fermentation. Alpha Ant is able to identify the reported efficient pathway from glycerol to 1,3-PD.
Improvement
In 2015, Team:Korea_U_Seoul had developed a software called Gil, which is also a pathway finding tool. They did an excellent job in iGEM competition. For us, it’s a great honor to make some improvement on their project.
Gil has four ranking criteria which are ATP, NADPH, NADH and CO2 in different pathways for users’ various needs and they analyze thermodynamic feasibility. The idea is great. They concerned a lot about production of ATP, NADPH, NADH and CO2. The more NADPH,NADH,ATP that pathway produce, the more efficient that the pathway is.
We have to admit that the energy related compound production is important. However, it pays attention to yield instead of describing and evaluating metabolic burden in organism. So our project will do both. We use flux balance analysis to maximize yield and use three different criteria to evaluate the possibility and quality of new designed pathway.
There are more things we can do to improve the project:
1. We can optimize search algorithm;
We use depth-first search algorithm. Comparing to other search algorithm, DFS is a classic method and it just fit our need to find all possible pathways. Besides, using DFS is faster than other algorithm due to our huge network. Detailed information can be found in Model.
2. We can add more criteria and give them different weights;
We select thermodynamic feasibility, heterologous competition, compound toxicity , atom conservation and frequency of reactions in various organism as our ranking criteria. We will give a recommend weight, of course, users can give them different weight which will result in different output list.Detailed information can be found in Project.
3. We could recommend chassis cell for users;
After investigating the users’ different need, we would like to provide more convenience to them. Characteristics of different chassis cells are of great diversity. The core idea is to recommend microorganism as chassis cell and offer some related information about it. We build a model to scoring them and give them a rank. See Model
4. As for novel compound or newly designed compound, we could find a regular compound with similar structure and explore possible synthesis pathway.
We add SMILES comparison to our software as one of the additional function. For novel compound, it can function as novel pathway explorer ( recommend several most matched compound and related reaction and enzyme information ) ; for compound in database, it can function as database search. ( Because the similarity of most matched compound is 100%)
5. FBA(Flux balance analysis)
FBA is another additional function of our software. For now, we only can do FBA in E.coli. Because E.coli is the most frequently used chassis cell and we know it much more than other microorganisms.
6. One-direction search
We want to provide more possibilities to everyone concerned. You only need to input one of source compound and target compound. As long as you enter one of them, you will come up with many possible outcomes, some of which may inspire you to think about the problem with new ideas.
Reference: [1] Manish Sud .MayaChemTools: An Open Source Package for Computational Drug Discovery. Journal of Chemical Information and Modeling , 2016 , 56 (12), 2292-2297.
[2] Thomas H. Cormen, Charles E. Leiserson, Ronald L. Rivest, and Clifford Stein. Introduction to Algorithms, Second Edition. MIT Press and McGraw-Hill, 2001. ISBN 0-262-03293-7. Section 22.3: Depth-first search, pp. 540–549.
[3] Hiroyuki Kuwahara, Meshari Alazmi, Xuefeng Cui and Xin Gao. MRE: a web tool to suggest foreign enzymes for the biosynthesis pathway design with competing endogenous reactions in mind. Nucleic Acids Research, 2016, Vol. 44, Web Server issue W217–W225.
[4] Jeremiah P. Malerich, Mike Travers, and Peter D. Karp. Accurate Atom-Mapping Computation for Biochemical Reactions Mario Latendresse. Journal of Chemical Information and Modeling 2012 52 (11), 2970-2982.
[5] Schellenberger J, Que R, Fleming RMT, Thiele I, Orth JD, Feist AM, Zielinski DC, Bordbar A, Lewis NE, Rahmanian S et al., Quantitative prediction of cellular metabolism with constraint-based models: the COBRA Toolbox v2.0 Nature Protocol, 2011,6(9):1290-307.