Difference between revisions of "Template:Groningen/Flux Based Analysis"

Line 29: Line 29:
 
   width: 300px;
 
   width: 300px;
 
   height: 300px;
 
   height: 300px;
  float: right;
 
 
}
 
}
  

Revision as of 11:30, 16 October 2018

Flux Based Analysis

Introduction

The increase in availability of genome scale databases of microorganisms has given rise to the possibility of creating genome scale reaction networks that include corresponding metabolites and relevant genes[1]. One method to study reaction fluxes in such a wide network is called flux based analysis or FBA. As part of our project we used FBA on a genome scale model of S. Cerevisiae in order to achieve the following two goals:

Goal 1: study the relation between cell growth and styrene production
Goal 2: visualize interventions that could lead to a higher styrene production

An overview of the different methods and software packages that were used to achieve these goals can be seen in figure 1.

Overview

Theory

Let us consider a small network consisting of the five reactions (r1 to r5), with mock metabolites A, B ,C and D.

The relative amount of a particular metabolite taken part in a reaction is represented by its stoichiometric coefficient, so for instance metabolite B has a stoichiometric coefficient of 2 in reaction r2. The whole network of reactions that represents the metabolism of can be represented in a stoichiometric matrix S of m rows and r columns, where m are the amount of metabolites and r the amount of reactions in the network. The matrix will then contain on the i,jth entry the stoichiometric coefficient of metabolite i in reaction j. The stoichiometric coefficient will be negative when the metabolite is a substrate (on the left side of the reaction) and positive when a metabolite is a product (on the right side of the reaction). So our example network can thus be represented in a matrix of S of 3 rows and 5 columns as can be seen in figure 3. To illustrate let’s consider position 1,1 in the matrix, since the first column represents reaction r1 and the first row the metabolite A, we put in the stoichiometric coefficient of A in r1 which is 1.

S=

r1r2r3r4r5
A1-3
B2-2-1
C1-1
D1

The flux or reaction rate through each reaction can be represented in a vector v of length r, where r are the amount of reactions in the network and every position in the vector represents the flux through a reaction. If the network is at steady state the following holds:

Where S is the stoichiometric matrix and v is the reaction rate vector. Since what we want to know is the flux through each reaction, given a stoichiometric matrix S that represents the whole metabolic network of our cell we have to solve the above equation for v. There usually is no single v that satisfy the above equation, but rather a whole solution space. In order to narrow down this solution space, we maximize a biologically plausible objective such as biomass growth and select only those flux values that together can optimize this objective.

The model

The model we used to represent the whole metabolic network of our cell is based on YeastGem7.6:[2] “the current consensus genome-scale metabolic model of Saccharomyces cerevisiae” This model contains three main types of information: 2241 metabolites, 3520 reactions and 1019 genes. The relation between the metabolites and reactions is represented by a stoichiometric matrix, as described above. The interactions between reaction and genes are described by Boolean relationships, meaning that genes can turn reactions on or off. For the first goal we used a version of this model that is further constrained based on the GECKO method: ecYeast7[3].

The GECKO method enhances a model by adding enzymes as part of its reactions in order to ensure that fluxes are limited based on the abundance of the enzymes abundance and their turnover numbers. Since the maximum flux through an enzyme catalysed reaction is equal to the turnover number times its abundance the following holds (for enzymes which catalyse only a single reaction)

Where v is the flux through a reaction Kcat is the enzyme’s turnover number and [E] its concentration. More concretely this means that we have to add the enzymes as metabolites to our model and for every reaction include the enzyme as a substrate with 1/Kcat as its stoichiometric coefficient. (The situation will be slightly more complex for enzymes that calculate multiple reactions, or reaction that can be catalysed by multiple enzymes).

Now we still to include the enzyme abundance in our model. Since we don’t have the proteomic data to limit each enzyme directly based on its abundance we make use of the single pool assumption. We include a pseudo metabolite “protein pool” representing the total amount of enzymes in the cell and then for each enzyme a reaction in which it is drawn from the protein pool with its molecular weight as a coefficient. Using this method the following reactions were added to the model:

TCA: trans-cinnamic acid, PAL2: Phenylalanine ammonia-lyase 2, FDC1: Ferulic acid decarboxylase 1, ppool: pseudo metabolite representing the total amount of enzymes

Table 1. Enzymes

EnzymeKcat (/s)Weight(kDa)uniprotGene
Ferulic acid decarboxylase 14.6456.164Q03034FDC1 (YDR539W)
Phenylalanine ammonia-lyase 23.2577.86P45724PAL2 (YNL294C)

Goal 1

The first thing we wanted to look at using our model was the relation between styrene production and cell growth. This is a crucial thing to know, because if producing styrene completely stops cell growth we will have to consider using an inducible system; we would have to design our yeast cell so that it first grows and then activate some switch that activates styrene production and stops biomass growth. On the other hand, if styrene production doesn’t hinder cell growth too much we can just have our cells produce styrene production while they are acquiring biomass.

In order to do flux based analyses we made use of the COBRA toolbox, a package for MATLAB that implements the COnstraint- Based Reconstruction and Analysis (COBRA) method[6] The COBRA toolbox contains various tools usable for modelling and analysing genome-scale networks.
First we want to see the maximum growth rate and the maximum output flux of styrene. We therefore run a flux based analysis on the model optimizing for biomass growth and another analysis optimizing for styrene production:

Table 2. FBA

Input flux of glucose (goal = biomass)1
Max biomass growth0.097
Input flux of glucose (goal = styrene)0.359
Max output flux of styrene0.00213

This shows that without interventions we can produce about one mole of styrene per 167 moles of glucose meaning that 0.35% of the mass of glucose can be turned into styrene. This means that we if we would grow the yeast cells in a medium of 20 g glucose/L, and all glucose is taken up by the cell, the final concentration of styrene will reach at its highest a concentration of 70 mg/L. This is well within toxicity levels[7] (Mckenna et al. found that growth was only minimally disrupted with styrene levels of 200 mg/L). However if we increase styrene yield with over expressions/knockouts we will have to look at minimising toxicity, for instance with a system that continuously takes styrene out of the medium.

Before we will look at increasing the yield we will look at how styrene production influences cell growth. In order to do this we will run Flux Based Analysis, optimizing for biomass while iteratively constraining the lower bound of the styrene sink reaction from 0 flux to its max flux of 0.00213. The result of this is plotted In figure 3.

As can be seen in the figure the cells can still grow almost completely optimally when the flux to styrene is at 65% of the max theoretical flux. This we can use as an argument to not create an inducible system, but rather let the cells produce styrene while they are growing, as styrene production up to a reasonable level barely hinders cell growth. Furthermore it shows that we must slightly careful with the expression levels of our enzymes as if there is too much flux going to producing styrene cell growth is hindered.

OptForce

Next we will look at how we can increase styrene production of our cells. For this we use an algorithm called OptForce[8], which has been employed successfully to find interventions leading to overproduction of a target metabolite. OptForce compares the flux patterns observed in an initial wildtype strain and a mutant strain which overproduces the chemical at the target yield. In order to compare the two strains it makes use of Flux Variance Analysis (FVA). As noted before the steady state equation yields a solution space, meaning that there is a whole range of possible fluxes for which the system is at steady state. With FVA the reactions in the network are iteratively minimized and maximised subject to the constraints. This yield a lower and upper bound for every reaction: a flux range for which the reaction can still satisfy all constraints.

The key of OptForce is identifying so called MUST sets. The flux ranges for every reaction is first calculated for both the wildtype strain which is constrained to grow at a certain level and the mutant strain which is constrained to produce a target chemical at a certain level. Next it will compare the flux ranges of both strains for every reaction. If the flux ranges has some overlap (figure 4a), the target production is possible without having to intervene in this reaction. However if the flux range for a reaction of the wildtype strain are completely to the left (Figure 4b) or to the right (Figure 4c) of the mutant strain the we have directly or indirectly change the flux of this reaction in order to be able to achieve the overproduction target. Reaction fluxes that must increase are referred to as MUSTU sets while reaction fluxes that must decrease are referred to as MUSTL sets. Furthermore we also look at sums of two reaction fluxes and compare the resulting flux range between the wildtype and mutant type. Now non-overlapping flux ranges imply that one of the two reaction flux must increase or decrease. These second order sets are referred to as respectively MUSTUU and MUSTLL sets.

Figure 4. MUSTsets (wildtype flux range for a reaction in green, mutant-type flux range in orange)

Finally after identifying the MUST sets the OptForce algorithm looks at how these can be imparted on the wildtype strain with the minimum amount of interventions, or in other words it finds the minimum amount of upregulations or knockouts which will make the wildtype produce the target chemical at the same rate as the mutant strain. The set of interventions is referred to as the FORCE set.

Goal 2

First OptForce was run using the yeastgem7.6 as model. As it is suggested to exclude reactions in a linear pathway leading up to the target chemical we will use try to find interventions leading to an increased phenylalanine production. The wildtype had its growth constrained to 0.07 and the mutant type had phenylalanine production constrained to 0.5, both close to their maximum values. In order to solve the optimization problem IBM ILOG CPLEX Optimizer (v 12.8.0) was used.

Next we wanted to visualize the found MUST sets; the candidates for downregulation/knockouts or upregulations. In order to do this we made use of Paint4Net (included in the Cobra Toolbox). Paint4Net is able to automatically generate a graph layout of (part of) the metabolism of a COBRA model, including data such as flux rate and direction, and the possibility to show detailed information about reactions and metabolites.[9]

As the model consists of over 3500 reactions visualizing the whole map at once will be rather messy, so instead we choose to visualize the cells metabolism per subsystem. Here we will show the map of: Phenylalanine, tyrosine and tryptophan biosynthesis (KEGG pathway sce00400) and Phenylalanine metabolism (KEGG pathway sce00360). This part seems most relevant to our analysis and allows us to compare interventions found by the model with well-established interventions from literature. (Do note that our method allows one to visualize any subsystem)

Once the Paint4net map was created the MUST sets had to be visualised. For this a simple MATLAB script was written that goes over all nodes and edges in the network and makes them red if they correspond do a reaction in a MUSTL or MUSTLL set (are a candidate for downregulation/knockout) , green if they belong to a MUSTU or MUSTUU set (are a candidate for upregulation) and blue if they belong to no MUST set (should be leaved unchanged). Furthermore to make a clearer visualization the script changes the labels of the reaction nodes to display the corresponding genes and the metabolite of interest is made orange. The algorithm (visualization.m) is available on our Github. The result of applying this to the map of the subsystem described above can be seen in figure 5.

Next we wanted to verify our model by comparing the computed interventions with literature. McKenna et al. showed that Styrene yield can be increased by deletion of ARO10.[7] The reaction catalysed by ARO10 (phenylpyruvate decarboxylase) is indeed marked in the graph as a possible candidate for downregulation (included in a MUSTL set) . Looking at the graph we notice that our model shows the genes ARO1, ARO2 and ARO7, responsible for the shikimate pathway in S. Cerevisiae as possibly candidates for upregulation. This is in accordance with earlier research showing that Styrene production in Escherichia Coli can be increased by overexpressing the two upstream shikimate pathway genes, aroF and pheA. [10]

To summarize we first have shown that cell growth and styrene production up to a certain level are not mutually exclusive ,using an genome scale model with extra enzyme based constraints based on the GECKO method, leading us to design cells that will do both simultaneously. Furthermore we used to OptForce algorithm to find candidate interventions for increased styrene yield. Finally we wrote a script that can visualize these interventions on a graph drawn by the Paint4Net method, allowing us to gain more insight in how we can increase styrene production in our cells.

References

[1] Smallboe K, Simeonidis E. Flux balance analysis: A geometric perspective. 2009. doi:10.1016/j.jtbi.2009.01.027

[2] Aung HW, Henry SA, Walker LP. Revising the Representation of Fatty Acid, Glycerolipid, and Glycerophospholipid Metabolism in the Consensus Model of Yeast Metabolism. Ind Biotechnol (New Rochelle N Y). 2013;9(4):215-228. doi:10.1089/ind.2013.0013

[3] Sánchez BJ, Zhang C, Nilsson A, Kerkhoven EJ, Nielsen J, Lahtvee P. Improving the phenotype predictions of a yeast genome-scale metabolic model by incorporating enzymatic constraints. 2017:1-16. doi:10.15252/msb.20167411

[4] Bateman A, Martin MJ, O’Donovan C, et al. UniProt: the universal protein knowledgebase. Nucleic Acids Res. 2017;45(D1):D158-D169. doi:10.1093/nar/gkw1099

[5] Cochrane FC, Davin LB, Lewis NG. The Arabidopsis phenylalanine ammonia lyase gene family: kinetic characterization of the four PAL isoforms. Phytochemistry. 2004;65(11):1557-1564. doi:10.1016/j.phytochem.2004.05.006

[6] Heirendt L, Arreckx S, Pfau T, et al. Creation and analysis of biochemical constraint-based models: the COBRA Toolbox v3.0. October 2017. http://arxiv.org/abs/1710.04038. Accessed October 6, 2018.

[7] Mckenna R, Thompson B, Pugh S, Nielsen DR. Rational and Combinatorial Approaches to Engineering Styrene Production by Saccharomyces Cerevisiae.; 2011. doi:10.1186/2046-1682-4-13

[8] Ranganathan S, Suthers PF, Maranas CD. OptForce: An Optimization Procedure for Identifying All Genetic Manipulations Leading to Targeted Overproductions. Price ND, ed. PLoS Comput Biol. 2010;6(4):e1000744. doi:10.1371/journal.pcbi.1000744

[9] Kostromins A, Stalidzans E. Paint4Net: COBRA Toolbox extension for visualization of stoichiometric models of metabolism. Biosystems. 2012;109(2):233-239. doi:10.1016/J.BIOSYSTEMS.2012.03.002

[10] Liu C, Men X, Chen H, et al. A systematic optimization of styrene biosynthesis in Escherichia coli BL21(DE3). Biotechnol Biofuels. 2018;11. doi:10.1186/s13068-018-1017-z