Difference between revisions of "Team:Jiangnan/Model"

Line 1: Line 1:
{{Jiangnan}}
+
 
 
<html>
 
<html>
 +
<link rel="stylesheet" href="https://2018.igem.org/Team:Jiangnan/CSSmaterializecss?action=raw&ctype=text/css">
 +
<style type="text/css">
 +
        #home_logo, #sideMenu { display:none; }
 +
        #HQ_page p{text-align:inherit;font-size:inherit;}
 +
html{width:100%;height:100%;background:white; list-style:none;}
 +
        #top_menu_under{height:auto;}
 +
#globalWrapper,#HQ_page,#bodyContent,#mw-content-text{height:100%;}
 +
#sideMenu, #top_title, .patrollink  {display:none;}
 +
#content { margin-left:0px;margin-top:-5px; padding:0px; width:100%;height:100%;}
 +
body {background-color:white;height:100%;}
 +
#bodyContent h1, #bodyContent h2, #bodyContent h3, #bodyContent h4, #bodyContent h5 { margin-bottom: 0px; }
 +
        ul, li{list-style:none;}
 +
        #description p{text-align:center;}
 +
        .mw-content-ltr ul{margin:0px}
 +
h4 {
 +
font-family: Arial,sans-serif;
 +
font-weight: 100;
 +
text-align: center;
 +
}
  
 +
p{
 +
font-size: 1.2em;
 +
}
  
 +
.JTrow{
 +
margin-top: 5%;
 +
margin-bottom:5%;
 +
margin-left: 10%;
 +
margin-right: 10%;
 +
text-align: center;
 +
}
  
 +
.JTp{
 +
    display: block;
 +
    margin-block-start: 0.2em;
 +
    margin-block-end: 0.2em;
 +
    margin-inline-start: 0px;
 +
    margin-inline-end: 0px;
 +
}
 +
</style>
 +
        <style type="text/css">
 +
.Jnav{position: fixed;top: 17px;opacity: 1;background-color: rgba(255, 255, 255, 0.2); width: 100%;z-index:999;}
 +
.Jnav a{text-decoration: none!important;color:#039be5;}
 +
.Jnavtitle{float: right;width: 10%;text-align: center;padding: 1em 0;}
 +
.Jnavdrag{position: relative;width: 100%;}
 +
.Jnavdrag>ul{position: absolute;top: 0;width: 100%;border-radius: 5px;background-color: white;transition: all .4s ease-in-out;opacity: 0;}
 +
.JredL{color: #f50057;}
 +
.Jgreen{color: #004d40;}
 +
body{
 +
margin: 0;
 +
}
 +
</style>
 +
        <script type="text/javascript">
 +
function Jnavshow(obj){
 +
var ul = obj.getElementsByTagName("ul");
 +
ul = ul[0];
 +
ul.style["opacity"] = 1;
 +
}
 +
function Jnavhide(obj){
 +
var ul = obj.getElementsByTagName("ul");
 +
ul = ul[0];
 +
ul.style["opacity"] = 0;
 +
}
 +
</script>
 +
<div class="Jnav">
 +
<div class="Jnavtitle">
 +
<a href="https://2018.igem.org/Team:Jiangnan/Safety">Safety</a>
 +
</div>
 +
<div class="Jnavtitle">
 +
<a href="https://2018.igem.org/Team:Jiangnan/Hardware">Hardware</a>
 +
</div>
 +
<div class="Jnavtitle" onmouseover="Jnavshow(this)" onmouseleave="Jnavhide(this)">
 +
<a href="https://2018.igem.org/Team:Jiangnan/Team">Team</a>
 +
<div class="Jnavdrag">
 +
<ul>
 +
<li><a href="https://2018.igem.org/Team:Jiangnan/Team">Team Members</a></li>
 +
<li class="divider"></li>
 +
<li><a href="https://2018.igem.org/Team:Jiangnan/Attributions">Attribution</a></li>
 +
<li class="divider"></li>
 +
<li><a href="https://2018.igem.org/Team:Jiangnan/Collaborations">Collaboration</a></li>
 +
</ul>
 +
</div>
 +
</div>
 +
<div class="Jnavtitle" onmouseover="Jnavshow(this)" onmouseleave="Jnavhide(this)">
 +
<a href="https://2018.igem.org/Team:Jiangnan/Human_Practices">Human Practice</a>
 +
<div class="Jnavdrag">
 +
<ul>
 +
<li><a href="https://2018.igem.org/Team:Jiangnan/Human_Practices">Overview</a></li>
 +
<li class="divider"></li>
 +
<li><a href="https://2018.igem.org/Team:Jiangnan/Silver">Silver</a></li>
 +
<li class="divider"></li>
 +
<li><a href="https://2018.igem.org/Team:Jiangnan/Entrepreneurship">Gold</a></li>
 +
<li class="divider"></li>
 +
<li><a href="https://2018.igem.org/Team:Jiangnan/Public_Engagement">Pulic Engagement</a></li>
 +
<li class="divider"></li>
 +
<li><a href="https://2018.igem.org/Team:Jiangnan/Entrepreneurship">Entrepreneurship</a></li>
 +
</ul>
 +
</div>
 +
</div>
 +
<div class="Jnavtitle">
 +
<a href="https://2018.igem.org/Team:Jiangnan/Model">Model</a>
 +
</div>
 +
<div class="Jnavtitle" onmouseover="Jnavshow(this)" onmouseleave="Jnavhide(this)">
 +
<a href="https://2018.igem.org/Team:Jiangnan/Notebook">Notebook</a>
 +
<div class="Jnavdrag">
 +
<ul>
 +
<li><a href="https://2018.igem.org/Team:Jiangnan/Notebook">Lab Book</a></li>
 +
<li class="divider"></li>
 +
<li><a href="https://2018.igem.org/Team:Jiangnan/Protocol">Protocol</a></li>
 +
</ul>
 +
</div>
 +
</div>
 +
<div class="Jnavtitle" onmouseover="Jnavshow(this)" onmouseleave="Jnavhide(this)">
 +
<a href="https://2018.igem.org/Team:Jiangnan">Project</a>
 +
<div class="Jnavdrag">
 +
<ul>
 +
<li><a href="https://2018.igem.org/Team:Jiangnan/Background">Background</a></li>
 +
<li class="divider"></li>
 +
<li><a href="https://2018.igem.org/Team:Jiangnan/Design">Design</a></li>
 +
<li class="divider"></li>
 +
<li><a href="https://2018.igem.org/Team:Jiangnan/Demonstrate">Demonstration</a></li>
 +
<li class="divider"></li>
 +
<li><a href="https://2018.igem.org/Team:Jiangnan/Results">Result</a></li>
 +
<li class="divider"></li>
 +
<li><a href="https://2018.igem.org/Team:Jiangnan/Parts">Part</a></li>
 +
</ul>
 +
</div>
 +
</div>
 +
<div class="navlogo" style="float: left;width: 20%;text-align: center;">
 +
<a href="https://2018.igem.org/Team:Jiangnan"><img src="https://static.igem.org/mediawiki/2018/d/d7/T--Jiangnan--igemJN_logo.png" style="width: 3em;"></a>
 +
</div>
 +
</div>
  
  
  
<div class="clear"></div>
 
  
  
<div class="column full_size">
+
<div style="width:100%;background-color: #f0ebea">
<h1> Modeling</h1>
+
<div style="position: relative;">
 
+
<img src="https://static.igem.org/mediawiki/2018/c/c4/T--Jiangnan--model_top.png" width="100%">
<p>Mathematical models and computer simulations provide a great way to describe the function and operation of BioBrick Parts and Devices. Synthetic Biology is an engineering discipline, and part of engineering is simulation and modeling to determine the behavior of your design before you build it. Designing and simulating can be iterated many times in a computer before moving to the lab. This award is for teams who build a model of their system and use it to inform system design or simulate expected behavior in conjunction with experiments in the wetlab.</p>
+
<div style="position: absolute;left: 2em; bottom: 4em;">
 
+
<h4 style="color: white;">Determination of<br>
 +
Plasma device parameter</h4>
 +
</div>
 +
</div>
 +
<div class="row">
 +
<div class="col s10 offset-s1">
 +
<p><b>An</b> orthogonal L18[3]7 test was designed to explore the effect of different parameter combinations of plasma-activated medium (PAM). Eighteen trials encompassing 7 factors (i.e., [T]treatment time, [A]the well size, [F]helium flow rate, [C]number of cells, [U]output voltage, [D1]distance from the tail of the plasma jet to the surface of the medium, [D2]thickness of medium) and 3 levels were conducted (Table 1 & 2). The frequency was fixed at 8.8KHz.</p>
 +
</div>
 +
</div>
 +
<div class="row">
 +
<div class="col s10 offset-s1">
 +
<img src="https://static.igem.org/mediawiki/2018/c/cd/T--Jiangnan--model_table1.png" width="100%">
 +
</div>
 +
<div class="col s6 offset-s1" style="margin-top: 70px;">
 +
<img src="https://static.igem.org/mediawiki/2018/2/23/T--Jiangnan--model_table2.png" width="100%">
 +
</div>
 +
</div>
 +
<div class="row">
 +
<div class="col s10 offset-s1">
 +
<h5 class="JredL">Linear model construction</h5>
 +
<p>Thre linear was constructed using R as equation(1):</p>
 +
<img src="https://static.igem.org/mediawiki/2018/6/67/T--Jiangnan--model_formula1.png">
 +
<p>The dependent variable Y is the measurement of virus amplification after Plasma-treated through orthogonal design, and the other 7 factors are independent variables in this equation.<br>
 +
The full model encompassing all these parameters was constructed by multivariate linear regression. The stepwise removal of each parameter was conducted followed by model feasibility assessment to identify independent parameters without collinearity. </p>
 +
<br>
 +
<h5 class="JredL">Optimal parameter configuration of PAM identified for triple-negative breast cancer cells</h5>
 +
<p>T9 was selected as the optimal experimental configuration, which corresponds to ‘treatment time’ (‘T’) of 3 min, ‘liquid surface area’ (‘A’) of 4.5 cm<sup>2</sup>, ‘thickness of medium’ (‘D2’) of 0.2 cm, ‘number of cells’ (‘C’) of 1.5×105 cells/mL, ‘output voltage’ (‘U’) of 1.1 kV, ‘distance from the tail of the plasma jet to the surface of the medium’ (‘D1’) of 1 cm and ‘helium flow rate’ (‘F’) of 1.5 SLM, respectively.</p>
 +
<br>
 +
<h5 class="JredL">Linear model assessment<br>Outlier test </h5>
 +
<p>Outliers were detected according to the student T test of studentized residuals from the outlier test and the Cook's distance from the influence analysis<br>
 +
<span class="JredL">T test</span><br>
 +
The Bonferroni corrected p value from the T-test of studentized residuals of the built TN linear model was used to identify the outliers of the trials. The studentized residuals <sup>[3]</sup> were computed using Equation (2).
 +
</p>
 +
<img src="https://static.igem.org/mediawiki/2018/5/59/T--Jiangnan--model_formula2.png">
 +
<p>where 'SRESI D<sub>i</sub>', 'e<sub>i</sub>', 'S<sub>yx</sub>', 'n' and 'X<sub>i</sub>'' each represents studentized residuals, residual, standard error, sample size and i<sup>th</sup> the variable, respectively.<br>
 +
<span class="JredL">Influence analysis  </span><br>
 +
Cook's D <sup>[3]</sup> was used to identify trials with strong influence on the results, as defined in Equation (3).
 +
</p>
 +
<img src="https://static.igem.org/mediawiki/2018/0/07/T--Jiangnan--model_formula3.png">
 +
<br>
 +
<img src="https://static.igem.org/mediawiki/2018/1/1d/T--Jiangnan--model_formula4.png">
 +
<p>where 'h<sub>i</sub>', ‘D<sub>i</sub>’, ‘n’, ‘k’ each represents the leverage, Cook's D, sample size and the number of variables in the model. If Cook's D is greater than 4 / (n-k-1), it will be recognized as a strong influential trial with statistic significance. </p>
 +
<br>
 +
<h5 class="JredL">Normality test</h5>
 +
<p>To determine whether the trials are well-modeled by a normal distribution, the quantile-quantile plot (QQ-plot) <sup>[4]</sup> of the standardized data against normal distribution was drawn. The trials were considered to follow the normal distribution if they fell close to the 45 degree line in the plot.</p>
 +
<p>According to the normal QQ plot, the plot comparing residuals and fitted values, the trials fall close to the 45 degree line representing the correlation between standardized residuals and theoretical quantiles, the square root of the standardized residuals are almost randomly distributed across all fitted values. Thus, the optimal TN model satisfies the normality and homoscedasticity assumption of the fitted linear model.</p>
 +
<br>
 +
<h5 class="JredL">Multicollinearity </h5>
 +
<p>To test whether there is a linear association between the variables in the model and all variables are independent, the variance inflation factor,<sup>[3]</sup> denoted as variance inflation factor (VIF) and defined using Equation (5), was used to test the multicollinearity of the model. </p>
 +
<img src="https://static.igem.org/mediawiki/2018/9/94/T--Jiangnan--model_formula5.png">
 +
<p>where 's<sup>2</sup>'' represents variance.</p>
 +
<p>Multicollinearity, defined as the situation where one variable can be linearly predicted from the others with substantial degree of accuracy, was assessed using VIF that increases with collinearity. VIF is defined as VIF=1 / (1-s2), where s2 refers to the variance. It is canonically considered to have the multiple collinearity problems if VIF > 4. <sup>[5]</sup> The VIF values of ‘T’ ‘A’ ‘D2’ ‘C’ are all close to 1 (i.e., 1.000559, 1.000559, 1.000000 and 1.000000, respectively), suggesting that there is no multiple co-linearity between the four deterministic parameters</p>
 +
<p>The linear model encompassing the four deterministic parameters is: </p>
 +
<img src="https://static.igem.org/mediawiki/2018/d/d7/T--Jiangnan--model_formula6.png">
 +
<br>
 +
<br>
 +
<br>
 +
</div>
 +
</div>
 +
<div class="row">
 +
<div class="col s10 offset-s1">
 +
<h4 class="Jgreen" style="text-align: right;">High titer</h4>
 +
<br>
 +
<br>
 +
<p>We collected a panel of genes responsible for virus multiplication through text mining, retrieved other associated gene by computing correlations using public datasets from the GEO database, and constructed the corresponding network using the fast heuristic algorithm and label propagation algorithm with GENEMANIA. </p>
 +
<p>In particular, the heuristic algorithm was used for calculating a single composite functional association network from multiple data sources based on linear regression, and a label propagation algorithm was used to predict gene functionalities given the composite functional association network.</p>
 +
<br>
 +
<h5 class="Jgreen">Fast heuristic algorithm</h5>
 +
<p>Each network data source is represented as a weighted interaction network where each pair of genes is assigned an association weight. The weight is either zero (indicating no interaction) or a positive value (reflecting the strength of interaction). The association of a pair of genes in a gene expression dataset can be assigned as the Pearson correlation coefficient of their expression levels across multiple conditions in an experiment. The more likely that the genes are co-expressed, the higher the weight is, which ranges from -1 to 1.<br>
 +
Both binary and continuous values can be used for building functional association networks. In the case of binary data, all zeros are replaced with log (1 - β ) and ones replaced with -log(β), where β is the proportion of samples with the given feature being 1. This allows for the emphasis of similarities between genes that share 'uncommon' features. <br>
 +
Similarity matrices were constructed for both types of data using the Pearson correlation coefficient to measure pair-wise similarities. </p>
 +
<br>
 +
<h5 class="Jgreen">Label propagation algorithm </h5>
 +
<p>A variation of the Gaussian field label propagation algorithm was used here to predict the composite network. <br>
 +
Label propagation algorithm, like most functionality prediction algorithms, assigns a score to each node in the network, called the 'discriminant value'. This score reflects the computed degree of association that the node has to the seed list defining the given function. This value can be thresholded to enable predictions of a given gene function. <br>
 +
WeA positive weight reflecting its usefulness in predicting a given function of interest.is assigned to each functional association network derived from these data sources. with a positive weight reflecting its usefulness in predicting a given function of interest. Once these weights were calculated, the weighted average of the association networks was constructed into a function-specific association network.<br>
 +
Denote the vector of discriminant values by f, the bias vector by y, and the matrix derived from the association network by W. We can represent an association network over n genes by a symmetric matrix W whose non-zero entries indicate the associations in the network. In particular,(i,j)<sup>th</sup> the element of W, W<sub>ij</sub>, is the association between genes i and j,W<sub>ij</sub> with = 0 indicating no edge between genes i and j. To ensure that all associations are non-negative, any negative associations can be set to zero. There is l labeled genes and u unlabeled genes (n = l + u) for each binary classification task. These labels are used to specify a bias vector y, where y  {+1, k, -1}, to represent that gene i is positive, unlabeled, or negative, respectively. In the label propagation algorithm:</p>
 +
<img src="https://static.igem.org/mediawiki/2018/6/66/T--Jiangnan--model_highformula1.png">
 +
<p>where n+ and n- are the numbers of positive and negative genes, respectively. The discriminant values were computed by solving the following objective function:</p>
 +
<img src="https://static.igem.org/mediawiki/2018/9/99/T--Jiangnan--model_highformula2.png">
 +
<p>which ensures that the discriminant values of positive and negative genes remain close to their label bias (first term in the summation) and the discriminant values of the associated genes (genes that have positive W<sub>ij</sub>) are not too different from each other (second term in the summation). Equation 1 can be written in matrix notation as:</p>
 +
<img src="https://static.igem.org/mediawiki/2018/b/b6/T--Jiangnan--model_highformula3.png">
 +
<p>where L = D - W is called the graph Laplacian matrix and D = diag(d<sub>i</sub>) (D is a diagonal matrix with  = D<sub>ij</sub> and D<sub>ij</sub> = 0 if i <img src="https://static.igem.org/mediawiki/2018/0/04/T--Jiangnan--model_alpha3.png" height="20"> j) and di =<img src="https://static.igem.org/mediawiki/2018/b/b9/T--Jiangnan--model_alpha4.png" height="20">. Since the association matrix is symmetric, L is symmetric and semi-definite positive, and equation 2 is a quadratic optimization problem with a global minimum. In fact, solutions to equation 2 can be obtained by solving a sparse linear system y = (I - L)f.</p>
 +
</div>
 +
<div class="col s12" style="text-align: center;">
 +
<img src="https://static.igem.org/mediawiki/2018/8/85/T--Jiangnan--model_high.png" width="50%">
 +
</div>
 +
</div>
 +
<div class="row">
 +
<div class="col s10 offset-s1">
 +
<h4 style="color: #ffab00">Suspension</h4>
 +
<br>
 +
<br>
 +
<p>Frist, we sequenced the transcriptome of suspension and adherent cell lines of BHK-21 and CHO-K1, and then aligned their reads to the reference transcripts. The reference of BHK-21 is MesAur1.0 () and that of CHO-K1 is CriGri_1.0 (ftp://ftp.ncbi.nlm.nih.gov/genomes/all/GCF/000/223/135/GCF_000223135.1_CriGri_1.0/GCF_000223135.1_CriGri_1.0_rna.fna.gz).</p>
 +
<p>Second, through transcriptome analysis, we eliminated the low-quality genes using R and quantified the expression using FPKM (fragments Per kb per Million), and genes with FPKM &lt 0.5 were removed. Also, a threshold, FDR (false discovery rate) ≤ 0.05 and FC (fold change) ≥ 1, was defined to further filter the remaining genes to find DEGs (different expressed genes), i.e. BHK_DEGs (4916) and CHO_DEGs (3597). </p>
 +
<img src="https://static.igem.org/mediawiki/2018/a/ad/T--Jiangnan--model_susformula1.png">
 +
<p>We analysed genes differentially expressed between suspended and adherent cells to explore genes potentially responsible for the suspension feature of cells. We eventually obtained two sets of DEGs, one from BHK and one from CHO, namely BHK_sus_muts and CHO_sus_muts. </p>
 +
<p>Further, we obtained the sus_muts gene set by analyzing their SNPs and mutations where genes with potential causal genetic changes were reserved. By intersecting each of the two DEGs set with sus_muts set respectively, we get BHK_keys and CHO_keys. Through taking the intersection of these two sets, we obtained 27 genes. </p>
 +
<p>After checking and reserving genes with consistent regulatory directions in suspended vs. adherent cells, we obtained 18 genes as our target in the end. The network was constructed using the same algorithm as above as Figure XX↓, where the top gene was selected for genetic modulation in the experiments.</p>
 +
<img src="">
 +
</div>
 +
<div class="col s12" style="text-align: center;">
 +
<img src="https://static.igem.org/mediawiki/2018/8/80/T--Jiangnan--model_susp.png" width="50%">
 +
</div>
 
</div>
 
</div>
<div class="clear"></div>
 
 
<div class="column full_size">
 
<h3> Gold Medal Criterion #3</h3>
 
<p>
 
Convince the judges that your project's design and/or implementation is based on insight you have gained from modeling. This could be either a new model you develop or the implementation of a model from a previous team. You must thoroughly document your model's contribution to your project on your team's wiki, including assumptions, relevant data, model results, and a clear explanation of your model that anyone can understand.
 
<br><br>
 
The model should impact your project design in a meaningful way. Modeling may include, but is not limited to, deterministic, exploratory, molecular dynamic, and stochastic models. Teams may also explore the physical modeling of a single component within a system or utilize mathematical modeling for predicting function of a more complex device.
 
</p>
 
 
<p>
 
Please see the <a href="https://2018.igem.org/Judging/Medals"> 2018
 
Medals Page</a> for more information.
 
</p>
 
 
</div>
 
</div>
  
<div class="column two_thirds_size">
 
<h3>Best Model Special Prize</h3>
 
  
<p>
 
To compete for the <a href="https://2018.igem.org/Judging/Awards">Best Model prize</a>, please describe your work on this page  and also fill out the description on the <a href="https://2018.igem.org/Judging/Judging_Form">judging form</a>. Please note you can compete for both the gold medal criterion #3 and the best model prize with this page.
 
<br><br>
 
You must also delete the message box on the top of this page to be eligible for the Best Model Prize.
 
</p>
 
  
</div>
 
  
  
<div class="column third_size">
 
<div class="highlight decoration_A_full">
 
<h3> Inspiration </h3>
 
<p>
 
Here are a few examples from previous teams:
 
</p>
 
<ul>
 
<li><a href="https://2016.igem.org/Team:Manchester/Model">2016 Manchester</a></li>
 
<li><a href="https://2016.igem.org/Team:TU_Delft/Model">2016 TU Delft</li>
 
<li><a href="https://2014.igem.org/Team:ETH_Zurich/modeling/overview">2014 ETH Zurich</a></li>
 
<li><a href="https://2014.igem.org/Team:Waterloo/Math_Book">2014 Waterloo</a></li>
 
</ul>
 
</div>
 
</div>
 
  
 
</html>
 
</html>

Revision as of 03:38, 17 October 2018

Determination of
Plasma device parameter

An orthogonal L18[3]7 test was designed to explore the effect of different parameter combinations of plasma-activated medium (PAM). Eighteen trials encompassing 7 factors (i.e., [T]treatment time, [A]the well size, [F]helium flow rate, [C]number of cells, [U]output voltage, [D1]distance from the tail of the plasma jet to the surface of the medium, [D2]thickness of medium) and 3 levels were conducted (Table 1 & 2). The frequency was fixed at 8.8KHz.

Linear model construction

Thre linear was constructed using R as equation(1):

The dependent variable Y is the measurement of virus amplification after Plasma-treated through orthogonal design, and the other 7 factors are independent variables in this equation.
The full model encompassing all these parameters was constructed by multivariate linear regression. The stepwise removal of each parameter was conducted followed by model feasibility assessment to identify independent parameters without collinearity.


Optimal parameter configuration of PAM identified for triple-negative breast cancer cells

T9 was selected as the optimal experimental configuration, which corresponds to ‘treatment time’ (‘T’) of 3 min, ‘liquid surface area’ (‘A’) of 4.5 cm2, ‘thickness of medium’ (‘D2’) of 0.2 cm, ‘number of cells’ (‘C’) of 1.5×105 cells/mL, ‘output voltage’ (‘U’) of 1.1 kV, ‘distance from the tail of the plasma jet to the surface of the medium’ (‘D1’) of 1 cm and ‘helium flow rate’ (‘F’) of 1.5 SLM, respectively.


Linear model assessment
Outlier test

Outliers were detected according to the student T test of studentized residuals from the outlier test and the Cook's distance from the influence analysis
T test
The Bonferroni corrected p value from the T-test of studentized residuals of the built TN linear model was used to identify the outliers of the trials. The studentized residuals [3] were computed using Equation (2).

where 'SRESI Di', 'ei', 'Syx', 'n' and 'Xi'' each represents studentized residuals, residual, standard error, sample size and ith the variable, respectively.
Influence analysis
Cook's D [3] was used to identify trials with strong influence on the results, as defined in Equation (3).


where 'hi', ‘Di’, ‘n’, ‘k’ each represents the leverage, Cook's D, sample size and the number of variables in the model. If Cook's D is greater than 4 / (n-k-1), it will be recognized as a strong influential trial with statistic significance.


Normality test

To determine whether the trials are well-modeled by a normal distribution, the quantile-quantile plot (QQ-plot) [4] of the standardized data against normal distribution was drawn. The trials were considered to follow the normal distribution if they fell close to the 45 degree line in the plot.

According to the normal QQ plot, the plot comparing residuals and fitted values, the trials fall close to the 45 degree line representing the correlation between standardized residuals and theoretical quantiles, the square root of the standardized residuals are almost randomly distributed across all fitted values. Thus, the optimal TN model satisfies the normality and homoscedasticity assumption of the fitted linear model.


Multicollinearity

To test whether there is a linear association between the variables in the model and all variables are independent, the variance inflation factor,[3] denoted as variance inflation factor (VIF) and defined using Equation (5), was used to test the multicollinearity of the model.

where 's2'' represents variance.

Multicollinearity, defined as the situation where one variable can be linearly predicted from the others with substantial degree of accuracy, was assessed using VIF that increases with collinearity. VIF is defined as VIF=1 / (1-s2), where s2 refers to the variance. It is canonically considered to have the multiple collinearity problems if VIF > 4. [5] The VIF values of ‘T’ ‘A’ ‘D2’ ‘C’ are all close to 1 (i.e., 1.000559, 1.000559, 1.000000 and 1.000000, respectively), suggesting that there is no multiple co-linearity between the four deterministic parameters

The linear model encompassing the four deterministic parameters is:




High titer



We collected a panel of genes responsible for virus multiplication through text mining, retrieved other associated gene by computing correlations using public datasets from the GEO database, and constructed the corresponding network using the fast heuristic algorithm and label propagation algorithm with GENEMANIA.

In particular, the heuristic algorithm was used for calculating a single composite functional association network from multiple data sources based on linear regression, and a label propagation algorithm was used to predict gene functionalities given the composite functional association network.


Fast heuristic algorithm

Each network data source is represented as a weighted interaction network where each pair of genes is assigned an association weight. The weight is either zero (indicating no interaction) or a positive value (reflecting the strength of interaction). The association of a pair of genes in a gene expression dataset can be assigned as the Pearson correlation coefficient of their expression levels across multiple conditions in an experiment. The more likely that the genes are co-expressed, the higher the weight is, which ranges from -1 to 1.
Both binary and continuous values can be used for building functional association networks. In the case of binary data, all zeros are replaced with log (1 - β ) and ones replaced with -log(β), where β is the proportion of samples with the given feature being 1. This allows for the emphasis of similarities between genes that share 'uncommon' features.
Similarity matrices were constructed for both types of data using the Pearson correlation coefficient to measure pair-wise similarities.


Label propagation algorithm

A variation of the Gaussian field label propagation algorithm was used here to predict the composite network.
Label propagation algorithm, like most functionality prediction algorithms, assigns a score to each node in the network, called the 'discriminant value'. This score reflects the computed degree of association that the node has to the seed list defining the given function. This value can be thresholded to enable predictions of a given gene function.
WeA positive weight reflecting its usefulness in predicting a given function of interest.is assigned to each functional association network derived from these data sources. with a positive weight reflecting its usefulness in predicting a given function of interest. Once these weights were calculated, the weighted average of the association networks was constructed into a function-specific association network.
Denote the vector of discriminant values by f, the bias vector by y, and the matrix derived from the association network by W. We can represent an association network over n genes by a symmetric matrix W whose non-zero entries indicate the associations in the network. In particular,(i,j)th the element of W, Wij, is the association between genes i and j,Wij with = 0 indicating no edge between genes i and j. To ensure that all associations are non-negative, any negative associations can be set to zero. There is l labeled genes and u unlabeled genes (n = l + u) for each binary classification task. These labels are used to specify a bias vector y, where y {+1, k, -1}, to represent that gene i is positive, unlabeled, or negative, respectively. In the label propagation algorithm:

where n+ and n- are the numbers of positive and negative genes, respectively. The discriminant values were computed by solving the following objective function:

which ensures that the discriminant values of positive and negative genes remain close to their label bias (first term in the summation) and the discriminant values of the associated genes (genes that have positive Wij) are not too different from each other (second term in the summation). Equation 1 can be written in matrix notation as:

where L = D - W is called the graph Laplacian matrix and D = diag(di) (D is a diagonal matrix with = Dij and Dij = 0 if i j) and di =. Since the association matrix is symmetric, L is symmetric and semi-definite positive, and equation 2 is a quadratic optimization problem with a global minimum. In fact, solutions to equation 2 can be obtained by solving a sparse linear system y = (I - L)f.

Suspension



Frist, we sequenced the transcriptome of suspension and adherent cell lines of BHK-21 and CHO-K1, and then aligned their reads to the reference transcripts. The reference of BHK-21 is MesAur1.0 () and that of CHO-K1 is CriGri_1.0 (ftp://ftp.ncbi.nlm.nih.gov/genomes/all/GCF/000/223/135/GCF_000223135.1_CriGri_1.0/GCF_000223135.1_CriGri_1.0_rna.fna.gz).

Second, through transcriptome analysis, we eliminated the low-quality genes using R and quantified the expression using FPKM (fragments Per kb per Million), and genes with FPKM &lt 0.5 were removed. Also, a threshold, FDR (false discovery rate) ≤ 0.05 and FC (fold change) ≥ 1, was defined to further filter the remaining genes to find DEGs (different expressed genes), i.e. BHK_DEGs (4916) and CHO_DEGs (3597).

We analysed genes differentially expressed between suspended and adherent cells to explore genes potentially responsible for the suspension feature of cells. We eventually obtained two sets of DEGs, one from BHK and one from CHO, namely BHK_sus_muts and CHO_sus_muts.

Further, we obtained the sus_muts gene set by analyzing their SNPs and mutations where genes with potential causal genetic changes were reserved. By intersecting each of the two DEGs set with sus_muts set respectively, we get BHK_keys and CHO_keys. Through taking the intersection of these two sets, we obtained 27 genes.

After checking and reserving genes with consistent regulatory directions in suspended vs. adherent cells, we obtained 18 genes as our target in the end. The network was constructed using the same algorithm as above as Figure XX↓, where the top gene was selected for genetic modulation in the experiments.