(Prototype team page) |
Wangzhihao (Talk | contribs) |
||
(5 intermediate revisions by the same user not shown) | |||
Line 1: | Line 1: | ||
{{Jiangnan_China}} | {{Jiangnan_China}} | ||
− | <html> | + | <html lang="en"> |
+ | <body style="background-color: #ccc"> | ||
+ | <nav class="site-header py-2" style="position: fixed;width: 100%;z-index: 999"> | ||
+ | <div class="container d-flex flex-column flex-md-row justify-content-between" style="max-width: 1300px;"> | ||
+ | <a class="navbar-brand" href="#"> | ||
+ | <img src="https://static.igem.org/mediawiki/2018/8/84/T--jiangnan_china--home--icon-logo.png" width="36px" height="36px"> | ||
+ | </a> | ||
+ | <a class="nav-link py-2 d-none d-md-inline-block" href="https://2018.igem.org/Team:Jiangnan_China"><i class="fa fa-home"></i> Home</a> | ||
+ | <div class="dropdown"> | ||
+ | <a class="nav-link dropdown-toggle" href="#" id="navbarDropdown" role="button" data-toggle="dropdown" aria-haspopup="true" aria-expanded="false"> | ||
+ | <i class="fa fa-group"></i> Team | ||
+ | </a> | ||
+ | <div class="dropdown-menu" aria-labelledby="navbarDropdown"> | ||
+ | <a class="dropdown-item" href="https://2018.igem.org/Team:Jiangnan_China/TeamMembers">Team Members</a> | ||
+ | <a class="dropdown-item" href="https://2018.igem.org/Team:Jiangnan_China/Sponsors">Sponsors</a> | ||
+ | <a class="dropdown-item" href="https://2018.igem.org/Team:Jiangnan_China/Attributions">Attributions</a> | ||
+ | <a class="dropdown-item" href="https://2018.igem.org/Team:Jiangnan_China/Collaborations">Collaborations</a> | ||
+ | </div> | ||
+ | </div> | ||
+ | <div class="dropdown"> | ||
+ | <a class="nav-link dropdown-toggle" href="#" id="projectDropdown" role="button" data-toggle="dropdown" aria-haspopup="true" aria-expanded="false"> | ||
+ | <i class="fa fa-project"></i> Project | ||
+ | </a> | ||
+ | <div class="dropdown-menu" aria-labelledby="projectDropdown"> | ||
+ | <a class="dropdown-item" href="https://2018.igem.org/Team:Jiangnan_China/Description">Description</a> | ||
+ | <a class="dropdown-item" href="https://2018.igem.org/Team:Jiangnan_China/Design">Design</a> | ||
+ | <a class="dropdown-item" href="https://2018.igem.org/Team:Jiangnan_China/Experiments">Experiments</a> | ||
+ | <a class="dropdown-item" href="https://2018.igem.org/Team:Jiangnan_China/Notebook">Notebook</a> | ||
+ | <a class="dropdown-item" href="https://2018.igem.org/Team:Jiangnan_China/Protocols">Protocols</a> | ||
+ | <a class="dropdown-item" href="https://2018.igem.org/Team:Jiangnan_China/Results">Results</a> | ||
+ | <a class="dropdown-item" href="https://2018.igem.org/Team:Jiangnan_China/Demonstrate">Demonstrate</a> | ||
+ | <a class="dropdown-item" href="https://2018.igem.org/Team:Jiangnan_China/Model">Model</a> | ||
+ | </div> | ||
+ | </div> | ||
+ | <div class="dropdown"> | ||
+ | <a class="nav-link dropdown-toggle" href="#" id="projectDropdown" role="button" data-toggle="dropdown" aria-haspopup="true" aria-expanded="false"> | ||
+ | <i class="fa fa-bell"></i> Safety | ||
+ | </a> | ||
+ | <div class="dropdown-menu" aria-labelledby="projectDropdown"> | ||
+ | <a class="dropdown-item" href="https://2018.igem.org/Team:Jiangnan_China/Safety">Training</a> | ||
+ | <a class="dropdown-item" href="https://2018.igem.org/Team:Jiangnan_China/Safety#Protection">Protection</a> | ||
+ | <a class="dropdown-item" href="https://2018.igem.org/Team:Jiangnan_China/Safety#Material">Material</a> | ||
+ | </div> | ||
+ | </div> | ||
+ | <div class="dropdown"> | ||
+ | <a class="nav-link dropdown-toggle" href="#" id="partsDropdown" role="button" data-toggle="dropdown" aria-haspopup="true" aria-expanded="false"> | ||
+ | <i class="fa fa-composer"></i> Human Pactices | ||
+ | </a> | ||
+ | <div class="dropdown-menu" aria-labelledby="partsDropdown"> | ||
+ | <a class="dropdown-item" href="https://2018.igem.org/Team:Jiangnan_China/Human_Practices">Silver_Human Pactices</a> | ||
+ | <a class="dropdown-item" href="https://2018.igem.org/Team:Jiangnan_China/Human_Practices#gold">Gold_Human Pactices</a> | ||
+ | </div> | ||
+ | </div> | ||
+ | <div class="dropdown"> | ||
+ | <a class="nav-link dropdown-toggle" href="#" id="partsDropdown" role="button" data-toggle="dropdown" aria-haspopup="true" aria-expanded="false"> | ||
+ | <i class="fa fa-trophy"></i> Awards | ||
+ | </a> | ||
+ | <div class="dropdown-menu" aria-labelledby="partsDropdown"> | ||
+ | <a class="dropdown-item" href="https://2018.igem.org/Team:Jiangnan_China/Awards">Bronzes</a> | ||
+ | <a class="dropdown-item" href="https://2018.igem.org/Team:Jiangnan_China/Awards#Silver">Silver</a> | ||
+ | <a class="dropdown-item" href="https://2018.igem.org/Team:Jiangnan_China/Awards#Gold">Gold</a> | ||
+ | <a class="dropdown-item" href="https://2018.igem.org/Team:Jiangnan_China/Awards#Prizes">Prizes</a> | ||
+ | </div> | ||
+ | </div> | ||
+ | <div class="dropdown"> | ||
+ | <a class="nav-link dropdown-toggle" href="#" id="partsDropdown" role="button" data-toggle="dropdown" aria-haspopup="true" aria-expanded="false"> | ||
+ | <i class="fa fa-part"></i> Parts | ||
+ | </a> | ||
+ | <div class="dropdown-menu" aria-labelledby="partsDropdown"> | ||
+ | <a class="dropdown-item" href="https://2018.igem.org/Team:Jiangnan_China/Parts">Basic Parts</a> | ||
+ | <a class="dropdown-item" href="https://2018.igem.org/Team:Jiangnan_China/Parts#Composite">Composite parts</a> | ||
+ | </div> | ||
+ | </div> | ||
+ | <a class="nav-link py-2 d-none d-md-inline-block" href="https://2018.igem.org/Team:Jiangnan_China/InterLab"><i class="fa fa-interlab" aria-hidden="true"></i> InterLab</a> | ||
− | <div | + | </div> |
− | + | </nav> | |
− | + | ||
− | + | ||
− | + | ||
+ | <main class="content-wrap"> | ||
+ | <img src="https://static.igem.org/mediawiki/2018/f/f6/T--jiangnan_china--model--1.jpg" width="100%"> | ||
+ | <div class="dcpt3" style="font-size:20px;line-height:1.5;font-family: 'spr';"> | ||
+ | |||
+ | <br> | ||
+ | Directed evolution strategies have always been a common method for obtaining specific optimized functional strains, but it’s easy to obtain specific functional components, nor explore the functional mechanisms of components. Therefore, based on the data and experience of this project, we have established a model for screening specific functional components, which can also provide a reference for the preliminary exploration of functional mechanisms. The goal of this project is to build a <span class="font-italic">Lactococcus lactis</span> strain with acid and freeze resistance, in which the acid-resistant components are obtained due to this model. Below I will introduce our model with our acid-resistant component <span class="font-italic">msmK</span> as an example: | ||
+ | </div> | ||
− | <div class=" | + | <div id="dcpt4" style="font-size:20px;line-height:1.5;font-family: 'spr';"> |
+ | <br> | ||
+ | Before model, you should have a strain with corresponding capability or function. In our project, we got the anti-acid strain <span class="font-italic">L.lactis</span> by the following steps: | ||
+ | <br> | ||
+ | <div align="center"> | ||
+ | <img src="https://static.igem.org/mediawiki/2018/9/9d/T--jiangnan_china--model--2.png" width="70%" > | ||
+ | </div> | ||
+ | <br> | ||
+ | We firstly mutant our parent strain <span class="font-italic">L.lactis</span> NZ9000, and 3 key mutant strains were screened from 35,000 mutant strains under high throughput screening, namely <span class="font-italic">L.lactis</span> WH101, WH102, and WH103. | ||
+ | <br> | ||
+ | And then we did acid stress analysis on these three mutant strains. | ||
+ | <br> | ||
+ | <br> | ||
+ | <div align="center"> | ||
+ | <img src="https://static.igem.org/mediawiki/2018/a/a0/T--jiangnan_china--model--3.png" width="70%" > | ||
+ | </div> | ||
+ | <strong >Figure 1</strong> The survival rate of 4 strains, <span class="font-italic">L.lactis</span> NZ9000, <span class="font-italic">L.lactis</span> WH101, <span class="font-italic">L.lactis</span> WH102, <span class="font-italic">L.lactis</span> WH103. On the left it’s colony distribution of parent strain and acid-tolerant strains under a pH of 4.0 stress gradient of 10-3, and on the right it’s the survival rate of parent strain and acid tolerant strain (pH 4.0). | ||
+ | <br> | ||
+ | <br> | ||
+ | <div align="center"> | ||
+ | <img src="https://static.igem.org/mediawiki/2018/0/09/T--jiangnan_china--model--4.png" width="100%" > | ||
+ | </div> | ||
+ | <strong >Figure 2</strong> The growth curve of 4 strains, <span class="font-italic">L.lactis</span> NZ9000, <span class="font-italic">L.lactis</span> WH101, <span class="font-italic">L.lactis</span> WH102, <span class="font-italic">L.lactis</span> WH103. A: pH 7.0, B: pH 5.0, C: pH 4.5. | ||
+ | <br><br> | ||
+ | From figure 1 and figure 2, we screened out <span class="font-italic">L.lactis</span> WH101, which has remarkably 16000-fold higher survival rate than the parent strain at pH4.0 for 5h, which is the highest among the reported survival rates at the same condition. | ||
+ | <br> | ||
+ | Then we start our model to find key anti-acid component. | ||
+ | </div> | ||
− | |||
− | |||
− | <p> | + | <div class="dcpt3" style="font-size:20px;line-height:1.5;font-family: 'spr';margin-top: -1px"> |
+ | <div align="left" style="font-family: 'spr';font-size:40px;border-bottom:2px solid #584b4f;"><strong>1. Deferential gene expression pattern cluster analysis</strong></div> | ||
+ | <br> | ||
+ | <div align="center"> | ||
+ | <img src="https://static.igem.org/mediawiki/2018/c/cf/T--jiangnan_china--model--5.png" width="40%" > | ||
+ | </div> | ||
+ | <div align="center"> | ||
+ | <img src="https://static.igem.org/mediawiki/2018/1/14/T--jiangnan_china--model--6.png" width="40%" > | ||
+ | </div> | ||
+ | <strong >Figure 3</strong> Heat map. (1) <span class="font-italic">L. lactis</span> WH101 at pH 4.0 compared with that at pH 7.0; (2) <span class="font-italic">L. lactis</span> WH101 compared with <span class="font-italic">L. lactis</span> NZ9000 at pH 7.0; (3) <span class="font-italic">L. lactis</span> NZ9000 at pH 4.0 compared with that at pH 7.0; (4) <span class="font-italic">L. lactis</span> WH101 compared with <span class="font-italic">L. lactis</span> NZ9000 at pH 4.0. | ||
+ | <br> | ||
+ | <br> | ||
+ | <strong>Click here for a cleaner heat map:</strong><p style="text-align: center;"><a href="https://static.igem.org/mediawiki/2018/6/68/T--jiangnan_china--model--heatmap.pdf" class="btn btn-info">heat map PDF</a></p> | ||
+ | <br> | ||
+ | We conduct a heat map analysis of the gene expression of mutant strain <span class="font-italic">L.lactis</span> WH101 and parent strain <span class="font-italic">L.lactis</span> NZ9000 in four cases. The heat map shows the degree of up-down-regulation of each gene, and 266 deferentially expressed genes are selected in the four cases according to the P value we set. Further analysis shows that there are 61 common deferential genes (Figure 3). We preliminarily concluded that the anti-acid mechanism of mutant <span class="font-italic">L.lactis</span> WH101 is related to these 61 common differential genes. | ||
+ | </div> | ||
− | </div> | + | <div id="dcpt4" style="font-size:20px;line-height:1.5;font-family: 'spr';"> |
− | <div class=" | + | <div align="left" style="font-family: 'spr';font-size:40px;border-bottom:2px solid #584b4f;"><strong>2. PCA analysis (Principle Component Analysis)</strong></div> |
+ | <br><br> | ||
+ | We then perform PCA on the data of the 61 common deferential genes.<br> | ||
+ | Statistically, PCA is one of the most widely used data compression algorithms. In PCA, data is converted from the original coordinate system to the new coordinate system, which is determined by the data itself. When converting the coordinate system, the direction with the largest variance is taken as the direction of the coordinate axis, because the maximum variance of the data gives the most important information of the data. The first new axis selects the method with the largest variance in the original data, and the second new axis selects the direction orthogonal to the first new coordinate axis and the second largest variance. This process is repeated and the number of repetitions is the feature dimension of the original data.<br> | ||
+ | The specific code that converts the data into feature spaces that retain only the first N principal components is as follows:<br> | ||
+ | <br> | ||
+ | <font color="blue"> | ||
+ | from numpy import *<br><br> | ||
+ | def loadDataSet(filename,delim='\t')<br> | ||
+ | fr=open(filename)<br> | ||
+ | stringArr=[line,strip().split(delim) for line in fr.readlines()]<br> | ||
+ | datArr=[map(float.line)for line in stringArr]<br> | ||
+ | return mat(datArr)<br><br> | ||
+ | def pca(dataMat,topNfeat=4096):<br> | ||
+ | meanVals=mean(dataMat,axis=0)<br> | ||
+ | meanRemoved=dataMat-meanVals<br> | ||
+ | covMat=cov(meanRemoved,rowvar=0)<br> | ||
+ | eigVals,eigVects=linalg.eig(mat(conMat))<br> | ||
+ | eigValInd=argsort(eigVals)<br> | ||
+ | eigValInd=eigValInd[:-(topNfeat+1):-1]<br> | ||
+ | redEigVects=eigVects[:,eigValInd]<br> | ||
+ | lowDDataMat=meanRemoved*redEigVects<br> | ||
+ | reconMat=(lowDDataMat*redEigVects.T)+meanVals<br> | ||
+ | return lowDDataMat,reconMat<br> | ||
+ | </font> | ||
+ | <br> | ||
+ | Click here for more details about PCA: <a href="https://en.wikipedia.org/wiki/Principal_component_analysis">https://en.wikipedia.org/wiki/Principal_component_analysis</a> | ||
+ | <br> | ||
+ | <br> | ||
+ | <div align="center"> | ||
+ | <img src="https://static.igem.org/mediawiki/2018/4/49/T--jiangnan_china--model--model7.png" width="80%" > | ||
+ | </div> | ||
+ | <strong >Figure 4</strong> The proportion of each data in the dimension under the two principal components. The data here are only taken from the top 10 of 61 common differential genes. | ||
+ | <br><br> | ||
+ | PCA sorted the data after analysis, and the top 10 are showed here because 61 data are too much. The first five genes with the proportion greater than 0.2 are experimentally verified. The experimental results show that LLNZ_RS02280 (<span class="font-italic">msmK</span>) has the best anti-acid capability. <font color="#5B9BD5">This is almost identical to the direct experimental validation from 61 common differential genes.</font><br> | ||
+ | With these two principal components, we improved the accuracy of the result to 0.98. Although LLNA_RS02280 (<span class="font-italic">msmK</span>) is not the first in the model, it is in the top five, which shows that the model still has great reference. | ||
+ | </div> | ||
− | <div class=" | + | <div class="dcpt3" style="font-size:20px;line-height:1.5;font-family: 'spr';margin-top: -1px"> |
− | < | + | <div align="left" style="font-family: 'spr';font-size:40px;border-bottom:2px solid #584b4f;"><strong>3. GO analysis and Pathway analysis</strong></div> |
− | < | + | <br> |
− | + | <div align="center"> | |
− | <br><br> | + | <img src="https://static.igem.org/mediawiki/2018/4/4b/T--jiangnan_china--model--model8.png" width="80%" > |
− | + | </div> | |
− | </ | + | <div style="text-align:center;"><strong >Figure 5</strong> GO analysis.</div> |
+ | <br><br> | ||
+ | <div align="center"> | ||
+ | <img src="https://static.igem.org/mediawiki/2018/8/8f/T--jiangnan_china--model--model9.png" width="80%" > | ||
+ | </div> | ||
+ | <div style="text-align:center;"><strong >Figure 6</strong> Pathway analysis.</div> | ||
+ | <br><br> | ||
+ | Now that the acid-resistant component is obtained, it is not enough for scientific research. We also need to explore the mechanism of acid resistance. This model attempts to analyze the mechanisms through GO analysis and pathway analysis. Here we analyze the data of 266 deferentially expressed genes. GO analysis shows that deferentially expressed genes are mainly involved in catalytic activity, binding activity in molecular function and metabolic process, cellular process in biological process. And pathway analysis shows that deferentially expressed genes are mainly involved in amino acid biosynthesis, metabolic pathways, fatty acid metabolism and carbon metabolism. | ||
+ | <br><br> | ||
+ | This is a rough direction judgment of the mechanism and helps us to further analyze. In the analysis of the acid resistance mechanism, it may not be more substantial than the information in a large number of documents, but it is still a good reference. When analyzing a new component with fewer references, the two analyses will make a big difference. | ||
+ | <br><br> | ||
+ | In general, the establishment of our model uses a statistical approach, in which the PCA dimensionality reduction algorithm is the point. The results of the model are basically consistent with the LLNZ-RS02280 obtained from the experimental results, and the accuracy is very high. Therefore, the model can be used to identify the key genes of the target mutant and to explore its possible mechanism. However, there is a small amount of data loss in the process of PCA dimension reduction, so the accuracy is not 100%. The results also need to be verified by experiments with several genes with higher ratios. | ||
+ | <br><br> | ||
+ | </div> | ||
+ | <div id="dcpt4" style="font-size:20px;line-height:1.5;font-family: 'spr';"> | ||
+ | <div align="left" style="font-family: 'spr';font-size:40px;border-bottom:2px solid #584b4f;"><strong>Reference :</strong></div> | ||
+ | <br> | ||
+ | <div> | ||
+ | 1. da Silva Sauthier, Maria Celeste; da Silva, Erik Galvao Paranhos; da Silva Santos, Bruna Rosa; Silva, Emmanuelle Ferreira Requiao; da Cruz Caldas, Jamile; Cavalcante Minho, Lucas Almir; Dos Santos, Ana Maria Pinto; Dos Santos, Walter Nei Lopes. Screening of Mangifera indica L. functional content using PCA and neural networks (ANN). Food chemistry. 10.1016/j.foodchem.2018.01.129 | ||
+ | <br> | ||
+ | 2.Silva, Emanuela Dos Santos; da Silva, Erik Galvao Paranhos; Silva, Danielen Dos Santos; Novaes, Cleber Galvao; Amorim, Fabio Alan Carqueija; Dos Santos, Marcio Jose Silva; Bezerra, Marcos Almeida. Evaluation of macro and micronutrient elements content from soft drinks using principal component analysis and Kohonen self-organizing maps. Food chemistry. 10.1016/j.foodchem.2018.06.021 | ||
+ | </div> | ||
+ | <br> | ||
+ | </div> | ||
− | + | ||
− | + | ||
− | + | ||
− | + | ||
− | + | ||
− | + | ||
− | + | ||
− | |||
− | |||
− | |||
− | |||
− | |||
− | |||
+ | </main> | ||
− | < | + | <footer class="py-4"> |
− | <div class=" | + | <div class="container" style="text-align: center;"> |
− | < | + | <a href="http://www.jiangnan.edu.cn/" title="江南大学"><img src="https://static.igem.org/mediawiki/2018/1/1d/T--jiangnan_china--jiangnanlogo.png"></a> |
− | + | <a href="http://biotech.jiangnan.edu.cn/" title="江南大学生物工程学院"><img src="https://static.igem.org/mediawiki/2018/a/a6/T--jiangnan_china--shenggonglogo.png"></a> | |
− | + | <p>Copyright © jiangnan_China 2018</p> | |
− | + | </div> | |
− | < | + | </footer> |
− | + | <a id="gotop" class="animated infinite bounce slow" style="position: fixed; right: 85px; bottom: 148px;display: none"> | |
− | + | <img src="https://static.igem.org/mediawiki/2018/f/f8/T--jiangnan_china--logo.png" width="64px" height="64px"> | |
− | + | </a> | |
− | < | + | <!-- Optional JavaScript --> |
− | </ | + | <!-- jQuery first, then Popper.js, then Bootstrap JS --> |
− | < | + | |
− | < | + | |
+ | |||
+ | <script type="text/javascript"> | ||
+ | //页面加载后触发 | ||
+ | window.onload = function(){ | ||
+ | var btn = document.getElementById('gotop'); | ||
+ | var timer = null; | ||
+ | var isTop = true; | ||
+ | //获取页面可视区高度 | ||
+ | var clientHeight = document.documentElement.clientHeight - 500; | ||
+ | |||
+ | //滚动条滚动时触发 | ||
+ | window.onscroll = function() { | ||
+ | //显示回到顶部按钮 | ||
+ | var osTop = document.documentElement.scrollTop || document.body.scrollTop; | ||
+ | if (osTop >= clientHeight) { | ||
+ | btn.style.display = "block"; | ||
+ | } else { | ||
+ | btn.style.display = "none"; | ||
+ | }; | ||
+ | //回到顶部过程中用户滚动滚动条,停止定时器 | ||
+ | if (!isTop) { | ||
+ | clearInterval(timer); | ||
+ | }; | ||
+ | isTop = false; | ||
+ | }; | ||
+ | |||
+ | btn.onclick = function() { | ||
+ | //设置定时器 | ||
+ | timer = setInterval(function(){ | ||
+ | //获取滚动条距离顶部高度 | ||
+ | var osTop = document.documentElement.scrollTop || document.body.scrollTop; | ||
+ | var ispeed = Math.floor(-osTop / 3); | ||
+ | document.documentElement.scrollTop = document.body.scrollTop = osTop+ispeed; | ||
+ | //到达顶部,清除定时器 | ||
+ | if (osTop == 0) { | ||
+ | clearInterval(timer); | ||
+ | }; | ||
+ | isTop = true; | ||
+ | },30); | ||
+ | }; | ||
+ | }; | ||
+ | </script> | ||
+ | </body> | ||
</html> | </html> |
Latest revision as of 11:47, 16 October 2018
Directed evolution strategies have always been a common method for obtaining specific optimized functional strains, but it’s easy to obtain specific functional components, nor explore the functional mechanisms of components. Therefore, based on the data and experience of this project, we have established a model for screening specific functional components, which can also provide a reference for the preliminary exploration of functional mechanisms. The goal of this project is to build a Lactococcus lactis strain with acid and freeze resistance, in which the acid-resistant components are obtained due to this model. Below I will introduce our model with our acid-resistant component msmK as an example:
Before model, you should have a strain with corresponding capability or function. In our project, we got the anti-acid strain L.lactis by the following steps:
We firstly mutant our parent strain L.lactis NZ9000, and 3 key mutant strains were screened from 35,000 mutant strains under high throughput screening, namely L.lactis WH101, WH102, and WH103.
And then we did acid stress analysis on these three mutant strains.
From figure 1 and figure 2, we screened out L.lactis WH101, which has remarkably 16000-fold higher survival rate than the parent strain at pH4.0 for 5h, which is the highest among the reported survival rates at the same condition.
Then we start our model to find key anti-acid component.
1. Deferential gene expression pattern cluster analysis
Click here for a cleaner heat map:
We conduct a heat map analysis of the gene expression of mutant strain L.lactis WH101 and parent strain L.lactis NZ9000 in four cases. The heat map shows the degree of up-down-regulation of each gene, and 266 deferentially expressed genes are selected in the four cases according to the P value we set. Further analysis shows that there are 61 common deferential genes (Figure 3). We preliminarily concluded that the anti-acid mechanism of mutant L.lactis WH101 is related to these 61 common differential genes.
2. PCA analysis (Principle Component Analysis)
We then perform PCA on the data of the 61 common deferential genes.
Statistically, PCA is one of the most widely used data compression algorithms. In PCA, data is converted from the original coordinate system to the new coordinate system, which is determined by the data itself. When converting the coordinate system, the direction with the largest variance is taken as the direction of the coordinate axis, because the maximum variance of the data gives the most important information of the data. The first new axis selects the method with the largest variance in the original data, and the second new axis selects the direction orthogonal to the first new coordinate axis and the second largest variance. This process is repeated and the number of repetitions is the feature dimension of the original data.
The specific code that converts the data into feature spaces that retain only the first N principal components is as follows:
from numpy import *
def loadDataSet(filename,delim='\t')
fr=open(filename)
stringArr=[line,strip().split(delim) for line in fr.readlines()]
datArr=[map(float.line)for line in stringArr]
return mat(datArr)
def pca(dataMat,topNfeat=4096):
meanVals=mean(dataMat,axis=0)
meanRemoved=dataMat-meanVals
covMat=cov(meanRemoved,rowvar=0)
eigVals,eigVects=linalg.eig(mat(conMat))
eigValInd=argsort(eigVals)
eigValInd=eigValInd[:-(topNfeat+1):-1]
redEigVects=eigVects[:,eigValInd]
lowDDataMat=meanRemoved*redEigVects
reconMat=(lowDDataMat*redEigVects.T)+meanVals
return lowDDataMat,reconMat
Click here for more details about PCA: https://en.wikipedia.org/wiki/Principal_component_analysis
PCA sorted the data after analysis, and the top 10 are showed here because 61 data are too much. The first five genes with the proportion greater than 0.2 are experimentally verified. The experimental results show that LLNZ_RS02280 (msmK) has the best anti-acid capability. This is almost identical to the direct experimental validation from 61 common differential genes.
With these two principal components, we improved the accuracy of the result to 0.98. Although LLNA_RS02280 (msmK) is not the first in the model, it is in the top five, which shows that the model still has great reference.
3. GO analysis and Pathway analysis
Figure 5 GO analysis.
Figure 6 Pathway analysis.
Now that the acid-resistant component is obtained, it is not enough for scientific research. We also need to explore the mechanism of acid resistance. This model attempts to analyze the mechanisms through GO analysis and pathway analysis. Here we analyze the data of 266 deferentially expressed genes. GO analysis shows that deferentially expressed genes are mainly involved in catalytic activity, binding activity in molecular function and metabolic process, cellular process in biological process. And pathway analysis shows that deferentially expressed genes are mainly involved in amino acid biosynthesis, metabolic pathways, fatty acid metabolism and carbon metabolism.
This is a rough direction judgment of the mechanism and helps us to further analyze. In the analysis of the acid resistance mechanism, it may not be more substantial than the information in a large number of documents, but it is still a good reference. When analyzing a new component with fewer references, the two analyses will make a big difference.
In general, the establishment of our model uses a statistical approach, in which the PCA dimensionality reduction algorithm is the point. The results of the model are basically consistent with the LLNZ-RS02280 obtained from the experimental results, and the accuracy is very high. Therefore, the model can be used to identify the key genes of the target mutant and to explore its possible mechanism. However, there is a small amount of data loss in the process of PCA dimension reduction, so the accuracy is not 100%. The results also need to be verified by experiments with several genes with higher ratios.
Reference :
1. da Silva Sauthier, Maria Celeste; da Silva, Erik Galvao Paranhos; da Silva Santos, Bruna Rosa; Silva, Emmanuelle Ferreira Requiao; da Cruz Caldas, Jamile; Cavalcante Minho, Lucas Almir; Dos Santos, Ana Maria Pinto; Dos Santos, Walter Nei Lopes. Screening of Mangifera indica L. functional content using PCA and neural networks (ANN). Food chemistry. 10.1016/j.foodchem.2018.01.129
2.Silva, Emanuela Dos Santos; da Silva, Erik Galvao Paranhos; Silva, Danielen Dos Santos; Novaes, Cleber Galvao; Amorim, Fabio Alan Carqueija; Dos Santos, Marcio Jose Silva; Bezerra, Marcos Almeida. Evaluation of macro and micronutrient elements content from soft drinks using principal component analysis and Kohonen self-organizing maps. Food chemistry. 10.1016/j.foodchem.2018.06.021
2.Silva, Emanuela Dos Santos; da Silva, Erik Galvao Paranhos; Silva, Danielen Dos Santos; Novaes, Cleber Galvao; Amorim, Fabio Alan Carqueija; Dos Santos, Marcio Jose Silva; Bezerra, Marcos Almeida. Evaluation of macro and micronutrient elements content from soft drinks using principal component analysis and Kohonen self-organizing maps. Food chemistry. 10.1016/j.foodchem.2018.06.021