Team:Jiangnan China/Model

Directed evolution strategies have always been a common method for obtaining specific optimized functional strains, but it’s easy to obtain specific functional components, nor explore the functional mechanisms of components. Therefore, based on the data and experience of this project, we have established a model for screening specific functional components, which can also provide a reference for the preliminary exploration of functional mechanisms. The goal of this project is to build a Lactococcus lactis strain with acid and freeze resistance, in which the acid-resistant components are obtained due to this model. Below I will introduce our model with our acid-resistant component msmK as an example:

Before model, you should have a strain with corresponding capability or function. In our project, we got the anti-acid strain L.lactis by the following steps:

We firstly mutant our parent strain L.lactis NZ9000, and 3 key mutant strains were screened from 35,000 mutant strains under high throughput screening, namely L.lactis WH101, WH102, and WH103.
And then we did acid stress analysis on these three mutant strains.

Figure 1 The survival rate of 4 strains, L.lactis NZ9000, L.lactis WH101, L.lactis WH102, L.lactis WH103. On the left it’s colony distribution of parent strain and acid-tolerant strains under a pH of 4.0 stress gradient of 10-3, and on the right it’s the survival rate of parent strain and acid tolerant strain (pH 4.0).

    Figure 2 The growth curve of 4 strains, L.lactis NZ9000, L.lactis WH101, L.lactis WH102, L.lactis WH103. A: pH 7.0, B: pH 5.0, C: pH 4.5.

    From figure 1 and figure 2, we screened out L.lactis WH101, which has remarkably 16000-fold higher survival rate than the parent strain at pH4.0 for 5h, which is the highest among the reported survival rates at the same condition.
    Then we start our model to find key anti-acid component.

1. Deferential gene expression pattern cluster analysis

Figure 3 Heat map. (1) L. lactis WH101 at pH 4.0 compared with that at pH 7.0; (2) L. lactis WH101 compared with L. lactis NZ9000 at pH 7.0; (3) L. lactis NZ9000 at pH 4.0 compared with that at pH 7.0; (4) L. lactis WH101 compared with L. lactis NZ9000 at pH 4.0.

Click here for a cleaner heat map:

heat map PDF

We conduct a heat map analysis of the gene expression of mutant strain L.lactis WH101 and parent strain L.lactis NZ9000 in four cases. The heat map shows the degree of up-down-regulation of each gene, and 266 deferentially expressed genes are selected in the four cases according to the P value we set. Further analysis shows that there are 61 common deferential genes (Figure 3). We preliminarily concluded that the anti-acid mechanism of mutant L.lactis WH101 is related to these 61 common differential genes.

2. PCA analysis (Principle Component Analysis)

    We then perform PCA on the data of the 61 common deferential genes.
    Statistically, PCA is one of the most widely used data compression algorithms. In PCA, data is converted from the original coordinate system to the new coordinate system, which is determined by the data itself. When converting the coordinate system, the direction with the largest variance is taken as the direction of the coordinate axis, because the maximum variance of the data gives the most important information of the data. The first new axis selects the method with the largest variance in the original data, and the second new axis selects the direction orthogonal to the first new coordinate axis and the second largest variance. This process is repeated and the number of repetitions is the feature dimension of the original data.
    The specific code that converts the data into feature spaces that retain only the first N principal components is as follows:

from numpy import *

def loadDataSet(filename,delim='\t')
        fr=open(filename)
        stringArr=[line,strip().split(delim) for line in fr.readlines()]
        datArr=[map(float.line)for line in stringArr]
        return mat(datArr)

def pca(dataMat,topNfeat=4096):
        meanVals=mean(dataMat,axis=0)
        meanRemoved=dataMat-meanVals
        covMat=cov(meanRemoved,rowvar=0)
        eigVals,eigVects=linalg.eig(mat(conMat))
        eigValInd=argsort(eigVals)
        eigValInd=eigValInd[:-(topNfeat+1):-1]
        redEigVects=eigVects[:,eigValInd]
        lowDDataMat=meanRemoved*redEigVects
        reconMat=(lowDDataMat*redEigVects.T)+meanVals
        return lowDDataMat,reconMat

    Click here for more details about PCA:    https://en.wikipedia.org/wiki/Principal_component_analysis

    Figure 4 The proportion of each data in the dimension under the two principal components. The data here are only taken from the top 10 of 61 common differential genes.

    PCA sorted the data after analysis, and the top 10 are showed here because 61 data are too much. The first five genes with the proportion greater than 0.2 are experimentally verified. The experimental results show that LLNZ_RS02280 (msmK) has the best anti-acid capability. This is almost identical to the direct experimental validation from 61 common differential genes.
    With these two principal components, we improved the accuracy of the result to 0.98. Although LLNA_RS02280 (msmK) is not the first in the model, it is in the top five, which shows that the model still has great reference.

3. GO analysis and Pathway analysis

Figure 5 GO analysis.

Figure 6 Pathway analysis.

    Now that the acid-resistant component is obtained, it is not enough for scientific research. We also need to explore the mechanism of acid resistance. This model attempts to analyze the mechanisms through GO analysis and pathway analysis. Here we analyze the data of 266 deferentially expressed genes. GO analysis shows that deferentially expressed genes are mainly involved in catalytic activity, binding activity in molecular function and metabolic process, cellular process in biological process. And pathway analysis shows that deferentially expressed genes are mainly involved in amino acid biosynthesis, metabolic pathways, fatty acid metabolism and carbon metabolism.

    This is a rough direction judgment of the mechanism and helps us to further analyze. In the analysis of the acid resistance mechanism, it may not be more substantial than the information in a large number of documents, but it is still a good reference. When analyzing a new component with fewer references, the two analyses will make a big difference.

    In general, the establishment of our model uses a statistical approach, in which the PCA dimensionality reduction algorithm is the point. The results of the model are basically consistent with the LLNZ-RS02280 obtained from the experimental results, and the accuracy is very high. Therefore, the model can be used to identify the key genes of the target mutant and to explore its possible mechanism. However, there is a small amount of data loss in the process of PCA dimension reduction, so the accuracy is not 100%. The results also need to be verified by experiments with several genes with higher ratios.

Reference :

1. da Silva Sauthier, Maria Celeste; da Silva, Erik Galvao Paranhos; da Silva Santos, Bruna Rosa; Silva, Emmanuelle Ferreira Requiao; da Cruz Caldas, Jamile; Cavalcante Minho, Lucas Almir; Dos Santos, Ana Maria Pinto; Dos Santos, Walter Nei Lopes. Screening of Mangifera indica L. functional content using PCA and neural networks (ANN). Food chemistry. 10.1016/j.foodchem.2018.01.129
2.Silva, Emanuela Dos Santos; da Silva, Erik Galvao Paranhos; Silva, Danielen Dos Santos; Novaes, Cleber Galvao; Amorim, Fabio Alan Carqueija; Dos Santos, Marcio Jose Silva; Bezerra, Marcos Almeida. Evaluation of macro and micronutrient elements content from soft drinks using principal component analysis and Kohonen self-organizing maps. Food chemistry. 10.1016/j.foodchem.2018.06.021