Team:NCTU Formosa/Dry Lab/Microbiota Prediciton

Navigation Bar Microbiota Prediction

     Artificial intelligence and machine learning allow us to realize the seemingly insurmountable goal of predicting the fluctuations of entire microbiotas due to the specific effects of bio-stimulators. While traditional ecologists may find it too difficult to consider every unique microbial relationship in an ecosystem, machine learning programs use numerical analysis to not only quickly determine these associations but also use them to predict overall population shifts caused by stimuli. For our modelling purposes we choose Weka, a software with strong classification capabilities, to establish accurate connections between every genera of bacteria in our soil.

Considering General Factors

     To begin modelling the relationship between bio-stimulators and microbiota, we first determine the most important factors that affect bacterial growth in soil. Three conditions immediately came to mind: temperature, pH and salinity, whose effects are modelled through the respective equations below:

$$Ratkowsky Equation:
R_{temp}(T)=a\cdot[(T-T_{min})\cdot(1-e^{(b\cdot(T-T_{max}))})]^2$$
$$Cardinal pH Equation:
R_{pH}(pH)=\frac{c\cdot(pH-pH_{min})\cdot(pH-pH_{max})}{d\cdot((pH-pH_{min})\cdot(pH-pH_{max})-e\cdot(pH-pH_{opt})^2)}$$
$$Salinity Equation:
R_{sal}(sal)=(f\cdot sal^2)+(g\cdot sal)+h$$

     These factors heavily influence bacterial fluctuation in any environment and are especially important in determining soil microbiota, according to soil expert Professor Young of National Chung Hsing University. Professor Young also suggested we consider the relationship between nitrogen, phosphorus and potassium and soil bacteria, because farm soil is regularly applied with fertilizers containing these vital macronutrients. To take these elements into account, we collected literature discussing their impact on bacterial levels and found the following functions:

     These equations model the direct relationship between levels of the elements and levels of bacteria in soil – specifically, the levels of bacteria that metabolize said elements.

     Combining these general equations together gives a method of obtaining a rough estimation of how our microbiota will change, based on fluctuation of these factors; our universal factors temperature, pH and salinity assist in modeling general fluctuations in the microbiota, while the more specific factors nitrogen, phosphorus and potassium are quite helpful in predicting how amount of nutrient metabolizing bacteria oscillates when dealing with the effects of fertilizers. But how do we deduce the change in level of bacteria that are unaffected by said nutrients? We turn to our NGS analysis to find missing link.

     From our NGS report we calculate the Spearman correlation value of each pair of genera in our soil. This coefficient, assigned a value between -1 and +1, describes the degree of correlation between each pair, with values closer to -1 representing stronger negative correlation and values closer to +1 representing stronger positive correlation. Correlation values between the 20 most abundant bacterial genera in our soil samples are shown in the following heat map.

Figure 1: Top-20 heat map of June
Weka

     Once we have our 6 general equations and our correlation values we’re ready to begin using Weka to construct a prediction model. Weka is split into two parts: regression analysis to filter out the non-correlated pairs of bacteria, and cross validation to determine the weighting each bacterial relationship has under different conditions

Regression Analysis

     We first take advantage of the machine learning software’s classification ability, using the built-in regression analysis module to determine which pairs of bacteria are heavily affected by correlation. To do this we define coefficient values below -0.7 to be truly negatively correlated and coefficient values above +0.7 to be truly positively correlated; pairs assigned a value in between are ignored. Weka then separates truly correlated pairs from the rest; these are the bacteria that will change as an indirect effect of bio-stimulator application. We start with one genus of bacteria and assess the correlation coefficient it has with each other genus in soil. Any pairs with significant correlation are collected into a fold belonging to that bacteria. Once all pairs are assessed, the resulting fold should contain all the bacteria that are correlated with our starting genus.

     For every pair in any particular fold, Weka plots that pair’s data (link) on a graph to find a curve of regression to describe their relationship. For example:

Cross Validation

     The resulting curve is the theoretical relationship between the two bacteria; however, the wide range of soil conditions that vary between different samples may alter the relationship. To account for this, Weka assigns weights to each correlation regression curve by performing cross validation, in this case with three folds. The steps are as follows:

     Three folds of three different genera of bacteria are compared in pairs to determine the accuracy of each pair’s correlational relationship.

     If they exhibit a relationship in line with Weka’s initial assessment, nothing changes and the pair keeps its assigned weight.

     If they show unexpected associations, they are said to exhibit paradox. Paradox alerts Weka to the discrepancy between prediction and reality, causing it to adjust waiting accordingly.

     Through this cross validation Weka calibrates weighting of each pair and can predict how an entire microbiota is related after analyzing all folds. The result can be expressed in a pie chart describing predicted microbial ratios

Artificial Intelligence
     Once our initial model is complete we can begin to make rough predictions about microbiota changes based on a volume of bio-stimulator. The basic rules we established regarding different soil conditions point us in the right direction in terms of bacteria shifts, but to achieve true precise control over soil we must improve our prediction accuracy through artificial intelligence. Artificial intelligence feeds actual data back into our system; more data allows for more calibration and more cross validations, adapting our predictions to the specific nature of our soil sample and improving the accuracy of subsequent predictions.
Model Learning

     We began by generating our model using NGS data from April through June. We entered a volume of bio-stimulator as well NGS data from before and after application, thus generating an initial prediction model. The accuracy of our model with only one month of data was approximately 21%, while inclusion of a second month’s data increased accuracy to 51% - at this point if we were to predict results for June, we would get about 51% of the total microbiota correct. Again, we applied bio-stimulator to our soil and waited for our data. Using June data to calibrate our model increased the prediction accuracy of our model by over another 25%, resulting in a microbiota prediction model with 78% accuracy.

Conclusion

     Increasingly accurate prediction of microbial shifts due to bio-stimulators is a vital element of our smart farming system. Our goal is to regulate soil microbiota precisely, and we need accurate models to do so. Luckily, machine learning and artificial intelligence can provide just that. A general model formed using established relationships between key environmental factors and bacterial growth is supported by correlation values calculated from NGS data to allow for rough initial predictions of microbial shifts. Raw data obtained after subsequent applications of bio-stimulators is reintroduced into our models through a feedback system, calibrating the weightings of each bacteria correlational relationship to improve accuracy with each cycle. With increasingly precise regulation we can manipulate soil microbiota to produce any desired effect. Visit our real farm demonstration (link) to find out how we use artificial intelligence to increase curcumin concentration in turmeric while maintaining soil health.

Template