Artificial intelligence and machine learning allow us to realize the seemingly insurmountable goal of predicting the fluctuations of entire microbiotas due to the specific effects of bio-stimulators. While traditional ecologists may find it too difficult to consider every unique microbial relationship in an ecosystem, machine learning programs use numerical analysis to not only quickly determine these associations but also use them to predict overall population shifts caused by stimuli. For our modelling purposes we choose Weka, a software with strong classification capabilities, to establish accurate connections between every genera of bacteria in our soil.
Line 220: | Line 220: | ||
display: inline-block; | display: inline-block; | ||
margin-left: 10%; | margin-left: 10%; | ||
− | |||
− | |||
− | |||
− | |||
− | |||
} | } | ||
Line 274: | Line 269: | ||
<div class="text"> | <div class="text"> | ||
<p> | <p> | ||
− | | + | Artificial intelligence and machine learning allow us to realize the seemingly insurmountable goal of predicting the fluctuations of entire microbiotas due to the specific effects of bio-stimulators. While traditional ecologists may find it too difficult to consider every unique microbial relationship in an ecosystem, machine learning programs use numerical analysis to not only quickly determine these associations but also use them to predict overall population shifts caused by stimuli. For our modelling purposes we choose Weka, a software with strong classification capabilities, to establish accurate connections between every genera of bacteria in our soil. |
</p> | </p> | ||
</div> | </div> | ||
− | <div class="title_1"> | + | <div class="title_1">Considering General Factors</div> |
<div class="text"> | <div class="text"> | ||
<p> | <p> | ||
− | | + | To begin modelling the relationship between bio-stimulators and microbiota, we first determine the most important factors that affect bacterial growth in soil. Three conditions immediately came to mind: temperature, pH and salinity, whose effects are modelled through the respective equations below: |
</p> | </p> | ||
</div> | </div> | ||
− | < | + | <div class="equation"> |
+ | $$Ratkowsky Equation: <br>R_{temp}(T)=a\cdot[(T-T_{min})\cdot(1-e^{(b\cdot(T-T_{max}))})]^2$$<br> | ||
+ | $$Cardinal pH Equation: <br>R_{pH}(pH)=\frac{c\cdot(pH-pH_{min})\cdot(pH-pH_{max})}{d\cdot((pH-pH_{min})\cdot(pH-pH_{max})-e\cdot(pH-pH_{opt})^2)}$$<br> | ||
+ | $$Salinity Equation: <br>R_{sal}(sal)=(f\cdot sal^2)+(g\cdot sal)+h$$ | ||
+ | </div> | ||
<div class="text"> | <div class="text"> | ||
<p> | <p> | ||
− | | + | These factors heavily influence bacterial fluctuation in any environment and are especially important in determining soil microbiota, according to soil expert Professor Young of National Chung Hsing University. Professor Young also suggested we consider the relationship between nitrogen, phosphorus and potassium and soil bacteria, because farm soil is regularly applied with fertilizers containing these vital macronutrients. To take these elements into account, we collected literature discussing their impact on bacterial levels and found the following functions: |
</p> | </p> | ||
</div> | </div> | ||
− | < | + | <div class="equation"><p></p></div> |
− | <div class=" | + | <div class="text"> |
− | < | + | <p> |
− | + | These equations model the direct relationship between levels of the elements and levels of bacteria in soil – specifically, the levels of bacteria that metabolize said elements. | |
+ | </p> | ||
</div> | </div> | ||
<div class="text"> | <div class="text"> | ||
<p> | <p> | ||
− | | + | Combining these general equations together gives a method of obtaining a rough estimation of how our microbiota will change, based on fluctuation of these factors; our universal factors temperature, pH and salinity assist in modeling general fluctuations in the microbiota, while the more specific factors nitrogen, phosphorus and potassium are quite helpful in predicting how amount of nutrient metabolizing bacteria oscillates when dealing with the effects of fertilizers. But how do we deduce the change in level of bacteria that are unaffected by said nutrients? We turn to our NGS analysis to find missing link. |
</p> | </p> | ||
</div> | </div> | ||
− | |||
<div class="text"> | <div class="text"> | ||
<p> | <p> | ||
− | | + | From our NGS report we calculate the Spearman correlation value of each pair of genera in our soil. This coefficient, assigned a value between -1 and +1, describes the degree of correlation between each pair, with values closer to -1 representing stronger negative correlation and values closer to +1 representing stronger positive correlation. Correlation values between the 20 most abundant bacterial genera in our soil samples are shown in the following heat map. |
</p> | </p> | ||
</div> | </div> | ||
− | + | <img src="https://static.igem.org/mediawiki/2018/9/9a/T--NCTU_Formosa--June_heatmap.png" class="heatmap"> | |
− | + | <div class="explanation"> | |
− | + | <svg class="icon" aria-hidden="true" data-prefix="fas" data-icon="arrow-circle-up" class="svg-inline--fa fa-arrow-circle-up fa-w-16" role="img" xmlns="http://www.w3.org/2000/svg" viewBox="0 0 512 512"><path fill="currentColor" d="M8 256C8 119 119 8 256 8s248 111 248 248-111 248-248 248S8 393 8 256zm143.6 28.9l72.4-75.5V392c0 13.3 10.7 24 24 24h16c13.3 0 24-10.7 24-24V209.4l72.4 75.5c9.3 9.7 24.8 9.9 34.3.4l10.9-11c9.4-9.4 9.4-24.6 0-33.9L273 107.7c-9.4-9.4-24.6-9.4-33.9 0L106.3 240.4c-9.4 9.4-9.4 24.6 0 33.9l10.9 11c9.6 9.5 25.1 9.3 34.4-.4z"></path></svg> | |
− | + | Figure 1: Top-20 heat map of June | |
− | + | ||
− | + | ||
− | + | ||
− | + | ||
</div> | </div> | ||
− | <div class=" | + | <div class="title_1">Weka</div> |
− | + | <div class="text"> | |
− | + | <p> | |
− | + | Once we have our 6 general equations and our correlation values we’re ready to begin using Weka to construct a prediction model. Weka is split into two parts: regression analysis to filter out the non-correlated pairs of bacteria, and cross validation to determine the weighting each bacterial relationship has under different conditions | |
− | + | ||
− | + | ||
− | + | ||
− | + | ||
</div> | </div> | ||
− | <div class="title_1"> | + | <div class="title_1">Regression Analysis</div> |
− | <div class=" | + | <div class="text"> |
− | + | <p> | |
− | + | We first take advantage of the machine learning software’s classification ability, using the built-in regression analysis module to determine which pairs of bacteria are heavily affected by correlation. To do this we define coefficient values below -0.7 to be truly negatively correlated and coefficient values above +0.7 to be truly positively correlated; pairs assigned a value in between are ignored. Weka then separates truly correlated pairs from the rest; these are the bacteria that will change as an indirect effect of bio-stimulator application. We start with one genus of bacteria and assess the correlation coefficient it has with each other genus in soil. Any pairs with significant correlation are collected into a fold belonging to that bacteria. Once all pairs are assessed, the resulting fold should contain all the bacteria that are correlated with our starting genus. | |
− | + | </p> | |
− | + | ||
− | + | ||
</div> | </div> | ||
− | + | <div class="text"> | |
− | + | ||
− | + | ||
<p> | <p> | ||
− | + | For every pair in any particular fold, Weka plots that pair’s data (link) on a graph to find a curve of regression to describe their relationship. For example: | |
− | + | ||
− | + | ||
− | + | ||
− | + | ||
− | + | ||
− | + | ||
− | + | ||
− | + | ||
− | + | ||
− | + | ||
− | + | ||
− | + | ||
− | + | ||
− | + | ||
− | + | ||
− | + | ||
− | + | ||
− | + | ||
− | + | ||
− | + | ||
− | + | ||
− | + | ||
− | + | ||
− | + | ||
</p> | </p> | ||
− | + | </div> | |
+ | <div class="title_1">Cross Validation</div> | ||
+ | <div class="text"><p> The resulting curve is the theoretical relationship between the two bacteria; however, the wide range of soil conditions that vary between different samples may alter the relationship. To account for this, Weka assigns weights to each correlation regression curve by performing cross validation, in this case with three folds. The steps are as follows:</p></div> | ||
+ | <div class="text"><p> Three folds of three different genera of bacteria are compared in pairs to determine the accuracy of each pair’s correlational relationship.</p></div> | ||
+ | <div class="text"><p> If they exhibit a relationship in line with Weka’s initial assessment, nothing changes and the pair keeps its assigned weight.</p></div> | ||
+ | <div class="text"><p> If they show unexpected associations, they are said to exhibit paradox. Paradox alerts Weka to the discrepancy between prediction and reality, causing it to adjust waiting accordingly.</p></div> | ||
+ | <div class="text"><p> Through this cross validation Weka calibrates weighting of each pair and can predict how an entire microbiota is related after analyzing all folds. The result can be expressed in a pie chart describing predicted microbial ratios</p></div> | ||
+ | <div class="title_1">Artificial Intelligence</div> | ||
+ | <div class="text"> Once our initial model is complete we can begin to make rough predictions about microbiota changes based on a volume of bio-stimulator. The basic rules we established regarding different soil conditions point us in the right direction in terms of bacteria shifts, but to achieve true precise control over soil we must improve our prediction accuracy through artificial intelligence. Artificial intelligence feeds actual data back into our system; more data allows for more calibration and more cross validations, adapting our predictions to the specific nature of our soil sample and improving the accuracy of subsequent predictions.</div> | ||
+ | <div class="title_1">Model Learning</div> | ||
+ | <div class="text"><p> We began by generating our model using NGS data from April through June. We entered a volume of bio-stimulator as well NGS data from before and after application, thus generating an initial prediction model. The accuracy of our model with only one month of data was approximately 21%, while inclusion of a second month’s data increased accuracy to 51% - at this point if we were to predict results for June, we would get about 51% of the total microbiota correct. Again, we applied bio-stimulator to our soil and waited for our data. Using June data to calibrate our model increased the prediction accuracy of our model by over another 25%, resulting in a microbiota prediction model with 78% accuracy. </p></div> | ||
+ | <div class="title_1">Conclusion</div> | ||
+ | <div class="text"><p> Increasingly accurate prediction of microbial shifts due to bio-stimulators is a vital element of our smart farming system. Our goal is to regulate soil microbiota precisely, and we need accurate models to do so. Luckily, machine learning and artificial intelligence can provide just that. A general model formed using established relationships between key environmental factors and bacterial growth is supported by correlation values calculated from NGS data to allow for rough initial predictions of microbial shifts. Raw data obtained after subsequent applications of bio-stimulators is reintroduced into our models through a feedback system, calibrating the weightings of each bacteria correlational relationship to improve accuracy with each cycle. With increasingly precise regulation we can manipulate soil microbiota to produce any desired effect. Visit our real farm demonstration (link) to find out how we use artificial intelligence to increase curcumin concentration in turmeric while maintaining soil health.</p></div> | ||
+ | |||
+ | </div> | ||
<!-----------------------------------------------------------------------------> | <!-----------------------------------------------------------------------------> |
Revision as of 03:09, 18 October 2018
![](https://static.igem.org/mediawiki/2018/f/ff/T--NCTU_Formosa--Navigation.png)
![](https://static.igem.org/mediawiki/2018/1/17/T--NCTU_Formosa--Microbiota_Prediciton.png)
To begin modelling the relationship between bio-stimulators and microbiota, we first determine the most important factors that affect bacterial growth in soil. Three conditions immediately came to mind: temperature, pH and salinity, whose effects are modelled through the respective equations below:
R_{temp}(T)=a\cdot[(T-T_{min})\cdot(1-e^{(b\cdot(T-T_{max}))})]^2$$
$$Cardinal pH Equation:
R_{pH}(pH)=\frac{c\cdot(pH-pH_{min})\cdot(pH-pH_{max})}{d\cdot((pH-pH_{min})\cdot(pH-pH_{max})-e\cdot(pH-pH_{opt})^2)}$$
$$Salinity Equation:
R_{sal}(sal)=(f\cdot sal^2)+(g\cdot sal)+h$$
These factors heavily influence bacterial fluctuation in any environment and are especially important in determining soil microbiota, according to soil expert Professor Young of National Chung Hsing University. Professor Young also suggested we consider the relationship between nitrogen, phosphorus and potassium and soil bacteria, because farm soil is regularly applied with fertilizers containing these vital macronutrients. To take these elements into account, we collected literature discussing their impact on bacterial levels and found the following functions:
These equations model the direct relationship between levels of the elements and levels of bacteria in soil – specifically, the levels of bacteria that metabolize said elements.
Combining these general equations together gives a method of obtaining a rough estimation of how our microbiota will change, based on fluctuation of these factors; our universal factors temperature, pH and salinity assist in modeling general fluctuations in the microbiota, while the more specific factors nitrogen, phosphorus and potassium are quite helpful in predicting how amount of nutrient metabolizing bacteria oscillates when dealing with the effects of fertilizers. But how do we deduce the change in level of bacteria that are unaffected by said nutrients? We turn to our NGS analysis to find missing link.
From our NGS report we calculate the Spearman correlation value of each pair of genera in our soil. This coefficient, assigned a value between -1 and +1, describes the degree of correlation between each pair, with values closer to -1 representing stronger negative correlation and values closer to +1 representing stronger positive correlation. Correlation values between the 20 most abundant bacterial genera in our soil samples are shown in the following heat map.
![](https://static.igem.org/mediawiki/2018/9/9a/T--NCTU_Formosa--June_heatmap.png)
Once we have our 6 general equations and our correlation values we’re ready to begin using Weka to construct a prediction model. Weka is split into two parts: regression analysis to filter out the non-correlated pairs of bacteria, and cross validation to determine the weighting each bacterial relationship has under different conditions
We first take advantage of the machine learning software’s classification ability, using the built-in regression analysis module to determine which pairs of bacteria are heavily affected by correlation. To do this we define coefficient values below -0.7 to be truly negatively correlated and coefficient values above +0.7 to be truly positively correlated; pairs assigned a value in between are ignored. Weka then separates truly correlated pairs from the rest; these are the bacteria that will change as an indirect effect of bio-stimulator application. We start with one genus of bacteria and assess the correlation coefficient it has with each other genus in soil. Any pairs with significant correlation are collected into a fold belonging to that bacteria. Once all pairs are assessed, the resulting fold should contain all the bacteria that are correlated with our starting genus.
For every pair in any particular fold, Weka plots that pair’s data (link) on a graph to find a curve of regression to describe their relationship. For example:
The resulting curve is the theoretical relationship between the two bacteria; however, the wide range of soil conditions that vary between different samples may alter the relationship. To account for this, Weka assigns weights to each correlation regression curve by performing cross validation, in this case with three folds. The steps are as follows:
Three folds of three different genera of bacteria are compared in pairs to determine the accuracy of each pair’s correlational relationship.
If they exhibit a relationship in line with Weka’s initial assessment, nothing changes and the pair keeps its assigned weight.
If they show unexpected associations, they are said to exhibit paradox. Paradox alerts Weka to the discrepancy between prediction and reality, causing it to adjust waiting accordingly.
Through this cross validation Weka calibrates weighting of each pair and can predict how an entire microbiota is related after analyzing all folds. The result can be expressed in a pie chart describing predicted microbial ratios
We began by generating our model using NGS data from April through June. We entered a volume of bio-stimulator as well NGS data from before and after application, thus generating an initial prediction model. The accuracy of our model with only one month of data was approximately 21%, while inclusion of a second month’s data increased accuracy to 51% - at this point if we were to predict results for June, we would get about 51% of the total microbiota correct. Again, we applied bio-stimulator to our soil and waited for our data. Using June data to calibrate our model increased the prediction accuracy of our model by over another 25%, resulting in a microbiota prediction model with 78% accuracy.
Increasingly accurate prediction of microbial shifts due to bio-stimulators is a vital element of our smart farming system. Our goal is to regulate soil microbiota precisely, and we need accurate models to do so. Luckily, machine learning and artificial intelligence can provide just that. A general model formed using established relationships between key environmental factors and bacterial growth is supported by correlation values calculated from NGS data to allow for rough initial predictions of microbial shifts. Raw data obtained after subsequent applications of bio-stimulators is reintroduced into our models through a feedback system, calibrating the weightings of each bacteria correlational relationship to improve accuracy with each cycle. With increasingly precise regulation we can manipulate soil microbiota to produce any desired effect. Visit our real farm demonstration (link) to find out how we use artificial intelligence to increase curcumin concentration in turmeric while maintaining soil health.
![](https://static.igem.org/mediawiki/2018/e/e8/T--NCTU_Formosa--top.png)