Difference between revisions of "Team:NCTU Formosa/Dry Lab/NGS Data Analysis"

Latest revision as of 18:33, 7 December 2018

MENU

HOME

TEAM

PROJECT

PARTS

HUMAN PRACTICES

AWARDS

Navigation Bar

☰

Project

Dry Lab

Microbiota Prediction

Wet Lab

Parts

Human Practice

Education and Public Engagement

Team

Notebook

Correlation Model

To train our model more effectively, we use NGS(Next Generation Sequencing) 16S to analyze the microbiota in the soil. We spray biostimulators into the soil to affect the entire microbiota and use NGS 16S regularly. By analyzing multiple NGS data, we can determine the nature of the relationships about bacteria and take advantage of these characteristics to add biostimulators again. Through this process training our model, we adjust to our system again and again, allowing our system predicts the changes of microbiota in soil accurately.

What is NGS 16S?

Next, we will interpret NGS and 16S rRNA separately.

Next Generation Sequencing

NGS(Next Generation Sequencing) is a kind of technique to sequence a number of genomes in very short time. There are three main platforms on the market currently: Solexa from Illumina, SOLiD from ABI, and 454 from Roche. The procedures of these three sequencers are different, but they are all based on chain termination. Take Solixa for example, following these steps below:

(1) Use ultrasound to break original DNA sequences into fragments of about 200-500 base pairs, and then attach the adapters to both ends of the fragments.

(2) Place the DNA fragments on a flowcell with complementary adapter sequences on the surface. The adapters will adhere to each other to allow the DNA fragments stay at the flow cell.

(3) Amplify DNA fragments by bridge amplification.

(4) The sequencing uses the method like the Sanger sequencing, adding different bases (dNTPs) and synthetic reagents that have been calibrated for specific removable fluorescent molecules. Repeatedly the process of removing and detecting fluorescence. Last, the computer software will analysis large numbers of DNA sequences quickly.

FASTQC

FASTQC is mainly used to filter the NGS data. It is very important to check the data quality before analyzing the data. Only when the data quality is high enough, the next step can be continued.
After we input NGS sequence data into FASTQC, the program will analyze automatically and score each sequence to ensure that the quality of the gene sequences is suitable for computer calculation.

16S rRNA

16S rRNA is an important component of the ribosomal small subunit of prokaryote. The sequence contains several conserved regions and 9 hypervariable regions (V1 to V9). The hypervariable regions have genus or species specificity, considered to be the most suitable indicator for phylogeny of bacteria and identification of classification. NGS 16S uses the sequence of V4 and V5 in the hypervariable regions to detect the bacterial clusters.

Sampling

We divide the agricultural land into four large blocks of A, B, C, and D. In each block, there are three strips of 1, 2, and 3, each of which is divided into T, M, and D. Thus, we have thirty six sample in total( A1T, A2M, ....). We get 50 micro liters per sample from 10 to 15 centimeters depth near the root of each testing plant. Then, the samples are sent to the company for NGS analysis.

The result will present the each bacteria ratio in each samples and report it in an OTU table.(Fig. 1)

Marker Gene Amplicon Analysis

Microbiome data are generated from 16S ribosomal RNA(rRNA) gene. The PCR primers were designed to amplify the V4 region of the bacterial 16S ribosomal DNA. After profiling 16S rRNA sequencing, we used QIIME to generate operational taxonomic units (OTUs) table. Then we used bioinformatics tools and statistics methods to analyze microbial diversity in soil samples. We also used machine learning to predict how soil microbiota changes with addition of bio-stimulators.

Operational Taxonomic Units Table (OTUs Table)

Figure 1 is an example of OTUs table. Each column represents the type and amount of bacteria (OTU1, OTU2, …, OTU7) in each soil sample (A1, A2, A3, B1, B2, and B3). We generate seven tables for each level: Phylum, Class, Order, Family, Genus, and Species.

Figure 1: Schematic Diagram of Operation Taxnomy Unit table

Data Analysis Process

Figure 2: The process of correlation analysis

The OTUs tables will consist of unclassified names using the open source pipeline of QIIME. Thus, we have to rearrange the data to facilitate analysis according to the following steps:

(1) Delete unclassified genomic segments.
(2) Calculate the ratios of the remaining entries.
(3) Select the most abundant bacteria within soil samples to observe their distribution using the following bar charts (Fig. 3, Fig. 4, Fig. 5).

After making the Stacked bar, we organized the data to facilitate the analysis (Table 2). We observed growths and declines of bacteria in the soil, and then summarized the bacteria according to their functions. Moreover, we will explain what we do in our farm affect microbiota. For example: Sphingomonas, Alcanivorax, Devosia which are polluting indicator bacteria, are decreased continuously from May to July. We speculate that the reason of their decline is that we have not applied herbicides, pesticides or other pollutants in these three months. When the soil repaired by itself, the pollutants are falling, and the polluting indicator bacteria are also falling. After the series of analysis, we hope to prove the basic hypothesis of our project--we can precisely regulate microbiota in the soil by using bio-stimulator. We will put our details of analysis in our demonstration.

Figure 3: Stacked bar chart of top-20 bacteria ratio in different samples (May)

Figure 4: Stacked bar chart of top-20 bacteria ratio in different samples (June)

Figure 5: Stacked bar chart of top-20 bacteria ratio in different samples (July)

Spearman's Rank Correlation

The strength of co-occurrence of bacteria within soil samples was evaluated by the Spearman’s rank correlation coefficients. It ranges from -1 to 1. The formula of Spearman correlation coefficient is as follows:

$$\rho_s=1-\frac{6\sum d_{i^2}}{n(n^2-1)}$$

Table 1: Variable and Parameter in Spearman's correlation equation.
Symbol	Unit	Explanation
$\rho_s$	-	Spearman's correlation value
$d_i$	-	The difference in the ranked observations from each group
$n$	-	The sample size

We used heat maps of correlation to visualize the correlation strength. Figure 3, 4, and 5 show the top 20 abundant bacteria within soil sample in different months. The reason we selected the top 20 abundant bacteria is that we found out the top 20 abundant bacteria accounted for over 95 percent of the amouts of known bacteria. It is true when we analyze frome phylum level to class level. However, analyzing the proportion of those bacteria at genus level, we found that although the proportion of the original top 20 abundant bacteria decrease to about 75% of known bacteria in some months, the proportion of each other bacterium was still lower than 1%. According to the analysis, it seemed that the effect of the bacteria ranked below 20 could be ignored. A computer program can then visualize the results in a heat map. A map of the 20 most abundant bacteria of our soil is shown below:

Figure 6: Correlation heat map of top-20 bacteria in June

For example, the figure above shows the heat map of correlation of June. We could select candidate bacteria by the heat map of June to do prediction of microbiota of July because Weka utilized the correlation formula of the bacteria in June to simulate the microbiota of July. To make it more easily to understand spearman, we could observe the heat map above: while the proportions of two kinds of bacteria increased simultaneously, the block in the table showing the correlation between the two bacteria is red. Conversely, the block would turn to blue if one bacterium increased while the other decreased. Since only when data’s correlation coefficient larger than 0.7, the data is meaningful in statistics. We then selected the combination of every two bacteria whose correlation coefficient is greater than 0.7 or less than -0.7 as correlative samples. Absolutely there are exceptions to the prediction of our system since the spearman coefficient could only show the ratio of the bacteria in the soil sample we collected of the month while show every single phenomenon in the nature and then lead to cause difference. However, the difference with low correlation coefficient could not increase error tour Weka training.

Table 2: The ratio variation of top-20 bacteria within June and July.
Taxon (Genus)	June	July
Kaistobacter	10.87%	11.57%
Nitrospira	8.84%	18.03%
Steroidobacter	7.05%	3.20%
Rhodoplanes	6.50%	10.36%
Candidatus Solibacter	5.63%	3.81%
Candidatus Koribacter	4.36%	4.53%
Pseudomonas	3.56%	5.14%
Perlucidibaca	3.03%	1.06%
Flavisolibacter	2.71%	3.01%
Hyphomicrobium	2.37%	1.00%
Bdellovibrio	2.03%	2.11%
Bacillus	1.96%	0.84%
Streptomyces	1.87%	0.41%
Rhodanobacter	1.84%	0.11%
Flavobacterium	1.55%	12.12%
Opitutus	1.47%	1.02%
Devosia	1.46%	0.72%
Mycobacterium	1.44%	0.18%
Phenylobacterium	1.37%	0.87%
Bradyrhizobium	1.06%	0.42%

Alpha-Diversity Analysis

Use of bio-stimulators to manipulate soil factors requires careful consideration of the microbiota. Certain stimulators may cause specific genera of bacteria to become overly dominant, damaging soil integrity. As a method of monitoring the balance of the microbial ecosystem, we investigate the evenness of the soil.

Eveness--Shannon Index

Microbial diversity is measured by alpha-diversity (α-diversity). In our study, α-diversity refers richness and the Shannon diversity index. Richness means the number of OTUs, and evenness of bacterial community is measured by the Shannon diversity index, as shown below:

$$H'=-\sum_{i=1}^{S}p_{i}lnp_i$$

Table 3: Variable and Parameter in Shannon index equation.
Symbol	Unit	Explanation
$H'$	-	Shannon index
$S$	-	The total number of genuses in samples
$p_i$	-	The ratio of bacteria amount of the i^th genus in the sample

A higher Shannon index indicates greater evenness. The estimated degree of evenness can be derived from the exponential of the value. For example, a soil sample with Shannon index 2.85 and $$e^{2.85}=17$$ It means that the sample approximately consists of 17bacteria that are equal in numbers. Thus, the Shannon index can be used as an observational tool to determine whether bio-stimulators decrease the overall evenness or not, and thus health and stability, of the soil.

Triplicate Analysis

Figure 7: The box plot of shannon index triplicate analysis

References

1. Kumar, A. and L. C. Rai (2017). "Soil Organic Carbon and Availability of Soil Phosphorus Regulate Abundance of Culturable Phosphate Solubilizing Bacteria in Paddy Fields of the Indo-Gangetic Plain." Pedosphere.

2. Wang, P., et al. (2015). "Long-term rice cultivation stabilizes soil organic carbon and promotes soil microbial activity in a salt marsh derived soil chronosequence." Scientific Reports 5: 15704.

Template

@@ Line 26: / Line 26: @@
        width: 100%;
        font-family: Levenim MT;
+      scroll-behavior: smooth;
      }
@@ Line 39: / Line 40: @@
      }
+    .title_title{
+      width: 40%;
+      margin-left: 30%;
+      margin-bottom: 2vw;
+      margin-top: 8vw;
+    }
      .title{
@@ Line 65: / Line 72: @@
        font-size: 4.5vmin;
        width: 70%;
-       margin: 50px;
+       margin-top: 4vw;
+      margin-bottom: 3vw;
        margin-left: 18%;
      }
@@ Line 114: / Line 122: @@
      caption{
-       font-size: 3vmin;
+       font-size: 2.5vmin;
        font-weight: bold;
        margin-bottom: 10px;
@@ Line 122: / Line 130: @@
      thead{
-       background: #1182be;
+       background: #142968;
        color: #fefefe;
-       font-size: 3.5vmin;
+       font-size: 3vmin;
      }
      tbody{
-       font-size: 3vmin;
+       font-size: 2.5vmin;
      }
@@ Line 188: / Line 196: @@
      .Spearman{
-       width: 35%;
+       width: 40%;
        position: relative;
-       left: 29%;
+       left: 26%;
        margin: 30px;
        margin-top: 0;
-       margin-bottom: 20px;
+       margin-bottom: 10px;
      }
@@ Line 206: / Line 214: @@
      .productivity{
-       width: 6.5%;
+       width: 7.5%;
-       top: 31.4vw;
+       top: 34.15vw;
-       left: 51.7%;
+       left: 49.37%;
      }
      .NGS{
-       width: 5.6%;
+       width: 7%;
-       top: 10.5vw;
+       top: 12.7vw;
-       left: 58.45%;
+       left: 55.9%;
      }
      .weka{
        width: 8%;
-       top: 20vw;
+       top: 22.85vw;
-       left: 57.3%;
+       left: 55.8%;
      }
      .growth{
-       width: 7%;
+       width: 7.5%;
-       top: 20.2vw;
+       top: 22.8vw;
-       left: 76.3%;
+       left: 74.7%;
      }
      .scoring{
-       width: 5.5%;
+       width: 7%;
-       top: 31vw;
+       top: 32.2vw;
-       left: 83%;
+       left: 80.8%;
      }
+    .spearman_table{
+      width: 30%;
+      margin-left: 35%;
+    }
+	  .barplot{
+	    width: 50%;
+	    margin-right: 27%;
+      margin-bottom: 1vw;
+      margin-top: 2vw;
+    }
    </style>
@@ Line 271: / Line 292: @@
    <div class="wrapper">
      <div class="banner">
-       <img class="cover" src="https://static.igem.org/mediawiki/2018/5/50/T--NCTU_Formosa--Model_inner_cover.png">
+       <img class="cover" src="https://static.igem.org/mediawiki/2018/f/f6/T--NCTU_Formosa--Drylab2.png">
-       <a href="https://2018.igem.org/Team:NCTU_Formosa/Dry_Lab/Model_4"><img src="https://static.igem.org/mediawiki/2018/a/ae/T--NCTU_Formosa--Productivity_icon.png" class="cover_icon productivity"></a>
+       <a href="https://2018.igem.org/Team:NCTU_Formosa/Dry_Lab/Productivity_Model"><img src="https://static.igem.org/mediawiki/2018/4/4d/T--NCTU_Formosa--Productivity5.png" class="cover_icon productivity"></a>
        <a href="https://2018.igem.org/Team:NCTU_Formosa/Dry_Lab/NGS_Data_Analysis"><img src="https://static.igem.org/mediawiki/2018/a/a9/T--NCTU_Formosa--NGS_icon.png" class="cover_icon NGS"></a>
-       <a href="https://2018.igem.org/Team:NCTU_Formosa/Dry_Lab/Weka"><img src="https://static.igem.org/mediawiki/2018/f/f3/T--NCTU_Formosa--weka_icon.png" class="cover_icon weka"></a>
+       <a href="https://2018.igem.org/Team:NCTU_Formosa/Dry_Lab/Microbiota_Prediciton"><img src="https://static.igem.org/mediawiki/2018/f/f3/T--NCTU_Formosa--weka_icon.png" class="cover_icon weka"></a>
-       <a href="https://2018.igem.org/Team:NCTU_Formosa/Dry_Lab/Model_1"><img src="https://static.igem.org/mediawiki/2018/5/58/T--NCTU_Formosa--growth_icon.png" class="cover_icon growth"></a>
+       <a href="https://2018.igem.org/Team:NCTU_Formosa/Dry_Lab/Growth_Model"><img src="https://static.igem.org/mediawiki/2018/5/58/T--NCTU_Formosa--growth_icon.png" class="cover_icon growth"></a>
        <a href="https://2018.igem.org/Team:NCTU_Formosa/Dry_Lab/Peptide_Prediction"><img src="https://static.igem.org/mediawiki/2018/8/84/T--NCTU_Formosa--scoring_card_icon.png" class="cover_icon scoring"></a>
      </div>
      <div class="sec1">
-       <div class="title"><p>Overview</p></div>
+      <img src="https://static.igem.org/mediawiki/2018/7/77/T--NCTU_Formosa--NGS_Data_Analysis.png" class="title_title">
+      <div class="text" style="width: 75%;">
+        <p>
+          &nbsp;&nbsp;&nbsp;&nbsp;&nbsp;To train our model more effectively, we use NGS(Next Generation Sequencing) 16S to analyze the microbiota in the soil. We spray biostimulators into the soil to affect the entire microbiota and use NGS 16S regularly. By analyzing multiple NGS data, we can determine the nature of the relationships about bacteria and take advantage of these characteristics to add biostimulators again. Through this process training our model, we adjust to our system again and again, allowing our system predicts the changes of microbiota in soil accurately.
+        </p>
+      </div>
+       <div class="title_1"><p>What is NGS 16S?</p></div>
        <div class="text">
          <p>
-           &nbsp;&nbsp;&nbsp;&nbsp;&nbsp;Use of biostimulators impacts the entire microbiota. The final resulting bacterial distribution depends on interactions between bacteria and can be illustrated through correlation. From our initial NGS data we can determine the nature of the relationships between bacteria and use these properties to accurately predict how soil microbiota changes with addition of biostimulators using machine learning.
+           &nbsp;&nbsp;&nbsp;&nbsp;&nbsp;Next, we will interpret NGS and 16S rRNA separately.
          </p>
        </div>
-       <img src="https://static.igem.org/mediawiki/2018/f/f5/T--NCTU_Formosa--Model_Analysis.png" class="correlation">
+       <div class="title_3" style="margin-left: 13%; margin-bottom: 1.5vw;"><p>Next Generation Sequencing</p></div>
-       <div class="explanation">
+      <div class="text">
-        <svg class="icon" aria-hidden="true" data-prefix="fas" data-icon="arrow-circle-up" class="svg-inline--fa fa-arrow-circle-up fa-w-16" role="img" xmlns="http://www.w3.org/2000/svg" viewBox="0 0 512 512"><path fill="currentColor" d="M8 256C8 119 119 8 256 8s248 111 248 248-111 248-248 248S8 393 8 256zm143.6 28.9l72.4-75.5V392c0 13.3 10.7 24 24 24h16c13.3 0 24-10.7 24-24V209.4l72.4 75.5c9.3 9.7 24.8 9.9 34.3.4l10.9-11c9.4-9.4 9.4-24.6 0-33.9L273 107.7c-9.4-9.4-24.6-9.4-33.9 0L106.3 240.4c-9.4 9.4-9.4 24.6 0 33.9l10.9 11c9.6 9.5 25.1 9.3 34.4-.4z"></path></svg>
+        <p>
-        Figure 1: The process of correlation analysis
+          &nbsp;&nbsp;&nbsp;NGS(Next Generation Sequencing) is a kind of technique to sequence a number of genomes in very short time. There are three main platforms on the market  currently: Solexa from Illumina, SOLiD from ABI, and 454 from Roche. The procedures of these three sequencers are different, but they are all based on chain termination.
+          Take Solixa for example, following these steps below:<br><br>
+          (1)	Use ultrasound to break original DNA sequences into fragments of about 200-500 base pairs, and then attach the adapters to both ends of the fragments.<br><br>
+          (2)	Place the DNA fragments on a flowcell with complementary adapter sequences on the surface. The adapters will adhere to each other to allow the DNA fragments stay at the flow cell.<br><br>
+          (3)	Amplify DNA fragments by bridge amplification.<br><br>
+          (4)	The sequencing uses the method like the Sanger sequencing, adding different bases (dNTPs) and synthetic reagents that have been calibrated for specific removable fluorescent molecules. Repeatedly the process of removing and detecting fluorescence. Last, the computer software will analysis large numbers of DNA sequences quickly.
+        </p>
+      </div>
+       <div class="title_3" style="margin-left: 13%; margin-bottom: 1.5vw;"><p>FASTQC</p></div>
+      <div class="text" style="margin-top: 1.5vw;">
+        <p>
+          &nbsp;&nbsp;&nbsp;&nbsp;&nbsp;FASTQC is mainly used to filter the NGS data. It is very important to check the data quality before analyzing the data. Only when the data quality is high enough, the next step can be continued.<br>
+          &nbsp;&nbsp;&nbsp;&nbsp;After we input NGS sequence data into FASTQC, the program will analyze automatically and score each sequence to ensure that the quality of the gene sequences is suitable for computer calculation.
+        </p>
+      </div>
+      <div class="title_3" style="margin-left: 13%; margin-bottom: 1.5vw;"><p>16S rRNA</p></div>
+      <div class="text" style="margin-top: 1.5vw;">
+        <p>
+          &nbsp;&nbsp;&nbsp;16S rRNA is an important component of the ribosomal small subunit of prokaryote. The sequence contains several conserved regions and 9 hypervariable regions (V1 to V9). The hypervariable regions have genus or species specificity, considered to be the most suitable indicator for phylogeny of bacteria and identification of classification.  NGS 16S uses the sequence of V4 and V5 in the hypervariable regions to detect the bacterial clusters.
+        </p>
+      </div>
+      <div class="title_3" style="margin-left: 13%; margin-bottom: 1.5vw;"><p>Sampling</p></div>
+      <div class="text" style="margin-top: 1.5vw;">
+        <p>
+          &nbsp;&nbsp;&nbsp;&nbsp;&nbsp;We divide the agricultural land into four large blocks of A, B, C, and D. In each block, there are three strips of 1, 2, and 3, each of which is divided into T, M, and D. Thus, we have thirty six sample in total( A1T, A2M, ....). We get 50 micro liters per sample from 10 to 15 centimeters depth near the root of each testing plant. Then, the samples are sent to the company for NGS analysis.<br><br>
+          &nbsp;&nbsp;&nbsp;&nbsp;&nbsp;The result will present the each bacteria ratio in each samples and report it in an OTU table.(Fig. 1)<br>
+        </p>
+      </div>
+      <div class="title_1"><p>Marker Gene Amplicon Analysis</p></div>
+      <div class="text" style="margin-top: 1.5vw;">
+        <p>
+          &nbsp;&nbsp;&nbsp;&nbsp;&nbsp;Microbiome data are generated from 16S ribosomal RNA(rRNA) gene. The PCR primers were designed to amplify the V4 region of the bacterial 16S ribosomal DNA. After profiling 16S rRNA sequencing, we used QIIME to generate operational taxonomic units (OTUs) table. Then we used bioinformatics tools and statistics methods to analyze microbial diversity in soil samples. We also used machine learning to predict how soil microbiota changes with addition of bio-stimulators.
+        </p>
        </div>
 <!----------------------------------------------------------------------------->
-       <div class="title_1"><p>Original OTU Table</p></div>
+       <div class="title_1"><p>Operational Taxonomic Units Table (OTUs Table)</p></div>
        <div class="text">
          <p>
-           &nbsp;&nbsp;&nbsp;&nbsp;&nbsp;16s NGS uses the 16 small subunit of bacterial ribosomes to differentiate bacteria of different genera. Through this technology we obtain the bacterial distribution of our soil, summarized in the operational taxonomical unit table (OTU Table) below:
+           &nbsp;&nbsp;&nbsp;&nbsp;&nbsp;Figure 1 is an example of OTUs table. Each column represents the type and amount of bacteria (OTU1, OTU2, …, OTU7) in each soil sample (A1, A2, A3, B1, B2, and B3). We generate seven tables for each level: Phylum, Class, Order, Family, Genus, and Species.
          </p>
        </div>
@@ Line 302: / Line 365: @@
        <div class="explanation">
          <svg class="icon" aria-hidden="true" data-prefix="fas" data-icon="arrow-circle-up" class="svg-inline--fa fa-arrow-circle-up fa-w-16" role="img" xmlns="http://www.w3.org/2000/svg" viewBox="0 0 512 512"><path fill="currentColor" d="M8 256C8 119 119 8 256 8s248 111 248 248-111 248-248 248S8 393 8 256zm143.6 28.9l72.4-75.5V392c0 13.3 10.7 24 24 24h16c13.3 0 24-10.7 24-24V209.4l72.4 75.5c9.3 9.7 24.8 9.9 34.3.4l10.9-11c9.4-9.4 9.4-24.6 0-33.9L273 107.7c-9.4-9.4-24.6-9.4-33.9 0L106.3 240.4c-9.4 9.4-9.4 24.6 0 33.9l10.9 11c9.6 9.5 25.1 9.3 34.4-.4z"></path></svg>
-         Figure 2: Schematic diagram of Operation Taxnomy Unit table
+         Figure 1: Schematic Diagram of Operation Taxnomy Unit table
+      </div>
+      <div class="title_1"><p>Data Analysis Process</p></div>
+      <img src="https://static.igem.org/mediawiki/2018/f/f5/T--NCTU_Formosa--Model_Analysis.png" class="correlation">
+      <div class="explanation">
+        <svg class="icon" aria-hidden="true" data-prefix="fas" data-icon="arrow-circle-up" class="svg-inline--fa fa-arrow-circle-up fa-w-16" role="img" xmlns="http://www.w3.org/2000/svg" viewBox="0 0 512 512"><path fill="currentColor" d="M8 256C8 119 119 8 256 8s248 111 248 248-111 248-248 248S8 393 8 256zm143.6 28.9l72.4-75.5V392c0 13.3 10.7 24 24 24h16c13.3 0 24-10.7 24-24V209.4l72.4 75.5c9.3 9.7 24.8 9.9 34.3.4l10.9-11c9.4-9.4 9.4-24.6 0-33.9L273 107.7c-9.4-9.4-24.6-9.4-33.9 0L106.3 240.4c-9.4 9.4-9.4 24.6 0 33.9l10.9 11c9.6 9.5 25.1 9.3 34.4-.4z"></path></svg>
+        Figure 2: The process of correlation analysis
        </div>
-      <div class="title_1"><p>Ratio Analysis</p></div>
        <div class="text">
          <p>
-           &nbsp;&nbsp;&nbsp;&nbsp;&nbsp;NGS data includes several unclassified entries consisting of incomplete genomic segments that don’t represent functional bacteria. We first delete these entries, then calculate the ratios of the remaining entries to produce the following bar chart:
+           &nbsp;&nbsp;&nbsp;&nbsp;&nbsp;The OTUs tables will consist of unclassified names using the open source pipeline of QIIME. Thus, we have to rearrange the data to facilitate analysis according to the following steps:<br><br>
+          (1) Delete unclassified genomic segments.<br>
+          (2) Calculate the ratios of the remaining entries.<br>
+          (3) Select the most abundant bacteria within soil samples to observe their distribution using the following bar charts (Fig. 3, Fig. 4, Fig. 5).
          </p>
        </div>
-       <img src="https://static.igem.org/mediawiki/2018/3/38/T--NCTU_Formosa--barplot.png" class="barplot">
+      <div class="text">
+        <p>
+          &nbsp;&nbsp;&nbsp;&nbsp;&nbsp;After making the Stacked bar, we organized the data to facilitate the analysis (Table 2). We observed growths and declines of bacteria in the soil, and then summarized the bacteria according to their functions. Moreover, we will explain what we do in our farm affect microbiota. For example: Sphingomonas, Alcanivorax, Devosia which are polluting indicator bacteria, are decreased continuously from May to July. We speculate that the reason of their decline is that we have not applied herbicides, pesticides or other pollutants in these three months. When the soil repaired by itself, the pollutants are falling, and the polluting indicator bacteria are also falling. After the series of analysis, we hope to prove the basic hypothesis of our project--we can precisely regulate microbiota in the soil by using bio-stimulator. We will put our details of analysis in our demonstration.
+        </p>
+      </div>
+       <img src="https://static.igem.org/mediawiki/2018/e/e4/T--NCTU_Formosa--may_bar.jpg" class="barplot">
+      <div class="explanation">
+        <svg class="icon" aria-hidden="true" data-prefix="fas" data-icon="arrow-circle-up" class="svg-inline--fa fa-arrow-circle-up fa-w-16" role="img" xmlns="http://www.w3.org/2000/svg" viewBox="0 0 512 512"><path fill="currentColor" d="M8 256C8 119 119 8 256 8s248 111 248 248-111 248-248 248S8 393 8 256zm143.6 28.9l72.4-75.5V392c0 13.3 10.7 24 24 24h16c13.3 0 24-10.7 24-24V209.4l72.4 75.5c9.3 9.7 24.8 9.9 34.3.4l10.9-11c9.4-9.4 9.4-24.6 0-33.9L273 107.7c-9.4-9.4-24.6-9.4-33.9 0L106.3 240.4c-9.4 9.4-9.4 24.6 0 33.9l10.9 11c9.6 9.5 25.1 9.3 34.4-.4z"></path></svg>
+        Figure 3: Stacked bar chart of top-20 bacteria ratio in different samples (May)
+      </div>
+      <img src="https://static.igem.org/mediawiki/2018/4/4b/T--NCTU_Formosa--june_bar.jpg" class="barplot">
+      <div class="explanation">
+        <svg class="icon" aria-hidden="true" data-prefix="fas" data-icon="arrow-circle-up" class="svg-inline--fa fa-arrow-circle-up fa-w-16" role="img" xmlns="http://www.w3.org/2000/svg" viewBox="0 0 512 512"><path fill="currentColor" d="M8 256C8 119 119 8 256 8s248 111 248 248-111 248-248 248S8 393 8 256zm143.6 28.9l72.4-75.5V392c0 13.3 10.7 24 24 24h16c13.3 0 24-10.7 24-24V209.4l72.4 75.5c9.3 9.7 24.8 9.9 34.3.4l10.9-11c9.4-9.4 9.4-24.6 0-33.9L273 107.7c-9.4-9.4-24.6-9.4-33.9 0L106.3 240.4c-9.4 9.4-9.4 24.6 0 33.9l10.9 11c9.6 9.5 25.1 9.3 34.4-.4z"></path></svg>
+        Figure 4: Stacked bar chart of top-20 bacteria ratio in different samples (June)
+      </div>
+      <img src="https://static.igem.org/mediawiki/2018/0/05/T--NCTU_Formosa--Barplot_july.png" class="barplot">
        <div class="explanation">
          <svg class="icon" aria-hidden="true" data-prefix="fas" data-icon="arrow-circle-up" class="svg-inline--fa fa-arrow-circle-up fa-w-16" role="img" xmlns="http://www.w3.org/2000/svg" viewBox="0 0 512 512"><path fill="currentColor" d="M8 256C8 119 119 8 256 8s248 111 248 248-111 248-248 248S8 393 8 256zm143.6 28.9l72.4-75.5V392c0 13.3 10.7 24 24 24h16c13.3 0 24-10.7 24-24V209.4l72.4 75.5c9.3 9.7 24.8 9.9 34.3.4l10.9-11c9.4-9.4 9.4-24.6 0-33.9L273 107.7c-9.4-9.4-24.6-9.4-33.9 0L106.3 240.4c-9.4 9.4-9.4 24.6 0 33.9l10.9 11c9.6 9.5 25.1 9.3 34.4-.4z"></path></svg>
-         Figure 3: Stacked bar plot of top-20 bacteria ratio in different samples
+         Figure 5: Stacked bar chart of top-20 bacteria ratio in different samples (July)
        </div>
@@ Line 320: / Line 406: @@
        <div class="text">
          <p>
-           &nbsp;&nbsp;&nbsp;&nbsp;&nbsp;To predict the final microbial distribution of soil as a result of adding biostimulators, we need to understand the interbacterial relationships that exist within the microbiome. Said relationships can be summarized by calculating a Spearman correlation coefficient using the following formula:
+           &nbsp;&nbsp;&nbsp;&nbsp;&nbsp;The strength of co-occurrence of bacteria within soil samples was evaluated by the Spearman’s rank correlation coefficients. It ranges from -1 to 1. The formula of Spearman correlation coefficient is as follows:
          </p>
        </div>
@@ Line 360: / Line 446: @@
        <div class="text">
          <p>
-           &nbsp;&nbsp;&nbsp;&nbsp;&nbsp;The Spearman correlation coefficient is a value ranging from -1 to 1… [ ]<br>
+           &nbsp;&nbsp;&nbsp;&nbsp;&nbsp;We used heat maps of correlation to visualize the correlation strength. Figure 3, 4, and 5 show the top 20 abundant bacteria within soil sample in different months. The reason we selected the top 20 abundant bacteria is that we found out the top 20 abundant bacteria accounted for over 95 percent of the amouts of known bacteria. It is true when we analyze frome phylum level to class level. However, analyzing the proportion of those bacteria at genus level, we found that although the proportion of the original top 20 abundant bacteria decrease to about 75% of known bacteria in some months, the proportion of each other bacterium was still lower than 1%. According to the analysis, it seemed that the effect of the bacteria ranked below 20 could be ignored. A computer program can then visualize the results in a heat map. A map of the 20 most abundant bacteria of our soil is shown below:
-          A computer program can then visualize the results in a heat map. A map of the 20 most abundant bacteria of our soil is shown below:
          </p>
        </div>
-       <img src="https://static.igem.org/mediawiki/2018/0/0b/T--NCTU_Formosa--Spearman%27s_Correlation.png" class="Spearman">
+       <img src="https://static.igem.org/mediawiki/2018/9/9a/T--NCTU_Formosa--June_heatmap.png" class="Spearman">
        <div class="explanation">
          <svg class="icon" aria-hidden="true" data-prefix="fas" data-icon="arrow-circle-up" class="svg-inline--fa fa-arrow-circle-up fa-w-16" role="img" xmlns="http://www.w3.org/2000/svg" viewBox="0 0 512 512"><path fill="currentColor" d="M8 256C8 119 119 8 256 8s248 111 248 248-111 248-248 248S8 393 8 256zm143.6 28.9l72.4-75.5V392c0 13.3 10.7 24 24 24h16c13.3 0 24-10.7 24-24V209.4l72.4 75.5c9.3 9.7 24.8 9.9 34.3.4l10.9-11c9.4-9.4 9.4-24.6 0-33.9L273 107.7c-9.4-9.4-24.6-9.4-33.9 0L106.3 240.4c-9.4 9.4-9.4 24.6 0 33.9l10.9 11c9.6 9.5 25.1 9.3 34.4-.4z"></path></svg>
-         Figure 4: Correlation table of top-20 bacteria
+         Figure 6: Correlation heat map of top-20 bacteria in June
        </div>
+      <div class="text">
+        <p>
+          &nbsp;&nbsp;&nbsp;&nbsp;&nbsp;For example, the figure above shows the heat map of correlation of June. We could select candidate bacteria by the heat map of June to do prediction of microbiota of July because Weka utilized the correlation formula of the bacteria in June to simulate the microbiota of July. To make it more easily to understand spearman, we could observe the heat map above: while the proportions of two kinds of bacteria increased simultaneously, the block in the table showing the correlation between the two bacteria is red. Conversely, the block would turn to blue if one bacterium increased while the other decreased. Since only when data’s correlation coefficient larger than 0.7, the data is meaningful in statistics. We then selected the combination of every two bacteria whose correlation coefficient is greater than 0.7 or less than -0.7 as correlative samples. Absolutely there are exceptions to the prediction of our system since the spearman coefficient could only show the ratio of the bacteria in the soil sample we collected of the month while show every single phenomenon in the nature and then lead to cause difference. However, the difference with low correlation coefficient could not increase error tour Weka training.
+        </p>
+      </div>
+      <div class="table" style="width: 40%; margin-left: 30%;">
+        <table style="width: 100%;">
+        <caption>
+          <p>
+            <svg class="icon" aria-hidden="true" data-prefix="fas" data-icon="arrow-circle-down" class="svg-inline--fa fa-arrow-circle-down fa-w-16" role="img" xmlns="http://www.w3.org/2000/svg" viewBox="0 0 512 512"><path fill="currentColor" d="M504 256c0 137-111 248-248 248S8 393 8 256 119 8 256 8s248 111 248 248zm-143.6-28.9L288 302.6V120c0-13.3-10.7-24-24-24h-16c-13.3 0-24 10.7-24 24v182.6l-72.4-75.5c-9.3-9.7-24.8-9.9-34.3-.4l-10.9 11c-9.4 9.4-9.4 24.6 0 33.9L239 404.3c9.4 9.4 24.6 9.4 33.9 0l132.7-132.7c9.4-9.4 9.4-24.6 0-33.9l-10.9-11c-9.5-9.5-25-9.3-34.3.4z"></path></svg>
+            Table 2: The ratio variation of top-20 bacteria within June and July.
+          </p>
+        </caption>
+        <thead>
+          <tr>
+            <th><p>Taxon (Genus)</p></th>
+            <th><p>June</p></th>
+            <th><p>July</p></th>
+          </tr>
+        </thead>
+        <tbody>
+          <tr>
+            <td><p><i>Kaistobacter</i></p></td>
+            <td><p>10.87%</p></td>
+            <td><p>11.57%</p></td>
+          </tr>
+          <tr>
+            <td><p><i>Nitrospira</i></p></td>
+            <td><p>8.84%</p></td>
+            <td><p>18.03%</p></td>
+          </tr>
+          <tr>
+            <td><p><i>Steroidobacter</i></p></td>
+            <td><p>7.05%</p></td>
+            <td><p>3.20%</p></td>
+          </tr><tr>
+            <td><p><i>Rhodoplanes</i></p></td>
+            <td><p>6.50%</p></td>
+            <td><p>10.36%</p></td>
+          </tr><tr>
+            <td><p><i>Candidatus Solibacter</i></p></td>
+            <td><p>5.63%</p></td>
+            <td><p>3.81%</p></td>
+          </tr><tr>
+            <td><p><i>Candidatus Koribacter</i></p></td>
+            <td><p>4.36%</p></td>
+            <td><p>4.53%</p></td>
+          </tr><tr>
+            <td><p><i>Pseudomonas</i></p></td>
+            <td><p>3.56%</p></td>
+            <td><p>5.14%</p></td>
+          </tr><tr>
+            <td><p><i>Perlucidibaca</i></p></td>
+            <td><p>3.03%</p></td>
+            <td><p>1.06%</p></td>
+          </tr><tr>
+            <td><p><i>Flavisolibacter</i></p></td>
+            <td><p>2.71%</p></td>
+            <td><p>3.01%</p></td>
+          </tr><tr>
+            <td><p><i>Hyphomicrobium</i></p></td>
+            <td><p>2.37%</p></td>
+            <td><p>1.00%</p></td>
+          </tr><tr>
+            <td><p><i>Bdellovibrio</i></p></td>
+            <td><p>2.03%</p></td>
+            <td><p>2.11%</p></td>
+          </tr><tr>
+            <td><p><i>Bacillus</i></p></td>
+            <td><p>1.96%</p></td>
+            <td><p>0.84%</p></td>
+          </tr><tr>
+            <td><p><i>Streptomyces</i></p></td>
+            <td><p>1.87%</p></td>
+            <td><p>0.41%</p></td>
+          </tr><tr>
+            <td><p><i>Rhodanobacter</i></p></td>
+            <td><p>1.84%</p></td>
+            <td><p>0.11%</p></td>
+          </tr><tr>
+            <td><p><i>Flavobacterium</i></p></td>
+            <td><p>1.55%</p></td>
+            <td><p>12.12%</p></td>
+          </tr><tr>
+            <td><p><i>Opitutus</i></p></td>
+            <td><p>1.47%</p></td>
+            <td><p>1.02%</p></td>
+          </tr><tr>
+            <td><p><i>Devosia</i></p></td>
+            <td><p>1.46%</p></td>
+            <td><p>0.72%</p></td>
+          </tr><tr>
+            <td><p><i>Mycobacterium</i></p></td>
+            <td><p>1.44%</p></td>
+            <td><p>0.18%</p></td>
+          </tr><tr>
+            <td><p><i>Phenylobacterium</i></p></td>
+            <td><p>1.37%</p></td>
+            <td><p>0.87%</p></td>
+          </tr><tr>
+            <td><p><i>Bradyrhizobium</i></p></td>
+            <td><p>1.06%</p></td>
+            <td><p>0.42%</p></td>
+          </tr>
+        </tbody>
+      </table>
        </div>
-       <div class="title_1"><p>α Diversity Analysis</p></div>
+       <div class="title_1"><p>Alpha-Diversity Analysis </p></div>
        <div class="text">
          <p>
-           &nbsp;&nbsp;&nbsp;&nbsp;&nbsp;Use of biostimulators to manipulate soil factors requires careful consideration of the microbiota. Certain stimulators may cause specific genera of bacteria to become overly dominant, damaging soil integrity. As a method of monitoring the balance of the microbial ecosystem, we investigate the evenness of the soil.
+           &nbsp;&nbsp;&nbsp;&nbsp;&nbsp;Use of bio-stimulators to manipulate soil factors requires careful consideration of the microbiota. Certain stimulators may cause specific genera of bacteria to become overly dominant, damaging soil integrity. As a method of monitoring the balance of the microbial ecosystem, we investigate the evenness of the soil.
          </p>
        </div>
-       <div class="title_2"><p>Eveness--Shannon index</p></div>
+       <div class="title_2"><p>Eveness--Shannon Index</p></div>
-       <div class="text"><p>&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;Evenness is defined as how close, in numbers, each genera of bacteria is in soil. Maintaining evenness ensures no type of bacteria grow to be too dominant, occupying niches of other potentially important bacteria.</p></div>
+       <div class="text"><p>&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;Microbial diversity is measured by alpha-diversity (α-diversity). In our study, α-diversity refers richness and the Shannon diversity index. Richness means the number of OTUs, and evenness of bacterial community is measured by the Shannon diversity index, as shown below:</p></div>
-      <div class="text"><p>&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;One way to measure evenness is by calculating the Shannon Index of a sample, as shown below:</p></div>
        <div class="equation">$$H'=-\sum_{i=1}^{S}p_{i}lnp_i$$</div>
        <div class="table">
@@ Line 386: / Line 577: @@
            <p class="explanation">
              <svg class="icon" aria-hidden="true" data-prefix="fas" data-icon="arrow-circle-down" class="svg-inline--fa fa-arrow-circle-down fa-w-16" role="img" xmlns="http://www.w3.org/2000/svg" viewBox="0 0 512 512"><path fill="currentColor" d="M504 256c0 137-111 248-248 248S8 393 8 256 119 8 256 8s248 111 248 248zm-143.6-28.9L288 302.6V120c0-13.3-10.7-24-24-24h-16c-13.3 0-24 10.7-24 24v182.6l-72.4-75.5c-9.3-9.7-24.8-9.9-34.3-.4l-10.9 11c-9.4 9.4-9.4 24.6 0 33.9L239 404.3c9.4 9.4 24.6 9.4 33.9 0l132.7-132.7c9.4-9.4 9.4-24.6 0-33.9l-10.9-11c-9.5-9.5-25-9.3-34.3.4z"></path></svg>
-             Table 2: Variable and Parameter in Shannon index equation.
+             Table 3: Variable and Parameter in Shannon index equation.
            </p>
          </caption>
@@ Line 415: / Line 606: @@
        </table>
        </div>
-       <div class="text"><p>&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;A Shannon Index of higher value indicates greater evenness. The estimated degree of evenness can be derived from the exponential of the value. For example, a sample with Shannon Index value 2.85 will have approximately: $$e^{2.85}=17$$ 17 genera of bacteria that are equal in numbers. The Shannon Index of a sample of soil can be used as an observational tool to make sure biostimulants don’t decrease the overall evenness, and thus health and stability, of the soil.</p></div>
+       <div class="text"><p>&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;A higher Shannon index indicates greater evenness. The estimated degree of evenness can be derived from the exponential of the value. For example, a soil sample with Shannon index 2.85 and $$e^{2.85}=17$$ It means that the sample approximately consists of 17bacteria that are equal in numbers. Thus, the Shannon index can be used as an observational tool to determine whether bio-stimulators decrease the overall evenness or not, and thus health and stability, of the soil.</p></div>
        <div class="title_3"><p>Triplicate Analysis</p></div>
        <img src="https://static.igem.org/mediawiki/2018/4/44/T--NCTU_Formosa--triplicate_shannon.png" class="eveness">
        <div class="explanation">
          <svg class="icon" aria-hidden="true" data-prefix="fas" data-icon="arrow-circle-up" class="svg-inline--fa fa-arrow-circle-up fa-w-16" role="img" xmlns="http://www.w3.org/2000/svg" viewBox="0 0 512 512"><path fill="currentColor" d="M8 256C8 119 119 8 256 8s248 111 248 248-111 248-248 248S8 393 8 256zm143.6 28.9l72.4-75.5V392c0 13.3 10.7 24 24 24h16c13.3 0 24-10.7 24-24V209.4l72.4 75.5c9.3 9.7 24.8 9.9 34.3.4l10.9-11c9.4-9.4 9.4-24.6 0-33.9L273 107.7c-9.4-9.4-24.6-9.4-33.9 0L106.3 240.4c-9.4 9.4-9.4 24.6 0 33.9l10.9 11c9.6 9.5 25.1 9.3 34.4-.4z"></path></svg>
-         Figure 5: The box plot of shannon index triplicate analysis
+         Figure 7: The box plot of shannon index triplicate analysis
        </div>
+      <div class="title_1"><p>References</p></div>
+      <div class="text">
+        <p>
+. Kumar, A. and L. C. Rai (2017). "Soil Organic Carbon and Availability of Soil Phosphorus Regulate Abundance of Culturable Phosphate Solubilizing Bacteria in Paddy Fields of the Indo-Gangetic Plain." Pedosphere.<br><br>
+. Wang, P., et al. (2015). "Long-term rice cultivation stabilizes soil organic carbon and promotes soil microbial activity in a salt marsh derived soil chronosequence." Scientific Reports 5: 15704.<br><br>
+        </p>
+      </div>
+    </div>
    </div>
 </body>
-</html>
+</html>
+{{NCTU_Formosa/Top_button}}
 {{NCTU_Formosa/Footer}}