Difference between revisions of "Team:NCTU Formosa/Dry Lab/NGS Data Analysis"

Line 375: Line 375:
 
       <div class="text">
 
       <div class="text">
 
         <p>
 
         <p>
           &nbsp;&nbsp;&nbsp;&nbsp;&nbsp;Then we use heat maps to visualize the correlation strengths. Figure 4 shows the 20 most abundant bacteria within soil samples.<br>nbsp;&nbsp;&nbsp;&nbsp;&nbsp;A computer program can then visualize the results in a heat map. A map of the 20 most abundant bacteria of our soil is shown below:
+
           &nbsp;&nbsp;&nbsp;&nbsp;&nbsp;Then we use heat maps to visualize the correlation strengths. Figure 4 shows the 20 most abundant bacteria within soil samples.<br>&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;A computer program can then visualize the results in a heat map. A map of the 20 most abundant bacteria of our soil is shown below:
 
         </p>
 
         </p>
 
       </div>
 
       </div>

Revision as of 18:47, 17 October 2018

Navigation Bar Correlation Model

     Use of bio-stimulators impacts the entire microbiota. The final resulting bacterial distribution depends on interactions between bacteria and can be illustrated through correlation. From our initial NGS data we can determine the nature of the relationships between bacteria and use these properties to accurately predict how soil microbiota changes with addition of bio-stimulators using machine learning.

Marker Gene Amplicon Analysis

     Microbiome data are generated from 16S ribosomal RNA(rRNA) gene. The PCR primers were designed to amplify the V4 region of the bacterial 16S ribosomal DNA. After profiling 16S rRNA sequencing, we used QIIME to generate operational taxonomic units (OTUs) table. Then we used bioinformatics tools and statistics methods to analyze microbial diversity in soil samples. We also used machine learning to predict how soil microbiota changes with addition of bio-stimulators.

Operational Taxonomic Units Table (OTUs Table)

     Figure 1 is an example of OTUs table. Each column represents the type and amount of bacteria (OTU1, OTU2, …, OTU7) in each soil sample (A1, A2, A3, B1, B2, and B3). We generate seven tables for each level: Phylum, Class, Order, Family, Genus, and Species.

Figure 1: Schematic Diagram of Operation Taxnomy Unit table

Data Analysis Process

Figure 2: The process of correlation analysis

     The OTUs tables will consist of unclassified names using the open source pipeline of QIIME. Thus, we have to rearrange the data to facilitate analysis according to the following steps:
     Step 1. Delete unclassified genomic segments.
     Step 2. Calculate the ratios of the remaining entries.
     Step 3. Select the most abundant bacteria within soil samples to observe
     their distribution using the following bar charts (Fig. 3).

Figure 3: Stacked bar plot of top-20 bacteria ratio in different samples

Spearman's Rank Correlation

     The strength of co-occurrence of bacteria within soil samples was evaluated by the Spearman’s rank correlation coefficients. It ranges from -1 to 1. The formula of Spearman correlation coefficient is as follows:

$$\rho_s=1-\frac{6\sum d_{i^2}}{n(n^2-1)}$$

Table 1: Variable and Parameter in Spearman's correlation equation.

Symbol

Unit

Explanation

$\rho_s$ -

Spearman's correlation value

$d_i$ -

The difference in the ranked observations from each group

$n$ -

The sample size

     Then we use heat maps to visualize the correlation strengths. Figure 4 shows the 20 most abundant bacteria within soil samples.
     A computer program can then visualize the results in a heat map. A map of the 20 most abundant bacteria of our soil is shown below:

Figure 4: Correlation table of top-20 bacteria

Alpha-Diversity Analysis

     Use of bio-stimulators to manipulate soil factors requires careful consideration of the microbiota. Certain stimulators may cause specific genera of bacteria to become overly dominant, damaging soil integrity. As a method of monitoring the balance of the microbial ecosystem, we investigate the evenness of the soil.

Eveness--Shannon index

     Microbial diversity is measured by alpha-diversity (α-diversity). In our study, α-diversity refers richness and the Shannon diversity index. Richness means the number of OTUs, and evenness of bacterial community is measured by the Shannon diversity index, as shown below:

$$H'=-\sum_{i=1}^{S}p_{i}lnp_i$$

Table 2: Variable and Parameter in Shannon index equation.

Symbol

Unit

Explanation

$H'$ -

Shannon index

$S$ -

The total number of genuses in samples

$p_i$ -

The ratio of bacteria amount of the ith genus in the sample

     A higher Shannon index indicates greater evenness. The estimated degree of evenness can be derived from the exponential of the value. For example, a soil sample with Shannon index 2.85 and $$e^{2.85}=17$$ It means that the sample approximately consists of 17bacteria that are equal in numbers. Thus, the Shannon index can be used as an observational tool to determine whether bio-stimulators decrease the overall evenness or not, and thus health and stability, of the soil.

Triplicate Analysis

Figure 5: The box plot of shannon index triplicate analysis
Template