Revision as of 17:55, 15 October 2018

Bioinformatics

After a succesfull sequencing has been performed and you’re left with raw data containing millions and millions (and millions) of lines of base sequences, all of this needs to be processed and interpreted. This is where the interdisciplinary field of bioinformatics comes in. A vast range of software tools are available, tailored to different kinds of analysis as well as being unique to the different sequencing methods being used.

Most of the tools we used were available through the free website Usegalaxy.org which as well let us do the processing on their servers. Because we also made use of nanopore sequencing, tailored tools used for the MinION data were available from their community hub which could be run from a terminal window.

Experiment

We decided to create our bioinformatics pipeline from scratch. Generally, a basic transcriptomics pipeline looks like the following: Alignment to a reference genome, gene counting and differential gene expression. However a couple of data processing steps were needed for the nanopore data beforehand such as demultiplexing and adapter trimming.

Demultiplexing and adapter trimming

Because the sequencing itself runs pooled samples containing both the barcoded cultured- and control-group samples, the data produced needs to be demultiplexed i.e separated into files containing the reads from respective groups. Because the barcodes used to fingerprint each group is made up of its own base sequence, this also had to be removed or ”trimmed” from the data, leaving us with the pure mRNA sequences. This was achieved using a free nanopore community tool called porechop.

Figure 1: Running demultiplexing and barcode trimming from the terminal. The programme first separates the reads according to barcode and then searches for available possible barcodes to be trimmed off.

Genome alignment

The base sequences needs to be aligned to the reference genome of the sequenced species in question for the downstream data analysis. This is important because we want to know where each sequence actually lies in the genome and which genes they correspond to. Genome alignment was done using another community tool called minimap2.

Gene counting

Gene counting basically means that you count how many times each mRNA sequence (aligned over a gene from the previous step) occurs. This in turn directly correlates to the amount of up- or down-regulation of that particular gene. A lot of different tools were available for gene counting but ”featureCounts” was chosen through galaxy.

Figure 2: Results of a differential gene expression analysis using Deseq2 on test files. The genes (shown with their gene ID) as well as their mean base length and several statistical results can be seen.

After the differential gene expression analysis is done the data was filtered twice, one time for the best adjusted P-value and subsequently for the highest (meaning the most significant) fold changes. Left were a couple of candidate genes which could be easily identified by their gene ID through various databases such as NCBI.

Result

The transcriptomics pipeline was tried out and validated using read files available from the internet. The files consisted of two datasets of E. Coli (triplicates) cultured in regular LB and a sugar solution respectively.

Figure 3: Results of the differential gene expression analysis using Deseq2 on test files. The genes (shown with their gene ID) as well as their mean base length and several statistical results can be seen.

Figure 4: Results of the differential gene expression after filtering for statistical significance and fold change.

The results after searching for the genes in the NCBI database showed that the most expressed gene from the sugar-cultured E. Coli was shown to be involved in a type of sugar system, proving that the pipeline was indeed working.

Figure 5: Highly expressed gene produced from the pipeline matching a glucose specific gene.

@@ Line 449: / Line 449: @@
                          <div class="side-img" style="background-color:darkolivegreen;">
                             <!-- Here goes the big image to the right -->
-                            <img src="https://2018.igem.org/File:T--Uppsala--Transcriptomics-Bioinformatics4.png">
+                            <img src="https://static.igem.org/mediawiki/2018/4/4c/T--Uppsala--Transcriptomics-Bioinformatics4.png">
                          </div>