Team:TUDelft/Model/Barcoding

Modeling Barcoding

1. Introduction

The Next Generation Sequencing MinION device, developed by Oxford Nanopore Technologies, can be relatively expensive and time consuming to sequencing the entire human genome of a single athlete. When every athlete needs to be sequenced after one another, the time for testing all 918 medal winners at the Olympics in the Rio de Janeiro of 2016 would have been 1836 hours (2 days per athlete), and the number of flowcells required 35802 (39 flowcells per athlete). Minimizing sequencing time and costs is already accomplished by implementing our targeted sequencing approach as it only requires one flowcell per athlete. However, this is not the limit that we aim for as speed can always be increased. Prof. Hagan Bayley from Oxford University, one of the founders of Oxford Nanopore Technologies Ltd., Oxford UK, said that multiplexing with additional barcoding would solve both problems: time and costs. Therefore, we implemented the possibility for barcoding in our detection method. In order to do so, we developed an iGEM barcoding tool by improving upon an existing iGEM DNA translation tool developed by iGEM Aberdeen Scotland, 2014.

2. Multiplexing by Barcoding

Multiplexing is the simultaneous sequencing of multiple samples from different athletes using one sequencing device. In order to be able to assign the output sequences to the corresponding athlete, the output sequence should be labeled with an athlete specific mark. However, the sequence should be marked in such a way that the label does not disturb or interrupt the sequencing. Therefore, we decided to label the DNA sample with an athlete specific sequence that contains an identification sequence: the DNA barcode (Bayliss et al, 2017).
The barcode is a DNA sequence that is placed between the mosaic elements for the transposase and the Oxford Nanopore adapter sequence. Every athlete will get his or her own specific barcode. This barcoded sequencing adapter is picked up by the Tn5-dxCas9 fusion protein and integrated into the target DNA of a specific athlete. When the DNA is sequenced by pore sequencing, the barcode is readable in the sequencing output and is used to assign the DNA directly to the corresponding athlete (figure 1).

1 / 6
2 / 6
3 / 6
4 / 6
5 / 6
6 / 6

Step 1: Barcoded adapter is loaded on the Tn5
Step 2: Fusion protein scans for target DNA
Step 3: Target DNA is found by dxCas9
Step 4: Barcoded adapter is integrated in target DNA
Step 5: Sequencing tag is ligated
Step 6: Barcoded target DNA is sequenced
Figure 1. Multiplexing of samples in Targeted Next Generation Sequencing by using barcodes.

DNA consists of adenine, thymine, cytosine and guanine nucleotides. Every athlete needs the personal DNA embedded barcode. The minimal barcode length when every single person on Earth would be tested for gene doping is 17 nucleotides as combining the 4 nucleotides in different manner generates 417 unique barcodes, which is larger than 2.5 times the world population. Implementing this 17 nucleotide anonymous barcode will be sufficient to mark samples of multiple athletes and multiplex them one sequencing run.

However, characterization by personal identification barcodes, for example name, date of birth, sports etc, is more linked to the athlete. However, the barcode sequence should be extended due to the fact that 4 nucleotides cannot cover the whole alphabet, numbers and other characters. Therefore, we assigned sets of three nucleotides to a specific letter, number or other character (such as a dot (.), comma (,), semicolon (;) etc.). Converting the characters of the barcode into DNA allows labelling of the targeted DNA to be sequenced. Converting the DNA strand back into the athlete details after sequencing allows multiplexing of many athlete's samples.

Translating an athlete specific barcode into DNA is performed by extending the iGEM DNA translation tool developed by iGEM Aberdeen Scotland, 2014 to include more characters. Additionally, the start and stopcodon are removed from the DNA translation tool since we do not require protein production. Instead, the mosaic element and sequencing adapter are implemented to flank the barcoding sequence.

3. Webtool


Name, date of birth and sport of the athlete:


The maximum size of the barcode is 50 characters. You can include alphabetic letters, number and the characters [space] [.] [!] [?] [-] [_] [@] [#] [$] [%] [^] [&] [*] [/] [:] [,] [~] and [=].

Your 5' to 3' barcode sequence is:

Your 3' to 5' barcode sequence is:

The barcode is now implemented in between the mosaic element and the adapter sequences:

Mosaic element Barcode Sequencing Adapter
P’CTGTCTCTTATACACATCT  CGTTATGCATTGACTTGCTTCA
 GACAGAGAATATGTGTAGA AGCAATACGTAACTGAACGAAGTACATTAAAAAAAAAAGGTTAAACACCCAAGCAGACGCC

4. Applications

Implementation the barcode system into our Targeted Next Generation Sequencing platform allows multiplexing of athlete's samples, reducing sequencing costs and sequencing time. The anonymous barcode that does not represent athletes personal identification but a noted sequence has the preference due to athletes privacy. Additionally, the barcode sequence only needs to be 17 nucleotides long and to cover already 2.7 times the world population.

Besides the area of gene doping detection, multiplexing in combination with targeted sequencing can be applied to virus detection, strain identification, food quality measurements and any other application that requires target specific sequencing.

References

  1. Bayliss, S. C., Hunt, V. L., Yokoyama, M., Thorpe, H. A., & Feil, E. J. (2017). The use of Oxford Nanopore native barcoding for complete genome assembly. GigaScience, 6(3), 1–6. http://doi.org/10.1093/gigascience/gix001