Team:TUDelft/Model/Barcoding

Modeling Barcoding

1. Introduction

Sequencing the entire genome of an athlete is relatively expensive and time-consuming. If we would sequence the genome of each 918 medal winners at the Olympics Games in the Rio de Janeiro of 2016, it would have taken at total of 1836 hours (2 days per athlete) and 35802 flowcells (39 flowcells per athlete) (Jain, et al. 2018). Minimizing sequencing time and costs is already accomplished by implementing our targeted sequencing approach as it only requires one flowcell per athlete. However, this is not the limit that we aim for as speed can always be increased. Prof. Hagan Bayley from Oxford University, one of the founders of Oxford Nanopore Technologies Ltd., Oxford UK, suggested that multiplexing with additional barcoding would solve both problems: time and costs. Therefore, we implemented the possibility for barcoding in our detection method. In order to do so, we developed an iGEM barcoding tool by improving upon an existing iGEM DNA translation tool developed by iGEM Aberdeen Scotland, 2014.

2. Multiplexing by Barcoding

Multiplexing is the simultaneous sequencing of multiple samples from different athletes using one sequencing device. In order to be able to assign the output sequences to the corresponding athlete, the output sequence should be labeled with an athlete specific mark. However, the sequence should be marked in such a way that the label does not disturb or interrupt the sequencing. Therefore, we decided to label the DNA sample with an athlete specific sequence: the DNA barcode (Bayliss et al, 2017).
The barcode is a DNA sequence that is placed between the mosaic elements for the transposase and the Oxford Nanopore adapter sequence. This barcoded sequencing adapter is picked up by the Tn5-dxCas9 fusion protein and integrated into the target DNA of a specific athlete. When the DNA is sequenced by pore sequencing, the barcode is readable in the sequencing output and is used to assign the DNA directly to the corresponding athlete (figure 1).

1 / 6
2 / 6
3 / 6
4 / 6
5 / 6
6 / 6

Step 1: Barcoded adapter is loaded on the Tn5
Step 2: Fusion protein scans for target DNA
Step 3: Target DNA is found by dxCas9
Step 4: Barcoded adapter is integrated in target DNA
Step 5: Sequencing tag is ligated
Step 6: Barcoded target DNA is sequenced
Figure 1. Multiplexing of samples in Targeted Next Generation Sequencing by using barcodes.

Every athlete needs the personal DNA embedded barcode. The minimal barcode length we need to generate a unique barcode for every person on Earth is 17 nucleotides as 417 is more than twice the world population. Implementing this 17 nucleotide barcode will be sufficient to mark samples of multiple athletes and multiplex in one sequencing run.

However, characterization by personal identification barcodes, for example name, date of birth, sports etc, rather than a unique identifier requires extensive barcoding. The barcode sequence should be extended due to the fact that 4 nucleotides cannot cover the whole alphabet, numbers and other characters. Therefore, we assigned sets of three nucleotides to a specific letter, number or other character (such as a dot (.), comma (,), semicolon (;) etc.). Converting the characters of the barcode into DNA allows labelling of the targeted DNA to be sequenced. Converting the DNA strand back into the athlete details after sequencing allows multiplexing of many athlete's samples.

Translating an athlete specific barcode into DNA is performed by extending the iGEM DNA translation tool developed by iGEM Aberdeen Scotland, 2014 to include more characters. Additionally, the start and stop codons are removed from the DNA translation tool since we do not require protein production. Instead, the mosaic element and sequencing adapter are implemented to flank the barcoding sequence.

3. Webtool


Name, date of birth and sport of the athlete:


The maximum size of the barcode is 50 characters. You can include alphabetic letters, number and the characters [space] [.] [!] [?] [-] [_] [@] [#] [$] [%] [^] [&] [*] [/] [:] [,] [~] and [=].

Your 5' to 3' barcode sequence is:

Your 3' to 5' barcode sequence is:

The barcode is now implemented in between the mosaic element and the adapter sequences:

Mosaic element Barcode Sequencing Adapter
P’CTGTCTCTTATACACATCT TCGTTATGCATTGACTTGCTTCA
  GACAGAGAATATGTGTAGA AGCAATACGTAACTGAACGAAGTACATTAAAAAAAAAAGGTTAAACACCCAAGCAGACGCC

4. Applications

Implementating the barcode system into our Targeted Next Generation Sequencing platform allows multiplexing of samples, reducing sequencing costs and sequencing time. The WADA can choose between a 17 nucleotide anonymous barcode or longer barcode with human-readable information. The 17 nucleotide barcode is anonymous and can have the preference due to athletes privacy. Additionally, the barcode sequence only needs to be 17 nucleotides long and to cover already 2.7 times the world population. However, the athlete describing barcode can be desired when the sample are required to be labelled this way.

Besides the area of gene doping detection, multiplexing in combination with targeted sequencing can be applied to virus detection, strain identification, food quality measurements and any other application that requires target specific sequencing.

References

  1. Bayliss, S. C., Hunt, V. L., Yokoyama, M., Thorpe, H. A., & Feil, E. J. (2017). The use of Oxford Nanopore native barcoding for complete genome assembly. GigaScience, 6(3), 1–6. http://doi.org/10.1093/gigascience/gix001
  2. Jain, M. et al. (2018). Nanopore sequencing and assembly of a human genome with ultra-long reads. Nature Biotechnology, 36, 338–345.