1. Introduction

Synthetic biology is currently one of the most rapidly developing fields. Recent advancements, particularly in genetic engineering, allow us to tackle major societal challenges. Nowadays gene engineering is even applied to humans, with the use of gene therapy. Gene therapy is an experimental technique that uses genes to treat or prevent among other severe genetic disorders. However, major concerns are raised about the misuse of gene editing techniques particularly for human enhancement. Gene doping, the misuse of gene therapy to enhance athletes’ performances, is such an example. Thus to promote responsible use of synthetic biology and to help eliminate gene doping from sport, we developed a complete gene doping detection method: ADOPE, the Advanced Detection of Performance Enhancement.

Picture of Timeline of the relevance of gene doping

ADOPE is based on targeted Next Generation Sequencing (NGS), reducing the amount of data generated with NGS and effectively identifying gene doping DNA. We accomplished this by creating an innovative fusion protein used in rapid library preparation required for sequencing. Our fusion protein consists of a cleavage deficient nuclease specific dxCas9 and a Tn5 transposase. The dxCas9 part loaded with a single guide RNA (sgRNA) will interact with the specific target DNA sequence via complementary matching between sgRNA and target DNA. Whereas, the Tn5 part will integrate two small DNA molecules (adapters) required for Nanopore sequencing. Thus, the fusion protein is capable of performing dxCas9 guided adapter ligation required for targeted sequencing library preparation, allowing us to identify gene doping in blood samples with our method ADOPE.

ADOPE

We designed ADOPE to detect gene doping in blood samples by targeting the most striking difference between the natural and gene doping DNA, namely the exon-exon junctions that only exist in doping DNA (Beiter et al, 2011). ADOPE consists of 4 main steps: sample preparation, prescreening, library preparation, and sequencing.

Figure 3. A flow diagram of our detection method ADOPE. With the four steps sample preparation, prescreening, library preparation and sequencing.

1. Sample preparation

Blood samples are commonly taken from athletes during regular doping tests. We used these blood samples to extract DNA from serum or the buffycoat (serum and white blood cells) required for testing (Ni et al. 2011). Additionally, we built an extensive gene doping kinetics model to predict the appropriate time window for gene doping testing and to establish the sensitivity requirements of ADOPE. This model requires an input concentration of gene doping vectors and an injection frequency and will output the amount of gene doping fragments in blood over time.

Figure 4. Concentration of doping DNA in the blood over time after a single intramuscular injection of 141 billion viral vectors. The detection limit of 1000 copies per mL of blood is estimated based on the loss of DNA that occurs during sample preparation and targeted sequencing preparation.

2. Prescreening

We incorporated a prescreening step in our method based on the advice of Dr. Oliver de Hon of the Dutch Doping Authorities. He emphasised the importance of a high throughput assay that could screen thousands of athletes simultaneously. Therefore, we developed a colorimetric assay based on the extent of gold nanoparticle aggregation (Baetsen-Young et al, 2018). When target doping DNA is absent, the nanoparticle completely aggregates, resulting in a purple color. When target doping DNA is present, it forms a secondary structure with a targeting DNA probe, which stabilizes the aggregation, resulting in a red color.

Figure 5. Colorimetric gold nanoparticle assay. Positive samples result in a red color, whereas negative samples result in a purple color.

3.Targeted library preparation

Positive prescreened samples proceed to our novel rapid targeted next generation sequencing. Targeted library preparation relies on our innovative fusion protein consisting of a Tn5 transposase and a dxCas9. The fusion of these two proteins resulted in a target specific transposition. As a proof of concept, we showed that our fusion protein is target specific with an in vitro targeted integration assay, being verified by visualising the amplified integration products with gel electrophoresis. Once we established the functionality and optimal conditions, we implemented the fusion protein in the established rapid next generation sequencing library preparation protocol from Oxford Nanopore Technologies (ONT). We replaced the original transposase, responsible for random integration of the sequencing adaptors, by our novel fusion protein. In our case, a specificly designed sgRNA will guide the fusion protein to the exon-exon junction target site and prepare only gene doping DNA for sequencing. Additionally, we developed a sgRNA model which identified the optimal exon-exon target site by searching for the least possible sgRNA’s required to cover all possible variation of the target sites due to synonymous mutations of EPO coding sequence. We prepared a sgRNA array of the resulting 12 sgRNAs for a single library preparation, utilizing the multiplexing capability of dxCas9 (Cong et al, 2013). As a result, we could simultaneously test for the gene doping variants, improving the efficiency of our method.

Figure 6. Targeted integration by our fusion protein. The sgRNA loaded dxCas9 binds to the target DNA via complementary matching. The transposase integrates sequencing adapters next to the target site.

Further, we implemented multiplexing with barcodes to improve method efficiency, reduce cost, and expand the throughput. We created a barcoding webtool to generate unique barcodes, which are integrated into the adapter sequences that are ligated to the target sequence. This allows us to sequence samples from multiple different athletes in the same run and trace the output sequence back to corresponding barcode (Ref).

4. Targeted sequencing

After library preparation, we sequence the samples using a MinION, portable real-time sequencing device, from ONT. Only targeted sequences with adapters are translocated through the pores of ONT’s via the nanopore motor protein (ref).The remaining untagged DNA will not be sequenced. By making a simple enzyme substitution, we transformed ONT’s established next generation sequencing platform into a targeted next generation sequencing platform.

1 / 4

2 / 4

3 / 4

4 / 4

❮ ❯

Step 1: The fusion protein scans the DNA and bind to a specific DNA sequence via complementary matching between sgRNA and target DNA.

Step 2: The fusion protein will integrate two small DNA molecules (adapters) required for Nanopore sequencing

Step 3: A motor protein binds to the integrated adapter sequences

Step 4: The motor protein guides the adapter tagged DNA through the nanopore and only targeted DNA is sequenced

Figure 7. Process flow diagram of our Targeted Next Generation Sequencing platform.

We processed the data obtained from ONT MinION sequencing runs with our tailor-made data analysis software tool. The software consists of an algorithm that aligns all the files generated by our sequencing run sequences with our pre-existing database, containing expected gene doping sequences. Based on the alignment score, the sequences are classified into gene doping DNA or non-gene doping DNA, eliminating any false positives that might have been sequenced. The software strengthens the robustness and reliability of our method, allowing us to determine whether gene doping DNA was present in the athlete’s blood sample. To go a step further, our algorithm has the capacity of expanding the database as it detects new gene doping sequences, thereby simultaneously evolving with gene doping.

Impact

We believe that by establishing and integrating a novel detection method into valid testing systems, we will discourage athletes from using high-risk gene doping technologies. Not only has the Dutch Doping Authority, the Delft Sports Engineering Institute and Dutch Trotting and Flat Racing Association (NDR) shown interest in our technology, but so has various other non gene doping related stakeholders. Sanquin, the Dutch blood bank, and the Dutch Research Department for Food Safety (RIKILT) at Wageningen University also emphasized the potential and value of our targeted sequencing technology. These stakeholders’ interests highlight the possible of application our targeted sequencing method in other fields, such as cancer detection, non-invasive prenatal screening, food safety regulations, and even strain identification.

Figure 8. Future applications of our targeted next generation sequencing.