1. Introduction
Synthetic biology is currently one of the most rapidly developing fields. Recent advancements, particularly in genetic engineering, allow us to tackle major societal challenges. Nowadays gene engineering is even applied to humans, with the use of gene therapy. Gene therapy is an experimental technique that uses genes to treat or prevent among other severe genetic disorders. However, major concerns are raised about the misuse of gene editing techniques particularly for human enhancement. Gene doping, the misuse of gene therapy to enhance athletes’ performances, is such an example. Thus to promote responsible use of synthetic biology and to help eliminate gene doping from sport, we developed a complete gene doping detection method: ADOPE, the Advanced Detection of Performance Enhancement.
ADOPE is based on targeted Next Generation Sequencing (NGS), reducing the amount of data generated with NGS and effectively identifying gene doping DNA. We accomplished this by creating an innovative fusion protein used in rapid library preparation required for sequencing. Our fusion protein consists of a cleavage deficient nuclease specific dxCas9 and a Tn5 transposase. The dxCas9 part loaded with a single guide RNA (sgRNA) will interact with the specific target DNA sequence via complementary matching between sgRNA and target DNA. Whereas, the Tn5 part will integrate two small DNA molecules (adapters) required for Nanopore sequencing. Thus, the fusion protein is capable of performing dxCas9 guided adapter ligation required for targeted sequencing library preparation, allowing us to identify gene doping in blood samples with our method ADOPE.
ADOPE
We designed ADOPE to detect gene doping in blood samples by targeting the most striking difference between the natural and gene doping DNA, namely the exon-exon junctions that only exist in doping DNA (Beiter et al, 2011). ADOPE consists of 4 main steps: sample preparation, prescreening, library preparation, and sequencing.
1. Sample preparation
Blood samples are commonly taken from athletes during regular doping tests. We used these blood samples to extract DNA from serum or the buffycoat (serum and white blood cells) required for testing (Ni et al. 2011). Additionally, we built an extensive gene doping kinetics model to predict the appropriate time window for gene doping testing and to establish the sensitivity requirements of ADOPE. This model requires an input concentration of gene doping vectors and an injection frequency and will output the amount of gene doping fragments in blood over time.
2. Prescreening
We incorporated a prescreening step in our method based on the advice of Dr. Oliver de Hon of the Dutch Doping Authorities. He emphasised the importance of a high throughput assay that could screen thousands of athletes simultaneously. Therefore, we developed a colorimetric assay based on the extent of gold nanoparticle aggregation (Baetsen-Young et al, 2018). When target doping DNA is absent, the nanoparticle completely aggregates, resulting in a purple color. When target doping DNA is present, it forms a secondary structure with a targeting DNA probe, which stabilizes the aggregation, resulting in a red color.
3.Targeted library preparation
Positive prescreened samples proceed to our novel rapid targeted next generation sequencing. Targeted library preparation relies on our innovative fusion protein consisting of a Tn5 transposase and a dxCas9. The fusion of these two proteins resulted in a target specific transposition. As a proof of concept, we showed that our fusion protein is target specific with an in vitro targeted integration assay, being verified by visualising the amplified integration products with gel electrophoresis. Once we established the functionality and optimal conditions, we implemented the fusion protein in the established rapid next generation sequencing library preparation protocol from Oxford Nanopore Technologies (ONT). We replaced the original transposase, responsible for random integration of the sequencing adaptors, by our novel fusion protein. In our case, a specificly designed sgRNA will guide the fusion protein to the exon-exon junction target site and prepare only gene doping DNA for sequencing. Additionally, we developed a sgRNA model which identified the optimal exon-exon target site by searching for the least possible sgRNA’s required to cover all possible variation of the target sites due to synonymous mutations of EPO coding sequence. We prepared a sgRNA array of the resulting 12 sgRNAs for a single library preparation, utilizing the multiplexing capability of dxCas9 (Cong et al, 2013). As a result, we could simultaneously test for the gene doping variants, improving the efficiency of our method.
Further, we implemented multiplexing with barcodes to improve method efficiency, reduce cost, and expand the throughput. We created a barcoding webtool to generate unique barcodes, which are integrated into the adapter sequences that are ligated to the target sequence. This allows us to sequence samples from multiple different athletes in the same run and trace the output sequence back to corresponding barcode (Ref).
4. Targeted sequencing
After library preparation, we sequence the samples using a MinION, portable real-time sequencing device, from ONT. Only targeted sequences with adapters are translocated through the pores of ONT’s via the nanopore motor protein (ref).The remaining untagged DNA will not be sequenced. By making a simple enzyme substitution, we transformed ONT’s established next generation sequencing platform into a targeted next generation sequencing platform.
We processed the data obtained from ONT MinION sequencing runs with our tailor-made data analysis software tool. The software consists of an algorithm that aligns all the files generated by our sequencing run sequences with our pre-existing database, containing expected gene doping sequences. Based on the alignment score, the sequences are classified into gene doping DNA or non-gene doping DNA, eliminating any false positives that might have been sequenced. The software strengthens the robustness and reliability of our method, allowing us to determine whether gene doping DNA was present in the athlete’s blood sample. To go a step further, our algorithm has the capacity of expanding the database as it detects new gene doping sequences, thereby simultaneously evolving with gene doping.
Impact
We believe that by establishing and integrating a novel detection method into valid testing systems, we will discourage athletes from using high-risk gene doping technologies. Not only has the Dutch Doping Authority, the Delft Sports Engineering Institute and Dutch Trotting and Flat Racing Association (NDR) shown interest in our technology, but so has various other non gene doping related stakeholders. Sanquin, the Dutch blood bank, and the Dutch Research Department for Food Safety (RIKILT) at Wageningen University also emphasized the potential and value of our targeted sequencing technology. These stakeholders’ interests highlight the possible of application our targeted sequencing method in other fields, such as cancer detection, non-invasive prenatal screening, food safety regulations, and even strain identification.