ADOPE

ADOPE

The aim of ADOPE is to develop a complete method for gene doping detection. The design of this method was an iterative process, with constant evolutions based on data obtained from wetlab, drylab and interaction with stakeholders. The final method consists of four main steps: sample preparation, prescreening, targeted library preparation and DNA sequencing and analysis.

Picture of Timeline of the relevance of gene doping

Figure 1. The flow diagram of our detection method ADOPE. With the four steps sample preparation, prescreening, library preparation and sequencing.

1. Sample Preparation

The target of our detection method is the specific gene doping DNA, which is extracted from blood samples. Based on the advice and expertise of Aïcha Ait Soussan and Dr. Ellen van der Schoot of the Dutch blood bank Sanquin, we developed a specific extraction and verification protocol to handle the relative low concentrations of gene doping DNA expected to be present in the blood (Lee et al, 2001, Ershova et al, 2017, Zinkova et al, 2017). These concentrations were predicted by modeling the process that the administered gene doping DNA goes through in the body. The model also determined the detection window of our method. By applying the modeling results in the wetlab we optimized the sample preparation for the detection of gene doping. The main steps of the sample preparation are shown in figure 2.

This resulted in an approach in which cell free DNA was extracted from blood derivatives using the QIAamp DNA Blood Mini Kit, a kit containing silica-membrane-based spin columns (QIAGEN). Extractions were verified by nested PCR for the abundant natural albumin gene as internal control after DNA quantification using Qubit. Lastly, samples were spiked with artificial gene-doping DNA, based on the values obtained from the gene doping model and successful extraction of the gene doping DNA was verified by PCR.

Figure 2. Sample preparation workflow. The final process involves a two step PCR to verify the presence of the internal standard albumin gene.

Cell free DNA (cfDNA) is fragmented genomic DNA present in the blood originated mainly from apoptotic cells and other DNA leakage processes described in our model (Devonshire et. al, 2014, Zinkova et al, 2017). This model also predicted that cfDNA in plasma will be a suitable source for detecting gene-doping DNA. The cfDNA contains fragmented genomic DNA and fragmented gene doping DNA. The gene doping DNA and endogenous DNA produce indistinguishable proteins, however they are distinctive on a nucleic acid level. The most striking differences between the two is the promoter and the absence of introns. Gene doping DNA commonly requires a constitutive promoter for higher levels of expression. Second, due to size limits of the gene doping vectors and the unknown regulation of transcription or translation the introns will be taken out, resulting in exon-exon junctions which are unique for the gene doping DNA. Extraction of cfDNA from plasma is challenging due to low concentrations and high fragmentation, but possible.

Since cfDNA in the blood is highly fragmented and quickly degraded in the blood it is important to as quickly as possible after blood withdrawal process the blood into plasma or serum (Wong et al, 2013). Processing blood into plasma or serum is done by centrifugation, separating the blood in three layers: the red blood cells, the buffycoat containing the white blood cells and the plasma, as shown in figure 2 (Thurik, 2016). The difference between plasma and serum is that serum is the liquid of the blood left after clotting, whereas plasma still contains the clotting factors and needs to be kept in EDTA lined tubes to prevent clotting. Due to the clotting, which damages the white blood cells, the serum has a higher concentration of cfDNA than plasma, as shown in table 1 (Lee et al, 2001). The protocol for processing blood into plasma can be found here. Plasma and serum containing cfDNA can be stored frozen and stable for longer periods of time, up to years, without significantly lowering the concentration of the cfDNA (Ginkel et al, 2017).

**Table 1.** cfDNA concentrations in human serum and plasma. Values obtained from Zinkova *et al*, 2017 .
Blood Derivative	Mean cfDNA concentration (ng/ml)	Range cfDNA concentration samples (ng/ml)
Serum	148.13 +/- 91.31	54.33 - 427.3
Plasma	7.48 +/- 5	3.16 - 26.05

Due to extra safety measurements and permits required for working with blood, we performed background cfDNA extraction on adult bovine serum as a proof of concept, instead of using whole blood and processing that into plasma. Extraction was done using the QIAamp DNA Blood Mini Kit, as recommended by Aïcha Ait Soussan of the Dutch blood bank Sanquin, who are experts in the extraction of cfDNA for prenatal screening and provided us with their optimized protocol for the low cfDNA extraction using the QIAamp DNA Blood Mini (QIAGEN). Since the concentration of cfDNA is so low in plasma, the extraction columns were loaded with more sample than recommended in standard protocol, building up the DNA on the column before eluting.

Multiple extractions were performed after which they were quantified and verified. Also spiking of samples with artificial gene doping DNA was performed before extraction using the values obtained by the Gene Doping Model.

Aim: Model the process of infection and degradation of gene doping DNA in blood and predict the time dependent concentration of cell-free gene-doping DNA.

Approach: We considered the entire process of gene doping to fully understand the underlying mechanisms of gene doping and its effect on the athlete in our Gene Doping Model. We started with the process of DNA delivery using viral vectors, including both intramuscular (IM) and intravenous (IV) injection methods, followed by the process of infection of kidney cells with the viral vectors, and finally the origination and degradation of cell free gene doping DNA in the blood. We used the human erythropoietin (EPO) gene as our model gene, and have thus included the EPO-dependent production of red blood cells in our model through a series of partial differential equations. Figure 3 shows the simplified process in the body for gene doping with the EPO gene. The result was a time-dependant concentration in the bloodstream of the gene doping DNA, which was used in the wetlab experiments. The detailed information on this model can be found on our Modeling page.

Aim: Verify successful extraction of low concentrated cfDNA from blood serum.

Approach: To verify successful extraction of the low amounts of cfDNA, we combined two approaches. First we quantified the cfDNA using the Qubit High Sensitivity (HS) dsDNA kit. Second, we verified the correct DNA extraction by amplification of the bovine albumin gene, a standard for verification of genomic mammalian DNA extraction and recommended by Sanquin, using nested PCR.

Aim: Show successful extraction of predicted concentrations of gene doping DNA.

Approach: We used an artificial intronless human EPO gene to simulate gene doping. Prior to the extraction we spiked the serum with this EPO gene doping DNA, using concentrations matching the predicted values of gene doping DNA from the gene doping model. Two different forms of artificial EPO doping DNA were spiked, the linear intronless EPO gene and the same fragment cloned in pSB1C3 (EPO biobrick, BBa_K2643004). The extraction of the EPO spiked serum was verified by two PCR’s.

2. Prescreening

During our interaction with the Dutch Doping Authorities, Dr. Oliver de Hon emphasised the importance of a high throughput assay to screen thousands of athletes simultaneously. He recommended a rapid, sensitive and cheap initial filter to reduce the total quantity of samples that will be processed by our final sequencing step. Therefore, we developed a prescreening method based on dextrin-capped gold nanoparticles (d-AuNPs), which changes color properties upon aggregation (Baetsen-Young, et al, 2018).

Figure 5. Prescreening method for gene doping. A ssDNAp is added to the extracted DNA and when complementary strand are creating the secondary structure, the d-AuNPs are stabilized.

Gold nanoparticles (AuNPs) have unique highly specific spectral absorption properties (Sepúlveda et al, 2009), large surface to volume ratios, and an ability to interact with DNA. Due to their characteristics, AuNPs have emerged as robust colorimetric assays. Baetsen-Young et al, (2018) have developed a direct colorimetric detection of DNA by d-AuNPs. They were able to visualize the presence of the unamplified pathogen Pseudoperonospora cubensis in a concentration ranging from 29 fM to 2.9 aM DNA using a single DNA probe.

Upon increasing NaCl concentration, d-AuNPs destabilize and form aggregates leading to a color change from red to purple. However, when a single stranded DNA probe (ssDNAp) hybridizes with a specific target double stranded DNA molecule (dsDNA), a secondary DNA structure is formed as shown in Figure 5. D-AuNPs can interact with this secondary structure, making them more stable and preventing them from aggregation. As a result of the d-AuNPs-dsDNA interaction, the solution with target DNA will have different refraction properties compared to non-targeted DNA and give a visual indication of target DNA presence (Baetsen-Young et al, 2018). The prescreening was tested with the human EPO gene and a corresponding ssDNAp targeting exon-exon junctions of the EPO gene. We were able to verify the possibilities of d-AuNPs to screen for EPO gene doping presence.

Aim: Develop a cheap, sensitive, and high throughput screening method to identify possible gene-doping users.

Approach: We generated batches of dextrin-capped gold nanoparticles (d-AuNPs) and evaluated their aggregation behavior in a range of 0 - 1000 mM NaCl under three different testing conditions:

No single strand DNA probe (ssDNAp) or double strand DNA (dsDNA).
Only ssDNAp and no dsDNA.
Target gene doping DNA with a specific ssDNAp, inducing annealing between both.

We compared the extent of d-AuNPs stabilization under these three conditions to determine the required concentration of NaCl for testing. After showing the functionality of the prescreening as a potential method for gene doping, we determined three assay properties:

Sensitivity: We assayed different ratios of target/non target DNA concentrations keeping the total amount of DNA constant.
Robustness: We evaluated the influence of sample background DNA after sample preparation.
Versatility: We compared the assay at the same conditions with different targets (varying ssDNAp).

3. Fusion Protein - dxCas9-Tn5

The final step of our gene doping detection method is based on targeted sequencing using Oxford Nanopore Technologies (ONT) sequencing platform. Our aim was to reduce the amount of data generated with Next generation sequencing and effectively identifying gene doping DNA. We accomplished this by creating an innovative fusion protein used in rapid library preparation required for sequencing. Our fusion protein consists of a cleavage deficient nuclease specific dxCas9 and a Tn5 transposase. The dxCas9 part, loaded with a single guide RNA (sgRNA), will interact with the specific target DNA sequence via complementary matching between sgRNA and target DNA. Whereas the Tn5 part will integrate two small DNA molecules (adapters) required for nanopore sequencing. Thus, the fusion protein is capable of performing dxCas9 guided adapter ligation for targeted sequencing library preparation, allowing us to identify gene doping in blood samples with our method ADOPE.

Before establishing a targeted library preparation protocol, we produced and analysed the functionality (in vitro and in vivo) of the fusion protein. We evaluated three different types of chromatography (nickel affinity, heparin, and monoQ chromatography) to develop a new downstream process for the fusion protein. Once the fusion protein was successfully purified, we evaluated the in vitro functionality. We tested the ability of the fusion protein to load the adapter sequence and bind to target DNA facilitated by sgRNA with electrophoretic mobility shift assays (EMSA). Finally, we tested targeted integration of the adapters by PCR of the obtained cleaved fragments and subsequent gel electrophoresis. Additionally, we developed an in vivo screening platform to evaluate the effectivity and efficiency of the fusion protein. The fusion protein was guided with a lacZ sgRNA carrying kanamycin resistance donor DNA, allowing us to perform a dual screen: blue/white colony screening and kanamycin resistance screening. A scheme of this work pipeline is depicted in figure 6.

Figure 6. Design, construction, purification and functionality tests of fusion protein dxCas9-Tn5.

The fusion protein consists of a catalytically inactive version of xCas9 (dxCas9, BBa_K2643001), fused to a hyperactive transposase (Tn5, BBa_K2643002) via a short linker (BBa_K2643003) of 18 amino acids and cloned into the pACYCDuet-1 vector.

dxCas9
We chose to work with the newly evolved Cas9 variant to guide the fusion protein to a specific DNA sequence. This variant is derived from the common Streptococcus pyogenes Cas9 (SpCas9). The benefit of the xCas9 variant is that it has the broadest known Protospacer Adjacent Motif (PAM) sequence including NG, GAA and GAT, and a greater DNA specificity compared to SpCas9, resulting in significantly lower off-target activity (Hu et al, 2018). The catalytically dead version of xCas9 (dxCas9) is thus an excellent choice for the component responsible for guiding the fusion to a specific DNA sequence. The variant was evolved in 2018 by phage assisted continuous evolution and has 6 point mutations (amino acids E480K, E543D, E1219V, A262T, S409I and M694I) (Hu et al, 2018).

Tn5
We selected to work with the hyperactive Tn5 transposase variant to integrate in a double strand DNA two small DNA molecules (adapters) compatible with Nanopore sequencing. The hyperactive Tn5 is a modified variant of the wild type Tn5 isolated from E. coli containing two mutations (E54K and L372P) responsible for increasing the activity of Tn5 (Picelli et al, 2018). This variant is an excellent choice for the component responsible for integrating the adapter sequences. This protein forms a dimer and is able to pick up specific DNA (transposon) flanked by mosaic ends (ME) sequence (CTGTCTCTTATACACATCT) and integrate it randomly into DNA. It can integrate a single DNA, extending the target DNA by the length of the transposon utilized for in vivo functionality, or it can integrate two individual DNA molecules, inducing a double stranded break to the target DNA.

Linker
The Tn5 transposase (BBa_K2643002) and the dxCas9 (BBa_K2643001) were fused using a Glycine Helical peptide (GHP) Linker. The flexible linker is composed of 18 amino acids (K L G G G A P A V G G G P K A A D K) (Cadinanos & Bradley, 2007). The GHP linker was selected as a high potential candidate due to its successful use in linking transposase with other proteins, for example to fuse piggyback transposase with ERT2, a ligand-binding domain (Cadinanos & Bradley, 2007), or to fuse sleeping beauty transpose with zinc finger DNA-binding domain (Voigt et al, 2012).

pACYCDuet-1 vector
The pACYCDuet-1 (Novagen) is designed for coexpression of two genes. The plasmid contains two multiple cloning sites (MCS), each of which is preceded by a T7 promoter/lac operator and ribosome binding site (RBS). The P15A replicon is present to sustain plasmid replication and the chloramphenicol resistance gene allows for applying selective pressure to cells for maintaining the plasmid. Additionally, after the T7 promoter a HIS tag is implemented for purification purposes.

Aim: Design and construct the dxCas9-Tn5 fusion protein (BBa_K2643000) into pACYCDuet-1 seen in figure 7.

Figure 7. Schematic overview of our fusion construct. The coding sequence was delivered in three parts: dxCas9 (blue), Tn5 (green) and Linker (red). The pACYCDuet-1 vector is also annotated.

Approach:

We cloned the fusion protein (dxCas9-linker-Tn5) coding sequence in front of the T7 promoter site of the pACYCDuet-1 vector in frame with the HIS purification tag. We used two separate restriction ligation cloning steps to assemble the construct. First, we PCR amplified the Tn5 coding sequence from Plasmid #60240 from Addgene containing pTXB1-Tn5 (Picelli et al. 2014 ), simultaneously integrating the linker with extended primers. The Linker-Tn5 PCR amplicon was cloned into the isolated pACYCDuet-1 plasmid with restriction ligation using KpnI and FseI sites. Next, we PCR amplified the dxCas9 coding sequence from Plasmid #108383 from Addgene containing dxCas9 (3.7) from (Miller et al. 2018). We cloned the dxCas9 amplicon into the isolated pACYCDuet-1-Lin-Tn5 plasmid with restriction ligation using NotI and FseI sites, resulting in a plasmid with dxCas9-Linker-Tn5 coding sequence with a T7 promoter and terminator and a His-tag. As the protein production might cause undesired cell burden, we used the low copy P15A replication origin to propagate the plasmid in the cells.

Aim: Express and purify the dxCas9-Tn5 fusion protein (BBa_K2643000), as well as the individual proteins that comprise the fusion: Tn5 transposase (BBa_K2643002) and dxCas9 (BBa_K2643001) for in vitro functionality evaluation.

Approach: We followed existing expression and purification protocols for both Tn5 (BBa_K2643002) ( Hennig et al. 2017) and dxCas9 (BBa_K2643001) ( Huai et al. 2017). However, we developed a new downstream process for our fusion protein (BBa_K2643000). We evaluated three different types of chromatography:

Nickel affinity chromatography: The fusion protein has a HIS-tag (used to purify both Tn5 and dxCas9 individually).
Heparin chromatography: The fusion protein is a DNA-binding protein (used to purify both Tn5 and dxCas9 individually).
MonoQ chromatography: It’s an anion exchange resin. To facilitate the binding of the fusion protein to the resin, the pH of the buffer was increased to pH8. At the higher pH the protein can interact with the resin, allowing us to utilize its charge as a purification property.

The final proposed process flow diagram for the production of dxCas9-Linker-Tn5 fusion (BBa_K2643002) includes two chromatography steps: a capture and polishing step figure 8.

Figure 8. Proposed process flow diagram of the production of dxCas9-Linker-Tn5 fusion BBa_K2643002.

Aim: Develop an in vivo platform for screening the efficiency of the dxCas9-Tn5 fusion protein (BBa_K2643000).

Approach & experimental design: To visualize in vivo whether a donor DNA sequence is recognized by the fusion construct (BBa_K2643000) and integrated close to a target sequence, we built a strain that harbors the dxCas9-Tn5 fusion protein (BBa_K2643000), as well as a sgRNA that can guide the complex to a target site of interest.

LacZ is a bacterial gene that encodes for beta-galactosidase, an enzyme able to hydrolyze lactose and to cleave an analogue compound 5-bromo-4-chloro-3-indolyl-β-D-galactopyranoside (X-gal), resulting in formation of a blue colored product, used to verify cloning attempts. Our in vivo screen methodology relies on lacZ disruption through targeted integration. To this extent, the assay makes use of blue/white screening, which happens on medium complemented with X-gal, Isopropyl β-D-1-thiogalactopyranoside (IPTG) and relevant antibiotics. IPTG induces expression of beta-galactosidase, the lacZ gene product.

For this, the lacZ sgRNA expression cassette (BBa_K2643010) was inserted into the episomal pACYC vector already containing the fusion construct (BBa_K2643000). IPTG-Induction of protein expression then enables immediate loading of the fusion construct with the sgRNA. An additional requirement for activity is a donor DNA sequence flanked by Mosaic Ends (MEs), which can be recognized by the Tn5 transposase domain for cut and paste integration near the sgRNA target. This disrupts the lacZ gene and renders catalytically inactive beta-galactosidase, which can be visualized by blue/white screening with X-gal figure 9.

Figure 9. Overview of the single-step genome editing mechanism in the in vivo assay. (A) The fusion construct is loaded with a sgRNA that targets lacZ and recognizes ME flanked kanamycin resistance cassette for (B) picking it up. (C) CRISPR-mediation drives the loaded complex to the genomic lacZ locus, where the transposase domain (D) integrates the donor DNA. (E) The host strain contains an integrated copy of kanamycin resistance, while the lacZ gene is disrupted.

In this setup, we required a strain that contains T7 promoter expression system, as well as an intact genomic lacZ copy. Therefore, we chose to work with E. coli BL21DE3 as a host strain for the in vivo assay. This host was transformed to harbor the plasmid containing both the fusion construct (BBa_K2643000) and the lacZ sgRNA cassette (BBa_K2643010). The resulting strain was used for electroporation of donor DNA.

In our laboratory experiments, the donor DNA sequence was a kanamycin resistance expression cassette, flanked with MEs. Integration of this linear fragment into the lacZ locus would thereby confer the ability to grow on medium containing kanamycin, while still making use of the blue/white screen (note the selection medium contains kanamycin, chloramphenicol, IPTG and X-gal). In presence of kanamycin, cells should only be able to grow if the supplied donor DNA sequence is integrated into the genome. For non-targeted integration, intact lacZ confers chromogenic catalytic functionality when growing on X-gal, resulting in a blue colony phenotype. For targeted integration, lacZ is disrupted, resulting in loss of chromogenic capability and thus in a white colony phenotype. This composite part purposely contains both Red Fluorescent Protein (RFP) and kanamycin, in order to distinguish surviving colonies based on whether genomic integration occurred, or whether the plasmid is simply propagated. The latter is a false positive, expected to confer a red color. In combination with blue product formation by intact lacZ, this is expected to result in a purple phenotype. Table 2 gives an overview of the expected possibilities.

**Table 2.** Overview of expected phenotypes in the *in vivo* assay.
Genomic Integration?	Template with RFP cotransformed?	Growth on LB+Cam+Kan	lacZ disrupted?	Colony phenotype^*
On target	No	Yes	Yes	White
On target	Yes	Yes	Yes	Red
Off target	No	Yes	No	Blue
Off target	Yes	Yes	No	Purple
None	No	No	No	N/A
None	Yes	Yes	No	Purple

We optimized diagnostic genomic PCR reactions for verification of putative integration of the kanamycin fragment in/near the lacZ locus (figure 10). When designing and optimizing these genomic PCR reactions, we favoured PCR reactions that always result in amplicons, regardless of integration. Such anticipating primer design helps evaluating the results and judging whether genomic PCR on a picked colony worked in the first place, and then evaluate whether targeted integration took place. When integration occurs, the amplicon size increases. The control used for these genomic PCRs was purified isolated gDNA of the BL21DE3 parent strain. Alternatively, we were planning on whole genome sequencing of individual colonies, in the case of encountering blue or white colonies, to verify where and how many times the donor DNA would have been integrated, and whether this activity was indeed concentrated near the original target sequence.

Figure 10. Overview of relative binding sites used in diagnostic genomic PCRs. Used primer combinations: IV007&IV019; IV020&IV008; IV007&IV022 and IV021&IV008.

Aim: Identify the in vitro conditions for the functionality of the dxCas9-Tn5 fusion protein.

Approach: We performed the functionality test in three main parts:

Loading of sgRNA and the desired target DNA to the individual dxCas9 was confirmed by trypsin resistance assay (Jiang et al, 2015) and electrophoretic mobility shift (binding) assay (EMSA) (Sternberg et al, 2015).
The ability of the Tn5 dimer to load the adapter DNA was demonstrated by EMSA (Whitfield et al, 2008).
The ability of the fusion protein to load its components was demonstrated by EMSA, which closely followed by the integration assay (Whitfield et al, 2008).

Experimental design: Trypsin resistance assay identified whether dxCas9 was able to load the designed sgRNA. Loaded dxCas9 undergoes a significant conformational shift causing the protein to become significantly more trypsin (protease) resistant compared to an apo-dxCas9 (non sgRNA loaded) protein (Jiang et al, 2015). We visualized this resistance to trypsin by comparing the protein degradation ratio between loaded- and apo- dxCas9 on an SDS PAGE.

Electrophoretic mobility shift (binding) assay (EMSA) further identified the complex formation between fusion protein and EPO (BBa K2643004). A mobility shift occurs when binding of DNA to protein retards the movement of the DNA through the polyacrylamide gel, creating two distinct bands (Sternberg et al, 2015). When the EPO coding sequence DNA formed a complex with loaded dxCas9, we observed a band shift in comparison to the unbound EPO on a 5% TBE native PAGE. With the same principle, we performed EMSA to verify the loading of DNA adapters on Tn5.

We performed EMSA to verify the loading of fusion protein with sgRNA, DNA adapters, and target EPO CDS. Next, we assessed the target specific integration using a sgRNA that was designed to guide the fusion protein to the position ~160bp of the EPO cDNA (BBa_K2643004). The Tn5 portion of the fusion protein integrates DNA adapters at this position. This integration split EPO into two fragments of ~250bp and ~480bp. We performed two negative controls of targeted integration: without sgRNA and with only Tn5 protein in parallel. We PCR amplified the integration products with primers that annealed to the adapter DNA and the beginning (or end) of the EPO cDNA and visualized the product on a 5% TBE native PAGE. Finally, we sequence verified the two integration products to determine the exact position of integration. Figure 11 depicts this experimental design.

Figure 11. Experimental design of fusion protein in vitro integration assay. The fusion protein loads DNA adapter. The sgRNA directs the whole fusion protein to the target site (~200th bp of the EPO cDNA) and Tn5 performs the integration. Integration of DNA splits EPO cDNA into two fragments of ~248bp and ~486bp. To visualize this fragment, PCR is performed with primer sets that will specifically amplify the integration fragment.

4. Targeted Sequencing with dxCas9-Tn5

We used Oxford Nanopore Technology as DNA sequencing platform for two reasons: no amplification of DNA required and the possibility of extremely long sequencing reads (examples show an average of > 100 kb) (Jain et al, 2018). To prepare samples for sequencing, DNA molecules have to be ligated to DNA adapters containing a motor protein that will guide and control the diffusion of the molecule through each nanopore. Commercially available kits for library preparation make use of a transposase that randomly fragments DNA while adding these adapters (Jain, Oslen, Paten and Akeson, 2016). We modified this step in library preparation to use our fusion protein (dxCas9-Tn5) instead, to direct this adapter integration specifically to a DNA target, as depicted in figure 12.

Figure 12. Schematic scheme of the targeted adapter integration for Nanopore Sequencing. The fusion protein Tn5-dxCas9 (A) finds and binds to a specific DNA target (with use of a sgRNA), (B) adds sequencing adapters at this specific site that (C) are ligated to a motor protein attached adapter to undergo (D) Nanopore sequencing.

We developed a model to generate suitable sgRNA sequences to catch all possible variations of gene doping, and we also designed and tested DNA adapters compatible with our fusion protein and ONT sequencing. We tested the directed integration of these sequencing adapters on EPO coding sequence DNA (BBa_K2643004). Furthermore, we sequenced samples with ONT (MinION device) that simulate a gene doping scenario and developed software for data analysis to determine the use of gene doping.

The rapid sequencing kit from ONT consists in two enzymatic steps to prepare DNA to be sequenced. Figure 13 represents the molecular principle behind library preparation of Rapid sequencing from ONT. The first step consists of an enzymatic fragmentation with transposase that integrates sequencing adapters (shown with a red box). These adapter sequences contain a motif that is later ligated to a second sequencing adapter that contains a motor protein (needed to assist DNA to difuse through the nanopore).

Figure 13. Rapid Sequencing from ONT: molecular principle of library preparation (gDNA refers to genomic DNA) (Oxford Nanopore Technologies, 2018).

Our approach uses the same fragmentation principle, but instead of a random transposition, we target this event to specific DNA sequences by our fusion protein. We designed adapters to be compatible with ONT fragmentation procedure figure 13, with the addition of the specific Mosaic End (ME) sequence that Tn5 needs to load and later integrate. The sequences of these adapters are depicted in figure 14.

Figure 14. Adapter DNA used for the integration by Tn5 for targeted sequencing with ONT Rapid sequencing.

The 3’ end of the forward strand has an adenine overhang that will anneal to the second sequencing adapters that carry motor protein. We ordered two types of adapters, with the difference of a phosphate group on the 5’ end of the reverse strand. The two specified adapters were ordered as forward and reverse primers and tested for integration to intronless EPO cDNA, directed by sgRNA 001.

Aim: Generate a computational model that will output sgRNA sequences required by dxCas9 to target gene doping sequences.

Approach: The design of sgRNAs for the detection of gene doping will guide the integration of adapter sequences to only possible gene doping sequences. The sgRNAs are designed to target sequences near the junctions between exons (not present in natural genes). For this, we developed a model that screens through an entire genetic coding sequence (potential gene used for enhancement) and identifies all possible sgRNA molecules at specific exon-exon junctions.

The possibility of modifying the DNA sequence used for gene doping with synonymous mutations is an immediate threat to our detection method. For this reason, this model screens through all the possible synonymous modifications identified close to PAM sequences near exon-exon junctions, and will only generate sgRNAs for the smallest variation site, thus covering all possibilities with the least number of sgRNAs.

The detailed information about this module can be found on our modeling page.

Aim: Develop a system to generate DNA barcodes within sequencing adapters for the identification of sample’s subject.

Approach: Minimizing sequencing time and costs is accomplished by implementing our targeted sequencing approach. During our interaction with stakeholders, Prof. Hagan Bayley from Oxford University (one of the founders of Oxford Nanopore Technologies) suggested multiplexing with additional barcoding to reduce time and costs. Multiplexing is the simultaneous sequencing of multiple samples from different athletes using one sequencing device. To assign the output sequences to the corresponding athlete, the output sequence should be labeled with an athlete specific mark. We designed a system to embed a personal DNA barcode for each subject. Implementing barcode length of 17 nucleotides would provide enough variation to cover every single person on Earth. We developed a tool that generates a 17 nucleotide unique sequence that indicates specific features from athletes. For more details on this module, please visit our improvement page.

Aim:

Sequence different DNA samples (plasmid and linear) prepared with ONT fragmentation kit on a MinION device.
Assess whether DNA that has not been processed with transposase is identified and sequenced with MinION sequencing.
Replace fragmentation step from Rapid Sequencing Kit with targeted integration of sequencing adapter by fusion protein dxCas9-Tn5.

Approach: Sample with pool of different DNA molecules We generated different DNA molecules to mimic gene doping. These molecules included intronless EPO (insert in BBa_K2643004), EPO with unnatural introns (BBa_K2643005 & BBa_K2643006), and EPO in a plasmid (BBa_K2643004). We used a pool of these molecules at different molar ratios to perform library preparation with rapid sequencing kit from ONT and sequenced them using a flow cell FLO-MIN 106. After sequencing, we base called, filtered and aligned the data generated to a reference sequence for analysis in distinguishing different sources of different groups of reads.

DNA without adapters During library preparation, once the DNA molecules were fragmented by transposition reaction, we added a new known sequence (pACYCduet-1). This molecule entered the ligation reaction with the second adapter (with the motor protein), but should not be ligated due to the absence of the first adapter. After the sequencing run, we aligned the sequence reads to pACYCduet-1 as a reference to determine how unadapted sequences are read with Nanopore.

Replace fragmentation step from Rapid Sequencing Kit with targeted fusion protein dxCas9-Tn5 using sequencing adapters. We took the functionality of the fusion protein a step forward to attempt targeted sequencing. This was done by replacing the rapid adapter step of library preparation with our fusion protein, and continuing with the pipeline of rapid sequencing. The sample used for this reaction contained both target and non target DNA, and after data alignment, we aimed to determine whether specific target is enriched during library preparation. In figure 15, we depict the difference between ONT rapid sequencing library preparation and our approach for targeted library preparation.

Figure 15. Pipeline of library preparation to sequence DNA. (Left) Rapid adapter sequence integration with random transposase from ONT. (Right) Targeted adapter integration with fusion protein dxCas9-Tn5.

Aim: Verify gene doping DNA sequences by classifying sequenced DNA as natural EPO genes, artificial EPO genes or other DNA.

Approach: We created a software tool to perform this task based on comparison of the input sequence to a custom database. Our database contains natural sequences with variations corresponding to misreads, and artificial sequences with codon mutations around exon-exon junctions. Natural genes do not have exon-exon junctions, which allows their distinction from artificial DNA.

Our tool calculates local and global alignment scores based on the Smith-Waterman and the Needleman-Wunsch algorithms respectively. Based on a combined alignment score, an input sequence is then classified using the k-nearest neighbour machine learning algorithm.

The database includes every new artificial EPO DNA that is analysed. In this way, evolved EPO gene doping sequences update the database resulting in ever-increasing accuracy.

Team:TUDelft/Design

ADOPE

1. Sample Preparation

2. Prescreening

3. Fusion Protein - dxCas9-Tn5

4. Targeted Sequencing with dxCas9-Tn5

References