ADOPE

ADOPE

The aim of ADOPE is to develop a complete method for gene doping detection. The final method consists of four main steps: sample preparation, prescreening, targeted library preparation and DNA sequencing and analysis.

Picture of Timeline of the relevance of gene doping

Figure 1. A flow diagram of our detection method ADOPE. With the four steps sample preparation, prescreening, library preparation and sequencing.

1. Sample Preparation

In the sample preparation module we developed a workflow to extract the target DNA from blood. As a proof of concept, we verified the extraction of the cell free DNA (cfDNA) from bovine serum. Finally, we successfully extracted gene doping DNA that we spiked to the serum, guided by the concentration value generated by the gene doping model.

Design: We modeled the process of infection and degradation of gene doping DNA in blood to determine its concentration overtime. We use this model to provide approximate sensitivity requirement for DNA extraction.

Results and interpretation: Based on our model, we determined microdosing was the best doping method for doping athletes, as it avoids high EPO fluctuation, and hence avoids detection through the biological passport. Figure 2 displays the predicted amount of doping DNA fragments in the blood over time as a result of intramuscular (IM) injection every 20 days. Based on this model, we predicted that there will be 40,000 to 50,000 fragments of DNA per mL blood. We based the sensitivity requirement of DNA extraction on this value.

Design: We extracted cfDNA from bovine serum and confirmed successful extraction with Qubit dsDNA HS assay and nested PCR.

Results and interpretation: We performed cfDNA isolation from the serum using QIAamp DNA Blood Mini Kit. To verify extraction, we quantified cfDNA on the serum using Qubit. We obtained 0.394+/- 0.0149 ng/uL of cfDNA after extraction. To verify the extraction further, we performed nested PCR on albumin gene. Albumin gene is often used for verification of the presence of genomic mammalian DNA and was recommended by Sanquin. The nested albumin PCR consists of a first round of PCR using primers 091 and 092, amplifying an internal fragment of the bovine albumin gene of 229 bp. We deliberately chose to amplify a short fragment since cfDNA in blood is fragmented. The product of the first round of PCR was used as the template for the second round. We used internal primers 066 and 067 to amplify a 150 bp internal fragment of the first PCR product.The product of the first and second round of PCR are shown on figure 3. The bands on figure 3 lane 2-7 confirmed the success of the extraction.

Design: We spiked samples with artificial gene doping DNA and confirmed its extraction with PCR.

Results and interpretation: Guided by our gene doping model, we spiked samples with 2x10^8 fragments/ml of either linear or plasmid EPO cDNA. We performed extraction, followed by PCR to verify it. Two EPO internal primer combinations were used. For the primer set 072 & 073 the expected band size was 297 bp and for the primer set 074 & 075 the expected band size was 140 bp. figure 4 confirmed the successful extraction of gene doping EPO DNA. The difference in intensity between the linear DNA spiked (sample 3 & 4) and the plasmid spiked (sample 5 & 6) indicates that the QIAamp DNA Blood Mini Kit has a lower extraction efficiency for plasmids, since the same amount of fragments were spiked.

Conclusion

We showed successful extraction of artificial gene doping DNA at concentration predicted by our model using the QIAamp DNA Blood Mini Kit. Further extraction optimization is advisable to maximize extraction efficiency.

2. Prescreening

In this module, we developed a high throughput screening method to detect gene doping. The method is based on the concept of color changes induced by d-AuNPs aggregation. In this module, we first analysed the stability of d-AuNPs upon introduction of NaCl with and without single stranded DNA probe. Finally, we were able to visually differentiate between samples where target DNA is presence or absence.

Design: We generated batches of dextrin-capped gold nanoparticles and tested their aggregation upon NaCl introduction to determine the optimal NaCl concentration.

Results and interpretation: Upon increasing NaCl concentration, d-AuNPs destabilize and aggregate, changing the color of the solution from red to purple. We scanned the visual light spectra from 450nm to 650nm for a range of 0mM to 333mM NaCl. The important absorbance wavelengths are 520nm (red color) and 620 nm (purple color). We calculated the ratio of 620nm and 520nm to quantify the expect of color changes (red to purple) (figure 5). We observed an increase in the 620/520 with an increase in NaCl concentration, indicating a color change from red to purple.

Figure 5. Influence of NaCl on d-AuNPs aggregation. At a concentration of 133 mM NaCl, the d-AuNPs start to aggregate; the peak in the spectra shift from 520nm to 620nm and the ratio 620/520 wavelengths start to increase indicating color change from red to purple. A) Spectrum of d-AuNPs solutions with NaCl concentrations 0, 167 and 333mM. B) Wavelength ratios of 620/520 of d-AuNPs solutions with NaCl concentrations ranging from 0 to 333mM. C) Visual outcome of the d-AuNPs solution with NaCl concentrations ranging from 0 to 333mM.

Design: We tested the effect of single stranded DNA probe (ssDNAp) on d-AuNPs aggregation.

Results and interpretation: ssDNAp are known to stabilize d-AuNPs from aggregating (Baetsen-Young, et al, 2018). We incubated the d-AuNP's with and without a ssDNAp and compared the d-AuNP aggregation at a range of NaCl concentrations. The stabilization effect can be observed on figure 6. The addition of ssDNAp causes a lower 620/520 ratio shift with increasing NaCl concentration.

Figure 6. Influence of ssDNAp on d-AuNPs aggregation. At a concentration of 133mM NaCl, the d-AuNPs start to destabilize while with presence of ssDNAp the destabilization occurs at higher NaCl concentrations. A) Wavelength ratios of 620/520 of d-AuNPs solutions with NaCl concentrations ranging from 0 to 333mM with and without ssDNAp. B) Visual outcome of the d-AuNPs solution with NaCl concentrations ranging from 0 to 333mM without (top) and with (bottom) ssDNAp.

To further evaluate the influence of ssDNAp concentration on the d-AuNPs stabilization, we measured the stability at different NaCl concentrations with varying ssDNAp concentrations. Increasing ssDNAp concentration stabilizes the d-AuNPs, reducing the aggregation and changing the color of the solution from red to purple.

Figure 7. Influence of ssDNAp concentration on stabilization of d-AuNPs. At higher primer concentrations, the d-AuNPs are more stabilized. A) Spectrum of d-AuNPs solutions with NaCl concentration of 333mM incubated with a ssDNAp concentration ranging from 0 nM to 2.5nM. B) Visual outcome of the d-AuNPs solution with NaCl concentration of 333mM with increasing ssDNAp concentrations.

Design: We incubated DNA with targeting ssDNAp or non-targeting ssDNAp to show the visual difference between the two.

Results and interpretation: The ssDNAp was incubated with the corresponding target gene doping DNA to let the ssDNAp anneal and form the secondary structure. Subsequently, the d-AuNPs were added and their aggregation was analysed. The ssDNAp targeting EPO was compared with a ssDNAp that is not targeting EPO. The on-target ssDNAp stabilized the d-AuNPs as the ratio 620/520 wavelength was lower compared to off-target ssDNAp (figure 8).

Figure 8. A) Wavelength ratios of 620/520 of d-AuNPs solutions with NaCl concentration of 333 mM with on and off target ssDNAp. B) Visual outcome of the d-AuNPs solution with ranging NaCl concentrations of 0 to 333mM with on- and off-target ssDNAp targeting EPO.

After showing the functionality of the d-AuNPs to detect target DNA, we aimed to determine the sensitivity by using variations of target EPO DNA concentrations while keeping the total amount of DNA constant. We tested concentrations ranging from 1 pM to 1 nM comparing on- and off-target primers and determined our lower limit of detection is 1nM. To achieve comparable results to Baetsen-Young et al. 2018 (29 fM) further optimization are required.

Additionally, we evaluated the influence of background DNA present in the sample. We used 38 ng/µL background DNA, equal to the expected cfDNA extraced during sample preparation. We were able to show that the presence of background DNA does not affect the sensitivity of the prescreening method. Further sensitivity evaluation are required before implementing the prescreening method.

Conclusion

The functionality of the prescreen was proven. We were able to visually detect the presence of target EPO DNA. However, to implement this prescreening, protocol optimization is required to reach a lower detection limit of at least 9 fM, based on data from our model. Thus, if the results of Baetsen-Young et al. 2018 can be reproduced, the detection limit can be lowered to 2.94 fM.

3. Fusion Protein - dxCas9-Tn5

In order to achieve targeted sequencing, we constructed a fusion protein by linking dxCas9 with Tn5 to be used in library preparation for sequencing platform. We constructed the strain that expressed our fusion protein, successfully purified them and tested their functionality in vivo and in vitro. We were able to show targeted integration by our fusion protein in vitro and sequence verified the integration products.

Design: We constructed a fusion protein by linking dxCas9 cDNA to Tn5 cDNA with a linker coding for glycine helical peptide.

Results and interpretation: The sequences encoding the dxCas9 (3.7) and Tn5 were obtained via Addgene with plasmid numbers #60240 from Picelli et al. (2014) and #108383 from Liu et al (2018) respectively. First, we introduced a linker coding for glycine helical peptide to Tn5 cDNA and cloned it to pACYCDuet. We screened colonies with the correct insert by colony PCR, followed by visualization of amplified insert on a 0.8% TAE agarose gel (figure 9). Plasmid was isolated from colony 8 and sequence verified.

Figure 9. Colony PCR of Linker-Tn5 in pACYCDuet-1 (1508 bp). The ladder represents the size of DNA in bps.

Next, we introduce dxCas9 into pACYCDuet-1_Lin-Tn5. We screened colonies with the correct insert by colony PCR, followed by visualization of amplified insert on a 0.8% TAE agarose gel (figure 10, 11, 12). After plasmid isolation from positive colony 2, we sequence verified plasmid pACYCDuet-1_dxCas9-Lin-Tn5 (figure 10).

Figure 10. Colony PCR of dxCas9-Tn5 in pACYCDuet-1_dxCas9-Lin-Tn5, expected size 5924 bp. The ladder represents the size of DNA in bps.

Picture of Timeline of the relevance of gene doping

Figure 11. Colony PCR of dxCas9 in pACYCDuet-1_dxCas9-Lin-Tn5, expected size 4172bp. The ladder represents the size of DNA in bps.

Picture of Timeline of the relevance of gene doping

Figure 12. Colony PCR of Lin-Tn5 in pACYCDuet-1_dxCas9-Lin-Tn5, expected size 1452bp. The ladder represents the size of DNA in bps.

Picture of Timeline of the relevance of gene doping

Figure 13. The alignment of the sequencing results of the plasmid dxCas9-Lin-Tn5. This image shows the alignment of the sequencing results of plasmids from colony 2 with the original insert design in SnapGene.

Design: We developed a new production process for the fusion protein BBa_K2643002. We followed existing dxCas9 expression protocols (Huai et al. 2017), but developed a new downstream process for the fusion. We evaluated three different types of chromatography:

Nickel affinity chromatography: The fusion protein has a HIS-tag (used to purify both Tn5 and dxCas9 individually).
Heparin chromatography: The fusion protein is a DNA-binding protein (used to purify both Tn5 and dxCas9 individually).
MonoQ chromatography: It’s an anion exchange resin, utilizing the charge of the fusion protein as a purification property.

Results and interpretation: Of the three chromatography methods, both heparin and monoQ were successful. As a result we developed a production process which includes heparin chromatography as a capture step, followed by monoQ as a final polishing step.

We used heparin chromatography to capture the fusion from the crude lysate. The fusion protein has a high affinity to the heparin resin, resulting in good initial separation of the fusion protein from contaminating and degrading species. The results suggested that the fusion protein was cleaved at the linker, resulting in a dxCas9 contamination (figure 14, lane 2 to 10) . The dxCas9 contaminant has a lower affinity to the heparin resin and eluted at lower concentrations of salt (figure 14, lane 2-10), whereas the fusion has a higher affinity to the heparin resin and eluted at high concentration of salt (figure 14, lane 11-13). Only the last three elution fractions were pooled (figure 14, lane 11-13), and further purified.

Figure 14. 8% SDS PAGE of fusion of Heparin Chromatography (expected size 214.7kDa). Lane 1 molecular ladder (kDa), lane 2 elution 1, lane 3 elution 2, lane 4 elution 3, lane 5 elution 4, lane 6 elution 5, lane 7 elution 6, lane 8 elution 7, lane 9 elution 8, lane 10 elution 9, lane 11 elution 10, lane 12 elution 11, lane 13 elution 12.

We used MonoQ chromatography to polish the fusion and remove any remaining contaminants. The fusion eluted in a broad peak (figure 15, lane 6-14) from the resin, however the fractions are relatively pure and the dxCas9 contamination was mostly eliminated.

Figure 15. 8% SDS PAGE of MonoQ chromatography of fusion (expected size 214.7kDa). Lane 1 molecular ladder (kDa), lane 2 starting material, lane 3 flow through, lane 4 wash, lane 5 elution 1, lane 6 elution 2, lane 7 elution 3, lane 8 elution 4, lane 9 elution 5, lane 10 elution 6, lane 11 elution 7, lane 12 elution 8, lane 13 elution 9, lane 14 elution 10, lane 15 elution 11.

We also evaluated nickel affinity chromatography, however we experienced a lot of difficulty binding the HIS tag to the nickel beads. Majority of the fusion was detected in the flow through and detailed results can be found in our notebook. Additionally, the complete upstream and downstream process for Tn5 transposase (BBa_K2643002), dxCas9 (BBa_K2643001) and dxCas9-Tn5 fusion (BBa_K2643002) is documented in detail on the biobrick registry.

Design: To visualize in vivo whether a donor DNA sequence can be integrated close to a target sequence, we built a strain that harbors the dxCas9-Tn5 fusion protein (BBa_K2643000), as well as a lacZ gRNA expression cassette (BBa_K2643010). The cell is electroporated with a kanamycin resistance cassette flanked by MEs (BBa_K2643011), which can be recognized by the Tn5 transposase domain for cut and paste integration near the gRNA target. This disrupts the lacZ gene and renders beta-galactosidase catalytically inactive, which can be visualized by blue/white screening with X-gal. In parallel, we have electroporated an E. coli strain that only harbors the Tn5 transposase for comparison with untargeted integration.

Results and interpretation: Prior to doing any actual genomic integration, we performed a broad scala of control experiments. Table 1 gives an overview of the results obtained for these controls. The obtained data confirmed that, based on their blue colony phenotype, the parental BL21DE3 strain indeed contained intact and functional lacZ. Furthermore, the parental strain was not found to be resistant to chloramphenicol nor to kanamycin. Harboring the fusion construct and the gRNA that targets lacZ, while not provided with donor DNA, did not influence the functionality of the lacZ, resulting in remaining blue colony phenotype. This discards the potential expressional inhibition effects to to binding of the fusion complex. Furthermore, we took along some controls including Red Fluorescent Protein (RFP), which is present by default on iGEM backbone vectors (pSB1K3). Together with blue phenotype, this resulted into a purple colony phenotype.

**Table 1:** Overview of observed phenotypes of the controls taken along in the *in vivo* assay.
Strain	DNA	LB-medium supplements	Observed phenotype
DH5α	-	X-gal + IPTG	white
BL21DE3	-	X-gal + IPTG	blue
BL21DE3	-	Cam	No growth
BL21DE3	-	Kan	No growth
BL21DE3	pACYCdxCas9Tn5gRNA	Cam + X-gal + IPTG	blue
BL21DE3	pACYCdxCas9Tn5gRNA + pSB1K3	Cam + Kan + X-gal + IPTG	purple
BL21DE3	pSB1K3	Kan + X-gal + IPTG	purple

When transforming the donor DNA plasmid (BBa_K2643011) into E. coli DH5α and selecting on medium complemented with chloramphenicol and kanamycin, we observed formation of red colonies. This indicates that the kanamycin cassette, as well as the RFP cassette were intact and functional. The sequence verification confirmed both sequences were intact and oriented as desired. DH5α was not able to grow on LB+Kan+Cam, and demonstrates the negative control.

The MEs of the donor DNA (BBa_K2643011) were also sequence verified and confirmed to be located as designed, but we were not able to show that they were recognized and that the donor DNA was integrated, neither in a targeted nor a non-targeted manner. Surviving colonies demonstrated development of purple phenotype, sometimes requiring several days (figure 16), while going through an interesting metamorphosis of color changes. Based on the evaluated controls, it is very likely that a purple phenotype was due to the presence of the original RFP-containing template plasmid used for creating the linear donor DNA sequence. The RFP cassette was kept on the plasmid to observe this deviant purple color to identify growing false positives. These colonies survived in the presence of kanamycin and chloramphenicol, because this template plasmid is propagated and harbors intact and functional chloramphenicol and kanamycin resistance expression cassettes.

Figure 16. Effect of incubation time on chromogenic in vivo assays. These LB+Cam+Kan+X-gal+IPTG plates demonstrate growth of BL21DE3+pACYCdxCas9Tn5gRNA strains that are electroporated with donor DNA. (A) 1 day after electroporation with linear ME flanked kanamycin resistance cassette, blue colony formation was observed, whereas (B) a week later red phenotype has developed strongly. (C) 1 day after transformation with ME flanked kanamycin resistance cassette in a plasmid, blue colony formation was observed, whereas (D) a week later purple phenotype has developed strongly.

Evaluation of colonies by several genomic PCRs have demonstrated that integration did not take place near the desired target site. The negative control used for these genomic PCRs was purified isolated gDNA of the BL21DE3 parent strain. Even transformation of donor DNA into a competent strain harboring only Tn5 did not result in colonies, suggesting a problem related to the transposase effector function of the fusion construct.

The experimental results demonstrate that the used kanamycin cassette is functional, judging from the control results. The RFP remains unchanged and functional too, allowing for identification of false positives. Furthermore, we have shown that strain BL21DE3 contains a functional lacZ copy, based on chromogenic conversion of X-gal.

Although we have shown that MEs are present and intact on the donor DNA sequence, we were not able to show that the MEs were recognized and that the donor DNA was integrated. We would have expected either targeted or non-targeted integration, but neither occured. Even when we try out the setup with just Tn5 and donor sequence, this yielded no integration.

Design: Two assays were performed to prove the in vitro functionality of the fusion protein:

Electrophoretic mobility shift assay (EMSA) verified the ability of fusion protein to load DNA adapter and sgRNA, and form complex with the target DNA (EPO).
Integration assay verified the target specific integration of fusion protein.

Results and interpretation: We successfully performed EMSA to individual (non-fused) dxCas9 and Tn5 protein (see the notebook). Next, we performed EMSA to verify that our fusion protein was able to load Cy5-labbelled adapter DNA, sgRNA, and Cy5-labelled target DNA. Figure 17 displays the result of the EMSA on 5% TBE native PAGE as shown by the Typhoon Imaging System.
Picture of Timeline of the relevance of gene doping

Figure 17. 5% TBE native PAGE by Typhoon Imaging System. A: Lane 1: unlabelled DNA ladder (100-1000bp), lane 2: 2.5µM fusion protein, lane 3: 2.5µM adapter, lane 4: 2.5µM adapter, lane 5: 2.5µM adapter + 0.15µM fusion protein, lane 6: 2.5µM adapter + 0.5µM fusion protein, lane 7: 2.5µM adapter + 1.5µM fusion protein, lane 8: 2.5µM adapter + 2.5µM fusion protein, lane 9: 2.5µM adapter + 2.5µM fusion protein + heat, lane 10: 2.5µM adapter. B: Lane 1: unlabelled DNA ladder (100-1000bp), lane 2: 0nM Fusion w/ gRNA (1:1), lane 3: 27nM Fusion w/ gRNA (1:1), lane 4: 134nM Fusion w/ gRNA (1:1), lane 5: 402nM Fusion w/ gRNA (1:1), lane 6: 804nM Fusion w/ gRNA (1:1), lane 7: 0nM Fusion w/out gRNA , lane 8: 27nM Fusion w/out gRNA, lane 9: 134nM Fusion w/out gRNA, lane 10: 402nM Fusion w/out gRNA, lane 11: 804nM Fusion w/out gRNA, lane 12: 134nm Fusion w/ gRNA and EDTA, lane 13: 402nM Fusion w/ gRNA and EDTA, lane 14: 804nM Fusion w/ gRNA and EDTA, lane 15: 1.34uM Fusion w/ gRNA and EDTA.

We performed EMSA for Tn5 and dxCas9 portion of the fusion protein separately. Figure 17A displays the EMSA of adapter. Loading of adapter on the fusion protein retards DNA movement on PAGE, which was observed as higher bands on figure 17A, lane 7 and 8. It indicates that mobility shift occurs when >1.5µM of fusion protein is present. We performed three different negative controls in parallel:

No adapter DNA was added (figure 17A, lane 2).
No fusion protein was added (figure 17A, lane 3,4, and 10).
Fusion protein was denatured by heat (figure 17A, lane 9).

None of these negative control reactions resulted in the band shift, confirming the specificity of our result.

With the same principle, higher bands observed on figure 17B, lane 5,6,14, and 15, confirmed the complex formation between sgRNA-loaded fusion protein with the target DNA. As a negative control, the same reaction was carried out without sgRNA. In this case, the dxCas9 should not be able to find the target DNA and complex should not be formed. The absence of higher bands on figure 17B, lane 7-11 confirmed this.

We performed integration assay based on the protocol. To amplify the integration products, we performed PCR with primers that anneal to the end of adapter DNA, and the beginning or end of target DNA. Figure 18 displays the PCR products on 5% TBE native gel.

We designed the sgRNA to target the second exon-exon junction of EPO cDNA (position ~200bp from total length 623bp). When integration of adapter DNA (length ~50bp) occurs, 2 fragments of ~250bp and ~470bp should be generated. Indeed, figure 18 lane 2 and 3 shows intense bands on position ~250bp and ~470bp respectively. This indicated the success of targeted integration by our fusion protein. Moreover, the absence of these bands and the stronger presence of other unspecific bands when sgRNA was omitted (figure 18, lane 4 and 5) or when Tn5 was used (figure 18 lane 10 and 11) confirmed that targeted integration is specific to sgRNA targeted fusion protein. Moreover, the absence of integration products on the other negative control (without fusion protein, adapter, or target DNA) on figure 18, lane 6-9, 12, and 13 confirmed this further.

Figure 18. 5% TBE native PAGE stained by EtBr and imaged by GelDoc system. A: Lane 1: DNA ladder (100-1000bp), lane 2: amplified product with fw EPO lane 3: amplified product with rv EPO, lane 4: amplified product with fw EPO without sgRNA, lane 5: amplified product with rv EPO without sgRNA, lane 6: amplified product with fw EPO without fusion protein, lane 7: amplified product with rv EPO without fusion protein, lane 8: amplified product with fw EPO without adapter DNA, lane 9: amplified product with rv EPO without adapter DNA, lane 10: amplified product with fw EPO with Tn5, lane 11: amplified product with rv EPO with Tn5, lane 12: amplified product with fw EPO without target DNA, lane 13: amplified product 2 without target DNA.

To confirm that these two bands represented the expected integration products, we isolated and sequence verified them. Figure 19 shows the sequence alignment between the two fragments and the target EPO gene. From the sequence alignment we can confirm that integration occured upstream the sgRNA binding site. The overlapping alignment between the two products suggested that integration does not always occur at the exact base pair position (~10bp variation), which might be the result of the flexible linker between dxCas9 and Tn5.

Figure 19. Sequence alignment of integration product (two red arrows) with the target EPO. Pink box represent the sgRNA binding site.

To examine the functionality of our fusion protein further, we studied integration activity with different sgRNA and target DNA notebook). Moreover, we repeated the reaction with longer incubation duration. With increasing incubation time, we observed fading bands representing targeted integration with increase in bands representing random integration (notebook)

Conclusion

We successfully constructed, expressed and purified the fusion protein. We developed a method to test its functionality both in vivo and in vitro. We successfully demonstrated that our fusion protein was able to perform targeted integration in vitro. We sequence verified the integration products and confirmed that integration occurred in the region close to the sgRNA binding site.

4. Targeted Sequencing with dxCas9-Tn5

Once the in vitro functionality of the fusion protein was established, we used it as a tool to prepare samples for targeted sequencing using ONT MinION sequencing. We proved the targeted integration of DNA adapters compatible with Rapid Sequencing kit (Oxford Nanopore Technologies) and linked this activity to the pipeline of library preparation for sequencing. Data analysis proved the potential of the fusion protein to be used for targeted sequencing. To compliment our work, we also developed software tools to generate sgRNA arrays and adapters with barcodes needed for MinION sequencing and a software tool for sequencing data analysis to identify gene-doping DNA.

Design: We developed a tool that screens through a gene coding sequence, identifies possible Protospacer Adjacent Motif (PAM) sequences for dxCas9 and analyzes the number of possible sgRNA molecules required for specific sites. One final specific site is chosen based on the lowest possible variation due to synonymous mutations, which was 12 for EPO cds, followed by generation of this array.

Results and interpretation: We generated a heat map that indicates the regions of human EPO gene with the lowest possible variation due to synonymous mutations (figure 20). The exon-exon junction (target sites that are gene-doping DNA-specific) is indicated with arrows in figure 20.

Figure 20. Heat map representing the total variation of EPO cds possible based on synonymous mutations. X axis represents EPO cds gene divided into subzones of 9 bp each. Subzones with darkest blue represent up to 18 possibilities of DNA (3 codons with 6 possible combinations each), whereas lightest subzones represent least number of possible DNA combinations. Junctions between exons are represented (in total 4 junctions) with their position in bp in EPO cds, number of identified PAM sequences and minimal number of sgRNAs required to cover all possibilities.

The model helped us to decide the best junction to be targeted by our fusion protein (junction with the least sgRNA posibilities). For EPO, it generated an array of 12 sgRNA at Junction three, with PAM sequence at position 232 bp of EPO CDS. For detailed information on the gRNA array model, please visit the model page.

Design: We developed a tool useful for the generation of DNA barcodes within the sequencing adapters to identify samples’ subjects. The resulted barcodes can be used for ONT MinION sequencing and allow the possibility of multiplexing.

Results and interpretation: To use this tool, please visit our improvement page.

Sequencing by ONT MinION Rapid Adapter kit: random adapter integration

Design: To prove the potential of targeted sequencing, we used our fusion protein during library preparation of specific DNA samples and sequenced them with ONT MinION. We run sequencing tests to determine whether different types of DNA molecules can be identified during data analysis. Once this analysis was done, we tested the DNA product from an integration reaction with our fusion protein and sequencing adapters.

Results and interpretation: We prepared a solution with a mix of different DNA molecules that ranged from plasmid to linear DNA of different sizes. This solution was used during library preparation, and after transposase reaction was stopped, an extra DNA molecule was added to assess whether unadapted DNA is sequenced. The total number of aligned reads for each reference molecule, presented in figure 21 indicated that small linear DNA molecules (600 bp) are not identified during data analysis. This does not mean that these DNA molecules were not read, but their alignment was not possible with our working pipeline, probably due to the size of the reference. These results also indicated that, when DNA molecules are added without the first transposase reaction (the case of PCR 3 that is unadapted), the reads generated are much lower (> 3%) than equimolar concentrations of molecules that were fragmented by the transposase reaction (PCR 1).

Figure 21. Number of reads aligned to reference molecules after MinION sequencing run on flow cell type FLO-MIN106 with Rapid Sequencing transposase based kit. Each bar represents a different DNA molecule added in the sample solution from either plasmid or linear DNA of different lengths. DNA molecules used were EPO biobrick (Plasmid 1), EPO + Ampicilin biobrick (Plasmid 2), pACYCduet-1 (Plasmid 3), EPO + Kanamycin amplification (PCR 1), EPO cds amplification (PCR 2) and dxCas9 amplification (PCR 3).

Applying the fusion protein: targeted adapter integration

Design: We designed and tested DNA adapters, compatible with ONT sequencing, to be integrated by the Tn5 portion of our fusion protein. We targeted the integration of these adapters on EPO coding sequence in plasmid DNA.

Results and interpretation: The initial assay for adapter (~30bp) integration on plasmid DNA with EPO coding sequence (BBa_K2643004) was not successful ( notebook). Based on this result, we hypothesised that it might be necessary to work with linearized DNA, so we digested EPO plasmid with EcoRI enzyme and repeated the adapter integration. Adapter integration with sgRNA (sg001) targeting EPO at around 232th bp should result in fragments of ~250 bp and ~430bp. We PCR amplified the products of integration using EPO and adapters’ primers. The results in figure 22, (Lane 1,2,5, and 6) indicated a correct integration of both adapters in a specific position in EPO cds molecule. The absence of these bands on the negative controls without sgRNA (lane 3,4,7,8) confirmed this specificity further.

Figure 22. Sequencing adapter integration by fusion protein on EPO target. 5% TBE native PAGE stained by EtBr and imaged by GelDoc system. DNA substrate: (lanes 1-4) EPO coding sequence or (lanes 4-8) EPO plasmid linearized with EcoRI. Lane L: DNA ladder (100-1000bp), lane 1, 5: amplified DNA with fw EPO lane 2, 6: amplified DNA with rv EPO, lane 3, 7: amplified DNA with fw EPO without sgRNA, lane 4, 8: amplified DNA with rv EPO without sgRNA.

ONT MinION sequencing with adapters integrated by fusion protein

We took our fusion protein one step further and attempted to use the integration product (figure 23) for ONT MinION sequencing. For this assay, we took an initial solution with equimolar concentrations of Tn5 DNA and EPO plasmid DNA and performed our integration assay, after which we continued with Rapid Sequencing Kit pipeline (without transposase fragmentation reaction). As a control, an equimolar concentration of dxCas9 DNA was prepared by ONT library preparation protocol (including transposase fragmentation).

After the sequencing run, we followed the workflow defined in Sequencing alignment protocol. The alignment results showed that most of the reads (>99 %) aligned to dxCas9 DNA used as positive control. Analysis of aligned reads to sample treated with our fusion protein showed zero sequences aligned to Tn5 DNA reference (non-target of our fusion) and 89 reads aligned to the target EPO plasmid DNA. These reads indicated that the integration of sequencing adapters preferred the linearized EPO plasmid DNA rather than Tn5 DNA for sequencing. Closer look at the aligned sequences rendered one sequence read presented in figure 23, where we have a clear example of targeted adapter integration next to the gRNA flanking site (figure 23).

Figure 23. A) Alignment of DNA sequence read with target EPO cds in plasmid DNA. (B) Zoom into the sequence alignment to determine sgRNA target. Fusion protein depicted with arrows representing binding of dxCas9 position (red) and the addition of sequencing adapters (blue).

These preliminary results show the function of adapter integration directed by our fusion protein to the sgRNA complementary site. The efficiency of integration was not high and some of the aligned reads showed extended regions summing up to long reads of 2 - 11 kbp. More experimentation should be done to obtain conclusive data on these rare events, but most importantly to define better conditions for the integration assay. The low efficiency of this integration could be explained by different factors explained below:

During the purification of the fusion protein, there could be contamination by DNA attached to some domain of the protein (as they are DNA binding proteins). This contamination could come from the cells used to produce the protein.
Purification of the fusion protein was done with heparin column, and other DNA interacting domain-like proteins could be present in the protein mixture.
Conditions of the integration assay could be suboptimal to favour reaction kinetics of the fusion protein.
Incompatibility of buffers required for transposase and dxCas9 binding reactions, or incompatibility of these buffers with ONT sequencing buffers.
Probable suboptimal compatibility of our designed sequencing adapters with ONT sequencing platform. The specific adapters’ sequence is not known due to corporate secrets.

We have, however, proven that our fusion protein is able to load sequencing adapters and integrate them at a specific site in their target DNA, allowing this DNA molecule to be sequenced with ONT sequencing platform. Moreover, we were able to identify 89 sequencing events aligned to the DNA target, and even analyse the possible direction for integration of our fusion protein based on one single DNA read out of over 200 000 reads.

Design: We developed a software tool for data analysis of sequencing data based on a machine learning algorithm to increase the database for reference alignment in an iterative process.

Results and interpretation: To use this tool, please visit our software tool page.

Conclusion

We demonstrated that our fusion protein and sequencing adapters could be used to replace the first reaction in library preparation for ONT MinION sequencing. More importantly, we proved specific targeted enrichment by our fusion protein for library preparation and ONT MinION sequencing. We obtained preliminary results on the position of adapter integration from sgRNA binding site.

Team:TUDelft/Results

ADOPE

1. Sample Preparation

Conclusion

2. Prescreening

Conclusion

3. Fusion Protein - dxCas9-Tn5

Conclusion

4. Targeted Sequencing with dxCas9-Tn5

Sequencing by ONT MinION Rapid Adapter kit: random adapter integration

Applying the fusion protein: targeted adapter integration

ONT MinION sequencing with adapters integrated by fusion protein

Conclusion

5. References