The core of our detection method is our novel targeted Next Generation Sequencing platform. This technology reduces the enormous amount of data generated by whole genome sequencing and reduces the time and resources invested in analyzing multiple samples, for the detection of gene doping. We accomplished our goal by designing an innovative dxCas9-Tn5 fusion protein shown in figure 1. The dxCas9 part is responsible for guiding the fusion to the target site via complementary matching between the sgRNA and target site. Whereas the Tn5 part is responsible for performing the integration of the adapter. This fusion thus has the capability of performing target specific integration of adapters required for sequencing. We could therefore replace the random enzymatic protein, Tn5, used for fragmentation during sequencing library preparation, transforming the Oxford Nanopore Technologies MinION platform to targeted Next Generation Sequencing.
Fusion protein In Vitro functionality
We constructed, expressed and purified the fusion protein, and demonstrated its intended activity: targeted integration of specific DNA sequences. For this, we analyzed the integration of small DNA sequences (DNA adapters) directed with a single guide RNA (sgRNA) to a specific site on a substrate gene, generating two DNA fragments of known length based on the position that the sgRNA targeted.
The fusion was loaded with sgRNAs designed to target the substrate DNA, erythropoietin (EPO) coding DNA sequence (cds), at position 179 bp (from a total length of 623 bp). If the integration of DNA adapters was site-specific, two main products should be obtained: ~250 bp and ~470 bp (one from the forward and one from the reverse strand), schematically shown in figure 2. This integration was demonstrated and presented in figure 2.B, in which lanes 2 and 3 represent each one of the fragments described. These lanes can be compared to the negative control in lanes 4 and 5. These lanes represent integration in the absence of sgRNA, the component required from targeted integration.
Targeted Next generation sequencing
Not only did we demonstrate that our fusion protein is capable of targeted integration, but we used our protein in its designed application: targeted sequencing with Oxford Nanopore Technologies (ONT) platform. We used our fusion protein during library preparation to integrate sequencing adapters at the target site in the EPO cds, using the same sgRNA from the assay described above. The library reaction mixture had equimolar concentrations of two molecules:
- Target DNA: EPO linear plasmid DNA containing a complimentary sgRNA target site.
- Background DNA:Tn5 cds DNA containing no sgRNA compatible target sites.
After library preparation, we also added equimolar concentrations of randomly tagged reference DNA, as an internal control to verify the sequencing run. We sequenced the DNA molecules with an ONT MinION device. Sequencing results showed multiple reads of the reference DNA (internal control), 89 unique aligned reads of to the target DNA, EPO linear plasmid, and 0 unique aligned reads to Tn5 cds DNA, background DNA. Demonstrating a successful sequencing run and the target specificity of our fusion protein. Close analysis of one of the alignment reads demonstrates the precise site where the sequencing adapters were added. We could thus identify the exact distance from the binding of dxCas9 to the integration site of the adapters as shown in figure 3.
We proved that our designed fusion protein could direct DNA integration to a specific sequence by the use of a sgRNA. Furthermore, these results demonstrate that this novel protein can be used to perform targeted next generation sequencing. We were able to identify 89 sequencing events that align to our target DNA (EPO linear plasmid DNA). We were even able to identify the exact position of adapter integration by our fusion protein based on one single DNA read out of over 200 000 reads (including reads from the internal control). This demonstrates that targeted sequencing with library preparation by our fusion protein significantly simplifies data analysis and could be used to detect gene doping DNA.