Team:Pittsburgh/Software

CrisPy


Background

One of the more challenging aspects of the CUTSCENE project was finding ways to measure the point mutation frequencies of our DNA. Our envisioned final readout is gel electrophoresis, however we needed a reliable alternate method as a baseline for comparison.

While high throughput sequencing methods are accurate, the cost was prohibitive for the large quantity of sequences we wanted to test. Instead, we turned to Sanger sequencing, an older but less costly method. This decision was supported by recent work by the Timothy Lu lab, that demonstrated relative mutation frequencies can be quantified from these chromatograms.

Figure 1: Sanger sequencing consists of three main steps: First a primer is added to the DNA and DNA fragments of different lengths are generated, each terminating with a labeled nucleotide; next the DNA segments are separated by length through capillary gel electrophoresis; finally a laser detects each of the labeled nucleotides as they pass by, generating a chromatogram like the one shown above [Sanger Sequencing Steps & Method, Sigma Aldrich].

Another question that we needed to solve was whether the cas9 protein could maintain its specificity when faced with many similar (but non-identical) DNA frames. Theoretically, we knew that having multiple mismatches close to the PAM sequence would make it less likely to bind, but we were not sure of the exact rates or whether using the modified cas9 base editor would make a significant difference. We discovered that the University of British Columbia’s 2017 iGem team had completed work on this in the past, building off of research done by Dr. Howard M Salis’s lab. The team took into account position, mismatch type, and supercoiling of the DNA to find how energetically favorable a match between the gRNA and DNA would be. Since our goal was to simply identify a range of likely off-target sequences, we used a scoring system that only took into account the position of each mismatch, then chose a threshold to screen for the highest scores (cutoff of 0.6). Scores were normalized to from 0-1, with 1 being a perfect match.

Figure 2: As illustrated by the 2017 University of British Columbia iGEM Team, sites farther from the PAM sequence are less important to target recognition.


Software Module


The purpose of our software tool (CrisPy) is to provide a fast and accurate way for characterizing CRISPR on-target and off-target edits through Sanger sequencing. Software tools have attempted to numerically characterize Sanger traces in the past. Mark Crowe’s SeqDoc (Perl, 2005) provided functionality for normalizing and comparing two traces. More recently, Timothy Lu’s Sequalizer (Matlab, 2017) used these comparisons to estimate nucleotide mutation frequencies. Our system builds upon this previous work by incorporating modified versions of these algorithms in a concise and easy-to-use python module. Moreover, this module creates new functionality by predicting and ranking likely off-target sequences (using Gibbs free energy) and finding their mutation frequencies, for both the given traces and its reverse complement.

We used this tool to successfully collect all of our target mutation frequency data (see results). In the future we plan on fully characterizing the off target mutation frequencies in our CUTSCENE system.

The workflow for the tool is provided below. It is our hope that synthetic biologists can use this tool to quickly test for off-target mutation frequencies that exist in their system.


Figure 3: Typical workflow for off-target characterization with the CrisPy module

Github





References


  • "aGrow Modeling", University of British Columnia 2017 iGEM team, https://2017.igem.org/Team:British_Columbia/Model
  • Farasat I, Salis HM (2016) A Biophysical Model of CRISPR/Cas9 Activity for Rational Design of Genome Editing and Gene Regulation. PLOS Computational Biology 12(1): e1004724. https://doi.org/10.1371/journal.pcbi.1004724
  • Farzadfard, Fahim, et al. “Single-Nucleotide-Resolution Computing and Memory in Living Cells.” BioRxiv, 2018, doi:10.1101/263657.
  • Crowe, M. L. (2005). BMC Bioinformatics, 6(1), 133