- Sequence summary: E.coli genome is sequence α, with length m; All plasmid sequences that need to be orthogonal are spliced into a sequence β of length n.
- Genome orthogonal screening: A randomly generated sgRNA sequence λ that ends in NGG together with all
23 consecutive sequences on sequence α are ruled as heterotopic or, and their total weighted scores
are obtained. Then reverse α and do the same thing. Eventually, we can get two score sequences of
length m-22. The distribution function with the total weight points in the entire sample space was
calculated in advance by means of parameter estimation, and then different threshold values of the
orthogonal exponential threshold μ_1were selected according to different orthogonality requirements.
The threshold value used in this experiment was 63 (90%). Retrieve two socre sequences.If there is a
score value below μ1, it is considered not to satisfy genomic orthogonality;if the conditions are met,
proceed to the next step. In addition, we used some empirical weight values - Weights 1 to 10 are assigned
at a time from 10 digits away from the PAM end,10 in the seed region;t he 10 sites in the seed region
assign an average value of 10; The 21st bit weight is 0,22 and 23 bit weight is 20.

Fig i

Fig 2 - Plasmid orthogonal screening: Repeat algorithm in step 2 using the sequence from step 2 and sequence β. And remove the weight score of two plasmid spliced sequences that do not exist in the bacteria.The screening was done with threshold value μ_2 = 43.Record the filtered sequence.
- Repeat steps 1, 2, and 3 until the number of recorded sequences reaches 600.
- Verify the orthogonality of all the obtained sequences among each other, and finally obtain the
20 sequences that are most orthogonal to the genome and plasmid. The result matrix obtained by mutual
verification is as follows:

Fig 3

All codes were written in Fortran and can be find here.

References

[1] Bikard, D., Jiang, W., Samai, P., Hochschild, A., Zhang, F., & Marraffini, L. A.
(2013). Programmable repression and activation of bacterial gene expression using an engineered
crispr-cas system. Nucleic Acids Research, 41(15), 7429-7437.