# Overview

Gene doping is the administration of exogenous genetic material for performance enhancement. Detection of gene doping requires identifying target sequences and detection windows. Our targeted sequencing method includes our novel dxCas9-Tn5 fusion protein and requires a minimal set of guide RNAs aligning with gene doping sequences. As there are 10104 possible codon variations in our proof of concept target, the EPO gene, it is not feasible to target all possible sequences in practice. We therefore implemented a search function to identify areas with minimal variation within the sequence, which reduced the testing set to twelve sgRNAs. Second, we modeled the process of infection and degradation of gene doping DNA in blood. From this model, we provided the laboratory with the time dependent concentration of the target DNA. Based on our wetlab sensitivity analyses, the model predicts that with microdosing, our detection method could effectively catch gene dopers.

# 1. Approach

We identified a threat in detecting gene doping that lays in the possibility of modifying the genetic sequence of a gene without changing the protein sequence produced. This allows gene dopers to creatively modify their DNA sequence in several possible combinations and complicates the design of detection methods. To combat this, the exon-exon junctions with the smallest possible variation need to be identified and sgRNA sequences generated to cover all possible combinations of one gene (figure 1).

Once the testing set of sgRNAs is generated for our fusion protein, we determined the time during which the detection of the gene doping DNA in an athlete’s blood sample is possible. This was accomplished by modelling gene doping administration. We considered the entire process of gene doping to fully understand the underlying mechanisms of gene doping and its effect on the athlete. With our chosen model gene being the human erythropoietin (EPO) gene based on the recommendation by Prof. Hidde Haisma from Groningen University, we included the EPO dependent production of red blood cells in our model (figure 2).

A vector is a DNA molecule used as a vehicle to carry foreign genetic material into another cell. Once, there it can be expressed and replicated. The effect and detection of gene doping is highly dependent on the vectors that are being used. In the early stages of our project we spoke with Prof. Hidde Haisma from Groningen University, a gene doping expert, who told us the main vectors he would expect athletes to use. He suspect that the most common vectors are plasmids and adenoviruses, because of their relative safety compared to vectors that integrate the DNA into the genome of the cell. Integrating vectors, such as the retrovirus, present the threat of insertional mutagenesis which can lead to the development of cancer. Table 1 shows our analysis of possible gene doping vectors and some of their properties based on the work of Ratko et al. 2003.

Plasmid Relatively safe
Generally low immune response
Low cost and easy large quantity production
Variable transgene insertion upto ±20 kb (Lodish et al. 2000)
Long storage (Munier et al. 2005, Kircheis et al. 2001, Li and Huang 2000)
Very low transfection efficiency (Bergen et al. 2008)
No targeting
Transient expression
Transduces proliferating and nonproliferating cells
Transduces many cell types
Easy Production
Very high titers (1012 pfu/mL)
No targeting
Transient expression
Limited insert size: 4–5 kb
Potential replication competence
No viral genes
No targeting
Difficult production
Not characterized well
Potential Insertional Mutagenesis
Limited insert size: 5kb
Lentivirus Transduces proliferating and nonproliferating cells
Prolonged expression
Relatively high titers (106–107 pfu/mL)
Integrating virus
Clinical experience limited
Difficult to manufacture and store
Limited insert size: 8 kb
Retrovirus Relatively high titers (106–107 pfu/mL)
Prolonged stable expression
Larger insert size: 9–12 kb
Inefficient transduction
Integrating virus
Insertional mutagenesis
No targeting
Potential replication competence

Each vector has its own benefits and drawbacks. Plasmid vectors, such as non-viral DNA vectors, have several advantages over viral vectors. Virus production is expensive (Templeton et al. 2002, Nagasaki and Shinkai, 2007) and safety of viral transfection remains a concern after several deaths (McCormack et al. 2004, Hacein-Bey-Abina et al. 2008). However, plasmid vectors have a very low transfection efficiency (Murakami et al. 2011), especially compared to adenoviruses that have been shown to have a 95% transfection efficiency in hepatocytes (Huard et al. 1995, Sullivan et al 1997). Transfection efficiency of plasmids can be improved through methods such as in vivo electroporation (Ataka et al. 2003). However, this method requires the insertion of electrode needles into the athlete to increase transfection efficiency. This is more invasive to the athlete than a single injection of vector particles. Therefore, adenoviruses are seen as the most likely transfection method for gene doping at this stage and even more so in the near future. Hence, most of the numerical values were based on this type of vector.

# 2. Model Design

### 2.1 sgRNA Array Model

The four input variables of the model are the coding sequence of a gene, the protospacer adjacent motif (PAM) of the CRISPR-Cas protein, the length of the sgRNA, and the CRISPR-Cas-dependent identity between sgRNA seed and target (off target effect). An overview of the generated algorithm is shown in figure 3.

A step-by-step overview of the algorithm used for determination and creation of sgRNAs is given below:

1. Convert coding sequence into numerical sequence (A=1, T=2, C=3, G=4).

2. Translate genetic sequence into amino acid sequence.

3. Based on the amino acid sequence, determine the number of possible different codons that will code for the same amino acid.

4. Determine positions of PAM sequence close to the exon-exon junctions.
1. Analyze the total number of sgRNAs necessary for each PAM sequence to cover all possibilities of synonymous mutations (product of possible codons).
2. Determined by the identity necessary between sgRNA and target sequence.

5. Gather the number of all PAM positions in the gene and determine the total number of sgRNAs necessary, based on the codon variations.

6. Select the PAM sequence that has the minimal number of sgRNAs possible and generate sgRNAs.

7. Output the position of PAM sequence in the gene and all the sgRNAs’ sequence.

### 2.2 Gene Doping Model

First, we modeled the transit of the injected gene doping viral vectors from the injection site to the target cells. Both, intramuscular (IM) and intravenous (IV) injection methods were considered.

Our model starts from the point of injection, which is either intramuscularly (IM) or intravenously (IV). The advantage of intravenous injections is that the viral vectors immediately enter blood circulation. The means that more vectors reach the target kidney cells before they are degraded. However, intravenous injections require a qualified doctor to administer and can lead to vein damage such as phlebitis. Intramuscular injection does not have these issues. On top of this, they can supply relatively large volumes of the gene doping DNA as the muscles have larger uptake capacity than the veins. The major disadvantage of intramuscular injection compared to intravenous injections is the added degradation of the vectors in the muscle, leading to larger doses being required for the same effect as an IV dose. Finally, intramuscular injections can cause local swelling, drainage and severe pain at the site of injection. Nevertheless, we take both ways of administration into account.

Pharmacological compartment models are developed to understand the distribution of drugs administered to the human body by oral or, intramuscular, or intravenous routes (Khanday et al. 2017). They are formulated based on diffusion processes using Fick’s principle and law of mass action. The same diffusion processes affect the administered adenoviral vectors. We start with pharmacological compartment models for both injection types as displayed in Figure 10. Here it is assumed that the mixing of the vectors with blood is instantaneous, based on an article by Tarr et al. 1933). Prof. Beltman, Assistant Professor Biomedical Modelling at Leiden University, later agreed with this assumption.

There are multiple tissues producing EPO, including liver, brain and kidney tissue. The main cell population responsible for EPO production are the interstitial fibroblasts in the kidney, spanning an average population of approximately 100 million cells, which produce more than 80% of the EPO in blood (Weidemann & Johnson 2009).

From the compartment models in figure 10, equations 1 and 2 can be derived for the intravenous administration.

$$\frac{d[c_{blood}]}{dt} = -(k_{blood}+kel_{blood})[c_{blood}]+k_{tissue}[c_{tissue}]\tag{1}$$

$$\frac{d[c_{tissue}]}{dt} = k_{blood}[c_{blood}]-(k_{bind\,uptake}+k_{tissue})[c_{tissue}] \tag{2}$$

Similarly, equations 3, 4, and 5 can be derived for the intramuscular administration. In intramuscular administration, the viral vectors must first diffuse out of the muscle and enter the bloodstream. While in the muscle, the muscle macrophages break down and eliminate the viral vectors.

$$\frac{d[c_{muscle}]}{dt} = -(k_{muscle}+kel_{muscle})[c_{muscle}]\tag{3}$$

$$\frac{d[c_{blood}]}{dt} = -(k_{blood}+kel_{blood})[c_{blood}]+k_{tissue}[c_{tissue}]+k_{muscle}[c_{muscle}] \tag{4}$$

$$\frac{d[c_{tissue}]}{dt} = k_{blood}[c_{blood}]-(k_{bind\,uptake}+k_{tissue})[c_{tissue}] \tag{5}$$

The initial values and constants used are shown in tables 2 and 3, respectively in the next dropdown.

Table 2. Overview of the initial values used in the human body model for an adenoviral vector.
Constants Values Meaning
$$[c_{blood} (t=0)]$$ 94 billion [#/mL] for IV Single Dose
64 billion [#/mL] followed by smaller doses of 18 billion vectors every 20 days for IV Microdosing
0 [#/mL] for IM
Initial injection of vectors intravenously
$$[c_{tissue} (t=0)]$$ 0 [#/mL] Initial vector concentration in in tissue
$$[c_{muscle} (t=0)]$$ 141 billion [#/mL] for IM
96 billion vectors followed by smaller doses of 27 billion vectors every 20 days for IM Microdosing
0 [#/mL] for IV
Initial injection of vectors intramuscularly

Table 3. Overview of the constants used in the human body model for an adenoviral vector.
Rate Constants Values (days-1) Meaning Source
$$k_{tissue}$$ 1440 Vector displacement from tissue to blood Estimate based on mean blood circulation time
$$k_{muscle}$$ 1440 Vector displacement from muscle to blood in IM injections Estimate based on mean blood circulation time
$$k_{blood}$$ 1440 Vector displacement from blood to tissue Estimate based on mean blood circulation time
$$kel_{blood}$$ 720 Elimination of viral vectors from the blood Ganesan et al. 2011
$$kel_{muscle}$$ 720 Elimination of viral vectors from the muscle Ganesan et al. 2011
$$k_{bind\,uptake}$$ 8.64 Endosomal uptake Varga et al. 2005

Upon reaching the target kidney cells, the gene doping viral vectors infect the cells. The infection process and production of gene doping EPO was modeled through a series of kinetic equation. Kidney cells have a mean lifespan of around 57 days. Upon their death, the infected cells release the gene doping DNA back into the bloodstream as cell free DNA (cfDNA). This increases the detection window of the gene doping DNA.

#### Cellular uptake

After the uptake of the vectors in the tissue, the stage of cellular uptake ensues. In this process we modelled multiple steps as indicated in Figure 11 according to a model first developed by Varga et al. 2005. The cellular uptake of gene doping vectors as depicted in Figure 11 can be dissected into multiple steps described by a set of coupled differential equations, for which the constants are given in Table 3.

First, the complex is taken up by endocytosis after which it is either degraded or taken up, as represented by equations 6 and 7.

$$\frac{d[vesicle]}{dt} = k_{bind\,uptake}[c_{tissue}]-(k_{Escape}+k_{deg\,vesicle})[vesicle] \tag{6}$$

$$\frac{d[complex\,intracell]}{dt} = k_{escape}[vesicle]-k_{unpack}[complex\,intracell]-k_{bind\,vector}[complex\,intracell]\tag{7}$$

Second, vector dissociation and either degradation or nuclear target complex binding takes place in either dissociated or complexed form, as given by equations 8, 9 and 10.

$$\frac{d[plasmid]}{dt} = k_{unpack}[complex\,intracell]-k_{bind \,plasmid}[plasmid]-k_{deg}[plasmid]\tag{8}$$

$$\frac{d[plasmid\,bound]}{dt} = k_{bind \,plasmid}[plasmid]-k_{NPC}[plasmid\,bound] \tag{9}$$

$$\frac{d[complex\,bound]}{dt} = k_{bind \,vector}[complex\,intracell]-k_{NPC}[complex\,bound] \tag{10}$$

Subsequently, transport to the inner part of the nucleus is believed to take place through first binding to a nuclear pore complex (NPC) and finally inside the nucleus dissociation of the nuclear target complex takes place. This is represented by equations 11 till 16.

$$\frac{d[complex\,boundNPC]}{dt} = k_{NPC}[complex\,bound]-k{in}[complex\,boundNPC] \tag{11}$$

$$\frac{d[complex\,bound\,nucleus]}{dt} = k{in}[complex\,boundNPC]-k_{dissociation}[complex\,bound\,nucleus] \tag{12}$$

$$\frac{d[complex\,nucleus]}{dt} = k_{dissociation}[complex\,bound\,nucleus] - k_{unpack2}[complex\,nucleus] \tag{13}$$

$$\frac{d[plasmid\,boundNPC]}{dt} = k_{NPC}[plasmid\,bound] -k_{in2}[plasmid\,boundNPC] \tag{14}$$

$$\frac{d[plasmid\,bound\,nucleus]}{dt} =k_{in2}[plasmid\,boundNPC]-k{kissociation2}[plasmid\,bound\,nucleus] \tag{15}$$

$$\frac{d[plasmid\,nucleus]}{dt} =k{kissociation2}[plasmid\,bound\,nucleus] + k_{unpack2}[complex\,nucleus]$$

$$- k_{cell\,death}[plasmid\,nucleus] \tag{16}$$

#### Detection of cell free doping DNA

Apart from the effect of the gene doping EPO on the production of red blood cells, the purpose of the model is to determine the dynamics of the detectable cfDNA concentration in the blood. Cell free doping DNA is released from dying infected cells and circulates in the blood where it is assumed to degrade at the same rate as natural cfDNA.

$$\frac{d[Doping\,DNA]}{dt} =k_{cell\,death}[plasmid\,nucleus]-kel_{cfDNA}[Doping\,DNA] \tag{17}$$

Equation 17 provides us with a detection window for which we assume that we can detect both, DNA left in the tissue and bloodstream after injection, and DNA released after cell death (kcelldeath). Any other degradation terms or transient expression we incorporated in the constants used. Based on the above model we obtained the concentration developments of cfDNA in time for both intramuscular and intravenous injections and the estimated detection windows where we assumed an estimated detection limit of 100 copies DNA. The concentrations of DNA over time were used in the laboratory for our sample preparation to mimic real life detection potential. The cell death rate constant is directly linked to the average lifetime of renal interstitial fibroblasts, which we estimated to be around 57 days based upon measurements in chicks by Weissmanshomer et al. 1975.

According to Haller et al. 2018, the expected concentration of doping cell free DNA may be higher for athletes in endurance and intermittent sports. Haller et al. found a 22.7 fold increase in venous cfDNA concentrations in footballers after a professional football match. Given the high amount of training top level athletes endure, this finding leads us to believe that the detection window might be even longer than our model predicts.

#### The Protein Effect

Lastly, the uptaken DNA can be translated into proteins according to equation 18, after which it can be exported to the extracellular environment according to equation 19.

$$\frac{d[protein]}{dt} =k_{protein}[plasmid\,nucleus]-k_{deg\,protein}[protein]-k_{export}[protein] \tag{18}$$

$$\frac{d[protein\,extracellular]}{dt} =k_{export}[protein]-k_{deg\,protein\,extracellular}[protein\,extracellular]\tag{19}$$

Table 4. Overview of the constants used in the human body model for an adenoviral vector.
Rate Constants (Ad5) Values (days-1) Meaning Source
$$k_{bind\,uptake}$$ 8.64 Endosomal uptake Varga et al. 2005
$$k_{deg\,vesicle}$$ 28.8 Degradation of complex within uptake vesicle Varga et al. 2005
$$k_{escape}$$ 23.0 Complex movement from endosome to intracellular Varga et al. 2005
$$k_{bind\,vector}$$ 144 Binding of gene delivery vector to compound targeting for the nucleus Varga et al. 2005
$$k_{unpack}, k_{unpack2}$$ 144 Plasmid detaches from vector either in cytoplasm(1) or in the nucleus(2) Varga et al. 2005
$$k_{deg}$$ 7.2 Degradation of unbound plasmid in the cytoplasm Lechardeur et al. 1999
$$k_{bind\,plasmid}$$ 2.88 Binding of plasmid to compound targeting for the nucleus Varga et al. 2001
$$k_{NPC}$$ 1.44*106 Binding formed complexes to Nuclear Pore Complex Vacik et al. 1999
Wilson et al. 1999
Chan et al. 1999
Dean et al. 1997
$$k_{in}, k_{in2}$$ 2.88 Uptake nucleus through Nuclear Pore Complex Varga et al. 2001
$$k_{dissociation}, k_{dissociation2}$$ 1.44*106 Dissociation from the NPC targeting compound Moroianu et al. 1996
$$k_{protein}$$ 14.4 Protein production from plasmid Schaffer et al. 1998
$$k_{degprot}$$ 1.04 Cytoplasmic degradation of the protein Fuertinger et al. 2012
$$k_{export}$$ 1.44*106 Export protein to extracellular environment Estimate
$$k_{deg\,protein\,extracellular}$$ 1.04 Degradation of EPO protein in blood Fuertinger et al. 2012
$$k_{cell\,death}$$ 0.0167 Average death rate of renal interstitial fibroblast Estimate based on chicks; Weissmanshomer et al. 1975
$$kel_{cfDNA}$$ 100 Clearance of cfDNA from the blood Alegre et al. 2015

The EPO from the infected cells is released into the bloodstream. The EPO reaches the bone marrow where it stimulates red blood cell production through erythropoiesis. Red blood cells begin as stem cells and go through a series of cell differentiations before maturing into red blood cells (figure 12).

EPO promotes the proliferation of CFU-E cells. Increases in EPO levels lead to faster proliferation rates of CFU-E cells. Increases in EPO levels reduces the marrow transit time of cells for marrow reticulocytes, releasing the cells into the blood in a shorter time frame. In the blood, the reticulocytes mature into red blood cells which increase the oxygen carrying capacity of blood. If the partial partial pressure of oxygen in the blood drops below normal due to an excess of red blood cells, the endogenous production of EPO in the kidneys decrease. A low enough concentration of EPO in the blood will trigger macrophages to phagocytose young red blood cells. The process is referred to as neocytolysis and allows the body to quickly respond to changes environmental changes including changes in altitude (Rice et al. 2005).

With the doping DNA degradation and doping EPO formation determined, the effect of EPO on erythropoiesis, the process which produces red blood cells, is determined. We developed a model using an anemia EPO treatment model by Fuertinger et al. 2012 as reference.

Red blood cells begin as stem cells and progress into different cell stages as they age. As progenitor and precursor cells age, they proliferate or undergo apoptosis at a rate dependent on the cell stage they are in. The resulting growth or decay rates may either be constant or dependent on the concentration of EPO in the blood. Burst-Forming Unit-Erythroid (BFU-E) cells have a very small number of EPO receptors. EPO concentration has no effect on BFU-E proliferation, their proliferation is assumed to be constant. After leaving the stem cell stage, cells stay in the BFU-E stage for 7 days, after which they enter the Colony-Forming Unit-Erythroid (CFU-E) stage. In this stage, the cells divide at a higher rate than in the BFU-E cell stage. The CFU-E cells have a large number of EPO receptors and are strongly dependent on EPO for their survival. Their rate of apoptosis is inversely related to the concentration of EPO. Under normal conditions within the human body, a large portion of CFU-E generated do not survive. As the concentration of EPO in the blood increases, the number of cells which survive increases.

After spending 6 days as CFU-E cells, the cells enter the erythroblasts stage. Here, the number of EPO receptors decline. During this stage, there is no evidence that additional divisions occur when production of EPO increases. For this reason we assume that the proliferation of erythroblasts is constant (Lichtman et al. 2005).

The cells stay in the erythroblasts stage for 5 days until they stop dividing, extrude their nuclei and mitochondria, and become marrow reticulocytes. Marrow reticulocytes no longer proliferate and their mortality rate is inversely dependent on iron concentration in the plasma. Since we assume that athletes have a sufficient iron supply, a constant apoptosis rate for marrow reticulocytes is assumed. The time cells stay in the reticulocytes stage is between 0.75 and 3 days. An increase in EPO concentration shortens the marrow transit time of reticulocytes.

Once reticulocytes are released from the bone marrow and enter the blood, they mature into erythrocytes (red blood cells) within 1-3 days. Reticulocytes have a hemoglobin content of around 27.5 ± 2.8 pg per cell and, red blood cells have a hemoglobin content of around 26.4 ± 2.4 pg per cell (Fishbane et al. 1997). Due to the similar ability of blood reticulocytes and red blood cells to carry oxygen, when red blood cells are discussed, we refer to both blood reticulocytes and mature red blood cells. The lifespan of red blood cells (RBCs) in healthy human adults is about 120 days before their components are recycled by microphages (Jandl 1987). Over the course of this time a small number of RBCs die due to random daily breakdown or, internal or external bleeding. This is taken into account with a small apoptosis rate for RBCs. Adults have a red blood cell count ranging from about 20 to 30 trillion. Women have a blood cell count range of 3.5-5.5 trillion cells per liter, while men have a range of 4.3-5.9 trillion cells per liter (Dean 2005). The average red blood cell count is estimated to be 24.98 trillion by Lichtman et al. 2005. The entire process, from stem cell to red blood cell recycling by microphages, takes 141 days.

The endogenous release of EPO is inversely related to the partial pressure of oxygen in the blood. The partial pressure of oxygen in the blood is proportionally related to the number of circulating red blood cells. An increase in the red blood cell population in blood will decrease endogenous EPO production. If the concentration of EPO in the blood falls below a certain level (9.8 mU/ml in the case of this model), neocytolysis is triggered. Neocytolysisis the selective lysis of young red blood cells by the body to allow it to decrease its red blood count at a higher rate and reach the desired partial pressure of oxygen in the blood.

The series of partial differential equations (PDEs) that describe red blood production was modeled with a simplified linear age population method which provided similar accuracy to more computationally intensive PDE solvers. The red blood cell production model is then combined with the compartment and infection model to determine the effects of EPO gene doping on red blood cell count.

The process of red blood cell formation, from stem cell to red blood cell can be modelled in time through a series of partial differential equations based on the age of the cells. Each of the cell stages (BFU-E, CFU-E, erythroblasts, marrow reticulocytes and red blood cells) is represented by a partial differential equation. The general form of the equation is shown in figure 13.

Starting at cell age 0 days, when stem cells become BFU-E cells, the age span up until cell age 141 days. When red blood cells are recycled by macrophages, the cells can be broken up into populations of cells at a certain age. For every cell stage, the population density of cells at a given maturity and time can modelled as a population mesh. A population mesh is a numerical estimator that breaks down the process of red blood cell production into populations at a given age. A mesh point is the population of cells of a specific age. When modelled as a mesh, the following assumptions of the relation between a change in time and a change in maturity can be made.

Assumptions
1. For a population of cells u of age μ and at time t, a change in time Δt will result in an equal change in maturity Δμ, assuming that the maturation velocity $$vs(E(t))=1$$.
2. The maturation velocity is 1 for every stage except the marrow reticulocytes, where the time cells stay in this stage depends on the concentration of EPO.

The commitment of stems cells to becoming red blood cells is an irreversible event (Fuertinger et al. 2012). Cells cannot regress back to a previous cell type or switch to a different differentiation pathway. For this reason, assumption 1 can be made. A change in time will lead to an equal change in the age of the cell. As a result, the finer you make the population mesh the smaller the time step is. This increases the accuracy of the model as the cells react to changes in EPO concentration in time faster. Once they reach the age at which they differentiate into another cell stage, the resulting growth and decay rates that govern them will change.

When a time step occurs, the cells in each mesh point ($$u_n$$) experience growth or decay based on the stage they are in. The new population of cells is then shifted to the next age mesh point. The general form of the equation that governs the change of population density at each time step is below.

$$u_n = u_{n-1}\times e^{(\beta-\alpha(EPO(t)) \times \Delta t} \tag{20}$$

$$u_{n+1} = u_{n}\times e^{(\beta-\alpha(EPO(t)) \times \Delta t} \tag{21}$$

$$\Delta t = \mu_{n+1} - \mu_n = \Delta \mu \tag{22}$$

The dynamics of the mesh in response to a gene doping injection can be seen in figure 15.

To calculate the total population of each stage, the trapezoid rule was used to calculate the area under the population mesh and estimate the total population.

#### BFU-E Cells

108 stem cells become BFU-E cells on day 0. BFU-E cells grow exponentially as they age. Their proliferation rate is not dependent on EPO. As result, the population of BFU-E cells grows at a constant exponential rate as they age.

$$BFU{-}E_n = BFU{-}E_{n-1}\times e^{\beta_{BFU-E} \times \Delta t} \tag{23}$$

$$BFU{-}E_{n+1} = BFU{-}E_{n}\times e^{\beta_{BFU-E} \times \Delta t} \tag{24}$$

Cells committed to become red blood cells continue to grow in this manner until they reach the age of 7 days. At this point, the BFU-E differentiate into CFU-E cells.

#### CFU-E Cells

CFU-E cells are strongly dependent on EPO for their survival.

$$CFU{-}E_n = CFU{-}E_{n-1}\times e^{(\beta_{CFU{-}E}-\alpha_{CFU{-}E}(EPO(t))) \times \Delta t} \tag{25}$$

$$CFU{-}E_{n+1} = CFU{-}E_{n}\times e^{(\beta_{CFU{-}E}-\alpha_{CFU{-}E}(EPO(t))) \times \Delta t} \tag{26}$$

The equation shows that the apoptosis rate of CFU-E cells is dependent on the concentration of EPO at a given time. As the concentration of EPO increases, the apoptosis rate of CFU-E cells decreases.

The logistic equation that governs the apoptosis rate of CFU-E is seen in the equation below.

$$\alpha_{CFU{-}E}(EPO(t))= \frac{(a_1 - b_1)}{1+e^{k_1 \times EPO(t) - c_1}}+b_1 \tag{27}$$

The CFU-E stage starts once cells reach the age of 7 days and continues until cells differentiate into Erythroblasts at an age of 13 days.

#### Erythroblasts

The proliferation of erythroblasts does not depend on the concentration EPO. However, the population of cells entering the erythroblast stage at an age of 13 days depends on the concentration of EPO due to the CFU-E stage coming before.

$$Erythroblasts_n = Erythroblasts_{n-1}\times e^{\beta_{Erythroblasts} \times \Delta t} \tag{28}$$

$$Erythroblasts_{n+1} = Erythroblasts_{n}\times e^{\beta_{Erythroblasts} \times \Delta t} \tag{29}$$

Upon reaching the age of 18 days, erythroblast cells differentiate into marrow reticulocytes.

#### Marrow Reticulocytes

For the marrow reticulocytes, where the marrow transit time varies with EPO concentration, the same mesh method is used. Even with sufficient iron supply, a constant fraction of marrow reticulocytes is phagocytosed (Fuertinger et al. 2012). This is represented by a constant apoptosis rate independent of EPO concentration.

$$Reticulocytes_n = Reticulocytes_{n-1}\times e^{-\alpha_{Reticulocytes} \times \Delta t} \tag{30}$$

$$Reticulocytes_{n+1} = Reticulocytes_{n}\times e^{-\alpha_{Reticulocytes} \times \Delta t} \tag{31}$$

The amount time it takes for marrow reticulocytes to leave the bone marrow and enter the blood decreases as the concentration of EPO increases. A faster marrow transit time means marrow reticulocytes spend more time as blood reticulocytes. The oldest marrow reticulocytes will become RBCs first when marrow transit time decreases. To account for the change in transit time, the stage boundary between reticulocytes and red blood cells is allowed to shift depending on the concentration of EPO.

The logistic equation that governs the marrow transit time can be seen below.

$$\mu_{Reticulocytes,max}(EPO(t)) = a_2 - \frac{b_2}{1+e^{k_2 \times EPO(t) - c_2}} \tag{32}$$

Reticulocytes will stay in the bone marrow until age 18.75 to 21 days after which they leave the bone marrow and circulate in the blood, where they mature into red blood cells.

#### Red Blood Cells and Blood Reticulocytes

There is a fixed rate of random daily breakdown of red blood cells due to random daily breakdown or, internal or external bleeding. This is represented by a constant apoptosis rate independent of EPO concentration.

$$RBCs_n = RBCs_{n-1}\times e^{-\alpha_{RBCs}(EPO(t), \, \mu_{RBCs}) \times \Delta t} \tag{33}$$

$$RBCs_{n+1} = RBCs_{n}\times e^{-\alpha_{RBCs}(EPO(t),\, \mu_{RBCs}) \times \Delta t} \tag{34}$$

Between the ages of of 35 to 42 days, young red blood cells will undergo neocytolysis if the concentration of EPO in the blood drops below 9.8 mU/ml. The neocytolysis rate will increase as EPO levels drop until the level of EPO reach 3.3 mU/ml, when it maximum.

$$\alpha_{RBCs}(EPO(t), \mu_{RBCs}) = \alpha_{RBCs, \, r} + min\Big(\frac{c_E}{EPO(t)^{k_E}}, \, b_E \Big),$$ $$\, for \, EPO(t) < \tau_E, \, 35 \, days \leq \mu_{RBCs} \leq 42 \, days \tag{35}$$

$$\alpha_{RBCs}(EPO(t), \mu_{RBCs}) = \alpha_{RBCs, \, r}, \, otherwise \tag{36}$$

#### Feedbackloop via EPO

The natural release of EPO from the kidneys depends on the partial pressure of oxygen in the blood. If the partial pressure of oxygen in the blood decreases due to a lack of red blood cells, more EPO is released. This leads to an increase in red blood cell production, leading to a higher population of circulating red blood cells and a higher partial pressure of oxygen. The partial pressure of oxygen and the number of circulating red blood cell are therefore assumed to be proportional.

The amount $$E^{end}_{in}(t)$$ of EPO released by the kidney can be estimated by the use of the total population of red blood cells $$RBCs(t)$$, which consist of all circulating red blood cells. The equation that governs the natural release of EPO uses a scaled red blood cell count with TBVbeing the total blood volume, and is shown below:

$$EPO^{end}_{in}(t) = \bigg(\frac{(a_3 - b_3)}{1+e^{k_3 \times \tilde{M}(t) - c_3}}+b_3\bigg)\ \times \frac{1}{TBV} \tag{37}$$

$$\tilde{M}(t) = 10^{-8} \times \frac{RBCs(t)}{TBV} \tag{38}$$

The behavior of the endogenous EPO concentration Eend(t) in plasma is modeled by the following ordinary differential equation:

$$\frac{dEPO_{end}(t)}{dt} = EPO^{end}_{in}(t) - k_{deg\,protein\,extracellular}EPO_{end}(t) \tag{39}$$

When gene doping has been administered, the concentration of gene doping EPO at a given time is given by Equation 19. It is assumed that the degradation of natural EPO and gene doping EPO occurs at the same rate. The overall concentration of EPO in plasma consists of the naturally produced erythropoietin and the administered gene doping EPO:

$$EPO(t) = EPO_{end}(t) + EPO_{doping}(t) \tag{40}$$

The parameters used in the model are based on the parameters derived in Fuertinger et al. 2012, except for a3 and b3 which are derived from Roberts, 2011.

Table 4. Table of the parameters used in the red blood cell production model, their values, their units, and their meaning.
Parameters Values Units Meaning
$$\beta_{BFU-E}$$ 0.2 1/day Proliferation rate for BFU-E cells
$$\beta_{CFU-E}$$ 0.57 1/day Proliferation rate for CFU-E cells
$$\beta_{Erythroblasts}$$ 1.024 1/day Proliferation rate for erythroblasts
$$\mu_{BFU-E, \, max}$$ 7 Days Maximal maturity for BFU-E cells
$$\mu_{CFU-E, \, min}$$ 7 Days Minimal maturity for CFU-E cells
$$\mu_{CFU-E, \, max}$$ 13 Days Maximal maturity for CFU-E cells
$$\mu_{Erythroblasts, \, min}$$ 13 Days Minimal maturity for erythroblasts
$$\mu_{Erythroblasts, \, max}$$ 18 Days Maximal maturity for erythroblasts
$$\mu_{Reticulocytes, \, max}(EPO(t))$$ 18.75 to 21 Days Maximal maturity for marrow reticulocytes
$$\alpha_{Reticulocytes}$$ 0.089 1/day Rate of ineffective erythropoiesis in the marrow reticulocytes stage
$$\alpha_{RBCs}$$ 0.005 1/day Intrinsic mortality rate for erythrocytes
$$a_1, \, b_1$$ 0.35, 0.07 1/day Constants for the sigmoid apoptosis rate for CFU-E cells
$$c_1, \, k_1$$ 3, 0.14 Dimensionless, ml/mU Constants for the sigmoid apoptosis rate for CFU-E cells
$$a_2, \, b_2$$ 3.225, 2.475 Days Constants for the sigmoid maturation velocity/marrow transit time for marrow reticulocytes
$$c_2, \, k_2$$ 2.3, 0.2 Dimensionless, ml/mU Constants for the sigmoid maturation velocity/marrow transit time for marrow reticulocytes
$$a_3, \, b_3$$ 9.1, 0.2 Dimensionless, ml/mU Constants for the sigmoid function governing the release of EPO from the kidneys
$$\mu_{RBCs, \, neocytolysis \, min}$$ 35 Days Lower bound of erythrocytes which are possibly exposed to neocytolysis
$$\mu_{RBCs, \, neocytolysis \, max}$$ 42 Days Upper bound of erythrocytes which are possibly exposed to neocytolysis
$$\mu_{RBCs, \, max}$$ 141 Days Maximal life span for red blood cells
$$b_E$$ 0.1 1/day Constant in the mortality rate for red blood cells
$$c_E$$ 3.5 mU3/(ml3$$\times$$day) Constant in the mortality rate for red blood cells
$$k_E$$ 3 Dimensionless Exponent in the mortality rate for red blood cells
$$\tau_E$$ 9.8 mU/ml EPO threshold for neocytolysis
$$k_{deg\,protein\,extracellular}$$ 1.04 1/day Degradation rate of EPO in the blood
$$S_0$$ 108 1/day Rate at which cells are committing to the erythroid lineage
$$TBV$$ 5000 ml Total blood volume

# 3. Results

### 3.1 sgRNA Array Model

The main purpose of our algorithm is creating a tool to generate an array of sgRNAs necessary to screen for gene doping with our novel targeted sequencing platform. The model works with any input gene . It was tested for the EPO gene, as EPO is our main target for gene doping. We worked with the following information:

• Gene cds sequence: Human EPO cds (GenBank: BC143225.1)
• Type of Cas: dxCas9
• PAM sequence: NG
• sgRNA length: 20 bp of target sgRNA
• Off target possibility (seed to target): 10 bp (50 % adjacent to PAM should be identical)

As seen in figure 20, the algorithm detected several PAM sequences close to exon-exon junctions and found the minimum number of necessary guides for each one.

Figure 20 clearly shows that in junction 3 there is the optimal PAM sequence with smallest number of sgRNAs possible. This way, the algorithm generates the sgRNAs and gives an output with the position of such sgRNA. In this specific case, there were two PAM sequences near Junction 3 that had the same minimal number of possible sgRNAs necessary (12 guides each). The algorithm does not ignore one or the other, but outputs both options (or more if available). This array can be used to generate the library of sgRNAs for targeted next generation sequencing of gene doping.

### 3.2 Gene Doping Model

The concentration of EPO gene doping DNA in blood increases rapidly in the first 1.5 days after injection before decreasing exponentially. Due to the rapid clearance of the adenoviral vectors from the blood and muscle, the majority of the infection events occurs right after injection. As more infected cells die, the number of infected cells decreases. This decreases the amount of doping DNA released into the blood over time, as seen in figure 21.

Using regression, the half-life of EPO gene doping DNA in blood was determined to be around 41 days.. While the half-life of cfDNA in blood is around 10 minutes, the slow release of the gene doping DNA from the dying infected cells significantly increases the detection window. EPO gene doping cDNA administered to cynomolgus macaques using adeno-associated viral vectors was detectable for up to 57 weeks after injection in infected white blood cells (Ni et al. 2011). The concentrations ranged from 333 to 500 copier per mL. While the target cell of the study was muscle cells, the DNA was found in white blood cells due to off target infections. Lymphocytes, a type of white blood cell, can live up to several months (Tough and Sprent, 1995). Our detection window could be further increased in the future if we moved away from detected doping DNA in plasma and isolated white blood cells instead.

Microdosing was determined to be the best doping method for doping athletes. The benefit of this method is that it avoids detection through the biological passport. Assuming that the athlete begins the treatment prior to becoming a professional athlete and continues microdosing during, their red blood cell count would appear to be constant and naturally high.

The red blood cell production model is combined with the compartment and infection model to determine the effects on EPO gene doping on red blood cell count. Intravenous (IV) and intramuscular (IM) injection are compared, and two methods for gene doping are tested. The first consists of a single large injection while the second consists of a large initial dose followed by smaller doses every 20 days.

#### Large Single Dose Gene Doping

IV and IM exhibit similar behavior with regards to EPO and red blood cell production. However, IM injection requires more vectors than IV for the same response due to the vectors being destroyed in the muscle and having to pass into the bloodstream. The detection limit of the doping DNA for large dose administration is 261 days after administration. The effects of EPO gene doping on red blood cell count is prevalent until 136 days after injection. The red blood cell count passes 30 trillion cells 19 days after injection. It reaches its maximum count of 35.9 trillion cells 57 days after injection before dropping below 30 trillion 136 days after injection. Assuming the athlete is trying to maximize his or her performance for a competition, the gene doping test could be conducted during the competition and the doping athlete would be caught. Gene doping athletes are also at risk of being caught through the athletes biological passport system implemented by WADA. The biological passport is an individual electronic record of a professional athlete which profiles biological markers of doping. Every time an athlete is tested for doping, his blood cell count is recorded. Any large fluctuations in red blood cell count would flag the athlete as a potential blood doper. Another problem with single dose administration is that red blood cell count falls below normal levels after the doping EPO wears off. This causes a decrease in the athlete’s performance following gene doping. For these reasons, athletes are likely to avoid single dose methods in favor of multidosing.

#### Microdosing Gene Doping

In multidose gene doping, an initial large dose of gene doping EPO DNA is injected to rapidly increase the red blood cell count. Following this, smaller doses are given to maintain the desired red blood cell count. This can be seen in Figure 22. For IV administration, an initial dose of 64 billion vectors is used followed by smaller doses of 18 billion vectors every 20 days. For IM administration, an initial dose of 96 billion vectors is used followed by smaller doses of 27 billion vectors every 20 days.

The downside of this method is the requirement for constant microdosing. Repeated injections would cause noticeable damage to the veins. The athlete would then be required to disguise the injection sites or perform the IV microdosing in parts of the body normally covered. For this reason athletes would likely favor IM injection to IV injection as IM is less invasive.

Though the microdoses are smaller, the copies of DNA stay above our detection limit (40,000 to 50,000 fragments per mL) due to IM injections occuring every 20 days. While the athlete would bypass detection through the biological passport, they would be detected by our gene doping detection method.

To determine the quantity of EPO gene plasmids to be injected, the effect of the steady state concentration of EPO on the steady state red blood cell count is determined. This is achieved by removing the feedback loop that causes the red blood cell count to have an effect on EPO production. EPO concentration can then be set at a desired level and the PDEs used to model the system can be reduced to solvable interconnected ODEs. The resulting equations are shown below, where $$EPO_{SS}$$ is the set steady state EPO concentration in the blood.

BFU-E Cells

$$BFU{-}E(\mu_{BFU{-}E}) = S_0\times e^{\beta_{BFU-E} \times \mu_{BFU{-}E}} \tag{41}$$

#### CFU-E Cells

$$CFU{-}E(\mu_{CFU{-}E}) = BFU{-}E(\mu_{BFU{-}E, \, max}) \times e^{(\beta_{BFU-E}-\alpha_{CFU{-}E}(EPO_{SS})) \times (\mu_{CFU{-}E} - \mu_{BFU{-}E, \, max})} \tag{42}$$

#### Erythroblasts

$$Erythroblasts(\mu_{Erythroblasts}) = CFU{-}E(\mu_{CFU{-}E, \, max})\times e^{\beta_{Erythroblasts} \times (\mu_{Erythroblasts}-\mu_{CFU{-}E, \, max})} \tag{43}$$

#### Marrow Reticulocytes

$$Reticulocytes(\mu_{Reticulocytes}) = Erythroblasts(\mu_{Erythroblasts, \, max})\times e^{-\alpha_{Reticulocytes} \times (\mu_{Reticulocytes}-\mu_{Erythroblasts, \, max})} \tag{44}$$

#### Red Blood Cells and Blood Reticulocytes

$$RBCs(\mu_{RBCs}) = Reticulocytes(\mu_{Reticulocytes, \, max})\times e^{-\alpha_{RBCs}(EPO_{SS}, \, \mu_{RBCs}) \times (\mu_{RBCs}-\mu_{Reticulocytes, \, max})} \tag{45}$$

#### Results of Steady State Analysis

When the EPO feedback loop is incorporated, the time dependent model converged to an EPO concentration of 9.8 mU/ml, just above the neocytolysis trigger. In the steady state model, the red blood cell count is 24.72 trillion total circulating cells at an EPO concentration of 9.8 mU/ml. The error between the steady state model and the estimated count by Lichtman et al. 2005 of 24.98 trillion is 1.05%. Due to the large range of red blood cell counts and the low error between the model steady state and the expected average, the model parameters were not optimized to exactly match the expected red blood cell count.

The density of the population mesh determines how many mesh points each day is split into. A mesh density of 10 means there are 10 points per day ($$\Delta t = \Delta \mu =$$ 0.1 days), while a mesh density of 100 means there are 100 points per day ($$\Delta t = \Delta \mu =$$ 0.01 days). Figure 26 shows that increasing the density of the population mesh results in a reduction in the difference (error) between the steady state solution and the dynamic steady state solution. It also shows that increasing the mesh density increases the run time of the model exponentially.

The error in the dynamic model with a mesh density of 100 and the steady state model is 0.55%. This is an acceptable degree of error between the two models. The error in the dynamic model is reduced by 0.015% by increasing the population mesh from 100 to 200. The run time of the model is increased from about 2.5 minutes to around 15 minutes when increasing the mesh density from 100 to 200. For these reasons the mesh density was kept at 100 throughout the EPO gene doping tests.

The red blood cell feedback loop on EPO release from the kidneys was implemented to determine the dynamic effect of changing EPO concentration. The initial population of the mesh was varied to determine is the starting conditions had an effect on the final steady state. The initial mesh was set to 0, steady state values, a population of 1010 at each maturity point, and a population of 1011 at each maturity point. Negative populations were not considered.

For every initial population mesh, with the population density set to 100, the dynamic model reached a steady state of 24.59 trillion red blood cells. Due to the combination of feedback loops in the model, the red blood cell count will reach the same steady state regardless of the initial population in the mesh. To change the steady state of the dynamic model, the parameters in the feedback functions would need to be modified until a desired steady state red blood cell count is reached. This was not done in our model.