During the development of CAT-Seq, we have met various biotechnology field experts with the hope of finding design flaws and improvements to our system. One of these experts was Audrius Laurinėnas. At that time we had a big problem of not having enough amplified DNA in each droplet for robust sequencing. Audrius has suggested the MDA reaction. Yet, he also knew that using modified nucleotides might inhibit the MDA reaction, as phi29 DNA polymerase does not incorporate many modified nucleotides. It is a logical assumption to think that some of the Substrate Nucleotides might somehow inhibit the DNA replication reaction. We have hypothesised that the inhibitory effect of modified nucleotides might skew the amplified DNA fragment size and product yield distribution. This an important aspect to consider, as long fragments are crucial for high quality when Nanopore Sequencing is employed. Yet, if we don’t use a high enough concentration of Substrate Nucleotides, the dynamic range of our measurement might decrease dramatically.
Therefore, we have written a mathematical description of Phi29 DNA amplification which takes into account the possibility of the Phi29 inhibition by Substrate Nucleotide, in order to investigate the maximum amount of substrate nucleotides we can use without worrying about short sequencing reads.
The mathematical Phi29 model imitates the amplification of the CAT-Seq Esterase DNA template with random hexamers. First of all, random hexamer primer sequences are generated and aligned to the DNA template. This maps the beginning of the DNA amplification. After that, newly synthesized DNA templates can be immediately primed again with random hexamers. The re-priming of DNA template happens every few iterations of synthesis. Each site housing a Guanine nucleotide acts as potential spot for modified nucleotide incorporation.
The incorporation of nucleotides is based on the probabilities set by initial ratios of substrate and reference nucleotides. The model assumes that incorporation of modified nucleotide into a DNA sequence during replication initiates the termination and dissociation of phi29 polymerase from the DNA template. Every incorporation of Substrate Nucleotides is followed by a probability recalculation.
Using different ratios of substrate and reference nucleotides as an input, the distribution of DNA fragments was generated. Because the kinetics parameters and features of phi29 polymerase and multiple displacement amplification reaction are complex and not quite understood, the generated data was normalized to the output distribution of DNA fragment size when zero substrate of nucleotides.
Figure 1. Normalized mathematical model output DNA fragment distribution.
The distribution of DNA fragment size for different initial substrate and reference nucleotide ratios was generated using the mathematical phi29 model. The data is normalized to the distribution output generated with zero substrate nucleotides.
As seen from the distribution violin plots shown in figure 1, the increase of potentially inhibiting substrate nucleotide concentration (its ratio to reference nucleotides) decreases the fragment size distribution as expected. Such amplification inhibition and reduced DNA fragment size distribution could pose a difficult task during the Nanopore sequencing of amplified DNA product and short sequencing read mapping and analysis.
Now that we are aware of such possible degree Substrate Nucleotide inhibition, we set out to test and titrate the possible nucleotide concentration and ratios used in CAT-Seq workflow for best possible DNA fragment distribution results.
Figure 2. Amplified DNA correlation to substrate nucleotide concentration in the reaction.
DNA, amplified using multiple displacement amplification was synthesised using 750µM dNTP concentration with varied substrate nucleotide concentration. Decreasing the concentration of substrate nucleotides the amount of DNA amplified increases as seen by the increased intensity of the DNA band and smear.
First of all, the lower limit of substrate nucleotides (Sub-dCTP) was determined. MDA reactions were performed with different dNTP and substrate-dCTP concentrations and the amplified DNA was analyzed on agarose gel. Results shown on Fig. 2 hint that large concentrations of substrate bound nucleotides might inhibit the reaction, as the the larger amplified DNA product bands fade as more substrate nucleotides are added, just as the model predicted. The DNA smear seen in the gel is a characteristic part of the rolling circle amplification. The final concentration of substrate bound dCTP molecules was chosen to be 25 µM in the amplification reaction.
Figure 3. Phi29 amplified DNA dependence on the concentration of nucleotide concentrations.
DNA, amplified using multiple displacement amplification was synthesised using 750 - 100µM dNTP concentration with or without 25µM substrate nucleotide concentration. Decreasing the concentration of dNTP’s decreases the amount of DNA amplified only when concentrations lower than 200 µM are used. Red dashed line highlight the chosen condition for MDA reaction.
Next, we seeked to determine the lowest reference nucleotide concentration for the MDA reaction usable with the corresponding 25 µM concentration of substrate nucleotides. While substrate dCTP concentration was constant, the concentration of dNTPs was gradually lowered. In addition to this, reaction without modified nucleotides was carried too, to determine whether the nucleotide concentration or ratio to substrate-dCTP is the product limiting factor. The results indicate that 200 µM and 100 µM final concentrations are too low to be used with 25 µM sub-dCTP (ratios 1:8 and 1:4), because the upper DNA product fades considerably. Taking this into consideration, 300 µM was chosen to be the final working dNTP concentration.
To conclude, the phi29 mathematical description solutions allowed us to acknowledge the potential DNA fragment size shift as a result of inhibitory effects of substrate nucleotides. Based on the output of the model we set out to investigate the inhibitory effects caused by substrate nucleotides in the laboratory. The results generated by the model fit with the experimental data well - higher substrate nucleotide concentrations indeed inhibit the production of long DNA molecules and lead to synthesis of short ones. This model has immensely helped us to improve our CAT-Seq prototype by showing the possible crucial negative effects of substrate nucleotides. This has helped us to successfully optimize the substrate nucleotide concentration in laboratory in order to receive best possible and sequencing quality and the dynamic range for activity measurements.
The mathematical model for Modified Nucleotide Inhibitory Effect can be found here