In order to choose the enzyme to prove the concept of CAT-Seq method, a library of various hydrolyzing enzymes were expressed using In vitro transcription and translation kit. The newly synthesized proteins were mixed with substrate nucleotide of our choice - N4-benzoyl-2'-deoxycytidine triphosphate. This type of substrate nucleotide shows a strong absorbance at 310 nm due to the benzene ring conjugating with the pyrimidine ring of cytosine. The removal (hydrolysis) of the substrate from the nucleotide abolishes this absorbance making it convenient for tracking catalysis and kinetics of enzymes. Fig. 1 shows the kinetics of 16 different hydrolase family enzymes as a decay of substrate absorbance intensity at 310 nm. As seen from these results some of the enzymes show no affinity to this type of enzyme, some of them catalyse a minor. Based on the speed of catalysis the RM537 (The CAT-Seq esterase) was chosen.
Figure 1. The hydrolase library screening results. The larger absorbance signified the removal of the substrate cytidine modifications. The chosen esterase for CAT-Seq is colored green (CAT-Seq Esterase).
Kinetic characterization of CAT-Seq Esterase
After the biomolecule of interest for Substrate Nucleotide Catalytic conversion was chosen, we termed it CAT-Seq esterase. This enzyme will be used to prove the workflow concepts of CAT-Seq.
Figure 2. Michaelis Menten Curve generated for CAT-Seq Esterase enzyme. The initial velocity of the enzyme was determined spectrophotometrically as a decay of absorbance at 310 nm at different starting substrate nucleotide concentrations. The graph shows the average values of three independent experiments.
Before starting the experiments with this chosen enzyme, a kinetics assay was first carried out. Spectrophotometric kinetic data based on decay of absorbance at 310 nm due to substrate nucleotide catalytic conversion was gathered using a range of starting substrate nucleotide concentrations. The Michaelis Menten curve for the CAT-Seq esterase was plotted using the acquired data. Data shown in figure 2, show perfect fit (R2 = 0.9449) to a standard Michaelis Menten curve. Next, Michaelis Menten plot transformations were performed in order to verify the Km and Vmax of the CAT-Seq enzyme. 3 Fig., display Lineweaver-Burk and Hanes-Woolf transformation plot. Both plots show great correlation (R2 > 0.7) to the linear function. Based on the equation koeficients in the Hanes-Woolf transformation, the experimentally determined Vmax value is 17.2 µM/min, Km value is 86 µM.
Figure 3. Two Michaelis Menten Curve transformations for CAT-Seq Esterase enzyme. Lineweaver-Burk and Hanes-Woolf transformations for Michaelis Menten curve generated earlier. Both of the transformations were performed on data acquired from 3 independent experiments. Graphs show the fitted linear function and its correlation coefficient.
CAT-Seq library preparation employs PCR amplification for the addition of adapter sequences to 5’ and 3’ ends of DNA templates. 5’ adapters house T7 promoter and RBS (or any other regulatory sequence). Both adapters can be barcoded and include the restriction site, used to circulate the DNA.
2. Library preparation
Figure 4. Relative activity of In silico generated CAT-Seq esterase mutants. CAT-Seq esterase mutant sequences were generated In silico and synthesised using in vitro transcription and translation kit. The graph shows the relative catalytic activity of each generated mutant measured spectrophotometrically and the corresponding mutation site.The activity was normalized to wild type (WT) enzyme.
In order to test the viability of our system and create additional parts for various strength regulatory sequence screening, 10 esterase mutants housing mutations at bioinformatically predicted sites were created. Each of the mutant was constructed utilizing PCR and synthesized using In vitro transcription and translation kit and their catalytic activity towards N4-benzoyl-2'-deoxycytidine triphosphate were tested. The reaction kinetics were measured using the spectrophotometer as a decrease of absorbance. Figure 4 displays the relative hydrolysis speed of each mutant generated. As seen from these results, a variety of mutants, showing different catalysis speeds were produced. Some of the amino acids changes affected the activity drastically, for example Trp224 to Tyr, Lys227 to Arg or Glu509 to Lys. Other, in silico designed mutations only modulated the activity Asn107 to Asp or Glu194 to Ala. Additionally, large 8 amino acid deletion at position Pro348-Hy356 caused only a moderate decrease in enzymes activity. These results conclude, that a control library of various activity esterase mutants were generated which we can now use to test the viability of overall CAT-Seq system and can be employed for screening of various strength regulatory.
Regulatory sequence library
In order to prove the viability of CAT-Seq approach as a method for highthroutput regulatory sequence screening and orthogonality measurements a mock library, consisting of 4 ribosomes binding sites (BBa_B0030, BBa_B0032, BBa_B0034, BBa_K2621038) and 9 toehold riboswitch variants based on Green et al., 2014 were designed.
Ribosome binding sites
To investigate the potential of CAT-Seq regulatory sequence screening, 4 ribosome binding sites have been utilized upstream of the esterase gene. The strength of each ribosome binding site was investigated by expressing the esterase enzyme using the In vitro transcription and translation kit and spectrophotometrically measuring the as a decrease of absorbance due to the hydrolysis of substrate nucleotides.
Figure 5. Relative activity esterase enzyme expressed using different RBS sites.
CAT-Seq esterase mutant sequences were placed under 4 different RBS control. The enzyme was synthesised using cell free expression system and catalytic activity of each generated variant was measured spectrophotometrically as a decrease of substrate absorbance at 310 nm. The shown data is normalized to Bba_B0034 RBS activity.
The figure 5 displays the relative hydrolysis speed (normalized to BBa_B0034) of CAT-Seq esterase expressed using 4 different ribosome binding sites. Based on the fact that the same enzyme and substrate nucleotide concentration are used, the difference in activity corresponds to difference in synthesised biomolecule concentrations. The expression of an enzyme depends on the particular strength of used ribosome binding sites. Taking this into consideration, the measured relative activity corresponds to strength of ribosome binding sites.
Toehold regulatory sequence library
To explore the capabilities of CAT-Seq workflow as a method for screening novel transcription or translation regulatory sequences a control riboregulator library was constructed based on Green et al., 2014. 9 library members, composed of 3 toehold sequencesand 3 activating RNA sequences, termed Trigger RNA, were designed to test the orthogonality and regulatory characteristics of each part.
Each of the regulatory part, consisting of one toehold sequence upstream of the esterase gene and one trigger sequence were constructed. First of all, the orthogonality of each toehold:activating RNA pair and they regulatory characteristic have been tested in bulk. The constructed library members were synthesized using In vitro transcription and translation kit and their catalytic activity towards N4-benzoyl-2'-deoxycytidine triphosphate (Substrate Nucleotide) were tested. The reaction kinetics were measured using the spectrophotometer as a decrease of absorbance due tue hydrolyzed substrate nucleotide.
Figure 6 displays the relative hydrolysis speed of each regulatory part variant in a form of matrix. The decrease of absorbance shown in Y graph corresponds to the catalytic conversion of substrate nucleotides. As seen from control experiment, in which standard esterase was expressed, the decline of absorbance over time is seen. Taking these results into consideration, the same decrease of absorbance is only seen in the diagonal of the matrix. This means, that active catalytic molecules is expressed only when both regulatory molecules of the same group are present - Toehold 1 with Trigger 1, Toehold 2 with Trigger 2 and Toehold 3 with Trigger 3. None of the regulatory sequences show any cross talk with the other group.
Figure 6. The catalytic activity of each Toehold:Trigger RNA construct matrix. CAT-Seq esterase mutant sequences were placed downstream the riboregulatory sequences with corresponding trigger part and they catalytic activity was measured. The decrease of absorbance corresponds to catalytically active enzyme. The graph shown as a matrix concludes that only Toehold sequences expressed with their corresponding Trigger RNA produce an active enzyme molecule.
We can conclude that the three toehold switch pairs are working as intended and show little to no cross-interaction. These switch pairs can further be used in CAT-Seq in order to measure the orthogonality and strength of these pairs in a high throughput manner, and therefore assess the accuracy and precision of Catalytic Activity Sequencing.
3. Library encapsulation into droplets
Pico-liter volume droplets, housing In vitro transcription and translation reagents, DNA template and substrate nucleotides are produced by perpendicular excising the constant flow of water phase with fluorinated carrier oil.
Movie 1. 10 pL biomolecule synthesis droplet generation.
Movie shows considerably slowed down generation of 10 pL water in oil emulsion. Horizontally arriving water phase housing In vitro transcription and translation reagents, DNA template, substrate nucleotides are excised by perpendicular flow of fluorinated carrier oil at a rate of 20 x 106 droplets per hour.
The formed emulsion is collected into a polytetrafluoroethylene (PTFE) tubing and incubated off chip in 37oC incubator for 4 hours to synthesize the biomolecule which catalyses the conversion of substrate nucleotides.
4. Catalytic biomolecule production
The effect of DNA circularization
In-vitro protein expression kits, encapsulated in the droplets together with only a single DNA template per droplet show extremely poor protein synthesis efficiency, as the template DNA concentration is very low. The DNA template was circularized without and omitted of any transcriptional terminators in order to initiate the rolling cycle transcription described by literature (Diegelman and Kool, 1998). Please read more about the rolling cycle transcription in the design section of the CAT-Seq project. The effect of DNA circularization on transcription and thus translation rate was measured by employing In vitro transcription and translation kit with two different DNA templates. Both of the templates encode an esterase gene flanked by T7 RNA promoter and RBS, however one of them was circularized and had no terminator sequence. Two differently synthesized biomolecules (one linear, and one circularized) were mixed with N4-benzoyl-2'-deoxycytidine triphosphates - the Substrate Nucleotides. The hydrolysis rate of substrate nucleotide was tracked by measuring the decay of absorbance with spectrophotometer. As seen from data showed in 7 Fig., the result of DNA circularization is ~ 3 times faster substrate nucleotide hydrolysis. As the screening conditions were the same (identical enzyme and substrate concentrations), the difference in catalysis speed arises only due to difference in enzyme expression. This means that there is 3 fold increase in enzyme amount when the RCT is employed. These results conclude, that highly processive T7 RNA polymerase produces more RNA and thus increases the protein yield if the DNA template used in transcription is circular and contains no terminating sequences.
Figure 7. The effect of rolling cycle transcription. The effect of DNA circularization on transcription and thus translation rate was measured by employing In vitro transcription and translation kit with two different DNA templates: circulated with no terminator sequences and linear. The hydrolysis rate of substrate nucleotide was tracked by measuring the decay of absorbance with at 310 nm.
5. Information recording (merging, amplification, reference nucleotides)
Before the catalytic activity of biomolecule could be recorded into DNA sequence as a ratio of incorporated product and reference nucleotides, suitable conditions for multiple displacement amplification (MDA) reaction and droplet merging had to be determined.
However, first we choose to create a mathematical phi29 DNA amplification model that imitates the amplification of the CAT-Seq Esterase DNA template with random hexamers (https://2018.igem.org/Team:Vilnius-Lithuania-OG/Model). The data generated by using the mathematical phi29 model informed us about a potential inhibitory effects of substrate nucleotides and the effect it might have on amplified DNA fragment size distribution. Based on the output of the model we set out to investigate the inhibitory effects caused by substrate nucleotides.
Figure 8. Amplified DNA correlation to substrate nucleotide concentration in the reaction. DNA, amplified using multiple displacement amplification was synthesised using 750µM dNTP concentration with varied substrate nucleotide concentration. Decreasing the concentration of substrate nucleotides the amount of DNA amplified increases as seen by the increased intensity of the DNA band and smear.
The first checkpoint in the journey for determining experimental conditions for CAT-seq workflow was to investigate the working concentrations of dNTP and substrate nucleotides and and their potential inhibitory effect. MDA reactions were performed with different dNTP and substrate-dCTP concentrations and the amplified DNA was analyzed on agarose gel. Results shown on Fig. 8 hint that large concentrations of substrate bound nucleotides might inhibit the reaction, as the the amplified DNA band fades as more substrate nucleotides are added. The DNA smear seen in the gel is a characteristic part of the rolling circle amplification. The final concentration of substrate bound dCTP molecules was chosen to be 25 µM in the amplification reaction.
Figure 9. Phi29 amplified DNA dependence on the concentration of nucleotide concentrations. DNA, amplified using multiple displacement amplification was synthesised using 750 - 100µM dNTP concentration with or without 25µM substrate nucleotide concentration. Decreasing the concentration of nucleotides decreases the amount of DNA amplified only when concentrations lower than 200 µM are used. Red dashed line highlight the chosen condition for MDA reaction.
Next, we seeked to determine the lowest native nucleotide concentration for the MDA reaction usable with the corresponding 25 µM concentration of substrate nucleotides. While substrate dCTP concentration was constant, the concentration of dNTPs was gradually lowered. In addition to this, reaction without modified nucleotides was carried too, to determine whether the nucleotide concentration or ratio to substrate-dCTP is the product limiting factor. The results indicate that 200 µM and 100 µM final concentrations are too low to be used with 25 µM sub-dCTP, therefore 300 µM was chosen to be the final working dNTP concentration (Fig. 9).
Figure 10. 50pL droplet MDA reaction emulsion brightfield and fluorescence image.
Circulized DNA templates were encapsulated into 50pL droplets together with MDA reaction reagents. The emulsion was incubated off chip at 30oC for 6 hours. The amplified DNA was stained with SyBr green and imaged using a microscope. Pictures show brightfield and fluorescence at 488 nm of the same spot.
In bulk reaction conditioning allowed us to quickly screen the required concentrations for the workflow, as droplet microfluidics and require separate, time consuming experiment to try out the each setting. After the initial in bulk conditioning followed the experiments in picoliter sized droplets. First of all, the chosen dNTP concentrations were tested in water-oil emulsion environment by encapsulating single circular DNA templates into 50 pL droplets with the same MDA reaction mix. The video seen above display a considerably slowed droplet generation procedure. The reaction mixture comes from the side and stable droplets are made by cutting the water phase with inert carrier oil. The collected emulsion is incubated off chip at 30oC for 6 hours. 10 µL of the emulsion is stained with SyBr green to and imaged at 488 nM to validate the amplification of DNA inside droplets in a digital PCR manner. ADisplays the emulsion off the chip and the fluorescence of droplets stained with Sybr Green. The occupancy of droplet is kept at 1 template per 10 droplets (Poisson distribution lambda = 0.1) in order to remove the chance of partitioning two templates in a single droplet.
Figure 11. Droplet MDA reaction product comparison to analogous in bulk reaction product. Circulized DNA templates were encapsulated into 50 pL droplets together with MDA reaction reagents (Droplet). The same reaction reagent concentrations were used for 20 µL in bulk MDA reaction (In bulk). Both of the reaction were incubated 30oC for 6 hours. The amplified DNA was analyzed using agarose gel.
Additionally, amplified DNA product is analysed in agarose gel. As seen from the figure 11, the set reaction conditions work perfectly with droplet MDA reaction. The droplet reaction produces a much sharper and cleaner DNA band as opposed to in bulk reaction smeared DNA. These results arise from the fact that each single DNA template is compartmentalized in separate water in oil droplets (Rhee et al., 2016). therefore there is no template switching between different templates and each droplet has a fixed population of reagents for the amplification. Based on these results it could be concluded that single DNA template is amplified in picoliter droplets and produces cleaner DNA fragments compared to analogous in bulk reaction.
Figure 12. Droplet MDA reaction with standard cytidines or 5-methylcytdines amplified product agarose gel analysis.MDA reaction were performed with standard (dCTP) and 5-methylated (5mC) cytidines in 50 pL droplets. The emulsions was incubated 30oC for 6 hours. The amplified DNA was analyzed using DNA agarose gel electrophoresis. L and HR correspond to GeneRuler 1kb and High Range DNA Ladders.
The next step was to identify if reference nucleotides - in this case 5-methylcytidine triphosphates (5mC) are accessible to phi29 polymerase and do not inhibit the reaction. Once again 50 pL droplets were formed containing MDA reaction reagents as previously, only this time it contained 300 uM concentration of methylated instead of simple dCTP molecules. The generated emulsion is incubated off chip at 30oC for 6 hours. As seen from the amplified DNA analysis on agarose gel (Fig. 12), the methylated nucleotides pose no problem for the reaction to occur and produce a heavier (expected because of additional -CH3 group) DNA fragments. These results conclude that 5-methylcytidines can be used as reference nucleotides, which can be differentiated from other forms of cytidine nucleotides during nanopore sequencing.
Figure 13. Analysis of activity amplification reaction product.The concept of catalytically converted nucleotide incorporation (activity recording) was tested by performing in bulk (20 µL) or droplet (50 pL) MDA reactions. Both of the reaction were incubated 30oC for 6 hours. 1. No primer contro; 2. MDA reaction with standard cytidine nucleotides; 3. MDA reaction with substrate nucleotides as the only source of cytidines; 4. MDA reaction with substrate nucleotides as the only source of cytidines with added CAT-Seq esterase enzyme. L and HR correspond to GeneRuler 1kb and High Range DNA Ladders.
When the working concentration of the MDA reaction have been established in droplets, we ought to test the simplified approach of our system. For this, our catalytic biomolecule of interest, which catalyzes the hydrolysation of substrate bound nucleotides was purified using 6x histidine tag. MDA reactions containing substrate nucleotides as the only source of cytidine triphosphate (no reference nucleotides) were performed in bulk and 50 pL droplets with or without the addition of extracted enzyme. The amplified DNA product agarose gel analysis results prove, that the presence of added enzyme enable the MDA reaction to take place. The substrate bound nucleotides are inaccessible to phi29 polymerase meaning that the reaction mix is missing 1 type of nucleotide triphosphates. For this reason the polymerase is unable to amplify DNA, therefore there no DNA product is detected (13 Fig., 3 well). However, the addition of purified catalytic biomolecule to the reaction mix and separate droplets catalyse the conversion of phi29 inaccessible substrate nucleotides to accessible ones and in turn enables the amplification reaction to occur (13 Fig., 4 well). These results conclude, that MDA reaction compartmentalization in microliter droplets works as intended - substrate nucleotides are inaccessible to phi29 polymerase and catalytically converted one are accessible and thus can be incorporated into the DNA molecule.
Figure 14. Microscopic images of 10 pL, 40 pL and 50 pL emulsion.
After assessing the conditions for activity recording amplification, the next step was to optimise the droplet electrocoalescence step and prove that it enables the addition of amplification reagents to the biomolecule synthesis droplets without disrupting the compartmentalization of single DNA molecules and information, stored as concentration of catalytically converted substrate nucleotides in it.
Movie 2. Reinfusion of previously generated 10 pL into the microfluidic droplet electrocoalescence device. The rejected closely packed emulsion is spaced by continuous flow of fluorinated carrier oil flowing from the sides and meets newly generated MDA amplification droplets in the main device channel.
Droplet merging technique involves a delicate step: reinfusion of previously generated biomolecules synthesis droplets. We have successfully optimized the reinfusion methodology by trying out different ways to collect and reinfuse the emulsion. By collecting the 10pL droplets generated during previous library encapsulation steps into a piece of tubing and using it as a vessel to incubate droplets at 37oC we have successfully avoided any disruption of reinjected droplet integrity as shown by a video above ( Video 2). The reinjected emulsion is closely packed, is stable and contains no air bubbles in between leading to perfect droplet reinfusion.
The most important factors of droplet coalescence are voltage applied to the device electrodes, the flow rates of droplets and the volume ratio of droplets merged together. Any of these parameters could decrease the efficiency of droplet merging and thus affect the overall performance of whole method. Videos shown below display the consequences of bad droplet merging parameters.
Movie 3. Droplet merging using not optimized parameters.
Figure 15. Three species of droplets after the electrocoalescence
To choose the best droplet merging parameters we relied on fluid mathematical modeling (click here for the model). The output the constructed model predicted, that the most suitable droplet volume ratios for the electrocoalescence should lie in the range of 1:3 to 1:5. Based on the fact, that droplets in which biomolecule are synthesised using in vitro transcription and translation reactions are 10 pL we choose the MDA amplification droplets to be 40 pL. Movie 4 displays the droplet merging we achieved after optimizing the parameters
Movie 4. Optimized droplet merging
Movie 5 displays the generation of 40 pL droplets containing the multiple displacement reaction mix and the reinfusion of previously generated 10 pL droplets. Droplet merging occurs at the junction between two electrodes. The amplification droplets are 1.25x times more concentrated, because the reaction mixture dilutes 4 times after droplet fusion: 10 pL + 40 pL = 50 pL. The videos below show, that we achieved near perfect droplet merging without disrupting the compartmentalization or stability of the fused droplets.
Movie 5. 10 pL biomolecule synthesis reinfusion, 40pL amplification droplet generation and their corresponding merging
Droplet electrocoalescence and DNA amplification combined
Figure 16. Analysis of DNA amplification reagent addition using droplet merging. The first image (lane 1) depicts the amplified DNA product generated by merging: 40 pL MDA reaction reagent (with dCTP) droplets with DNA template housing 10 pL droplets. Second and third images show the amplified DNA product, received by merging: 40 pL MDA reaction (No dCTP) droplets with 10 pL droplets housing DNA template, substrate nucleotides with (+Est) or without (No Est) added CAT-Seq esterase. All the droplet reaction were incubated 30oC for 6 hours. L and HR correspond to GeneRuler 1kb and High Range DNA Ladders.
To test the feasibility of chosen droplet coalescence volume ratios simple MDA reaction were tested. First of all, single circular DNA templates were encapsulated into 10 pL droplets and the emulsion was collected in a 1 mL syringe. Then, these small droplets were reinjected into the microfluidic droplet merging device. At the same moment 40 pL droplets, containing the MDA reaction components without a DNA template were generated at a constant rate (~ 100 000 000 droplets per hour). The droplet electrocoalescence was initiated by applying 10 kHz 300 mV square electrical wave. Fused 50 pL droplets were collected and incubated off chip at 30oC for 6 hours. The analysis of reaction prove (16 Fig. 1), that the content of two compartments fused and single DNA templates were amplified as the product looks exactly the same as simple droplet MDA reaction.
Next, a control experiment, mimicking the full microfluidic workflow of CAT-seq was performed. First, 10 pL droplets, housing in vitro transcription and translation reaction reagents, pUC19 DNA, substrate nucleotides and purified esterase were generated and incubated in 37oC for 3 hours. After the incubation, the droplets were reinfused into the microfluidics merging device and fused with 40 pL droplets, containing MDA reaction reagents without DNA template and dCTP molecules. The merged emulsion was collected and incubated at 30oC for 6 hours. The DNA agarose gel analysis show that ONLY if extracted CAT-Seq esterase enzyme is added the DNA is amplified (16 fig., well +Est). These results conclude that substrate nucleotides, catalytically converted by the added enzyme in 10 pL droplets, are incorporated into the DNA strand by phi29 polymerase which was successfully added to the reaction by employing droplet electrocoalescence. It also proves, that any of the reagents, present in the reaction do not affect the stability of substrate nucleotides, as no product is seen when the esterase is not added.
Proof of droplet merging.
Figure 17. Analysis of CAT-Seq activity embedded DNA library preparation workflow. Two species of 10 pL droplets, housing Esterase DNA template (1) or pUC19 (2), IVTT reaction reagents and substrate nucleotides were generated, incubated at 37oC and merged with 40 pL amplification reagents with no cytidine nucleotides. After the 6 incubation at 30oC the amplified DNA product was analyzed in agarose gel. Intensive dark RNA band at ~ 1kb correspond to IVTT reaction kit RNA (tRNA, rRNA and etc.).. L and HR correspond to GeneRuler 1kb and High Range DNA Ladders.
In order to show, that the CAT-Seq workflow for activity recorded DNA library preparation is working, two experiments with different starting DNA templates and no reference nucleotides have been carried out. Two populations of 10pL biomolecule synthesis droplets housing IVTT reaction components, substrate nucleotides and circularized Esterase gene or a pUC19 (control) plasmid were generated. After the incubation in 37oC for 3 hours the droplets were reinfused into the droplet merging device. At the same time droplets, housing MDA reaction components without the reference nucleotides(5mC) were generated at ~2 500 000 droplets per hour. The merged droplets were collected and incubated at 30oC for 6 hours. After that, the amplified DNA was extracted using Ampure beads and analyzed on agarose gel. Because no reference nucleotides were used, the only source of cytidine nucleotides in the reaction were substrate nucleotides. However, these nucleotides cannot be incorporated into the DNA, unless the synthesized biomolecule catalyses the conversion, in this example the hydrolysis, of substrate nucleotides. 17 fig. DNA band migration results conclude, that:
Substrate nucleotides are stable and are not affected by any of the components in the CAT-Seq workflow as there is no DNA product when pUC19 template is used (17 fig. 2 well).
Biomolecule of interest is being synthesised in the 10pL droplets and indeed catalyzed the hydrolysis of substrate nucleotides, as DNA is amplified after the merging because a band is visible (17 fig 1 well).
The amplification of DNA concludes the viability of droplet electrocoalescence technique and proves that catalytic information initially stored in 10 pL droplets is carried to the DNA during the amplification, because these nucleotides are incorporated in it.
6. CAT-Seq data processing and dynamic range assessment
The final step of CAT-Seq workflow is sequencing the prepared DNA library embedded with biomolecule activity using the Nanopore.
Movie 7. Nanopore DNA sequencing.
Data processing and pipeline preparation
For every analysis, the sequencing reads were basecalled using ONT Albacore v2.3.1 and demultiplexed using Japsa Package (https://github.com/mdcao/japsa). The demultiplexed reads were binned to different folders using our NanoCycler.sh script. To identify whether a cytosine base is methylated we used a forked version of SignalAlign (Rand et al. 2017) tool (GitHub branch: kmer_event_alignment). First, we trained our own HMM using a control sequencing run of a CAT-Seq Esterase gene amplified with only the reference nucleotides (dATP, 5’-mdCTP, dGTP, dTTP). The resulting HMM was used for all the downstream analyses involving methylation basecalling.
After demultiplexing, every sample is analyzed using SignalAlign and our prepared HMM to estimate probabilities that a k-mer in the sample contains product cytosines. Since the probability that a methylated cytosine will be incorporated at a specific site is uniform, the posterior probability distributions of having a methylated cytosine for every kmer are bimodal, with one peak close to 0, and the other close to 1 (Figure 18).
Figure 18. Example density plot of posterior probability frequencies for all kmers of different samples to have methylated cytosines.
However, this does not cause problems when calculating the mean for the same kmer. Ideally, if our sequenced molecules were uncut, we could calculate the means for every long read kmer and estimate an error, however due to this limitation we are only able to calculate errors between technical replicates.
To decrease noise in our analysis pipeline, for every experimentwe select k-mers that differentiate well between the control runs, where either only reference or the product cytidine was used. The thresholds for good differentiation are selected as an average probability for the sample kmers to have product cytosines, in the range of 0 - 0.2 for the reference and 0.4 - 1.0 for the product cytidine control samples. After filtering for the selected k-mers we then extracted the mean probability for every sample and performed normalization by boundary controls to obtain the final score.
CAT-Seq dynamic range assessment
Before the activity of earlier generated control libraries could be measured a control experiment was carried out. The role of such experiment was to test the sensitivity of nanopore sequencing as a tool for reading the information encoded with the DNA as a ratio of incorporated cytidine and reference nucleotides. For this, multiple displacement amplification reactions were carried out in 50 pL droplets containing the circularized CAT-Seq esterase gene and different ratios of substrate, reference and product nucleotides:
The amplified DNA was extracted and prepared for nanopore sequencing. By applying data preparation and analysis pipeline discussed above, the mean probabilities for every k-mer in the sample DNA were filtered and extracted for 5’-methyl-cytidines and cytidines in all the sequencing reads. The mean methylation score (the reference nucleotide count) was then assigned to each of the sequenced products.
Figure 19. Mean methylation score (reference nucleotide count) assigned to DNA amplified in 50 pL droplet MDA reaction with 300 µM 5-methylated cytidines; 300 µM 5-methylated cytidines + 25 µM substrate nucleotide; 300 µM 5-methylated cytidines + 25 µM regular cytidines; 300 µM regular cytidines.
We then performed scaling by subtracting the baseline of Cyt 300µM sequencing reads, which corresponds to zero methylated sites and normalized it over 300µM 5mC sequencing reads (corresponding to fully methylated sequence) to obtain the final score displayed at Fig. 20.
Figure 20. Normalized mean methylation score (reference nucleotide count) assigned no to DNA amplified in 50 pL droplet MDA reaction with 300 µM 5-methylated cytidines; 300 µM 5-methylated cytidines + 25 µM substrate nucleotide; 300 µM 5-methylated cytidines + 25 µM regular cytidines; 300 µM regular cytidines.
As seen from the results, the difference in mean methylation score between 300 µM 5-methylated cytidines and 300 µM 5-methylated cytidines + 25 µM regular cytidines reads is around 40%. This change in methylation score (reference nucleotide count) proves the fundamental step of catalytic activity information recovery from the DNA. By identifying the mean methylation profile it’s possible to assign the ratio of reference (in this case 5mC) and product nucleotide incorporated into the DNA. This ratio corresponds to the starting concentrations of nucleotides present in the DNA amplification step. The starting nucleotide concentration ratio (5mC to catalytically converted nucleotide) depends on the catalytic activity of the analyzed enzyme. More active enzymes catalyse the conversion of larger amount of nucleotides. Thus, by determining the methylation score of the sequenced DNA is possible to extract the catalytic activity data of the enzyme which created the corresponding ratio of the 5mC to product nucleotides. These findings conclude, that Nanopore sequencing and present base calling algorithms are delicate enough to differentiate and assign mean methylation scores to sequencing reads which correspond to different starting nucleotide concentration ratios.
7. CAT-Seq determined activities
The final step in the journey of Catalytic activity sequencing workflow is to sequence the control libraries (esterase mutants, toehold orthogonality and RBS parts) and compare the data to results measured in bulk experiments. The correlation of the results would prove that catalytic activity of a biomolecule indeed CAN be incorporated into DNA as a ratio of nucleotides and that it CAN be extracted from it using Nanopore next generation sequencing.
Recording the sequence-activity information
The constructed in silico designed mutant library was subjected to catalytic activity sequencing. The DNA embedded with catalytic activity information was prepared and sequenced with Nanopore. By applying the same data preparation and analysis pipeline discussed above, the mean methylation scores arising from reference nucleotides for each barcoded mutant DNA template were filtered and extracted. The collected data was normalized to Wild Type CAT seq Esterase and K227R mutant (lowest activity).
Figure 21. Comparison of In bulk and CAT-Seq measured esterase mutant relative activity
In silico generated Esterase mutant library was subjected to catalytic activity sequencing. The mean methylation scores for each barcoded mutant DNA template were filtered and extracted. The collected data was normalized over Wild Type CAT seq Esterase and K227R mutant (lowest activity). The relative activity, extracted from the mean methylation score of each mutant read is compared to in data gathered in standard sized reactions (in bulk).
The relative methylation score (reference nucleotide count) of each mutant read corresponds to the activity of the enzyme it encodes. The higher the activity of the expressed enzyme, the more substrate nucleotides it converts to product nucleotides. Because product and reference nucleotides can both be incorporated into the DNA, lower methylation scores are assigned. The comparison of the results, gathered with CAT-Seq catalytic activity sequencing method and standard sized reactions kinetic measurements (Figure 21) conclude the viability of CAT-Seq approach. The activity reading, extracted from the DNA sequence correlates with the in kinetic measurement data perfectly. Except for the mutant R509K, the activity of the each Esterase mutant is measured accurately and is assigned to the corresponding DNA sequence. These results prove that CAT-Seq approach enables to screening the activity of million of enzyme variant sequences and accurately assigns the phenotype of each variant to the genotype it arises from.
Ribosome binding site activity sequencing
The constructed Ribosome Binding site library (BBa_B0030, BBa_B0032, BBa_B0034, BBa_K2621038 with a downstream CAT-Seq esterase gene BBa_K2621000) was subjected to catalytic activity sequencing method. The DNA embedded with catalytic activity information was extracted and sequenced with Nanopore. The mean methylation scores (reference nucleotide count) for each barcoded DNA template, housing different RBS were filtered and extracted.. The activity collected data was normalized to BBa_B0034 data and is shown in Fig. 22.
Figure 22. Comparison of in bulk and CAT-Seq measured ribosome binding site relative strength.
The catalytic activity of esterase gene, regulated by a library of ribosome binding sites was measured using cell free expression system in bulk or CAT-Seq approach and compared side by side. The mean methylation scores for each barcoded mutant DNA template were filtered and extracted. The collected data was normalized BBa_B0034 corresponding to mean strength of 1.
Stronger ribosome binding sites increase the yield of translated proteins and in turn increase the number of catalytically converted substrate nucleotides. This increase is inversely proportional to the assigned mean methylation score. Based on this fact, the activity results can be extracted from mean methylation scores (reference nucleotide count) and correspond to ribosome binding site strength because the same enzyme and substrate nucleotide concentration were used during the activity recording. The catalytic activity sequencing results were compared to earlier measured in bulk RBS strength results. The comparison once again concludes the viability of CAT-Seq approach. The ribosome binding site strength, extracted from the DNA sequence reference nucleotide count correlates with measurements made with accurate standard assays. These results display the validity of CAT-Seq as a method for screening the strength of regulatory sequences and its ability to assign accurate phenotype to genotype linkage.
Toehold riboswitch activity and orthogonality sequencing
The constructed Toehold regulatory sequence library constituted of different toehold and triggers pairs was constructed subjected to catalytic activity sequencing method. The DNA embedded with catalytic activity information was extracted and sequenced with Nanopore. The mean methylation scores (reference nucleotide count) for each barcoded DNA template, housing different regulatory sequence were filtered and extracted.
Figure 23. The evaluation of Toehold-Trigger riboregulatory sequence orthogonality using CAT-Seq.
The catalytic activity of esterase genes, regulated by different Toehold switches were measured using CAT-Seq. The mean methylation scores for each barcoded regulatory construct DNA template was filtered and essigned. Low methylation scores correspond to actively expressed protein and are only assigned when both Toehold and trigger sequences from the same group are present verifying the already measured orthogonality of regulatory parts.
The graph displays the mean methylation (reference nucleotide count) scores assigned to each barcoded toehold-trigger construct read. Based on the results, low methylation score are only assigned to those parts, that are constituted of Toehold and trigger sequences from the same group. This means that esterase enzyme was expressed and catalyzed the conversion of substrate nucleotides. These results correlate perfectly to the standard (not in droplet) measurement results. Based on this fact, it can be concluded that CAT-Seq activity sequencing method can be utilized as a precise and accurate way to screen and assign the activity and orthogonality of regulatory sequences in a high throughput manner.
The three proof of concept experiments presented above - mutanty library activity and sequencing recording, ribosome binding site strength determination and toehold regulatory sequence cross-interaction evaluation - conclude that CAT-Seq catalytic activity sequencing workflow is a precise and high throughput method to screen millions of catalytic or regulatory molecules that allows to precisely link the measured absolute phenotypic data to the sequenced genotype.
Alexander A.Green, Pamela A.Silver, James J.Collins, PengYin (2014). Toehold Switches: De-Novo-Designed Regulators of Gene Expression. CellVolume 159, Issue 4, 6 November 2014, Pages 925-939
Amy M. Diegelman Eric T. Kool. Generation of circular RNAs and trans -cleaving catalytic RNAs by rolling transcription of circular DNA oligonucleotides encoding hairpin ribozymes. Nucleic Acids Research, Volume 26, Issue 13, 1 July 1998, Pages 3235–3241,
Minsoung Rhee, Yooli K. Light, Robert J. Meagher, Anup K. Singh. Digital Droplet Multiple Displacement Amplification (ddMDA) for Whole Genome Sequencing of Limited DNA Samples. PLOS ONE, May 4, 2016. https://doi.org/10.1371/journal.pone.0153699
Rand, A. C., Jain, M., Eizenga, J. M., Musselman-brown, A., Olsen, H. E., Akeson, M., & Paten, B. (2017). Mapping DNA methylation with high-throughput nanopore sequencing. Nature Publishing Group, 14(4), 411–413. http://doi.org/10.1038/nmeth.4189