Difference between revisions of "Team:Marburg/Measurement"

Line 89: Line 89:
 
     </div>
 
     </div>
 
   </div>
 
   </div>
<div class="skipTarget" skipname="Data analysis workflow"></div>
+
 
 
<div class="collapsible">
 
<div class="collapsible">
 
     <div class="btn_expand">Data analyisis</div>
 
     <div class="btn_expand">Data analyisis</div>
 
     <div class="content">
 
     <div class="content">
 +
<div class="skipTarget" skipname="Data analysis workflow"></div>
 
       <p>  
 
       <p>  
 
<figure style="width: 50%; float: right">
 
<figure style="width: 50%; float: right">

Revision as of 20:48, 17 October 2018

Measurement
Measure what is measurable, and make measurable what is not so.
-- Galileo Galilei

Highly characterized parts are an absolute prerequisite for synthetic biology to enable rational design of DNA constructs. While thousands of parts were analyzed for E. coli, no such data is available for V. natriegens making design of plasmids a guessing game.
To tackle this foundational challenge for our chassis, we established a measurement and data analyzation workflow tailored to its unmatched growth rate. Initially, we carefully examined the plasmidal context yielding the highest dynamic range in reporter experiments and concluded superior performance when utilizing the lux operon and ColE1. Subsequently, we applied this protocol to obtain reproducible data for promoter strengths - including inducible promoters, - insulation by our connectors and expression influence of different oris. qPCR experiments provided additional insights into plasmid copy number dependent on reporter expression. We are certain that our data, in combination with our Marburg Collection, will foster the widespread utilization of V. natriegens in synthetic biology.

This page focuses on developing an experimental and data analysis workflow for all experiments that were performed with a platereader. The characterization of promoters, RBS, terminators and our connectors, can be found in the respective section on the result page.
In addition to platereader experiments, we also used methods (e.g. qPCR) to additionally support our measured data. (Link to Results).
Establishing a fast and convenient workflow for platereader experiments
We realized that a new organism requires overthinking of previous methods. We developed a workflow that is tailored to the extremely fast growth of V. natriegens. An experimental approach was combined with a novel data analysis workflow to obtain highly reproducible characterization data.
Experimental approach

Most of our data were obtained by measuring the expression of reporters in platereader experiments. In the first attempts, we failed to obtain reproducible data by applying workflows that are commonly used for E. coli. We realized that a new chassis requires overthinking existing procedures and we decided to establish a workflow for platereader experiments tailored to V. natriegens which respects species specific properties, primarily its unbeaten doubling time.

Many platereader workflows for E. coli start with growing overnight cultures of certain samples, measuring the OD600 and dilute all samples in a 96 well plate to a defined OD600 (e.g. 0.05). Depending on the number of samples, pipetting the test plate can easily take up to 45 minutes.
When first following this approach with V. natriegens, we realized that a different workflow is needed for the worlds fastest growing organism. We tested how much V. natriegens grows during the preparation of a full 96 well plate. We set up an experiment using a stationary culture of V. natriegens with a plasmid weakly expressing lux - to achieve realistic conditions - and diluted this culture 1 : 100 in a 96 well plate with one pipetting step taking 30 seconds. The result of this experiment can be seen in figure 1 A.
The data show an obvious trend towards higher OD600 for wells that were pipetted first. Seemingly, V. natriegens is able to recover from stationary phase and undergo almost one cell division in 45 minutes at room temperature. Please note that performing this experiment with a culture in the exponential phase, as described in some E. coli protocols, would most likely result in an even stronger trend.
With this in mind, we managed to establish a workflow that omits measuring and independently diluting individual wells.
We tested to cultivate our precultures - inoculated from glycerol stocks - directly in 96 well plates and incubated this plate for five to six hours in a platereader or shaking incubator. For V. natriegens, this time frame is sufficient for all cultures to reliably reach stationary phase which equals an overnight culture for E. coli. The cultures are then diluted in two steps: 1:50 and 1:40, finally resulting in a 1:2000 dilution of the preculture. Compared to the commonly used workflows for E. coli, our approach does not consider the OD600 of individual wells but instead dilutes all cultures by the same factor.
The OD600 after 1:50 dilution of the 96 well plate is shown in figure 1 B. Despite not calculating dilutions for individual wells, the range of values is explicitly smaller and additionally no positional bias can be observed.
A
B
Figure 1:
Heterogeneity of 96 well plate after preparation

A) One pipetting step each 30 s, starting with A1
B) Intermediate dilution (1:50) of a preculture
Moreover, we realized that inoculating with a low cell number is advantageous when working with V. natriegens. This prolongs the exponential phase, the period in which most relevant data are acquired. A 1:2000 dilution results in a cell density much lower than the inoculum used in most E. coli experiments.The subsequent, kinetic measurement can be completed in as less as five to six hours.

Data analyisis

Figure 2: Steady state level with constant degradation and production rates
The black horizontal line represents a constant production rate and the green curve indicates the degradation rate depending on the concentration
Besides the experimental workflow, we established an approach for fast and accurate data analysis. From a mathematical point of view, constant production and constant degradation rates result in a steady state level of a molecule. This is visualized in figure 2. While the absolute degradation increases depending on the concentration of a molecule (e.g. a reporter protein), synthesis from a constitutive promoter remains constant, independent of the concentration. In the context of our reporter experiments, the concentration represents the number of reporter per cell. The intersection of both graphs yield the steady state concentration because degradation and synthesis rates are balanced.

In most of our experiments we analyze the lux operon which is constantly expressed. From a mathematical point of view, the dilution of a molecule by expanding the volume equals a degradation. For a growing bacterial culture, this dilution is caused by growth and cell divisions. We assume the dilution rate to be constant throughout the exponential growth phase of a culture. Consequently, the ratio of signal/OD600 is expected to be constant in this time window.

With this in mind, we aimed to create an algorithm identifying the exponential growth phase and calculating the signal to OD600 ratio. In our first experiments we tried to identify a point in time at which all cultures grow exponentially, and failed.
We were not able to obtain reproducible data for two reasons. Firstly, cultures grow differently, depending on the starting concentration and fitness differences caused by varying test constructs and expression strengths. Secondly, individual measurements are highly variable (we saw this especially for OD600 measurements). Therefore performing calculations with a single value is highly susceptible to outliers.
Figure 3: Sampling of time points
The threshold is indicated with a dotted line at OD600 = 0.2. The time point that first exceeded 0.2. is shown in red and the data points that are included in the range are displayed in green. The mean of the luminescence/OD600 ratio of all points is calculated to assign a value to this well.
The mean CV of all wells over the three independent experiments for each combination of OD600 threshold and range is shown on the Z-axis
We solved both issues by developing a Matlab script that identifies the exponential phase and includes a range of seven measuring points for each individual well. The culture is assumed to be in the mid exponential phase at an OD600 of 0.2. Three time points (5 min intervals), each before and after the culture has first reached 0.2, are taken. Then the mean of all signal to OD600 ratios is calculated. Through this calculation, a single value is assigned to each well representing the strength of reporter expression in the exponential phase. All samples were measured in four technical replicates in three subsequent independent experiments.
Consequently, every result (e.g. promoter strength) is, in total, the mean of 84 measurements. This high number of raw data leads to a high degree of reproducibility and, as we believe, to highly accurate characterization data.

The choice of 0.2 as the OD600 threshold and the range of three time points was not randomly chosen but selected after calculating the mean CV for combinations of OD600 thresholds and ranges. We tested OD600 thresholds ranging from 0.1 to 0.6 with a step size of 0.01 and ranges around this data point from 0 to 10. Please note that the range is applied bidirectionally, meaning that a range of 10 considers 10 data points before and after a well has reached the OD600 threshold resulting in averaging 21 time points.
Figure 3: Evaluating the CV between different days depending on OD600 threshold and time point range
The threshold indicates the OD600 value which has to be reached by each individual well to determine the values considered for analyzation. The range sets the number of time points before and after the well reached the threshold that are used for mean calculation.
The mean CV of all wells over the three independent experiments for each combination of OD600 threshold and range is shown on the Z-axis
This 3D par plot allows for a visual estimation of advantageous combinations of OD600 threshold and range of data points. Seemingly, using a rather low OD threshold (0.15 - 0.25) and a medium range size (2 - 5) yield a plateau of low coefficient of variation (CV) values as low as 10 %. In contrast, high OD thresholds (<3) and high ranges (<5) result in an explicit increase in CV data up to 38 %.
As discussed previously, expression data can be assumed to be constant throughout the exponential growth phase. To our experience, the cultures leave the fully exponential phase around OD600 = 0.5 - 0.6 resulting in a divergence from the previously constant luminescence/OD ratio which leads to a loss of reproducibility for high OD600 thresholds. Slightly higher CV values can also be observed for very low OD thresholds. We noted that the OD600 fluctuates significantly when measuring very low cell concentrations, presumably due to technical inaccuracies. This effect can drastically effect data points for low OD600 values (<0.1). Generally, we consider using a high range useful because this increases the number of data points that are averaged, thus decreasing the impact of single outliers. However, for a fast growing organism like V. natriegens, the time of a culture with a OD600 high enough to yield reliable measurement results but low enough to be exponentially growing is limited. Therefore an excessive range leads to inclusion of either too low or too high data points, which again reduces reproducibility.
We consider this analysis as a foundation for the selection of the OD600 threshold 0.2 and a range of 3 that was used in the analysis of all our platereader experiments. This enabled us to achieve high reproducibility of our data from three subsequent, independent experiments.
View example Matlab Code
clear all
close all
%% Import Platereader raw data
Lux_Data_raw_Day1 = xlsread('Connector_Lux_071018.xlsx','All Cycles');
OD_Data_raw_Day1 = xlsread('Connector_OD_071018.xlsx','All Cycles');
 
Lux_Data_raw_Day2 = xlsread('Connector_Lux_071018_2.xlsx','All Cycles');
OD_Data_raw_Day2 = xlsread('Connector_OD_071018_2.xlsx','All Cycles');
 
Lux_Data_raw_Day3 = xlsread('Connector_Lux_081018.xlsx','All Cycles');
OD_Data_raw_Day3 = xlsread('Connector_OD_081018.xlsx','All Cycles');
 
%% Merge all data in two matrices
OD_Data(:,:,1) = OD_Data_raw_Day1;
OD_Data(:,:,2) = OD_Data_raw_Day2;
OD_Data(:,:,3) = OD_Data_raw_Day3;
 
Lux_Data(:,:,1) = Lux_Data_raw_Day1;
Lux_Data(:,:,2) = Lux_Data_raw_Day2;
Lux_Data(:,:,3) = Lux_Data_raw_Day3;
 
%% Blank substraction and cut off
for h = 1:size(Lux_Data,3)
for k = 1:size(Lux_Data,2)
        Blank = mean(Lux_Data([60,72,84,96],k,h));
    for j = 1:size(Lux_Data,1)
        Lux_Data(j,k,h) = Lux_Data(j,k,h)-Blank;
        if Lux_Data(j,k,h) < 50
           Lux_Data(j,k,h) = 50; % Cut off for Lux signal
        end
    end
end
end
 
for h = 1:size(OD_Data,3)
for k = 1:size(OD_Data,2)
        Blank = mean(OD_Data([60,72,84,96],k,h));
    for j = 1:size(OD_Data,1)
        OD_Data(j,k,h) = OD_Data(j,k,h)-Blank;
        if OD_Data(j,k,h) < 0.01
           OD_Data(j,k,h) = 0.01; % Cut off for OD
        end
    end
end
end
 
%% Calculating ratio around OD 0.2
 
Ratios = zeros(size(OD_Data,1),1,3);
for q = 1:size(OD_Data,3) % looping over days
for o = 1:size(OD_Data,1) % looping over samples
    for p = 1:size(OD_Data,2)-3 % looping through time points
        if OD_Data(o,p,q) > 0.2 && OD_Data(o,p+1,q) > 0.2 && Ratios(o,q) == 0
           Ratios(o,q) = mean(Lux_Data(o,p-3:p+3,q)./OD_Data(o,p-3:p+3,q));
        end
    end
end
end
 
%% Calculating mean and std of individial samples
 
for k = 1:12
Samples(k,1,1:3) = mean(Ratios([k,k+12,k+24,k+36],:)); % calculate mean of technical triplicates
Samples(k,2,1:3) = std(Ratios([k,k+12,k+24,k+36],:)); % calculate std of technical triplicates
end
 
for k = 13:24
Samples(k,1,1:3) = mean(Ratios([k+36,k+48,k+60,k+72],:)); % calculate mean of technical triplicates
Samples(k,2,1:3) = std(Ratios([k+36,k+48,k+60,k+72],:)); % calculate std of technical triplicates
end
 
%% Sort samples for top and bottom row
 
Names = {'J23100', '5Con1 long','5Con2 long','5Con3 long','5Con4 long',...
        '5Con5 long','5Con1 short','5Con2 short','5Con3 short',...
        '5Con4 short','5Con5 short','5Con1 long','5Con2 long','5Con3 long',...
        '5Con4 long','5Con5 long','5Con1 short','5Con2 short','5Con3 short',...
        '5Con4 short','5Con5 short', 'Promoter Dummy'};
Top_row = Samples(1:11,:,:);
Bot_row = Samples(13:23,:,:);
 
SortedValues = [];
for k = 1:size(Top_row,1)
SortedValues = [SortedValues; Top_row(k,:,:); Bot_row(k,:,:)];
end
SortedValues = [SortedValues;Samples(12,:,:)];
SortedValues(22,:,:) = [];
 
%% Calculate relativ strength
 
Relative_strength = SortedValues;
for h = 1:size(SortedValues,3)
for k = 1:size(SortedValues,1)
    Relative_strength(size(SortedValues,1)-k+1,2,h) = ...
        Relative_strength(size(SortedValues,1)-k+1,2,h)/Relative_strength(1,1,h);
    Relative_strength(size(SortedValues,1)-k+1,1,h) = ...
        Relative_strength(size(SortedValues,1)-k+1,1,h)/Relative_strength(1,1,h);
end
end
 
%% plot relative strenths
 
figure(1) % constructs with J23100
hold on
bar(mean(Relative_strength([1:11,22],1,1:3),3),'facecolor',[125/255,202/255,97/255])
errorbar((mean(Relative_strength([1:11,22],1,1:3),3)),...
    std(Relative_strength([1:11,22],1,1:3),1,3),'linestyle','none','color','k')
set(gca, 'YScale', 'log')
ylabel('normalized Luminescence/OD_6_0_0')
set(gca,'Color','w')
xticks([1:12])
xticklabels(Names([1:11,22]))
xtickangle(45)
ylim([0.001 3])
yticks([0.001,0.01,0.05, 0.1, 0.5, 1.0, 2.0 ])
yticklabels([0.001,0.01,0.05, 0.1, 0.5, 1.0, 2.0])
 
figure(2) % constructs with promoter dummy
hold on
bar(mean(Relative_strength([1,12:22],1,1:3),3),'facecolor',[125/255,202/255,97/255])
errorbar((mean(Relative_strength([1,12:22],1,1:3),3)),...
    std(Relative_strength([1,12:22],1,1:3),1,3),'linestyle','none','color','k')
set(gca, 'YScale', 'log')
ylabel('normalized Luminescence/OD_6_0_0')
set(gca,'Color','w')
xticks([1:12])
xticklabels(Names([1,12:22]))
xtickangle(45)
ylim([0.001 3])
yticks([0.001,0.01,0.05, 0.1, 0.5, 1.0, 2.0 ])
yticklabels([0.001,0.01,0.05, 0.1, 0.5, 1.0, 2.0])
 

Evaluating the optimal plasmidal context
The used ori and reporter can greatly influence the outcome of the characterization of parts. We tested different oris and reporters to identify a combination with superior performance.
Finding the best reporter

Figure 4: Mean ratio of reporter signal over medium blank during the coarse of the experiment.
After having established a reliable workflow for V. natriegens, we investigated four different reporters and measured the signal to blank ratio. Test constructs (shown in figure 5) were built by using the same set of parts except for the coding sequence. sfGFP, RFP, YFP and the lux operon were analyzed for their performance in V. natriegens. The best signal to blank ratio by far was achieved for the lux operon (2000), followed by sfGFP (3), RFP (1) and YFP (no detectable signal). The main explanation for the superior performance of the lux operon is the almost complete absence of background signal without reporter expression. This makes the lux operon a perfect reporter that can even be used to analyze extremely low levels of expression caused by very weak promoters or terminator read through. Based on this finding, we decided to use the lux operon as our reporter for all subsequent experiments.

A
B
C
D
Figure 5: Test constructs for reporter experiment
Plasmids were built with four different reporters.
A) Lux B) RFP C) sfGFP D) YFP
In contrast to fluorescence reporters, the enzymes expressed from the lux operon lead to continuous emission of light. This can result in increased cross talk between neighboring wells. The extent of cross talk highly depends on the type of 96 well plate that is used in the experiment. We analyzed the cross talk in clear and black 96 well plates by placing a single lux expressing sample in well C3 and filled all remaining wells with medium. As can be seen in figure 6, the signal from a single well is sufficient to significantly illuminate a huge portion of the clear plate (~ 1 % signal overflow to neighboring wells) while the cross talk is reduced tenfold when using a black plate (~ 0.1 % signal overflow to neighboring wells).
A
B
Figure 6: Luminescence pattern in clear (A) and black (B) 96 well plate
200 µL of one Lux expressing sample was placed in C3 while all other wells were filled with medium
Thus, we used black plates and payed attention not to place the brightest cultures in direct proximity to the darkest cultures. Therefore we do not see crosstalk as a decisive argument against the lux operon. However, algorithms are under development that will allow for a mathematical correction to further improve the performance of the lux operon as reporter (Mauri et al. unpublished)

Finding the best ori

Figure 7: Testing the Lux expression from plasmids with different oris
Data were normalized over the strongest construct ColE1. Error bars represent the standard deviation of the measurements of three independent experiments
The dynamic range of a reporter experiment does not only depend on the used reporter but also on the copy number of the tested plasmids, which is determined by the used origin of replication. We wanted to identify the ori which yields the highest dynamic range when expressing the lux operon. To do that, we constructed three plasmids expressing Lux. All parts, except for the ori, were identical and tested them for signal strength. We obtained the highest expression from the construct harboring the ColE1 ori, followed by p15A and pMB1 (figure 7). We suggest that ColE1 yields plasmids with the highest copy number. We performed qPCR experiments that that support this hypothesis. We observed a qualitative correlation between copy number and expression strength. As a high dynamic range is essential for analyzing weak expression levels, we chose ColE1 as our default ori for all subsequent experiments.

B. Marchal