Semantic Containment Modelling
Semantic Containment Failure Rate
Introduction
In any safety system it is vitally important to evaluate how likely failures are to occur both to allow the quantification of risk and to assess the fitness of the particular system, our semantic containment system is no different. We wanted to be able to assess our semantic containment system in a way that would be comparable to other safety systems in different disciplines. The performance level system (PL) was set in 2006 to assess the suitability of electronic parts used in control systems. The PL system ranks parts over 5 different categories defined by the probability of a dangerous failure within an hour and hence we produced a mathematical model to assess our semantic containment system in the same manner.
In order to assess the semantic containment system on the PL ranking we need to calculate the probability of a failure occuring in the system during a 1 hour time period. For our purposes we will be defining a failure of the system as a successful read through of a semantically contained part occuring within a wild type organism resulting in transcription. In order for a successful read through to occur all reassigned amber codons within the part would have to be bound by a serine RNA rather than molecule of release factor 1 during the time span of a single read.
Mass Action Equations
We developed a system of mass action equations which describe amber codons in 3 distinct states; unbound, bound by serine RNA and bound by release factor 1. Codons move between these states based on binding and unbinding rates for serine RNA and release factor 1.
Ordinary Differential Equations
From our mass action equations we derived a system of ordinary differential equations (ODEs) that when solved describe the changes in the number of codons in each state over time. The system of ODEs was then evaluated over a 24 hour time period with rates being chosen from normal distributions the means of which were our affinities from literature. With each parameter setting the ratio between the different codon states was logged and used to calculate the probability of consecutive codons bound by serine RNA. Having calculated the probability of consecutive serine RNA bound codons we can observe that this is the same as the probability of a single read through of our semantically contained part. As the length of time required for a single read through can be calculated from the traversal speed of RNA polymerase and the gene length we can attain the number of possible reads per hour and by multiplying this with our probability for a single successful read through find the average frequency of failures per hour required by the performance level system.
Results
Figure 1 displays the probability of a successful read through of a semantically contained part for 1 to 16 amber codons.
We can observe from Figure 1 that as could be expected the probability of successful read through increases dramatically as more amber codons are added. The increasingly large probability range as more amber codons are considered results from an increased variance across across runs caused by larger and larger pools of amber codons being considered.
Failures per Hour
Figure 2 displays the average frequency of failure per hour, the metric upon which performance level is assessed.
Figure 3 displays the probability of dangerous failures per hour corresponding to each level of the performance level system.
We can see by comparing Figures 2 and 3 that although our 1 amber codon semantic containment part will be too failure prone to be considered on the PL systems ranking and our 5 amber codon part sneaks into the ranking with an a ranking (the lowest performance level) our 10 amber codon part performs exceedingly well far exceeding the required performance to achieve the top ranked e rating.
Conclusion
The performance level system usually functions together with required performance level (PLr) which assesses the PL ranking that should be reached for parts to be safe for a particular purpose. By following the flow chart below we can ascertain the PL rating that semantically contained parts should attain.
At the first junction semantic containment clearly falls into slight severity of injury as an instance of failure in our scenario is the production of an mRNA molecule from a gene. Due to the cell continually attempting to read our semantic containment parts the frequency of exposure to hazard is frequent-to-continuous. Finally the possibility of avoiding hazard is scarcely possible as we are unable to effect wild type cells in the environment to for example reduce their transcription rate hence to fulfill safety criteria of the PLr system we should be able to attain PL level c with our semantic containment parts. As stated in results our 10 amber codon part far exceeds this with a top ranked e rating, to put this into perspective the probability of failure of a 10 amber codon semantic containment part is roughly equal to being struck by lightening in consecutive years.
Ordinal Logistic Regression Classifier
Introduction
Despite the robustness of our semantic containment system its is important to plan for scenarios of failure no matter how unlikely. By using an ordinal logistic regression classifier we are able to predict which of our semantic containment parts an organism is expressing based on its growth curve at different antibiotic concentrations. This allows us to make predictions of which of our semantic containment parts may have been transfered by horizontal gene transfer.
Methodology
Ordinal variables are both discrete and ordered - for example a temperature scale of cold, tepid, hot is comprised of ordinal variables. The number of amber codons in our semantic containment part is for the purposes of this classifier the ordinal variable which we wish to predict. Here we used the Mord python libraries ordinal logistic regression (OLR) functionality. Mord follows the methodology set out by McCullagh et al 1980. I order to train and evaluate the success of our classifier we used k-Fold cross validation. k-Fold cross validation employs the seperation of input data (in our case growth curves) into k groups (in our case k=5), in each training iteration k-1 groups are used to train the classifier whilst the final kth group is used as test data to assess the model performance. Model performance is then evaluated as an average across all folds, operating in this manner maximises the size of both our training and test data sets leading to the best trained classifier and best estimate of performance on totally unseen data.
Results
Contact EdiGEM18
Feel free to leave us a comment on social media!