Difference between revisions of "Team:Edinburgh UG/Semantic Containment Modelling"

 
(6 intermediate revisions by 2 users not shown)
Line 67: Line 67:
 
                 <div class="dropdown-menu" aria-labelledby="navbarDropdownMenuLink">
 
                 <div class="dropdown-menu" aria-labelledby="navbarDropdownMenuLink">
 
                   <a class="dropdown-item" href="https://2018.igem.org/Team:Edinburgh_UG/Parts">Parts Overview</a>
 
                   <a class="dropdown-item" href="https://2018.igem.org/Team:Edinburgh_UG/Parts">Parts Overview</a>
 +
                  <a class="dropdown-item" href="https://2018.igem.org/Team:Edinburgh_UG/Basic_Part">Basic Parts</a>
 +
                  <a class="dropdown-item" href="https://2018.igem.org/Team:Edinburgh_UG/Composite_Part">Composite Parts</a>
 +
                  <a class="dropdown-item" href="https://2018.igem.org/Team:Edinburgh_UG/Part_Collection">Part Collection</a>
 
                 </div>
 
                 </div>
 
             </li>
 
             </li>
Line 72: Line 75:
 
                 <a class="nav-link" href="https://2018.igem.org/Team:Edinburgh_UG/Safety">Safety <span class="sr-only">(current)</span></a>
 
                 <a class="nav-link" href="https://2018.igem.org/Team:Edinburgh_UG/Safety">Safety <span class="sr-only">(current)</span></a>
 
             </li>
 
             </li>
             <li class="nav-item dropdown">
+
             <li class="nav-item active">
                 <a class="nav-link dropdown-toggle" href="#" id="navbarDropdownMenuLink" data-toggle="dropdown" aria-haspopup="true" aria-expanded="false">
+
                 <a class="nav-link" href="https://2018.igem.org/Team:Edinburgh_UG/Human_Practices">Human Practices <span class="sr-only">(current)</span></a>
          Human Practices
+
             </li>          
                </a>
+
                <div class="dropdown-menu" aria-labelledby="navbarDropdownMenuLink">
+
                  <a class="dropdown-item" href="https://2018.igem.org/Team:Edinburgh_UG/Human_Practices">Human Practices</a>
+
                </div>
+
             </li>
+
 
             <li class="nav-item dropdown">
 
             <li class="nav-item dropdown">
 
                 <a class="nav-link dropdown-toggle" href="#" id="navbarDropdownMenuLink" data-toggle="dropdown" aria-haspopup="true" aria-expanded="false">
 
                 <a class="nav-link dropdown-toggle" href="#" id="navbarDropdownMenuLink" data-toggle="dropdown" aria-haspopup="true" aria-expanded="false">
Line 90: Line 88:
 
             <li class="nav-item active">
 
             <li class="nav-item active">
 
                 <a class="nav-link" href="https://igem.org/2018_Judging_Form?team=Edinburgh_UG">Judging Form <span class="sr-only">(current)</span></a>
 
                 <a class="nav-link" href="https://igem.org/2018_Judging_Form?team=Edinburgh_UG">Judging Form <span class="sr-only">(current)</span></a>
 +
<li class="nav-item active">
 +
                <a class="nav-link" href="https://2018.igem.org/Team:Edinburgh_UG/Medal_Criteria">Medal Criteria<span class="sr-only">(current)</span></a>
 +
            </li>
 
             </li>
 
             </li>
 
           </ul>
 
           </ul>
Line 101: Line 102:
 
           <div class="row">
 
           <div class="row">
 
             <div class="col-lg-8 mx-auto">
 
             <div class="col-lg-8 mx-auto">
               <h1 class="brand-heading">Semantic Containment Modelling</h1>
+
               <h1 class="brand-heading" align="center">Semantic Containment Modelling</h1>
 
               <p class="intro-text"></p>
 
               <p class="intro-text"></p>
 
             </div>
 
             </div>
Line 113: Line 114:
 
         <div class="row">
 
         <div class="row">
 
           <div class="col-lg-8 mx-auto">
 
           <div class="col-lg-8 mx-auto">
             <h1 class="brand-heading">Semantic Containment Failure Rate</h1>
+
             <h1 class="brand-heading" align="left">Semantic Containment Failure Rate</h1>
 
             <h2 style="text-align:left">Introduction</h2>
 
             <h2 style="text-align:left">Introduction</h2>
             <p style="text-align:left">In any safety system it is vitally important to evaluate how likely failures are to occur both to allow the quantification of risk and to assess the fitness of the particular system, our <a href="https://2018.igem.org/Team:Edinburgh_UG/Design"> semantic containment system</a> is no different. We wanted to be able to assess our semantic containment system in a way that would be comparable to other safety systems in different disciplines. The performance level system (PL) was set in 2006 to assess the suitability of electronic parts used in control systems. The PL system ranks parts over 5 different categories defined by the probability of a dangerous failure within an hour and hence we produced a mathematical model to assess our semantic containment system in the same manner.</p>
+
             <p style="text-align:left">In any safety system it is vitally important to evaluate how likely failures are to occur both to allow the quantification of risk and to assess the fitness of the particular system - our <a href="https://2018.igem.org/Team:Edinburgh_UG/Design"> semantic containment system</a> is no different. We wanted to be able to assess our semantic containment system in a way that would be comparable to other safety systems in different disciplines. The performance level system (PL) [1] was set in 2006 to assess the suitability of electronic parts used in control systems. The PL system ranks parts over 5 different categories defined by the probability of a dangerous failure within an hour and hence we produced a mathematical model to assess our semantic containment system in the same manner.</p>
 
           </div>
 
           </div>
 
         </div>
 
         </div>
Line 125: Line 126:
 
         <div class="row">
 
         <div class="row">
 
           <div class="col-lg-8 mx-auto">
 
           <div class="col-lg-8 mx-auto">
             <p style="text-align:left"> In order to assess the semantic containment system on the PL ranking we need to calculate the probability of a failure occuring in the system during a 1 hour time period. For our purposes we will be defining a failure of the system as a successful read through of a semantically contained part occuring within a wild type organism resulting in transcription. In order for a successful read through to occur all reassigned amber codons within the part would have to be bound by a serine RNA rather than molecule of release factor 1 during the time span of a single read. </a>
+
             <p style="text-align:left"> In order to assess the semantic containment system on the PL ranking we need to calculate the probability of a failure occurring in the system during a 1 hour time period. For our purposes we will be defining a failure of the system as a successful read through of a semantically contained part occurring within a wild type organism resulting in translation. In order for a successful read through to occur all reassigned amber codons within the part would have to be bound by a serine amber suppressor  tRNA rather than a Release Factor 1 protein during the time span of a single read. </a>
 
           </div>
 
           </div>
 
         </div>
 
         </div>
Line 137: Line 138:
 
             <h2 style="text-align:left">Mass Action Equations</h2>
 
             <h2 style="text-align:left">Mass Action Equations</h2>
 
                 <img src="https://static.igem.org/mediawiki/2018/1/16/T--Edinburgh_UG--mass_action_codon_bind.png">
 
                 <img src="https://static.igem.org/mediawiki/2018/1/16/T--Edinburgh_UG--mass_action_codon_bind.png">
             <p style="text-align:left">We developed a system of mass action equations which describe amber codons in 3 distinct states; unbound, bound by serine RNA and bound by release factor 1. Codons move between these states based on binding and unbinding rates for serine RNA and release factor 1. </a>
+
             <p style="text-align:left">We developed a system of mass action equations which describe amber codons in 3 distinct states; unbound, bound by serine amber suppressor tRNA (<i>supD</i>) and bound by Release Factor 1 (RF1). Codons move between these states based on binding and unbinding rates for <i>supD</i> and RF1. [2][3]</a>
 
           </div>
 
           </div>
 
         </div>
 
         </div>
Line 151: Line 152:
 
             <img src="https://static.igem.org/mediawiki/2018/6/62/T--Edinburgh_UG--odes_scf.jpg">
 
             <img src="https://static.igem.org/mediawiki/2018/6/62/T--Edinburgh_UG--odes_scf.jpg">
 
             </center>
 
             </center>
             <p style="text-align:left"> From our mass action equations we derived a system of ordinary differential equations (ODEs) that when solved describe the changes in the number of codons in each state over time. The system of ODEs was then evaluated over a 24 hour time period with rates being chosen from normal distributions the means of which were our affinities from literature. With each parameter setting the ratio between the different codon states was logged and used to calculate the probability of consecutive codons bound by serine RNA. Having calculated the probability of consecutive serine RNA bound codons we can observe that this is the same as the probability of a single read through of our semantically contained part. As the length of time required for a single read through can be calculated from the traversal speed of RNA polymerase and the gene length we can attain the number of possible reads per hour and by multiplying this with our probability for a single successful read through find the average frequency of failures per hour required by the performance level system.</a>
+
             <p style="text-align:left"> From our mass action equations we derived a system of ordinary differential equations (ODEs) that when solved describe the changes in the number of codons in each state over time. The system of ODEs was then evaluated over a 24 hour time period with rates being chosen from normal distributions with the means of which were our affinities from literature. With each parameter setting the ratio between the different codon states was logged and used to calculate the probability of consecutive codons bound by serine RNA. Having calculated the probability of consecutive serine tRNA bound codons we can observe that this is the same as the probability of a single read through of our semantically contained part. As the length of time required for a single read through can be calculated from the traversal speed of a ribosome and the gene length we can attain the number of possible reads per hour, and by multiplying this with our probability for a single successful read through, find the average frequency of failures per hour required by the performance level system.</a>
 
           </div>
 
           </div>
 
         </div>
 
         </div>
Line 167: Line 168:
 
             <figcaption class="figure-caption"></figcaption>
 
             <figcaption class="figure-caption"></figcaption>
 
             </figure>
 
             </figure>
             <p style="text-align:left"> We can observe from Figure 1 that as could be expected the probability of successful read through increases dramatically as more amber codons are added. The increasingly large probability range as more amber codons are considered results from an increased variance across across runs caused by larger and larger pools of amber codons being considered.</p>
+
             <p style="text-align:left"> We can observe from Figure 1 that as could be expected - the probability of successful read through decreases dramatically as more amber codons are added. The increasingly large probability range as more amber codons are considered results from an increased variance across runs caused by larger and larger pools of amber codons being considered.</p>
 
           </div>
 
           </div>
 
         </div>
 
         </div>
Line 180: Line 181:
 
             <p style="text-align:left">Figure 2 displays the average frequency of failure per hour, the metric upon which performance level is assessed. </p>
 
             <p style="text-align:left">Figure 2 displays the average frequency of failure per hour, the metric upon which performance level is assessed. </p>
 
             <img src="https://static.igem.org/mediawiki/2018/d/dc/T--Edinburgh_UG--fail_rt.png" heigth=750 width=750>
 
             <img src="https://static.igem.org/mediawiki/2018/d/dc/T--Edinburgh_UG--fail_rt.png" heigth=750 width=750>
             <p style="text-align:left"> Figure 3 displays the probability of dangerous failures per hour corresponding to each level of the performance level system.</p>
+
             <p style="text-align:left"> Figure 3 displays the probability of dangerous failures per hour corresponding to each level of the performance level system. [1]</p>
 
             <img src="https://static.igem.org/mediawiki/2018/9/9d/T--Edinburgh_UG--pl_sys.jpg">
 
             <img src="https://static.igem.org/mediawiki/2018/9/9d/T--Edinburgh_UG--pl_sys.jpg">
             <p style="text-align:left"> We can see by comparing Figures 2 and 3 that although our 1 amber codon semantic containment part will be too failure prone to be considered on the PL systems ranking and our 5 amber codon part sneaks into the ranking with an a ranking (the lowest performance level) our 10 amber codon part performs exceedingly well far exceeding the required performance to achieve the top ranked e rating.</p>
+
             <p style="text-align:left"> We can see by comparing Figures 2 and 3 that although the P1003* part will be too failure prone to be considered on the PL systems ranking, and our 5 amber codon part sneaks into the ranking with an 'a' ranking (the lowest performance level), our 10 amber codon part performs exceedingly well - far exceeding the required performance to achieve the top ranked 'e' rating.</p>
 
           </div>
 
           </div>
 
         </div>
 
         </div>
Line 193: Line 194:
 
           <div class="col-lg-8 mx-auto">
 
           <div class="col-lg-8 mx-auto">
 
             <h2 style="text-align:left">Conclusion</h2>
 
             <h2 style="text-align:left">Conclusion</h2>
             <p style="text-align:left">The performance level system usually functions together with required performance level (PLr) which assesses the PL ranking that should be reached for parts to be safe for a particular purpose. By following the flow chart below we can ascertain the PL rating that semantically contained parts should attain.</p>
+
             <p style="text-align:left">The performance level system usually functions together with required performance level (PLr) [1] which assesses the PL ranking that should be reached for parts to be safe for a particular purpose. By following the flow chart below [1] we can ascertain the PL rating that semantically contained parts should attain.</p>
 
             <img src="https://static.igem.org/mediawiki/2018/9/9f/T--Edinburgh_UG--plr_legend.jpg">
 
             <img src="https://static.igem.org/mediawiki/2018/9/9f/T--Edinburgh_UG--plr_legend.jpg">
 
             <img src="https://static.igem.org/mediawiki/2018/f/f2/T--Edinburgh_UG--plr_sys.jpg">
 
             <img src="https://static.igem.org/mediawiki/2018/f/f2/T--Edinburgh_UG--plr_sys.jpg">
             <p style="text-align:left"> At the first junction semantic containment clearly falls into slight severity of injury as an instance of failure in our scenario is the production of an mRNA molecule from a gene. Due to the cell continually attempting to read our semantic containment parts the frequency of exposure to hazard is frequent-to-continuous. Finally the possibility of avoiding hazard is scarcely possible as we are unable to effect wild type cells in the environment to for example reduce their transcription rate hence to fulfill safety criteria of the PLr system we should be able to attain PL level c with our semantic containment parts. As stated in results our 10 amber codon part far exceeds this with a top ranked e rating, to put this into perspective the probability of failure of a 10 amber codon semantic containment part is roughly equal to being struck by lightening in consecutive years.</p>
+
             <p style="text-align:left"> At the first junction semantic containment clearly falls into 'slight severity of injury' as an instance of failure in our scenario is the production of an complete polypeptide from a gene. Due to the cell continually attempting to read our semantic containment parts the frequency of exposure to hazard is 'frequent-to-continuous'. Finally, the possibility of avoiding hazard is 'scarcely possible' as we are unable to effect wild type cells in the environment to, for example, reduce their translation rate. Hence, to fulfill safety criteria of the PLr system we should be able to attain PL level 'c' with our semantic containment parts. As stated in results our P1003 10* part far exceeds this with a top ranked 'e' rating, to put this into perspective the probability of failure of a P1003 10* semantic containment part is roughly equal to being struck by lightening in 2 consecutive years [4].</p>
 
           </div>
 
           </div>
 
         </div>
 
         </div>
Line 206: Line 207:
 
         <div class="row">
 
         <div class="row">
 
           <div class="col-lg-8 mx-auto">
 
           <div class="col-lg-8 mx-auto">
             <h1 class="brand-heading">Ordinal Logistic Regression Classifier</h1>
+
             <h1 class="brand-heading" align="left">Ordinal Logistic Regression Classifier</h1>
 
             <h2 style="text-align:left">Introduction</h2>
 
             <h2 style="text-align:left">Introduction</h2>
             <p style="text-align:left">Despite the robustness of our semantic containment system its is important to plan for scenarios of failure no matter how unlikely. By using an ordinal logistic regression classifier we are able to predict which of our semantic containment parts an organism is expressing based on its growth curve at different antibiotic concentrations. This allows us to make predictions of which of our semantic containment parts may have been transfered by horizontal gene transfer. </a>
+
             <p style="text-align:left">Despite the robustness of our semantic containment system its is important to plan for scenarios of failure no matter how unlikely. By using an ordinal logistic regression classifier we are able to predict which of our semantic containment parts an organism is expressing based on its growth curve at different antibiotic concentrations. This allows us to make predictions of which of our semantic containment parts may have been transferred by horizontal gene transfer. </a>
 
           </div>
 
           </div>
 
         </div>
 
         </div>
Line 219: Line 220:
 
           <div class="col-lg-8 mx-auto">
 
           <div class="col-lg-8 mx-auto">
 
             <h2 style="text-align:left">Methodology</h2>
 
             <h2 style="text-align:left">Methodology</h2>
             <p style="text-align:left">Ordinal variables are both discrete and ordered - for example a temperature scale of cold, tepid, hot is comprised of ordinal variables. The number of amber codons in our semantic containment part is for the purposes of this classifier the ordinal variable which we wish to predict. Here we used the Mord python libraries ordinal logistic regression (OLR) functionality. Mord follows the methodology set out by McCullagh et al 1980. I order to train and evaluate the success of our classifier we used k-Fold cross validation. k-Fold cross validation employs the seperation of input data (in our case growth curves) into k groups (in our case k=5), in each training iteration k-1 groups are used to train the classifier whilst the final kth group is used as test data to assess the model performance. Model performance is then evaluated as an average across all folds, operating in this manner maximises the size of both our training and test data sets leading to the best trained classifier and best estimate of performance on totally unseen data.</a>
+
             <p style="text-align:left">Ordinal variables are both discrete and ordered - for example a temperature scale of cold, tepid, hot is comprised of ordinal variables. The number of amber codons in our semantic containment part is, for the purposes of this classifier, the ordinal variable which we wish to predict. Here we used the Sklearn python library [5] ordinal logistic regression (OLR) functionality. Unfortunately our OLR classifier did not perform particularly well only making correct predictions 55% of the time on a test data set. Whilst this is significantly above random performance, the results are not particularly encouraging and can mostly be put down to the use of too small a dataset as logistic regression classifiers are commonly trained on extremely large datasets of hundreds or thousands of instances.</p>
 
           </div>
 
           </div>
 
         </div>
 
         </div>
Line 229: Line 230:
 
         <div class="row">
 
         <div class="row">
 
           <div class="col-lg-8 mx-auto">
 
           <div class="col-lg-8 mx-auto">
             <h2 style="text-align:left">Results</h2>
+
             <h2 style="text-align:left">References</h2>
            <p style="text-align:left"></a>
+
                <ol>
 +
                    <li style="text-align:left"> PL and PLr systems and figures - https://www.keyence.co.uk/ss/products/safetyknowledge/performance/level/ </li>
 +
                    <li style="text-align:left"> Hetrick, Lee, Joseph, Kinetics of Stop Codon Recognition by Release Factor 1, Biochemistry 2009, https://www.ncbi.nlm.nih.gov/pmc/articles/PMC2789991/</li>
 +
                    <li style="text-align:left"> Fluitt, Pienaar, Viljoen, Ribosome Kinetics and aa-tRNA Competition Determine Rate and Fidelity of Peptide Synthesis, Comp. Biol. Chem. 2007 - https://www.ncbi.nlm.nih.gov/pmc/articles/PMC2727733/#</li>
 +
                    <li style="text-align:left"> Probability of being struck by lightning - https://news.nationalgeographic.com/news/2004/06/flash-facts-about-lightning/</li>
 +
                    <li style="text-align:left"> Sklearn - http://scikit-learn.org/stable/ </li>
 +
                </ol>
 
           </div>
 
           </div>
 
         </div>
 
         </div>

Latest revision as of 01:54, 18 October 2018

Edinburgh iGEM 2018

Semantic Containment Modelling

Semantic Containment Failure Rate

Introduction

In any safety system it is vitally important to evaluate how likely failures are to occur both to allow the quantification of risk and to assess the fitness of the particular system - our semantic containment system is no different. We wanted to be able to assess our semantic containment system in a way that would be comparable to other safety systems in different disciplines. The performance level system (PL) [1] was set in 2006 to assess the suitability of electronic parts used in control systems. The PL system ranks parts over 5 different categories defined by the probability of a dangerous failure within an hour and hence we produced a mathematical model to assess our semantic containment system in the same manner.

In order to assess the semantic containment system on the PL ranking we need to calculate the probability of a failure occurring in the system during a 1 hour time period. For our purposes we will be defining a failure of the system as a successful read through of a semantically contained part occurring within a wild type organism resulting in translation. In order for a successful read through to occur all reassigned amber codons within the part would have to be bound by a serine amber suppressor tRNA rather than a Release Factor 1 protein during the time span of a single read.

Mass Action Equations

We developed a system of mass action equations which describe amber codons in 3 distinct states; unbound, bound by serine amber suppressor tRNA (supD) and bound by Release Factor 1 (RF1). Codons move between these states based on binding and unbinding rates for supD and RF1. [2][3]

Ordinary Differential Equations

From our mass action equations we derived a system of ordinary differential equations (ODEs) that when solved describe the changes in the number of codons in each state over time. The system of ODEs was then evaluated over a 24 hour time period with rates being chosen from normal distributions with the means of which were our affinities from literature. With each parameter setting the ratio between the different codon states was logged and used to calculate the probability of consecutive codons bound by serine RNA. Having calculated the probability of consecutive serine tRNA bound codons we can observe that this is the same as the probability of a single read through of our semantically contained part. As the length of time required for a single read through can be calculated from the traversal speed of a ribosome and the gene length we can attain the number of possible reads per hour, and by multiplying this with our probability for a single successful read through, find the average frequency of failures per hour required by the performance level system.

Results

Figure 1 displays the probability of a successful read through of a semantically contained part for 1 to 16 amber codons.

We can observe from Figure 1 that as could be expected - the probability of successful read through decreases dramatically as more amber codons are added. The increasingly large probability range as more amber codons are considered results from an increased variance across runs caused by larger and larger pools of amber codons being considered.

Failures per Hour

Figure 2 displays the average frequency of failure per hour, the metric upon which performance level is assessed.

Figure 3 displays the probability of dangerous failures per hour corresponding to each level of the performance level system. [1]

We can see by comparing Figures 2 and 3 that although the P1003* part will be too failure prone to be considered on the PL systems ranking, and our 5 amber codon part sneaks into the ranking with an 'a' ranking (the lowest performance level), our 10 amber codon part performs exceedingly well - far exceeding the required performance to achieve the top ranked 'e' rating.

Conclusion

The performance level system usually functions together with required performance level (PLr) [1] which assesses the PL ranking that should be reached for parts to be safe for a particular purpose. By following the flow chart below [1] we can ascertain the PL rating that semantically contained parts should attain.

At the first junction semantic containment clearly falls into 'slight severity of injury' as an instance of failure in our scenario is the production of an complete polypeptide from a gene. Due to the cell continually attempting to read our semantic containment parts the frequency of exposure to hazard is 'frequent-to-continuous'. Finally, the possibility of avoiding hazard is 'scarcely possible' as we are unable to effect wild type cells in the environment to, for example, reduce their translation rate. Hence, to fulfill safety criteria of the PLr system we should be able to attain PL level 'c' with our semantic containment parts. As stated in results our P1003 10* part far exceeds this with a top ranked 'e' rating, to put this into perspective the probability of failure of a P1003 10* semantic containment part is roughly equal to being struck by lightening in 2 consecutive years [4].

Ordinal Logistic Regression Classifier

Introduction

Despite the robustness of our semantic containment system its is important to plan for scenarios of failure no matter how unlikely. By using an ordinal logistic regression classifier we are able to predict which of our semantic containment parts an organism is expressing based on its growth curve at different antibiotic concentrations. This allows us to make predictions of which of our semantic containment parts may have been transferred by horizontal gene transfer.

Methodology

Ordinal variables are both discrete and ordered - for example a temperature scale of cold, tepid, hot is comprised of ordinal variables. The number of amber codons in our semantic containment part is, for the purposes of this classifier, the ordinal variable which we wish to predict. Here we used the Sklearn python library [5] ordinal logistic regression (OLR) functionality. Unfortunately our OLR classifier did not perform particularly well only making correct predictions 55% of the time on a test data set. Whilst this is significantly above random performance, the results are not particularly encouraging and can mostly be put down to the use of too small a dataset as logistic regression classifiers are commonly trained on extremely large datasets of hundreds or thousands of instances.

References

  1. PL and PLr systems and figures - https://www.keyence.co.uk/ss/products/safetyknowledge/performance/level/
  2. Hetrick, Lee, Joseph, Kinetics of Stop Codon Recognition by Release Factor 1, Biochemistry 2009, https://www.ncbi.nlm.nih.gov/pmc/articles/PMC2789991/
  3. Fluitt, Pienaar, Viljoen, Ribosome Kinetics and aa-tRNA Competition Determine Rate and Fidelity of Peptide Synthesis, Comp. Biol. Chem. 2007 - https://www.ncbi.nlm.nih.gov/pmc/articles/PMC2727733/#
  4. Probability of being struck by lightning - https://news.nationalgeographic.com/news/2004/06/flash-facts-about-lightning/
  5. Sklearn - http://scikit-learn.org/stable/

Contact EdiGEM18

Feel free to leave us a comment on social media!