Difference between revisions of "Team:Toulouse-INSA-UPS/Model"

Line 46: Line 46:
 
<ul class="nav nav-pills d-flex flex-row flex-wrap flex-lg-nowrap justify-content-around justify-content-lg-start" role="tablist">
 
<ul class="nav nav-pills d-flex flex-row flex-wrap flex-lg-nowrap justify-content-around justify-content-lg-start" role="tablist">
 
<li class="nav-item">
 
<li class="nav-item">
<a class="nav-link btn btn-secondary rounded-top corner-bottom mr-1 active" data-toggle="pill" href="#Model_Part1" role="tab" style="color:white !important;">Why model our protein?</a>
+
<a class="nav-link btn btn-secondary rounded-top corner-bottom mr-1 active" data-toggle="pill" href="#Model_Part1" role="tab" style="color:white !important;">Why model our platform?</a>
 
</li>
 
</li>
 
<li class="nav-item">
 
<li class="nav-item">
<a class="nav-link btn btn-secondary rounded-top corner-bottom mr-1" data-toggle="pill" href="#Model_Part2" role="tab" style="color:white !important;">How does it work?</a>
+
<a class="nav-link btn btn-secondary rounded-top corner-bottom mr-1" data-toggle="pill" href="#Model_Part2" role="tab" style="color:white !important;">Material and Methods</a>
 
</li>
 
</li>
 
<li class="nav-item">
 
<li class="nav-item">
<a class="nav-link btn btn-secondary rounded-top corner-bottom mr-1" data-toggle="pill" href="#Model_Part3" role="tab" style="color:white !important;">How did we model it?</a>
+
<a class="nav-link btn btn-secondary rounded-top corner-bottom mr-1" data-toggle="pill" href="#Model_Part3" role="tab" style="color:white !important;">Results</a>
 
</li>
 
</li>
 
<li class="nav-item">
 
<li class="nav-item">
Line 67: Line 67:
 
<!--PART 1 -->
 
<!--PART 1 -->
 
<div class="tab-pane fade show active" id="Model_Part1" role="tabpanel">
 
<div class="tab-pane fade show active" id="Model_Part1" role="tabpanel">
                     <h2 id="Why" class="heavy">Why do we want to model our protein?</h2>
+
                     <h2 id="Why" class="heavy">Why do we want to model our platform?</h2>
 
                     <hr/>
 
                     <hr/>
                     <p>Our novel fusion protein contains three binding sites: Streptavidin (SAv), a Carbohydrate Binding Module (CBM3a) and an unnatural amino acid, azidophenylanine (AzF). These “heads” of our Cerberus are connected by flexible regions that are 30-50 amino acids each. The linkers constitute Intrinsically Disordered Regions of our protein, or IDRs, due to this flexibility. These are known complex domains that lack a fixed or ordered 3D structure, and that can occur when predicting and modelling large proteins.</p>
+
                     <p>Our new molecular binding platform contains three binding sites: Streptavidin (mSA2), a Carbohydrate Binding Module (CBM3a) and an unnatural amino acid, azidophenylanine (AzF). These “heads” of our Cerberus are connected by flexible regions, composed of 30-50 amino acids each. These regions called linkers are described in the literature to be highly flexible and as disordered regions <sup>1 2</sup>.</p>
                    <img style="width : 100%; heigth = auto;" src="https://static.igem.org/mediawiki/2018/3/3d/T--Toulouse-INSA-UPS--Desc--OurSystem.png" class="figure-img img-fluid rounded" alt="A schematic representation of our system">
+
<img style="width : 100%; height: auto;" src="https://static.igem.org/mediawiki/2018/3/3d/T--Toulouse-INSA-UPS--Desc--OurSystem.png" class="figure-img img-fluid rounded" alt="A schematic representation of our system">
                     <p>
+
                     <p style="font-size: 15px" ><i><b>Figure 1:</b> General design of our Cerberus protein</i></p>
                    If possible, defining a range within which the linkers remain would give us a better idea of how the protein behaves in situ. This task presents a particular challenge, as their structures have never been successfully resolved by crystallography.</p>
+
<p>For more information about our platform, visit the <a href="https://2018.igem.org/Team:Toulouse-INSA-UPS/Description" >Description</a> or <a href="https://2018.igem.org/Team:Toulouse-INSA-UPS/Design">Design</a> pages.</p>
                     <p>As our linkers between domains are so flexible, it was possible that the CBM3a and SAv heads could be brought into contact. Their tertiary structures could also collapse due to new interactions between them. We needed to ensure that there will be no such interactions between these sites that would prevent them from binding to their intended ligands.
+
 
 +
                   
 +
                     <p>As our linkers between domains are so flexible, it was possible that the CBM3a and mSA2 heads could be brought into contact. Interactions between the different molecular components of the platform could disturb its capacities to bind the target ligands. Their tertiary structures could also not be stable in the overall macromolecular assembly. In order to try to answer to all these questions, we implemented multi-step and multi-level molecular modeling strategy.
 
                     </p>
 
                     </p>
                     <p>Incorporating an unnatural amino acid into a protein is a complex task not just for molecular biologists, but also for molecular modellers. Most protein prediction and modelling algorithms use predefined force field parameters for the 20 canonical amino acids, which simplify calculations. Using AzF meant building the 3D structure, minimising its potential energy then generating a specific force field for it. Rather than go through this whole pipeline multiple times, we opted to construct it bonded with the DBCO-Fluorescein, the fluorophore that we were using in the wetlab, from the start.</p>
+
                     <p>First, we built a 3D structure model of our binding platform and we assessed its stability over time. We also evaluated the distance between the different domains and its variations, which provide information about the size of the ligands that can be bound to the platform. We studied the interaction of our binding platform with target ligands. Especially, large-scale conformational sampling and exploration methods were used to assess the conformational rearrangements of these 3D structure models (the binding platform, either  free or in interaction with the target ligands) over time, hopefully maintaining their binding capacities.</p>
 
                     <p>
 
                     <p>
                    Finally, evaluating the potential interactions between ligands attached to our protein was the final step of our modelling process. For this, we had to evaluate the average distance between the different domains and its variations, which will also provide information about the maximum size of the ligands that we can use.</p>
+
On this wiki page, we detail the different steps of our molecular modeling strategy and our thought process behind the solutions that we chose to use.</p>
                    <p>
+
 
                     All of these questions can be answered through molecular modelling. However, each step requires its own approach and software. On this wiki page, we will detail these steps and our thought process behind the solutions that we chose to use.</p>
+
<h2 class="heavy">Strategy</h2>
 +
<p>To face all these challenges, we conceived a multi-level and multi-step molecular modelling workflow. Here is a graphical summary of this workflow:</p>
 +
 
 +
                     <img src="https://static.igem.org/mediawiki/2018/e/ef/T--Toulouse-INSA-UPS--Callum-Model-workflow.png" alt="workflow sketch" />
 +
                    <p style="font-size: 15px" ><i><b>Figure 2:</b> Workflow of our molecular modelling approach</i></p>
 +
                   
 +
                    <h3 class="heavy">References</h3>
 +
 
 +
                    <i>
 +
                        <ol style=" font-size:15px;">
 +
                      <li>Gunnoo et al., Nanoscale Engineering of Designer Cellulosomes, 2016</li>
 +
                        <li> E. Morag et al., Structure of a family 3a carbohydrate-binding module from the cellulosomal scaffoldin CipA of Clostridium thermocellum with flanking linkers: implications for cellulosome structure, 2013</li>
 +
                        </ol>
 +
                        </i>
 
                      
 
                      
 
             </div>
 
             </div>
  
 
             <div class="tab-pane fade" id="Model_Part2" role="tabpanel">
 
             <div class="tab-pane fade" id="Model_Part2" role="tabpanel">
                     <h2 id="HowWork" class="heavy">How does (protein) molecular modelling work?</h2>
+
                     <h2 id="MandM" class="heavy">Material and Methods</h2>
 
                     <hr/>
 
                     <hr/>
 +
<h3 class="heavy" >Available X-ray data</h3>
 +
                    <p>3D structures of the CBM3a and mSA2 protein domains are available on the Protein Data Bank (PDB). We chose to use 4JO5 <sup>2</sup> and 4JNJ <sup>3</sup>, both solved  by X-ray cristallography, with a resolution of 1.98 and 1.90 Å respectively.</p>
 +
                    <p>After obtaining these files, we faced two problems. First, the linkers connecting our heads had never been resolved by biophysics methods, due to their flexibility and length. Secondly, azidophenylalanine (AzF) had never been modelled, in its native state or covalently bonded to another molecule. This process will be detailed further on in this page.</p>
 +
 +
                    <h3 class="heavy">3D structure prediction</h3>
 
                     <p>
 
                     <p>
                     As biochemistry techniques give us a snapshot of the structure of a molecule, molecular modelling tends to integrate the multiple information and interactions possible within a molecule and between a molecule and its environment over time. In order to model a protein, we first need to understand the chemical basis forming it. </p>
+
                     This did however present a complex challenge for the prediction methods used. Our protein is 366 amino acids in length, containing two protein domains structurally organised in protein secondary structures (CBM, mSA2) and two peptide linkers, one connecting the CBM and SAv and the other one free on the C-terminus of the protein. The linkers are known in the literature to be highly flexible <sup>1 2</sup>> and as Intrinsically Disordered Regions (or IDRs) and therefore their structures have never been successfully resolved by biophysics methods. </p>
                   
+
                    <p>Protein structures roughly consist of a polyamide main chain with functional ramifications decorating it. The energetical stability of the bonds generally determines the underlying structure. It is important to define the different sources of conformational variability both within the main chain and on the lateral functions. Each of these many bonds presents a large amount of variation, intoducing multiple dimensions in which the full molecule can evolve.</p>
+
                   
+
                    <img src="https://static.igem.org/mediawiki/2018/3/3c/T--Toulouse-INSA-UPS--peptide_bond_angles.jpg" alt="bond rotations" width="45%" height="45%">
+
                    <p style="font-size: 15px" ><i><b>Figure 1:</b> Peptide bond between two amino acids.</i></p>
+
                   
+
                    <p>First, the distance between two atoms involved in a covalent bond can shift. Once three atoms are involved, we also need to study the angle separating the two distal atoms. Adding a fourth atom means needing to evaluate the torsion of the middle bond. This determines how far the fourth atoms deviates from the plane formed by the other three, on top of the previous parameters. This is considering the case where all four are connected sequentially. If not, the torsion is said to be improper.</p>
+
                   
+
                    <div class="row">
+
                      <div class="imgcolumn">
+
                        <img src="https://static.igem.org/mediawiki/2018/c/ca/T--Toulouse-INSA-UPS--Team--Callum-Model-bonds.png" alt="Screenshot from Sophie's slides showing these distances" />
+
                      </div>
+
                      <div class="imgcolumn">
+
                        <img src="https://static.igem.org/mediawiki/2018/1/16/T--Toulouse-INSA-UPS--Team--Callum-Model-torsions.png" alt="Screenshot from Sophie's slides showing these distances" />
+
                      </div>
+
                    </div>
+
                    <p style="font-size: 15px" ><i><b>Figure 2:</b> Variables involved for neighbouring atoms (length of bond, angle, torsion, improper torsion)</i></p>
+
                   
+
                    <p> These are the main factors to calculate for neighbouring atoms. When considering two distant atoms, we must also include a number interactions. </p>
+
                    <p> Hydrogen bonds are extremely common along the various folds of a protein, often maintaining its structure. If two atoms are close enough together, the Van Der Waals force plays a considerable role in defining the distance between them. Charged amino acids encountering another amino acid of opposite charge creates ionic interactions, which can appear or disappear depending on the pH of the surrounding environment.</p>
+
                   
+
                    <div class="row">
+
                      <div class="imgcolumn">
+
                        <img src="https://static.igem.org/mediawiki/2018/9/92/T--Toulouse-INSA-UPS--Team--Callum-Model-interactions1.png" alt="Screenshot from Sophie's slides showing these distances" />
+
                      </div>
+
                      <div class="imgcolumn">
+
                        <img src="https://static.igem.org/mediawiki/2018/c/c9/T--Toulouse-INSA-UPS--Team--Callum-Model-interactions2.png" alt="Screenshot from Sophie's slides showing these distances" />
+
                      </div>
+
                    </div>
+
                    <p style="font-size: 15px" ><i><b>Figure 3:</b> Interactions between distant atoms (Van der Waals, ionic, hydrogen bonds)</i></p>
+
                   
+
                    <p> When seeking the correct conformation of a protein, algorithms seek to minimise its potential energy. Minimisation algorithms therefore constitute a founding step through which all molecular modelling projects must go. A lot of different search methods exist to find local minima of the energy function of a protein. The most widespread use the first derivative of the energy function to find when it is equal to 0 (Steepest Descent, Conjugated Gradient). These are generally quite memory efficient for the computers running them, but have variable convergence speeds when approaching a minimum. A more complex solution would be to employ the Newton-Raphson method, which uses the second derivative of the energy function. It does however demand much more power from the computer, but is very accurate when close to the energy minimum.</p>
+
                   
+
                    <p> Once a minimised structure of a protein is obtained, it is possible to run a molecular dynamics simulation to study it. This allows us to glimpse its behaviour over time. It is however a very long process, requiring many hours of calculations on a traditional computer. Upscaling it to multiple processors is generally compatible with molecular dynamics software, and very useful to save time.</p>
+
                   
+
                    <p> By applying a starting velocity to each atom, even a weak one, we can observe how they will interact and move over time. This of course requires calculating values for the energy involved in each interaction listed above. Doing so requires vast amounts of calculations, and a few nanoseconds of simulation requires hundreds of CPU hours. As proteins generally operate in solution, representing the solvent and surrounding cofactors yields higher quality results at the cost of adding even more atoms to simulate.</p>
+
 
                     <p>
 
                     <p>
                     It is important to remember however that this process will always remain a mathematical approximation of reality. As such, specific values like the exact strength of a ligand - receptor interaction or length of tertiary protein structures are near impossible to obtain, even in high precision simulations.
+
                     The three prediction software tools we chose to use were I-TASSER, Swiss Model and MODELLER. This allowed us to obtain a 3D structure model of our protein on which we could start calculations.</p>
                    </p>
+
                   
+
                       
+
            </div>
+
            <div class="tab-pane fade" id="Model_Part3" role="tabpanel">
+
                    <h2 id="HowModel" class="heavy">How did we model our protein?</h2>
+
                    <hr/>
+
                    <p>3D structures of the CBM3a and SAv domains are available on the Protein Data Bank (PDB). We chose to use 4JO5 <sup>X</sup> and 4JNJ <sup>X</sup>, both obtained by X-ray diffraction, with a resolution of 1.98 and 1.90 A respectively.</p>
+
                    <p>After obtaining these files, we faced two problems. First, the linkers connecting our heads had never been resolved by crystallography, meaning that all sorts of interactions between the two could theoretically occur. Second, azidophenylalanine had never been modelled, in its native state or clicked to another molecule. This process will be detailed further on in this page.</p>
+
                    <p>
+
                    We decided to start by modelling our Cerberus under the Orthos form, with a simple phenylalanine replacing the AzF residue. Orthos was Cerberus’ little brother in Greek mythology, with only two heads, like this version of our fusion protein. This task was completed through homology-driven prediction algorithms. The three we chose to use were I-TASSER, Swiss Model and MODELLER. This allowed us to obtain a 3D structure of our protein on which we could start calculations.</p>
+
                    <p>
+
                    From their website: “I-TASSER is a hierarchical approach to protein structure and function prediction. It first identifies structural templates from the PDB by multiple threading approach LOMETS, with full-length atomic models constructed by iterative template fragment assembly simulations. Function insights of the target are then derived by threading the 3D models through protein function database BioLiP.”</p>
+
 
                     <p>
 
                     <p>
                     From their website: “SWISS-MODEL is a fully automated protein structure homology-modelling server, accessible via the ExPASy web server, or from the program DeepView (Swiss Pdb-Viewer).</p>
+
                     I-TASSER is an online platform hosted by the Zhang Lab (University of Michigan) that can perform 3D model protein prediction <sup>4</sup>. It computes it by searching reference template in the PDB database through the LOMETS server, filling the unidentified regions by replica-exchange Monte Carlo Simulation. It then assemble the input structure to the 10 most probable models found on the database. It performs optimization of the structure and search for protein domain enrichment search on different databases such as Gene Ontology. </p>
 
                     <p>
 
                     <p>
                     From their website: “MODELLER is used for homology or comparative modeling of protein three-dimensional structures. The user provides an alignment of a sequence to be modeled with known related structures and MODELLER automatically calculates a model containing all non-hydrogen atoms.</p>
+
                     SWISS-MODEL service proposes a protein sequence homology based 3D modeling <sup>5</sup>. It performs sequence homology research through BLAST and on the SWISS MODEL Template Library to begin the structural assembly. If there is no match on the database, the algorithm performs a Monte Carlo technique to find the best conformation.</p>
 
                     <p>
 
                     <p>
                     Only Modeller provided us with a satisfactory structure, most likely due to its improved de novo prediction capacities and the usage of restraints on the final model. The other two algorithms struggled to suitably resolve the position of the linkers. </p>
+
                     MODELLER is an software for protein 3D structure modeling. It uses homology for template reference but also implements a structure reference based prediction of the most potent conformation <sup>6</sup>. </p>
 
                      
 
                      
                     <img src="https://static.igem.org/mediawiki/2018/5/5c/T--Toulouse-INSA-UPS--Team--Callum-Model-orthos.png" alt="3D structure of our Orthos protein" />
+
                     <h3 class="heavy">Simulated Annealing</h3>
                    <p style="font-size: 15px" ><i>Figure 4: Structure of our Orthos protein obtained with MODELLER.</i></p>
+
                    <hr/>
+
 
                      
 
                      
 
                     <p>
 
                     <p>
                     The LAAS, Laboratoire d'Analyses et d'Architecture des Systèmes, a French National Scientific Research Centre, developed a novel software package called Psf-Amc (ref: A. Estana, N. Sibille, E. Delaforge, M. Vaisset, J. Cortes, P. Bernado. Realistic ensemble models of intrinsically disordered proteins using a structure-encoding coil database. Structure, in press.). </p>
+
                     When dealing with a protein as large and complex as ours, 3D prediction approaches described above  are not sufficient for obtaining an adequate structure. We then performed simulated annealing on the 3D model obtained through the previous approach. This process involves first a protein energy minimisation, then a series of heating cooling. This enables it to exit a local energy minimum and re-settle in a more global energy minimum.</p>
 
                     <p>
 
                     <p>
                     In robotics, by giving all the needed information about the various articulations in a mobile limb, AI software can calculate all positions which the arm can adopt. The LAAS applied this to proteins by defining subchains of the protein instead of individual atoms like other modelling software. By applying these  possibilities and basic physics and chemistry equations to the different possible positions, they can very rapidly calculate hundreds of approximate models of the protein </p>
+
                     Simulating Annealing was performed using sander, from the Amber16 suite. Amber is a suite of biomolecular simulation programs used both for generating parameters for molecules to be simulated and for simulating those molecules over time ⁷. It was first released in the late 1970s and is kept up to date consistently and features a large range of tutorials and forums to help users. All simulations performed with Amber16 used the ff14SB force field for topology and parameters of standard amino acids in the protein...</p>
 
                     <p>
 
                     <p>
                     Using the Orthos model, two algorithms were applied to it, the first using Single Residue Sampling (SRS) and the second using Triple Residue Sampling (TRS). </p>
+
                     The minimisation phase was performed over 5,000 cycles to find the lowest energy point, holding the streptavidin and CBM3a domains fixed.</p>
                   
+
 
                     <img src="https://static.igem.org/mediawiki/2018/2/22/T--Toulouse-INSA-UPS--Team--Callum-Model-SRS_dist_slow.gif" alt="SRS simulations GIF" />
+
                     <p>Orthos was heated for 200 picoseconds at 330K with an integration time of 2 femtoseconds, then once more at 310K with the same time parameters.</p>
                   
+
                    <div class="row">
+
                      <div class="imgcolumn">
+
                    <img src="https://static.igem.org/mediawiki/2018/6/61/T--Toulouse-INSA-UPS--Callum-Model-SRS_distance_distribution.png" alt="SRS distance distribution" />
+
                      </div>
+
                      <div class="imgcolumn">
+
                    <img src="https://static.igem.org/mediawiki/2018/7/79/T--Toulouse-INSA-UPS--Callum-Model-TRS_distance_distribution.png" alt="TRS distance distribution" />
+
                      </div>
+
                    </div>
+
                   
+
                    <div class="row">
+
                      <div class="imgcolumn">
+
                    <img src="https://static.igem.org/mediawiki/2018/2/22/T--Toulouse-INSA-UPS--Callum-Model-SRS_RMSD_distribution.png" alt="SRS RMSD distribution" />
+
                      </div>
+
                      <div class="imgcolumn">
+
                    <img src="https://static.igem.org/mediawiki/2018/4/40/T--Toulouse-INSA-UPS--Callum-Model-TRS_RMSD_distribution.png" alt="TRS RMSD distribution" />
+
                      </div>
+
                    </div>
+
                   
+
                    <div class="row">
+
                      <div class="imgcolumn">
+
                        <img src="https://static.igem.org/mediawiki/2018/4/45/T--Toulouse-INSA-UPS--Team--Callum-Model-heads_SRS.png" alt="Different positions the SAv head can adopt, from SRS" style="width:100%" />
+
                      </div>
+
                      <div class="imgcolumn">
+
                        <img src="https://static.igem.org/mediawiki/2018/e/e9/T--Toulouse-INSA-UPS--Team--Callum-Model-heads_TRS.png" alt="Different positions the SAv head can adopt, from TRS" style="width:100%" />
+
                      </div>
+
                    </div>
+
 
                      
 
                      
 +
                    <h3 class="heavy">IDR Conformational Sampling and Exploration</h3>
 +
 
                     <p>
 
                     <p>
                     The figures shown directly above were generated by loading 5 different structures calculated by the LAAS, then aligning the CBM3a domain of each of them. This presents the stability of our protein.</p>
+
                     The challenge here was the complexity of the IDRs. During the previous phase of our study, we were faced with the limitations of classic protein modelling methods.  These approaches generally search for secondary structures (helices and sheets) connected by short flexible connectors (3-10 residues), but when faced with a long region lacking in secondary structures, they fail to predict an appropriate structure.</p>
 
                     <p>
 
                     <p>
                     As you seen on the figures, both of these simulations show the range of positions that the streptavidin head can take in relation to the CBM3a, also demonstrating the high flexibility of the linkers. </p>
+
                     To overcome this limitation, we therefore needed to find an innovative method well-suited to predict the 3D conformation of long IDRs. </p>
 
                     <p>
 
                     <p>
                     With these structures, we were able to choose the most probable overall conformation of our Cerberus protein to continue modelling. The next step was to add in the functional molecules that we wished to attach to our protein.</p>
+
                     The LAAS, Laboratoire d'Analyses et d'Architecture des Systèmes, a French National Scientific Research Centre, developed original approaches for efficient large-scale molecular conformational sampling  and exploration. These methods rely on a modelisation of molecular structures as polyarticulated mechanisms <sup>8</sup>. Based on this modelisation, efficient algorithms derived from robotics have been adapted to explore molecular rearrangements. These algorithms allow avoiding expensive molecular modeling methods and accessing macromolecular rearrangements, thus overcoming limitations of traditional approaches. </p>
 
                     <p>
 
                     <p>
                     For the next step, we needed to determine a viable 3D model of azidophenylalanine. We opted to build it clicked onto a DBCO-fluorescein molecule, to simulate it functionalised from the start. This proved harder to accomplish than originally planned.</p>
+
                     These methods have recently been extended to explore IDRs and generate realistic conformational ensembles, and are integrated into the Psf-Amc library developed at LAAS. By disregarding secondary structure and seeking simply the angles between amino acids in an IDR, their approach builds protein chains incrementally from N- to C-termini. A database of experimentally-determined angles between tripeptides is sampled for each step, and the angle selected in this manner is added to the chain, then checked for collisions.</p>
                     <p>The basic chemical structure was drawn using Avogadro, an advanced molecular editor and visualiser, from a drawing of the AzF-DBCO-fluorescein molecule. It was also briefly minimised here for optimal geometric conformation. Next, LeAP, part of the AMBER suite (described below), was used to generate the force field of the molecule. This process calculates the various physical properties (mass, charge, energy, …) of each atom in the structure, to be reused for the molecular dynamics step. </p>
+
                     <p>Two strategies can be used. The first, Single Residue Sampling (SRS), considers each amino acid individually to find appropriate angles, ignoring the residues flanking it (except if it is followed by a proline). The second, Triple Residue Sampling (TRS), takes the neighbouring amino acids into account when selecting the angles to add to the chain. This does require additional constraints applied onto the selection, to respect the angles found when adding the previous amino acid. The TRS strategy requires more calculation time, but is better at finding partially structured regions in the protein chain <sup>8</sup>. </p>
                     <p>The complex amino acid then had to be integrated as a part of the protein, which occurs quite rarely in protein modelling. Usually, canonical amino acids are used, which the modelling algorithms understand simply. However, here, we needed to align the structure of our AzF residue with that of the phenylalanine at the end of the C-Ter linker, remove the phenyl then add the AzF with all the correct bonds (and none of the wrong ones) to the neighbouring amino acids in the chain.</p>
+
 
 +
                    <h3 class="heavy">AzF Building and Parameterization</h3>
 +
 
 +
                     <p>Incorporating an unnatural amino acid into a protein is a complex task not just for molecular biologists, but also for molecular modellers. Most protein prediction and modelling algorithms use predefined force field parameters for the 20 canonical amino acids, which simplify parameterization. Using AzF meant building the 3D structure and then generating specific parameters for it. Rather than go through this whole pipeline multiple times, we opted to construct it bonded with the DBCO-Fluorescein, the fluorophore that we were using in the wetlab. This proved harder to accomplish than originally planned.</p>
 
                     <p>
 
                     <p>
                     Biotin is a well known case study of ligand - receptor, so building and parametrising it was quick to perform. Again, we used the AMBER force field generating software to create the necessary files. Placing it into the correct position in the streptavidin binding site was used by aligning our streptavidin domain and aligning that with the original PDB file, which contained the biotin position obtained by X-ray analysis.
+
                     A first 3D structure was built using Avogadro, an advanced molecular editor and visualiser <sup>9</sup>, from a 2D model of AzF-DBCO-fluorescein molecule generated by ChemDraw. Next, the parameters of the molecule were generated using the GAFF force field <sup>10</sup> and different modules (antechamber, LeAP, parmchk) of the AMBER biomolecular software suite. The main difficulty here was to incorporate the functionalised AzF residue as a part of the main protein, which involves creating covalent bonds to the neighbouring amino acids.
 
                     </p>
 
                     </p>
 
                      
 
                      
 
                     <p>
 
                     <p>
                     From their website: “The term "Amber" refers to two things. First, it is a set of molecular mechanical force fields for the simulation of biomolecules (these force fields are in the public domain, and are used in a variety of simulation programs). Second, it is a package of molecular simulation programs which includes source code and demos.”  
+
                     Here, we needed to align the structure of our AzF residue with that of the phenylalanine near the end of the C-ter linker, remove the phenylalanine then add the AzF with all the correct bonds (and none of the wrong ones).   
 
                     </p>
 
                     </p>
 
                     <p>
 
                     <p>
                     From their website: “NAMD, recipient of a 2002 Gordon Bell Award and a 2012 Sidney Fernbach Award, is a parallel molecular dynamics code designed for high-performance simulation of large biomolecular systems. Based on Charm++ parallel objects, NAMD scales to hundreds of cores for typical simulations and beyond 500,000 cores for the largest simulations. NAMD uses the popular molecular graphics program VMD for simulation setup and trajectory analysis, but is also file-compatible with AMBER, CHARMM, and X-PLOR.”.</p>
+
                     Biotin is a well known case study of ligand - receptor, so building and parameterizing it was easier to perform. Additionally, this did not require any new bonds in our protein, as the interaction is not covalent. Again, we used the antechamber force field of AMBER18 to create the necessary files. Placing it into the correct position in the streptavidin binding site was used by aligning our streptavidin domain with the X-ray structure of the streptavidin-biotin complex (PDB: 4JNJ <sup>3</sup>). </p>
 +
 
 +
                    <h3 class="heavy">Explicit Solvent Model</h3>
 
                      
 
                      
                     <p>The minimisation phase employs a different calculation method to determine the optimal local minimum with greater precision than previously. It consists of multiple sub-phases, with the first one holding the protein firmly to avoid deviations, then slowly releasing it to allow it to relax into a suitable position.</p>
+
                     <p>To consider the protein as it behaves in its environment, we decided to simulate our platform explicitly in its solvent. This means adding a great number of water molecules around it, using the TIP3P model, and calculating values for all of these new interactions, instead of simply simulating the general physical and chemical conditions of the solvent. Due to the great flexibility of our linkers, we needed to generate the box large enough to accommodate the full length of our Cerberus protein when these linkers are fully stretched out. We therefore left a minimum distance of 20 Å around our protein when generating the water box to contain it, in the shape of a truncated octahedron. </p>
                     <p>The heating phase brings the protein past the bumps in front of local minima to find a more global minimum. This is necessary for understanding how it reacts to temperature with energy available around and inside it.</p>
+
                     <p>In total, this amounted to over 40,000 water molecules in our model, and 127,995 atoms with the protein. Incorporating this number of new atoms into out model meant adding millions, maybe even billions of new interactions.</p>
 
                     <p>The equilibration phase decreases the temperature of the protein and its solvent to allow it to settle back down into a local medium, usually a lower one than the previous minimum attained.</p>
 
                     <p>The equilibration phase decreases the temperature of the protein and its solvent to allow it to settle back down into a local medium, usually a lower one than the previous minimum attained.</p>
                     <p>The production phase is when the main part of the simulation occurs. While the other phases last fraction of a nanosecond, this one takes place over tens of nanoseconds, sometimes up to a few milliseconds. This allows us to view how the protein structure changes over time with all the factors taken into account.</p>
+
                     <p>5 Na<sup>+</sup> ions were also added to neutralise the protein.</p>
 
                      
 
                      
                     <p>Simulations were conducted first on Amber18 for the minimisation, heating and equilibration phases of the dynamics workflow, and NAMD for the production phase due to its higher multiprocessor capacities. The first three steps were run on 80 processors in parallel, whereas production was performed on 1000.</p>
+
                     <p>This model represents the full Cerberus system. It features the unnatural amino acid incorporated into the C-terminal linker, functions added onto the two binding sites that we wished to explore and characterise and a complete simulation environment containing water molecules and sodium ions. We were then ready to prepare it for the molecular dynamics phase.</p>
 +
 
 +
                    <h3 class="heavy">Molecular Dynamics</h3>
 
                      
 
                      
                     <p> Two separate production phases were performed. The first simulated the native protein. The second featured a constraint keeping the sulfur atom of the biotin molecule 7 A from the alpha carbon of the nearby XXX amino acid, to be sure to keep it within range of its associated binding site.
+
                     <p>Preparation of molecular dynamics simulations consisted of initial minimization steps (steepest descent and conjugate gradient methods) where atomic positions of solute were first restrained using a harmonic potential. The force constant was then progressively diminished along the minimization procedure until a final unrestrained minimization step. We first minimised our system over 5,000 steps with restraints on the whole protein to minimise the solvent and ions, then reduced the restraints gradually over five minimisation cycles, for 1,000 steps each cycle. One final minimisation was performed with no restraints for 10,000 steps.
 
                     </p>
 
                     </p>
                    <img src="https://static.igem.org/mediawiki/2018/5/5b/T--Toulouse-INSA-UPS--Team--Callum-Model-5step_dist.gif" alt="GIF of our protein simulation, maybe even a film?" />
 
 
                     <p>  
 
                     <p>  
                     Above, you cans see a video generated from the structural data generated by our simulation. By observing its general structure, we can see that the linker sections are indeed very flexible but seem to adopt some preferred positions. For example, the N-terminal linkers sticks to one side of the CBM3a domain due to hydrogen interactions. We can also see the biotin molecule exiting its binding site after around 15ns.</p>
+
                     The minimization procedure was then followed by a slow heating to 310 K over a period of 0.1ns, with a moderate restraint placed on the protein. At the final required temperature, the system was equilibrated for 100 ps using constant temperature (310 K) and pressure (1 bar) conditions via the Berendsen algorithm with a coupling constant of 0.2 ps for both parameters.  Along the equilibration, the restraints on the protein were gradually diminished. Electrostatic interactions were calculated using the Particle-mesh Ewald method with a nonbonded cutoff of 10 A. All bonds involving hydrogen were constrained with the SHAKE algorithm, permitting the use of 2 fs time steps to integrate the equations of motions. </p>
 
                     <p>
 
                     <p>
                     We also tracked the distance between the alpha carbon of the azidophenylalanine residue and the XXX residue, in the binding site of the streptavidin domain. Extracting the raw numbers for each frame displayed allowed us to study its evolution closely, as you can see on these plots:</p>
+
                     Next came the production phase, where we can observe how the model behaves over a large period of time.  Depending on the size of the system and the resources available, this phase can last from a few nanoseconds up to a few hundred ns. Due to the explicit solvent in our model and the size of our platform, we needed vast amounts of processing power to simulate it for a long time.</p>
                   
+
                    <div class="row">
+
                      <div class="imgcolumn">
+
                        <img src="https://static.igem.org/mediawiki/2018/5/5f/T--Toulouse-INSA-UPS--Callum-Model-free_distance.png" alt="Plots showing the amazing distance between our two binding heads, wow they're so pretty!!" />
+
                      </div>
+
                      <div class="imgcolumn">
+
                        <img src="https://static.igem.org/mediawiki/2018/b/b2/T--Toulouse-INSA-UPS--Callum-Model-cnstr_distance.png" alt="Plots showing the amazing distance between our two binding heads, wow they're so pretty!!" />
+
                      </div>
+
                    </div>  
+
 
                      
 
                      
 
                     <p>
 
                     <p>
                     The Root Mean Square Deviation (RMSD) calculates the sum of the distances between each atom of a molecule, compared to the first frame of the simulation. Through this, we can study the varibility of our protein and, by selecting only specific residues, the variability of certain regions of it. This way, we can compare the data obtained for the structured domains and the IDRs to remark on the formers stability and the latters flexibility.
+
                     Thanks to CALMIP, a regional high performance calculation mesocentre, we were able to model our platform for a sufficient period and obtain useable results. We used 100,000 CPU hours on their servers for our simulations, generating over 85GB of data.  
 
                     </p>
 
                     </p>
                     <img src="https://static.igem.org/mediawiki/2018/2/2b/T--Toulouse-INSA-UPS--Callum-Model-simulation_RMSD.png" alt="plots for showing the RMSD of CBM3a vs the N-ter linker" />
+
                     <p>Two separate production phases were performed. The first involved our Cerberus protein in its water box, with no constraints  on any region of the system. It ran for 120ns. The second featured a distance constraint between the sulfur atom of the biotin molecule and the alpha carbon of the nearby glycine 14 residue, to keep it within range of its associated binding site. This one ran for 80ns. </p>
 +
                    <p>Both simulations were performed at constant temperature (310K) and pressure (1 bar) and with an integration time step of 2fs. </p>
 +
                    <p>NAMD is a parallel molecular dynamics code designed for high-performance simulation for large systems <sup>11</sup>. NAMD was developed by the Theoretical and Computational Biophysics Group in the Beckman Institute for Advanced Science and Technology at the University of Illinois at Urbana-Champaign. It is based on Charm++ parallel object, and scales well on up to thousands of cores for the largest simulations.  NAMD is compatible with the Amber force fields generated during the previous phases, it was therefore easy to reuse these parameters. </p>
 +
                    <p>Simulations were performed on Amber16 for the minimisation, heating and equilibration phases of the dynamics workflow, and NAMD for the production phase due to its higher multiprocessor capacities. The first three steps were run on 80 processors in parallel, whereas production was performed on 1000. </p>
 +
 
 +
                    <h3 class="heavy">References</h3>
 +
 
 +
                    <i><ol>
 +
                        <li>Gunnoo et al., Nanoscale Engineering of Designer Cellulosomes, 2016</li>
 +
                        <li>E. Morag et al., Structure of a family 3a carbohydrate-binding module from the cellulosomal scaffoldin CipA of Clostridium thermocellum with flanking linkers: implications for cellulosome structure, 2013</li>
 +
                        <li>D. Demonte et al. , Structure-based engineering of streptavidin monomer with a reduced biotin dissociation rate, 2013</li>
 +
                        <li>J Yang et al. , The I-TASSER Suite: Protein structure and function prediction, 2015</li>
 +
                        <li>A. Waterhouse et al., SWISS-MODEL: homology modelling of protein structures and complexes, 2018</li>
 +
                        <li>B. Webb, A. Sali, Comparative Protein Structure Modeling Using Modeller, 2016</li>
 +
                        <li>R. Salomon-Ferrer, D.A. Case, R.C. Walker, An overview of the Amber biomolecular simulation package, 2013.</li>
 +
                        <li>A. Estana, N. Sibille, E. Delaforge, M. Vaisset, J. Cortes, P. Bernado, Realistic ensemble models of intrinsically disordered proteins using a structure-encoding coil database, in press (submitted Sep 2018).</li>
 +
                        <li>M. Hanwell et al., Avogadro: An advanced semantic chemical editor, visualization, and analysis platform, 2012</li>
 +
                        <li>J. Wang, Development and Testing of a General Amber Force Field, 2004</li>
 +
                        <li>James C. Phillips et al., Scalable molecular dynamics with NAMD,  2005</li>
 +
                    </ol></i>
 
                        
 
                        
 
             </div>
 
             </div>
 +
            <div class="tab-pane fade" id="Model_Part3" role="tabpanel">
 +
                    <h2 id="Results" class="heavy">Results</h2>
 +
                    <hr/>
 +
                    <h3 class="heavy">3D Structure Prediction</h3>
 +
                    <p>Only Modeller provided us with a satisfactory structure, most likely due to its improved de novo prediction capacities and the usage of restraints on the final model. The other two algorithms struggled to suitably resolve the position of the linkers.</p>
 +
 +
                    <img src="https://static.igem.org/mediawiki/2018/5/5c/T--Toulouse-INSA-UPS--Team--Callum-Model-orthos.png" alt="3D structure of our Orthos protein from Modeller" />
 +
                    <p style="font-size: 15px" ><i><b>Figure 3:</b> 3D structure of Orthos obtained from Modeller </i></p>
 +
                    <p>We can see that the linkers seem to tangle together quite significantly.
 +
                    </p>
 +
 +
                    <h3 class="heavy">Simulated Annealing</h3>
 +
                    <p>After completing these phases, we obtained this structure:</p>
 +
                    <img src="https://static.igem.org/mediawiki/2018/3/3e/T--Toulouse-INSA-UPS--Callum-Model-orthos_heat.png" alt="3D structure of our Orthos protein from Sim Anneal" />
 +
                    <p style="font-size: 15px" ><i><b>Figure 4:</b> 3D structure of Orthos obtained after simulated annealing</i></p>
 +
                    <p>We can see that once again, the linkers are considerably tangled. We wanted to investigate the possibilities further, which is why we continued the strategy with the robotics inspired methods.</p>
 +
 +
                    <h3 class="heavy">IDR Conformational Sampling and Exploration</h3>
 +
                    <p>Both the SRS and TRS methods were applied to Orthos, generating 1,000 structures each.</p>
 +
                    <p>Thanks to this large scale sampling of possible conformations, we conducted a statistical analysis studying the distance between the alpha carbon of the phenylalanine 366 and the alpha carbon of the glutamic acid 32 of the streptavidin binding site and the global RMSD of the protein.</p>
 +
 +
                    <img src="https://static.igem.org/mediawiki/2018/2/22/T--Toulouse-INSA-UPS--Team--Callum-Model-SRS_dist_slow.gif" alt="SRS simulations GIF" />
 +
                    <p style="font-size: 15px" ><i><b>Figure 5:</b> Various positions of the mSA2 domain (in red) in relation to the CBM3a (in green) as calculated by the SRS strategy developped by the LAAS, showing the distance between the mSA2 binding site and AzF alpha carbon (dotted line)</i></p>
 +
                 
 +
                    <div class="row">
 +
                    <div class="imgcolumn">
 +
                      <img src="https://static.igem.org/mediawiki/2018/4/45/T--Toulouse-INSA-UPS--Team--Callum-Model-heads_SRS.png" alt="Different positions the SAv head can adopt, from SRS" style="width:100%" />
 +
                    </div>
 +
                    <div class="imgcolumn">
 +
                      <img src="https://static.igem.org/mediawiki/2018/e/e9/T--Toulouse-INSA-UPS--Team--Callum-Model-heads_TRS.png" alt="Different positions the SAv head can adopt, from TRS" style="width:100%" />
 +
                    </div>
 +
                    <p style="font-size: 15px" ><i><b>Figure 6:</b> Various positions of the mSA2 domain (in blue or red) in relation to the CBM3a (in green) as calculated by the SRS strategy (left) and the TRS strategy (right)</i></p>
 +
                  </div>
 +
                   
 +
                    <div class="row">
 +
                    <div class="imgcolumn">
 +
                  <img src="https://static.igem.org/mediawiki/2018/6/61/T--Toulouse-INSA-UPS--Callum-Model-SRS_distance_distribution.png" alt="SRS distance distribution" />
 +
                    </div>
 +
                    <div class="imgcolumn">
 +
                  <img src="https://static.igem.org/mediawiki/2018/7/79/T--Toulouse-INSA-UPS--Callum-Model-TRS_distance_distribution.png" alt="TRS distance distribution" />
 +
                    </div>
 +
                    </div>
 +
                    <p style="font-size: 15px" ><i><b>Figure 7:</b> Distribution of the mSA2 binding site - AzF distance for each conformation calculated by the SRS strategy (left) and TRS strategy (right)</i></p>
 +
                 
 +
                    <div class="row">
 +
                    <div class="imgcolumn">
 +
                  <img src="https://static.igem.org/mediawiki/2018/2/22/T--Toulouse-INSA-UPS--Callum-Model-SRS_RMSD_distribution.png" alt="SRS RMSD distribution" />
 +
                    </div>
 +
                    <div class="imgcolumn">
 +
                  <img src="https://static.igem.org/mediawiki/2018/4/40/T--Toulouse-INSA-UPS--Callum-Model-TRS_RMSD_distribution.png" alt="TRS RMSD distribution" />
 +
                    </div>
 +
                    </div>
 +
                    <p style="font-size: 15px" ><i><b>Figure 8:</b> Distrubtion of the global RMSD for each conformation calculated by the SRS strategy (left) and TRS strategy (right) </i></p>
 +
 +
                    <p>The figures shown directly above were generated by loading 5 different structures calculated by the LAAS, then aligning the CBM3a domain of each of them. As you seen on the figures, both of these simulations show the range of positions that the streptavidin head can take in relation to the CBM3a, also demonstrating the high flexibility of the linkers.</p>
 +
                    <p>We selected the most represented overall conformation of our protein, and therefore the most probable, by analysing these parameters to continue our strategy.</p>
 +
 +
                    <h3 class="heavy">Molecular Dynamics</h3>
 +
 +
                    <img src="https://static.igem.org/mediawiki/2018/5/5b/T--Toulouse-INSA-UPS--Team--Callum-Model-5step_dist.gif" alt="GIF of our protein simulation" />
 +
                    <p style="font-size: 15px" ><i><b>Figure 9: Restrained biotin simulation, showing the distance between the mSA2 binding site and the AzF alpha carbon (dotted line)</b> </i></p>
 +
 +
                    <p>Above, you can see a video generated from the structural data generated by taking snapshots every 0.5ns along our simulation. By observing its general structure, we can see that the linker sections are indeed very flexible but seem to adopt some preferred positions.</p>
 +
                    <p>For example, the N-terminal linkers sticks to one side of the CBM3a domain due to hydrogen interactions. We can also see the biotin molecule exiting its binding site after around 15ns.</p>
 +
                    <p>We also tracked the distance between the alpha carbon of the azidophenylalanine residue and the glutamic acid 32 residue. Extracting the raw numbers for each frame displayed allowed us to study its evolution closely, as you can see on these plots:</p>
 +
 +
                    <div class="row">
 +
                        <div class="imgcolumn">
 +
                          <img src="https://static.igem.org/mediawiki/2018/5/5f/T--Toulouse-INSA-UPS--Callum-Model-free_distance.png" alt="Plots showing the amazing distance between our two binding heads, wow they're so pretty!!" />
 +
                        </div>
 +
                        <div class="imgcolumn">
 +
                          <img src="https://static.igem.org/mediawiki/2018/b/b2/T--Toulouse-INSA-UPS--Callum-Model-cnstr_distance.png" alt="Plots showing the amazing distance between our two binding heads, wow they're so pretty!!" />
 +
                        </div>
 +
                    <p style="font-size: 15px" ><i><b>Figure 10:</b> Evolution over time of the distance between the mSA2 binding site and AzF alpha carbon for the free biotin (left) and restrained biotin (right) simulations</i></p>
 +
                      </div>
 +
 +
                    <p>The graphs indicate that this distance stays between 45 and 70 Å. This is a very satisfactory result, showing that we can design uses for our platform both with separate functions used independently and with two halves of a molecule that need to be assembled to produce a function. We had also considered using FRET to test this distance experimentally, which is now possible seeing as the distance between these two heads remain at an appropriate distance to observe interactions of that nature.</p>
 +
 +
                    <p>The Root Mean Square Deviation (RMSD) calculates the sum of the geometric distances between each atom of a molecule, compared to the first frame of the simulation. Through this, we can study the variability of our protein and, by selecting only specific residues, the variability of certain regions of it. </p>
 +
 +
 +
                    <img src="https://static.igem.org/mediawiki/2018/2/2b/T--Toulouse-INSA-UPS--Callum-Model-simulation_RMSD.png" alt="plots for showing the RMSD of CBM3a vs the N-ter linker" />
 +
                    <p style="font-size: 15px" ><i><b>Figure 11:</b> Evolution of the RMSD of the different domains of our Cerberus protein over time: N-terminal linker in yellow, mSA2 in red and CBM3a in green</i></p>
 +
                    <p>The structured domains keep their stability over time, with the RMSD of the CBM3a staying at around 2 Å throughout the simulation, and the streptavidin at around 4 Å.</p>
 +
                    <p>The N-terminal linker however shows great flexibility, never keeping the same position during the time period studied here, with an RMSD peaking at close to 60 Å. The C-terminal linker presents a similar behaviour, with peaks at around 40 Å, it is not shown here for clarity’s sake. </p>
 +
 +
 +
            </div>
 +
 
             <div class="tab-pane fade" id="Model_Part4" role="tabpanel">
 
             <div class="tab-pane fade" id="Model_Part4" role="tabpanel">
 
                     <h2 id="Conclusion" class="heavy">Conclusion</h2>
 
                     <h2 id="Conclusion" class="heavy">Conclusion</h2>
 
                     <hr/>
 
                     <hr/>
                    <p> Calculations took a total of 100,000 hours of CPU time. Over 85GB of data was generated. </p>
+
                     <p>Thanks to these results, we are able to determine through multiple methods that out molecular binding platform behaves as planned when we designed it. Using the LAAS’ innovative robotics inspired methods, we overcame the challenge presented by the Intrinsically Disordered Regions. This novel approach circumvented the limitations that classic modelling software encounters and allowed us to progress in our project.</p>
                     <p> Thanks to these results, we are able to certify that our functional domains remain appropriately structured and retain their binding properties. Although we observe that the biotin molecule becomes released from the Streptavidin docking site, this is most likely due to small changes that occur in the arrangement of the various tertiary structures that compose it. However, it is important to remember that all of these data are obtained through mathematical approximations, and that it is important to verify them <i>in situ</i>. It would be extremely difficult to certifiably determine how affine for biotin our SAv domain is in the final structure through molecular modelling alone.</p>
+
                     <p>By measuring the RMSD of each domain of our protein and following their positions over the duration of our simulations, we can confirm that our linker regions are too flexible to resolve a definite conformation. Despite this extreme variability, we never observed entanglement between them, nor did we see them bring the binding sites of our domains into contact. With this information, we can also be sure that the linkers shouldn't adopt a conformation where these binding sites become unavailable to their ligands.</p>
                     <p>By measuring their RMSDs and following their positions over the duration of our simulations, we can confirm that our linker regions are too flexible to resolve a definite conformation. Despite this exteme variability, we never observed entanglement between them, nor did we see them bring the binding sites of our domains into contact. With this information, we can also be sure that the linkers shouldn't adopt a conformation where these binding sites become unavailable to their ligands. </p>
+
                    <p>Although we observe that the biotin molecule exits the streptavidin docking site during the molecular dynamics simulation, this occurs quite often in protein molecular modelling and is not a great source of concern for the viability of our platform. It is important to remember that all of these data are obtained through mathematical approximations, and that it is important to verify them <i>in situ</i>.</p>
                     <p> Finally, tracking the distance between the two distal heads showed that we could indeed attach medium to large funcitonal molecules to these heads without risking too many unwanted interactions. However, due to the fact that they remain within 100 A of each other, we could still imagine placing functions on the heads designed to interact with each other, for example to subunits of a protein that meet and activate under certain conditions.</p>
+
                     <p>Finally, tracking the distance between the two distal heads showed that we could indeed attach medium to large functional molecules to these heads without risking too many unwanted interactions. However, due to the fact that they remain within 100 Å of each other, we could still imagine placing functions on the heads designed to interact with each other, for example to subunits of a protein that meet and activate under certain conditions.</p>
 
                      
 
                      
 
             </div>
 
             </div>
Line 248: Line 297:
 
     </div>             
 
     </div>             
  
 
 
 
<h3>References</h3>
 
 
<i>
 
        <ol style=" font-size:15px;">
 
        <li> E. Morag et al., "Structure of a family 3a carbohydrate-binding module from the cellulosomal scaffoldin CipA of Clostridium thermocellum with flanking linkers: implications for cellulosome structure.", Acta Crystallogr Sect F Struct Biol Cryst Commun. 2013 Jul</li>
 
        <li>D. Demonte et al. , "Structure-based engineering of streptavidin monomer with a reduced biotin dissociation rate.", Proteins. 2013 Sep</li>
 
        </ol>
 
</i>
 
  
 
</div>
 
</div>

Revision as of 19:18, 16 October 2018

MOLECULAR MODELLING


Why do we want to model our platform?


Our new molecular binding platform contains three binding sites: Streptavidin (mSA2), a Carbohydrate Binding Module (CBM3a) and an unnatural amino acid, azidophenylanine (AzF). These “heads” of our Cerberus are connected by flexible regions, composed of 30-50 amino acids each. These regions called linkers are described in the literature to be highly flexible and as disordered regions 1 2.

A schematic representation of our system

Figure 1: General design of our Cerberus protein

For more information about our platform, visit the Description or Design pages.

As our linkers between domains are so flexible, it was possible that the CBM3a and mSA2 heads could be brought into contact. Interactions between the different molecular components of the platform could disturb its capacities to bind the target ligands. Their tertiary structures could also not be stable in the overall macromolecular assembly. In order to try to answer to all these questions, we implemented multi-step and multi-level molecular modeling strategy.

First, we built a 3D structure model of our binding platform and we assessed its stability over time. We also evaluated the distance between the different domains and its variations, which provide information about the size of the ligands that can be bound to the platform. We studied the interaction of our binding platform with target ligands. Especially, large-scale conformational sampling and exploration methods were used to assess the conformational rearrangements of these 3D structure models (the binding platform, either free or in interaction with the target ligands) over time, hopefully maintaining their binding capacities.

On this wiki page, we detail the different steps of our molecular modeling strategy and our thought process behind the solutions that we chose to use.

Strategy

To face all these challenges, we conceived a multi-level and multi-step molecular modelling workflow. Here is a graphical summary of this workflow:

workflow sketch

Figure 2: Workflow of our molecular modelling approach

References

  1. Gunnoo et al., Nanoscale Engineering of Designer Cellulosomes, 2016
  2. E. Morag et al., Structure of a family 3a carbohydrate-binding module from the cellulosomal scaffoldin CipA of Clostridium thermocellum with flanking linkers: implications for cellulosome structure, 2013

Material and Methods


Available X-ray data

3D structures of the CBM3a and mSA2 protein domains are available on the Protein Data Bank (PDB). We chose to use 4JO5 2 and 4JNJ 3, both solved by X-ray cristallography, with a resolution of 1.98 and 1.90 Å respectively.

After obtaining these files, we faced two problems. First, the linkers connecting our heads had never been resolved by biophysics methods, due to their flexibility and length. Secondly, azidophenylalanine (AzF) had never been modelled, in its native state or covalently bonded to another molecule. This process will be detailed further on in this page.

3D structure prediction

This did however present a complex challenge for the prediction methods used. Our protein is 366 amino acids in length, containing two protein domains structurally organised in protein secondary structures (CBM, mSA2) and two peptide linkers, one connecting the CBM and SAv and the other one free on the C-terminus of the protein. The linkers are known in the literature to be highly flexible 1 2> and as Intrinsically Disordered Regions (or IDRs) and therefore their structures have never been successfully resolved by biophysics methods.

The three prediction software tools we chose to use were I-TASSER, Swiss Model and MODELLER. This allowed us to obtain a 3D structure model of our protein on which we could start calculations.

I-TASSER is an online platform hosted by the Zhang Lab (University of Michigan) that can perform 3D model protein prediction 4. It computes it by searching reference template in the PDB database through the LOMETS server, filling the unidentified regions by replica-exchange Monte Carlo Simulation. It then assemble the input structure to the 10 most probable models found on the database. It performs optimization of the structure and search for protein domain enrichment search on different databases such as Gene Ontology.

SWISS-MODEL service proposes a protein sequence homology based 3D modeling 5. It performs sequence homology research through BLAST and on the SWISS MODEL Template Library to begin the structural assembly. If there is no match on the database, the algorithm performs a Monte Carlo technique to find the best conformation.

MODELLER is an software for protein 3D structure modeling. It uses homology for template reference but also implements a structure reference based prediction of the most potent conformation 6.

Simulated Annealing

When dealing with a protein as large and complex as ours, 3D prediction approaches described above are not sufficient for obtaining an adequate structure. We then performed simulated annealing on the 3D model obtained through the previous approach. This process involves first a protein energy minimisation, then a series of heating cooling. This enables it to exit a local energy minimum and re-settle in a more global energy minimum.

Simulating Annealing was performed using sander, from the Amber16 suite. Amber is a suite of biomolecular simulation programs used both for generating parameters for molecules to be simulated and for simulating those molecules over time ⁷. It was first released in the late 1970s and is kept up to date consistently and features a large range of tutorials and forums to help users. All simulations performed with Amber16 used the ff14SB force field for topology and parameters of standard amino acids in the protein...

The minimisation phase was performed over 5,000 cycles to find the lowest energy point, holding the streptavidin and CBM3a domains fixed.

Orthos was heated for 200 picoseconds at 330K with an integration time of 2 femtoseconds, then once more at 310K with the same time parameters.

IDR Conformational Sampling and Exploration

The challenge here was the complexity of the IDRs. During the previous phase of our study, we were faced with the limitations of classic protein modelling methods. These approaches generally search for secondary structures (helices and sheets) connected by short flexible connectors (3-10 residues), but when faced with a long region lacking in secondary structures, they fail to predict an appropriate structure.

To overcome this limitation, we therefore needed to find an innovative method well-suited to predict the 3D conformation of long IDRs.

The LAAS, Laboratoire d'Analyses et d'Architecture des Systèmes, a French National Scientific Research Centre, developed original approaches for efficient large-scale molecular conformational sampling and exploration. These methods rely on a modelisation of molecular structures as polyarticulated mechanisms 8. Based on this modelisation, efficient algorithms derived from robotics have been adapted to explore molecular rearrangements. These algorithms allow avoiding expensive molecular modeling methods and accessing macromolecular rearrangements, thus overcoming limitations of traditional approaches.

These methods have recently been extended to explore IDRs and generate realistic conformational ensembles, and are integrated into the Psf-Amc library developed at LAAS. By disregarding secondary structure and seeking simply the angles between amino acids in an IDR, their approach builds protein chains incrementally from N- to C-termini. A database of experimentally-determined angles between tripeptides is sampled for each step, and the angle selected in this manner is added to the chain, then checked for collisions.

Two strategies can be used. The first, Single Residue Sampling (SRS), considers each amino acid individually to find appropriate angles, ignoring the residues flanking it (except if it is followed by a proline). The second, Triple Residue Sampling (TRS), takes the neighbouring amino acids into account when selecting the angles to add to the chain. This does require additional constraints applied onto the selection, to respect the angles found when adding the previous amino acid. The TRS strategy requires more calculation time, but is better at finding partially structured regions in the protein chain 8.

AzF Building and Parameterization

Incorporating an unnatural amino acid into a protein is a complex task not just for molecular biologists, but also for molecular modellers. Most protein prediction and modelling algorithms use predefined force field parameters for the 20 canonical amino acids, which simplify parameterization. Using AzF meant building the 3D structure and then generating specific parameters for it. Rather than go through this whole pipeline multiple times, we opted to construct it bonded with the DBCO-Fluorescein, the fluorophore that we were using in the wetlab. This proved harder to accomplish than originally planned.

A first 3D structure was built using Avogadro, an advanced molecular editor and visualiser 9, from a 2D model of AzF-DBCO-fluorescein molecule generated by ChemDraw. Next, the parameters of the molecule were generated using the GAFF force field 10 and different modules (antechamber, LeAP, parmchk) of the AMBER biomolecular software suite. The main difficulty here was to incorporate the functionalised AzF residue as a part of the main protein, which involves creating covalent bonds to the neighbouring amino acids.

Here, we needed to align the structure of our AzF residue with that of the phenylalanine near the end of the C-ter linker, remove the phenylalanine then add the AzF with all the correct bonds (and none of the wrong ones).

Biotin is a well known case study of ligand - receptor, so building and parameterizing it was easier to perform. Additionally, this did not require any new bonds in our protein, as the interaction is not covalent. Again, we used the antechamber force field of AMBER18 to create the necessary files. Placing it into the correct position in the streptavidin binding site was used by aligning our streptavidin domain with the X-ray structure of the streptavidin-biotin complex (PDB: 4JNJ 3).

Explicit Solvent Model

To consider the protein as it behaves in its environment, we decided to simulate our platform explicitly in its solvent. This means adding a great number of water molecules around it, using the TIP3P model, and calculating values for all of these new interactions, instead of simply simulating the general physical and chemical conditions of the solvent. Due to the great flexibility of our linkers, we needed to generate the box large enough to accommodate the full length of our Cerberus protein when these linkers are fully stretched out. We therefore left a minimum distance of 20 Å around our protein when generating the water box to contain it, in the shape of a truncated octahedron.

In total, this amounted to over 40,000 water molecules in our model, and 127,995 atoms with the protein. Incorporating this number of new atoms into out model meant adding millions, maybe even billions of new interactions.

The equilibration phase decreases the temperature of the protein and its solvent to allow it to settle back down into a local medium, usually a lower one than the previous minimum attained.

5 Na+ ions were also added to neutralise the protein.

This model represents the full Cerberus system. It features the unnatural amino acid incorporated into the C-terminal linker, functions added onto the two binding sites that we wished to explore and characterise and a complete simulation environment containing water molecules and sodium ions. We were then ready to prepare it for the molecular dynamics phase.

Molecular Dynamics

Preparation of molecular dynamics simulations consisted of initial minimization steps (steepest descent and conjugate gradient methods) where atomic positions of solute were first restrained using a harmonic potential. The force constant was then progressively diminished along the minimization procedure until a final unrestrained minimization step. We first minimised our system over 5,000 steps with restraints on the whole protein to minimise the solvent and ions, then reduced the restraints gradually over five minimisation cycles, for 1,000 steps each cycle. One final minimisation was performed with no restraints for 10,000 steps.

The minimization procedure was then followed by a slow heating to 310 K over a period of 0.1ns, with a moderate restraint placed on the protein. At the final required temperature, the system was equilibrated for 100 ps using constant temperature (310 K) and pressure (1 bar) conditions via the Berendsen algorithm with a coupling constant of 0.2 ps for both parameters. Along the equilibration, the restraints on the protein were gradually diminished. Electrostatic interactions were calculated using the Particle-mesh Ewald method with a nonbonded cutoff of 10 A. All bonds involving hydrogen were constrained with the SHAKE algorithm, permitting the use of 2 fs time steps to integrate the equations of motions.

Next came the production phase, where we can observe how the model behaves over a large period of time. Depending on the size of the system and the resources available, this phase can last from a few nanoseconds up to a few hundred ns. Due to the explicit solvent in our model and the size of our platform, we needed vast amounts of processing power to simulate it for a long time.

Thanks to CALMIP, a regional high performance calculation mesocentre, we were able to model our platform for a sufficient period and obtain useable results. We used 100,000 CPU hours on their servers for our simulations, generating over 85GB of data.

Two separate production phases were performed. The first involved our Cerberus protein in its water box, with no constraints on any region of the system. It ran for 120ns. The second featured a distance constraint between the sulfur atom of the biotin molecule and the alpha carbon of the nearby glycine 14 residue, to keep it within range of its associated binding site. This one ran for 80ns.

Both simulations were performed at constant temperature (310K) and pressure (1 bar) and with an integration time step of 2fs.

NAMD is a parallel molecular dynamics code designed for high-performance simulation for large systems 11. NAMD was developed by the Theoretical and Computational Biophysics Group in the Beckman Institute for Advanced Science and Technology at the University of Illinois at Urbana-Champaign. It is based on Charm++ parallel object, and scales well on up to thousands of cores for the largest simulations. NAMD is compatible with the Amber force fields generated during the previous phases, it was therefore easy to reuse these parameters.

Simulations were performed on Amber16 for the minimisation, heating and equilibration phases of the dynamics workflow, and NAMD for the production phase due to its higher multiprocessor capacities. The first three steps were run on 80 processors in parallel, whereas production was performed on 1000.

References

  1. Gunnoo et al., Nanoscale Engineering of Designer Cellulosomes, 2016
  2. E. Morag et al., Structure of a family 3a carbohydrate-binding module from the cellulosomal scaffoldin CipA of Clostridium thermocellum with flanking linkers: implications for cellulosome structure, 2013
  3. D. Demonte et al. , Structure-based engineering of streptavidin monomer with a reduced biotin dissociation rate, 2013
  4. J Yang et al. , The I-TASSER Suite: Protein structure and function prediction, 2015
  5. A. Waterhouse et al., SWISS-MODEL: homology modelling of protein structures and complexes, 2018
  6. B. Webb, A. Sali, Comparative Protein Structure Modeling Using Modeller, 2016
  7. R. Salomon-Ferrer, D.A. Case, R.C. Walker, An overview of the Amber biomolecular simulation package, 2013.
  8. A. Estana, N. Sibille, E. Delaforge, M. Vaisset, J. Cortes, P. Bernado, Realistic ensemble models of intrinsically disordered proteins using a structure-encoding coil database, in press (submitted Sep 2018).
  9. M. Hanwell et al., Avogadro: An advanced semantic chemical editor, visualization, and analysis platform, 2012
  10. J. Wang, Development and Testing of a General Amber Force Field, 2004
  11. James C. Phillips et al., Scalable molecular dynamics with NAMD, 2005

Results


3D Structure Prediction

Only Modeller provided us with a satisfactory structure, most likely due to its improved de novo prediction capacities and the usage of restraints on the final model. The other two algorithms struggled to suitably resolve the position of the linkers.

3D structure of our Orthos protein from Modeller

Figure 3: 3D structure of Orthos obtained from Modeller

We can see that the linkers seem to tangle together quite significantly.

Simulated Annealing

After completing these phases, we obtained this structure:

3D structure of our Orthos protein from Sim Anneal

Figure 4: 3D structure of Orthos obtained after simulated annealing

We can see that once again, the linkers are considerably tangled. We wanted to investigate the possibilities further, which is why we continued the strategy with the robotics inspired methods.

IDR Conformational Sampling and Exploration

Both the SRS and TRS methods were applied to Orthos, generating 1,000 structures each.

Thanks to this large scale sampling of possible conformations, we conducted a statistical analysis studying the distance between the alpha carbon of the phenylalanine 366 and the alpha carbon of the glutamic acid 32 of the streptavidin binding site and the global RMSD of the protein.

SRS simulations GIF

Figure 5: Various positions of the mSA2 domain (in red) in relation to the CBM3a (in green) as calculated by the SRS strategy developped by the LAAS, showing the distance between the mSA2 binding site and AzF alpha carbon (dotted line)

Different positions the SAv head can adopt, from SRS
Different positions the SAv head can adopt, from TRS

Figure 6: Various positions of the mSA2 domain (in blue or red) in relation to the CBM3a (in green) as calculated by the SRS strategy (left) and the TRS strategy (right)

SRS distance distribution
TRS distance distribution

Figure 7: Distribution of the mSA2 binding site - AzF distance for each conformation calculated by the SRS strategy (left) and TRS strategy (right)

SRS RMSD distribution
TRS RMSD distribution

Figure 8: Distrubtion of the global RMSD for each conformation calculated by the SRS strategy (left) and TRS strategy (right)

The figures shown directly above were generated by loading 5 different structures calculated by the LAAS, then aligning the CBM3a domain of each of them. As you seen on the figures, both of these simulations show the range of positions that the streptavidin head can take in relation to the CBM3a, also demonstrating the high flexibility of the linkers.

We selected the most represented overall conformation of our protein, and therefore the most probable, by analysing these parameters to continue our strategy.

Molecular Dynamics

GIF of our protein simulation

Figure 9: Restrained biotin simulation, showing the distance between the mSA2 binding site and the AzF alpha carbon (dotted line)

Above, you can see a video generated from the structural data generated by taking snapshots every 0.5ns along our simulation. By observing its general structure, we can see that the linker sections are indeed very flexible but seem to adopt some preferred positions.

For example, the N-terminal linkers sticks to one side of the CBM3a domain due to hydrogen interactions. We can also see the biotin molecule exiting its binding site after around 15ns.

We also tracked the distance between the alpha carbon of the azidophenylalanine residue and the glutamic acid 32 residue. Extracting the raw numbers for each frame displayed allowed us to study its evolution closely, as you can see on these plots:

Plots showing the amazing distance between our two binding heads, wow they're so pretty!!
Plots showing the amazing distance between our two binding heads, wow they're so pretty!!

Figure 10: Evolution over time of the distance between the mSA2 binding site and AzF alpha carbon for the free biotin (left) and restrained biotin (right) simulations

The graphs indicate that this distance stays between 45 and 70 Å. This is a very satisfactory result, showing that we can design uses for our platform both with separate functions used independently and with two halves of a molecule that need to be assembled to produce a function. We had also considered using FRET to test this distance experimentally, which is now possible seeing as the distance between these two heads remain at an appropriate distance to observe interactions of that nature.

The Root Mean Square Deviation (RMSD) calculates the sum of the geometric distances between each atom of a molecule, compared to the first frame of the simulation. Through this, we can study the variability of our protein and, by selecting only specific residues, the variability of certain regions of it.

plots for showing the RMSD of CBM3a vs the N-ter linker

Figure 11: Evolution of the RMSD of the different domains of our Cerberus protein over time: N-terminal linker in yellow, mSA2 in red and CBM3a in green

The structured domains keep their stability over time, with the RMSD of the CBM3a staying at around 2 Å throughout the simulation, and the streptavidin at around 4 Å.

The N-terminal linker however shows great flexibility, never keeping the same position during the time period studied here, with an RMSD peaking at close to 60 Å. The C-terminal linker presents a similar behaviour, with peaks at around 40 Å, it is not shown here for clarity’s sake.

Conclusion


Thanks to these results, we are able to determine through multiple methods that out molecular binding platform behaves as planned when we designed it. Using the LAAS’ innovative robotics inspired methods, we overcame the challenge presented by the Intrinsically Disordered Regions. This novel approach circumvented the limitations that classic modelling software encounters and allowed us to progress in our project.

By measuring the RMSD of each domain of our protein and following their positions over the duration of our simulations, we can confirm that our linker regions are too flexible to resolve a definite conformation. Despite this extreme variability, we never observed entanglement between them, nor did we see them bring the binding sites of our domains into contact. With this information, we can also be sure that the linkers shouldn't adopt a conformation where these binding sites become unavailable to their ligands.

Although we observe that the biotin molecule exits the streptavidin docking site during the molecular dynamics simulation, this occurs quite often in protein molecular modelling and is not a great source of concern for the viability of our platform. It is important to remember that all of these data are obtained through mathematical approximations, and that it is important to verify them in situ.

Finally, tracking the distance between the two distal heads showed that we could indeed attach medium to large functional molecules to these heads without risking too many unwanted interactions. However, due to the fact that they remain within 100 Å of each other, we could still imagine placing functions on the heads designed to interact with each other, for example to subunits of a protein that meet and activate under certain conditions.