Team:Toulouse-INSA-UPS/Model

Modelling

Why do we want to model our protein?

Our novel fusion protein contains three binding sites: Streptavidin (SAv), a Carbohydrate Binding Module (CBM3a) and an unnatural amino acid, azidophenylanine (AzF). These “heads” of our Cerberus are connected by flexible regions that are 30-50 amino acids each. The linkers constitute Intrinsically Disordered Regions of our protein, or IDRs, due to this flexibility. These are known complex domains that lack a fixed or ordered 3D structure, and that can occur when predicting and modelling large proteins.

If possible, defining a range within which the linkers remain would give us a better idea of how the protein behaves in situ. This task presents a particular challenge, as it has never been successfully completed by crystallography.

As our linkers between domains are so flexible, it was possible that the CBM3a and SAv heads could be brought into contact. Their tertiary structures could also collapse due to new interactions between them. We needed to ensure that there will be no such interactions between these sites that would prevent them from binding to their intended ligands.

Incorporating an unnatural amino acid into a protein is a complex task not just for molecular biologists, but also for molecular modellers. Most protein prediction and modelling algorithms use predefined force field parameters for the 20 canonical amino acids, which simplify calculations. Using AzF meant building the 3D structure, minimising its potential energy then generating a specific force field for it. Rather than go through this whole pipeline multiple times, we opted to construct it bonded with the DBCO-Fluorescein fluorophore that we were using in the wetlab.

Finally, evaluating the potential interactions between ligands attached to our protein was the final step of our modelling process. For this, we had to evaluate the average distance between the different domains and its variations, which will also provide information about the maximum size of the ligands that we can use.

All of these questions can be answered through molecular modelling. However, each step requires its own approach and software. On this wiki page, we will detail these steps and our thought process behind the solutions that we chose to use.

How does (protein) molecular modelling work?

As biochemistry technics gives us a snapshot of the structure of a molecule, molecular modelling tends to integrate the multiples informations and interactions possible within a molecule and between a molecule and its environment. In order to model a protein, we first need to understand the chemicals basis around it.

Protein structures roughly consist of a polyamide main chain with fuctional ramifications decorating it. The energetical stability of the bonds generally determines the underlying structure. It is the important to define the different angles observed within the covalent bonds.

bond rotations

List interactions between distant atoms (VdW, EEL, any others?) Minimisation algorithms: theory, Steepest Descent vs Conjugated Gradient (cf Sophie’s classes, 2nd chapter) Molecular dynamics: theory, Monte Carlo, explicit vs implicit solvent Limitations

###Be succinct for this part! We ain’t teaching them for an exam!###

How did we model our protein?

3D structures of the CBM3a and SAv domains are available on the Protein DataBase (PDB). We chose to use 4JO5 (CBM3a-L domain with flanking linkers from scaffoldin cipA of cellulosome of Clostridium thermocellum) and 4JNJ (Structure based engineering of streptavidin monomer with a reduced biotin dissociation rate), both obtained by X-ray diffraction, with a resolution of 1.98 and 1.90 A respectively.

After obtaining these files, we faced two problems. First, the linkers connecting our heads had never been resolved by crystallography, meaning that any interactions between the two could theoretically occur. Second, azidophenylalanine had never been modelled, in its native state or clicked to another molecule. This process will be detailed further on in this page.

We decided to start by modelling our Cerberus under the Orthos form, with a simple phenylalanine replacing the AzF residue. Orthos was Cerberus’ little brother in Greek mythology, with only two heads, like this version of our fusion protein. This task was completed through homology-driven prediction algorithms. The three we chose to use were I-TASSER, Swiss Model and MODELLER. This allowed us to obtain a 3D structure of our protein on which we could start calculations.

From their website: “I-TASSER is a hierarchical approach to protein structure and function prediction. It first identifies structural templates from the PDB by multiple threading approach LOMETS, with full-length atomic models constructed by iterative template fragment assembly simulations. Function insights of the target are then derived by threading the 3D models through protein function database BioLiP.” (From their website)

From their website: “SWISS-MODEL is a fully automated protein structure homology-modelling server, accessible via the ExPASy web server, or from the program DeepView (Swiss Pdb-Viewer).” (From their website)

From their website: “MODELLER is used for homology or comparative modeling of protein three-dimensional structures. The user provides an alignment of a sequence to be modeled with known related structures and MODELLER automatically calculates a model containing all non-hydrogen atoms.” (From their website)

Only Modeller provided us with a satisfactory structure, most likely due to its improved de novo prediction capacities and the usage of restraints on the final model. The other two algorithms struggled to suitably resolve the position of the linkers.


From their website: “The term "Amber" refers to two things. First, it is a set of molecular mechanical force fields for the simulation of biomolecules (these force fields are in the public domain, and are used in a variety of simulation programs). Second, it is a package of molecular simulation programs which includes source code and demos.” Amber was used AMBER suite for modelling

From their website: “NAMD, recipient of a 2002 Gordon Bell Award and a 2012 Sidney Fernbach Award, is a parallel molecular dynamics code designed for high-performance simulation of large biomolecular systems. Based on Charm++ parallel objects, NAMD scales to hundreds of cores for typical simulations and beyond 500,000 cores for the largest simulations. NAMD uses the popular molecular graphics program VMD for simulation setup and trajectory analysis, but is also file-compatible with AMBER, CHARMM, and X-PLOR.” (From their website).

CALMIP: SANDER (Simulated Annealing with NMR-Derived Energy Restraints) for minimisation, heating and production

The LAAS, a French National Scientific Research Centre laboratory, developed a novel software package called Psf-Amc (ref: A. Estana, N. Sibille, E. Delaforge, M. Vaisset, J. Cortes, P. Bernado. Realistic ensemble models of intrinsically disordered proteins using a structure-encoding coil database. Structure, in press.). In robotics, by giving all the needed information about the various articulations in a mobile limb, AI software can calculate all positions which the arm can adopt. The LAAS applied this to proteins by defining subchains of the protein instead of individual atoms like other modelling software. By applying these possibilities and basic physics and chemistry equations to the different possible positions, they can very rapidly calculate hundreds of approximate models of the protein We used the Orthos model, with an phenyl replacing the AzF and no biotin attached to our SAv head. Two algorithms were applied to it, the first using Single Residue Sampling (SRS) and the second using Triple Residue Sampling (TRS). The figures shown here were generated by loading 10 different structures calculated by the LAAS, then aligning the CBM3a domain of each of them. This presents the stability of our protein As you seen on the figures, both of these simulations show that our complex protein structures keep their conformation, but also demonstrate the high flexibility of the linkers. With these structures, we were able to choose the most probable overall conformation of our Cerberus protein to continue modelling. The next step was to add in the functional molecules that we wished to attach to our protein. LAAS? AI methods, robotics, statistical sampling from existing 3D structures http://projects.laas.fr/Psf-Amc/

For the next step, we needed to determine a viable 3D model of azidophenylalanine, clicked onto a DBCO-fluorescein molecule. This proved harder to accomplish than originally planned.

The basic chemical structure was drawn using Avogadro, an advanced molecular editor and visualiser, from a drawing of the AzF-DBCO-fluorescein molecule. It was also briefly minimised here for optimal geometric conformation. Next, the AMBER suite (described below) was used to generate the force field of the molecule. This process calculates the various physical properties (mass, charge, energy, 
) of each atom in the structure, to be reused for the molecular dynamics step.

The complex amino acid then had to be integrated as a part of the protein, which occurs quite rarely in protein modelling. Usually, canonical amino acids are used, which the modelling algorithms understand simply. However, here, we needed to align the structure of our AzF residue with that of the phenylalanine at the end of the C-Ter linker, remove the phenyl then add the AzF with all the correct bonds to the neighbouring amino acids in the chain.

Generating biotin model and including that

Conclusions

Actuellement: 30 000 H de calcul réalisés! No non specific interactions found Linkers remain flexible and tend to cross, N ter seems to stick to one side of the CBM3a Size of ligands to be determined but for the moment, seems like quite a bit