Team:Tuebingen/Bioinformatics

Bioinformatics

To do something that you feel in your heart that's great, you need to make a lot of mistakes. Anything that's successful is a series of mistakes.- Billie Joe Armstrong
Title image


Software
BERT a powerful workflow for the deimmunization of proteins
Collaboration
Application of BERT on a protein of the iGEM team Paris-Bettencourt

Model

Overview

Working with proteins as large and potentially toxic as botolinium neurotoxin C without an experimentally derived structure poses a challenge for wetlab scientists. Any modifications on the protein may impair the structure and function of the protein in unexpected ways. Hence, we employed homology modeling algorithms to first-time derive the completely assembled structure of all botolinium neurotoxin C domains. To assess the deduced structure, and various mutations performed on the protein to detoxify it, we conducted molecular dynamics simulations. The immune system does not only attack malicious molecules, but also proteins which were not originally part of the body, but got injected as parts of drugs. Therefore, to improve the potential usage of botolinium neurotoxin C as a shuttle mechanism for fusion proteins we developed a deimmunization workflow applicable to proteins.


Structure Elucidation

'Form follows function' is a principle routinely applied in industrial design and modern architecture. However, this concept also commonly applies to proteins. Botolinium neurotoxin C's structure complete structure with all domains has so far not been experimentally derived in the laboratory and submitted to publicly accessible database. Detailed knowledge of the structure of botulinum neurotoxin C may aid scientists in the general understanding of the protein and to assess the impact of sequence modifications on the protein, especially in the context of safety. Instead of generating the structure of botolinium neurotoxin C using crystallography, we modeled a theoretical structure using homology modeling. Since the generated structure has not been derived in the laboratory, we attempted to verify the structure's folding capabilities and robustness using molecular dynamics simulations.


Homology Modeling

Homology modeling describes the process of the construction of an atomic-resolution model of a target amino acid sequence using experimental three-dimensional structures of related homologous proteins templates. The identification of suitable already known protein structures which resemble the structure of the query sequence and the following alignment of residues of the query sequence to residues in the template sequences heavily influence the quality of the homology model. Researchers have shown that protein structures are very conserved amongst homologues, even more so than protein sequences[]. Sequences with less than 20% sequence identity are likely to have different three-dimensional strucutes.[] The presence of alignment gaps in either solely the target or the template further complicate the modeling process, since they indicate a structural region present in only one of the two structures. Moreover, it has been shown that the quality of the homology model gradually decreases with the sequence identity. A typical homology model has ~1–2 Å root mean square deviation between the matched Cα atoms at 70% sequence identity but only 2–4 Cα agreement at 25% sequence identity. Furthermore, loop regions, where amino acid sequences of target and template proteins may completely differ, usually contain more errors[][]. The generated sequence alignment is then used for the creation of a structural model of the target protein. Significant structural similarity can usually be derived from the detectable levels of sequence similarity, since protein structure is more conserved than DNA[].


Materials and Methods

Molecular dynamics simulations allow for the mimicking of the movement and interactions of atoms and molecules. The system simulates the interactions of the atoms and molecules for a fixed period of time allowing for a detailed view of the dynamic progress of the system. Most commonly, the trajectories of the atoms are calculated by numerically solving Newton’s equations of motions for interacting particles. Molecular mechanics force fields are responsible for the estimation of interatomic potentials and energies. At each simulation step the forces on each atom are computed using the force field. Afterwards, the position and velocity of the atoms is updated by solving newton’s law of motion.
Our molecular dynamics simulations were done using GROMACS [], a free, open-source software design for simulations of proteins, lipids and nucleic acids. GROMACS' processor specific optimization and extensive GPU support results in one of the fastest software packages available[].
This allowed us to run molecular dynamics simulations even very large proteins such as BoNTC. The homology modelling was performed using the full automated SWISS-Model web servers, which computes trustworthy results regarding stability and accuracy. SWISS-Model finds the closest homologs of the input sequence via BLAST and transfers the coordinates defined by the target-template alignment followed by refinement steps.[]



Results

In order to compute a probable structure for our botulinum toxin C, we used the SWISS-Model fully automated web server for homology modelling, which automatically searches the most closely related sequences with known experimentally determined tertiary structure. The best match for the botulinum neurotoxin c was the botulinum neurotoxin b, which has a sequence identity of ~34 % and a sequence similarity of ~52 % . We created the homology model based on the structure 2PN0 as template (Figure 1).
Figure 1: Protein strucutre  of the homology model of botulinum neurotoxin c based on class B neurotoxin. The light chain of the proteins is defined as the globular domain on the left of the picture (blue), with a central zinc ion (grey). The remaining part of the protein is defined as the heavy chain
Figure 1: Protein strucutre of the homology model of botulinum neurotoxin c based on class B neurotoxin. The light chain of the proteins is defined as the globular domain on the left of the picture (blue), with a central zinc ion (grey). The remaining part of the protein is defined as the heavy chain
To check the quality of the homology model, we checked the zinc coordination of the active site of the protein with the participating amino acids. Especially the geometry and the distance of the amino acids to the zinc ion are particularly interesting, since those factors are crucial for the stability of the active site itself. We used the known protein structure of the botulinum toxin light chain, which has been crystallized without the heavy chain (PDB ID: 2QN0).
The homology modeled botulinum neurotoxin shows a similar zinc coordination then the actual light chain structure, with similar distance between the amino acids and the zinc ion. Examining the Ramachandran plot, which plots the psi and phi angles of the backbone atoms, no irregularities were found, which indicates a reasonable tertiary structure. According to the placement of the residues in the Ramachandran plot due to their phi and psi angles, it is possible to identify unlikely structures.
Figure 2.1: 
Active site of the homology model of the botulinum neurotoxin c.
Figure 2.1: Active site of the homology model of the botulinum neurotoxin c.
Figure 2.2: Active site of the botulinum neurotoxin c light chain (PDB ID: 2QN0)
Figure 2.2: Active site of the botulinum neurotoxin c light chain (PDB ID: 2QN0)
Figure 3: Ramachandran plot of the botulinum neurotoxin c homology model. The phi angle is shown on the x-axis and the psi angle is shown on the y-axis
Figure 3: Ramachandran plot of the botulinum neurotoxin c homology model. The phi angle is shown on the x-axis and the psi angle is shown on the y-axis
We then tried to verify the homology model using MD simulations. For this purpose, we used the MD simulation software GROMCAS 2016.3. For the force field we chose AMBER99SB since it allows the modelling of proteins containing metals like zinc. Furthermore, the AMBER99SB force field has been well established for such kinds of tasks.
However, we were not able to stabilize the zinc ion in the active site of the protein with reasonable distances between the zinc ion and the coordinating amino acids. As shown in Figure 2.1 and Figure 2.2 the initial distances between the zinc ion and the nitrogen atom of the two histidine amino acids was around two Angstrom. During the time of the simulation the distance of the histidine increased, because the histidine amino acids did not stay in their initial position. However the distance between the negatively charged glutamate and the zinc stayed the same.
We assumed that the structure of our botulinum toxin obtained from the homology modelling could be in an energetically suboptimal state. That’s why we decided to create another homology modelling based on another botulinum neurotoxin. Since most neurotoxin share nearly every characteristic of their tertiary structure, we computed another homology model based on the botulinum neurotoxin E which has a slightly different fold.
However, the simulations using the homology model based on the class E neurotoxin were not able to fix the zinc ion relatively to the histidine residues.
Then we decided to simulate the experimentally obtained crystal structure 2PN0, which resembles the actual distances between the atoms and therefore should be in the energetically optimal state. But also in the simulation of the actual light chain structure, there was a conformational refolding in the active site of the protein. We finally also tried to introduce harmonic distance restraints, which improved the situation of the conformational change, but did not completely solve the issue of the modelling a stable active site of the metalloprotease.
Figure 4: This visualisation of the active center of the botolinium toxin is shown exemplary for all MD simulations performed by us. It can be observed that the histidines are no longer in close contact with the zinc ion.
Figure 4: This visualisation of the active center of the botolinium toxin is shown exemplary for all MD simulations performed by us. It can be observed that the histidines are no longer in close contact with the zinc ion.


Discussion

Due to tight constraints on computational resources and time we were not able to work on this problem even further. A step in this process could have been custom parametrization of a force field for the zinc ion. But even then, success would have not been guaranteed, since other possible factors cannot be ruled out. Moreover, MD simulations are only simulations and sometimes may not be able to perfectly represent nature. Even though the MD models are by now quite sophisticated, they cannot always easily be applied to every protein. However, the boundaries of molecular dynamics simulations are shown, when complex geometry and binding angles are supposed to be simulated. Often those calculations are not correct or not even calculated at all, hence the simulation lacks certain interactions.
As the structure suggests, BoNTC has several forces acting in many different directions, rendering MD simulations difficult. In our case especially all kinds of forces played vital roles. The two times positively charged, for the toxic functions essential, zinc ion, interacts through electrostatic forces. However, as explained earlier, electrostatic forces are only a small part of the real interactions. The zinc ion is in reality also influenced by geometrical conditions, which were not simulated in the molecular dynamics simulation. Furthermore, there aren't many force fields available which can model metal ions and interactions with metal ions. As a result we were very limited in our choices of force fields. We thought of parameterizing a different force field specifically for metal ions and BoNTC, but quickly noticed that this would be far beyond the scope of the relatively short project time of iGEM.
Our results and extensive documentation of many different attempts summarize the current state of molecular dynamics simulations of complex proteins and the experimental field of theoretical structural biology in general. Molecular dynamics simulations, although a very powerful concept, can often lead to time consuming trial and error phases, where it is not guaranteed that a working configuration is finally found.