Bioinformatics

“To do something that you feel in your heart that's great, you need to make a lot of mistakes. Anything that's successful is a series of mistakes.”- Billie Joe Armstrong

Software

Collaboration

Model

Overview

Working with proteins as large and potentially toxic as botolinium neurotoxin C without an experimentally derived structure poses a challenge for wetlab scientists. Any modifications on the protein may impair the structure and function of the protein in unexpected ways. Hence, we employed homology modeling algorithms to first-time derive the completely assembled structure of all botolinium neurotoxin C domains. To assess the deduced structure, and various mutations performed on the protein to detoxify it, we conducted molecular dynamics simulations. The immune system does not only attack malicious molecules, but also proteins which were not originally part of the body, but got injected as parts of drugs. Therefore, to improve the potential usage of botolinium neurotoxin C as a shuttle mechanism for fusion proteins we developed a deimmunization workflow applicable to proteins.

Structure Elucidation

'Form follows function' is a principle routinely applied in industrial design and modern architecture. However, this concept also commonly applies to proteins. Botolinium neurotoxin C's structure complete structure with all domains. has so far not been experimentally derived in the laboratory and submitted to publicly accessible database. Detailed knowledge of the structure of botulinum neurotoxin C may help scientists to aid in the general understanding of the protein and to assess the impact of sequence modifications on the protein, especially in the context of safety. Instead of generating the structure of botolinium neurotoxin C using crystallography, we modeled a theoretical structure using homology modeling. Since the generated structure has not been derived in the laboratory, we verified the structure's folding capabilities and robustness using molecular dynamics simulations.

Homology Modeling

Homology modeling describes the process of the construction of an atomic-resolution model of a target amino acid sequence using experimental three-dimensional structures of related homologous proteins templates. The identification of suitable already known protein structures which resemble the structure of the query sequence and the following alignment of residues of the query sequence to residues in the template sequences heavily influence the quality of the homology model. Researchers have shown that protein structures are very conserved amongst homologues, even more so than protein sequences^[]. Sequences with less than 20% sequence identity are likely to have different three-dimensional strucutes.^[] The presence of alignment gaps in either solely the target or the template further complicate the modeling process, since they indicate a structural region present in only one of the two structures. Moreover, it has been shown that the quality of the homology model gradually decreases with the sequence identity. A typical homology model has ~1–2 Å root mean square deviation between the matched Cα atoms at 70% sequence identity but only 2–4 $Cα agreement at 25% sequence identity. Moreover, loop regions, where amino acid sequences of target and template proteins may completely differ, usually contain more errors^[]^[]. The generated sequence alignment is then used for the creation of a structural model of the target protein. Significant structural similarity can usually be derived from the detectable levels of sequence similarity, since protein structure is more conserved than DNA^[].

Materials and Methods

Our molecular dynamics simulations were done using GROMACS ^[], a free, open-source software design for simulations of proteins, lipids and nucleic acids. GROMACS' processor specific optimization and extensive GPU support results in one of the fastest software packages available^[].
This allowed us to run molecular dynamics simulations even very large proteins such as BoNTC. The homology modelling was performed using the full automated SWISS-Model web servers, which computes trustworthy results regarding stability and accuracy. SWISS-Model finds the closest homologs of the input sequence via BLAST and transfers the coordinates defined by the target-template alignment followed by refinement steps.^[]

Results

In order to compute a probable structure for our botulinum toxin C, we used the SWISS-Model fully automated web server for homology modelling, which automatically searches the most closely related sequences with known experimentally determined tertiary structure. ( 10.1093/nar/gky42) The best match for the botulinum neurotoxin c was the botulinum neurotoxin b, which has a sequence identity of ~34 % and a sequence similarity of ~52 % . We created the homology model based on the structure 2PN0 as template (Figure 1).

Figure 1: Protein strucutre of the homology model of botulinum neurotoxin c based on class B neurotoxin. The light chain of the proteins is defined as the globular domain on the left of the picture (blue), with a central zinc ion (grey). The remaining part of the protein is defined as the heavy chain

To check the quality of the homology model, we checked the zinc coordination of the active site of the protein with the participating amino acids. Especially the geometry and the distance of the amino acids to the zinc ion are particularly interesting, since those factors are crucial for the stability of the active site itself. We used the known protein structure of the botulinum toxin light chain, which has been crystallized without the heavy chain (PDB ID: 2QN0).
The homology modeled botulinum neurotoxin shows a similar zinc coordination then the actual light chain structure, with similar distance between the amino acids and the zinc ion. Examining the Ramachandran plot, which plots the psi and phi angles of the backbone atoms, no irregularities were found, which indicates a reasonable tertiary structure. According to the placement of the residues in the Ramachandran plot due to their phi and psi angles, it is possible to identify unlikely structures.

Figure 2.1:
Active site of the homology model of the botulinum neurotoxin c. — Figure 2.1: Active site of the homology model of the botulinum neurotoxin c.

Figure 2.2: Active site of the botulinum neurotoxin c light chain (PDB ID: 2QN0)

Figure 3: Ramachandran plot of the botulinum neurotoxin c homology model

We then tried to verify the homology model using MD simulations. For this purpose, we used the MD simulation software GROMCAS 2016.3. For the force field we chose AMBER99SB since it is possible to model zinc containing protein. Furthermore, the AMBER99SB force field has been well established for such kinds of tasks.
However, we were not able to stabilize the zinc ion in the active site of the protein with reasonable distances between the zinc ion and the coordinating amino acids. As shown in Figure 1.1 and Figure 2.2 the initial distances between the zinc ion and the nitrogen atom of the two histidine amino acids was around two Angstrom. During the time of the simulation the distance of the histidine increased, because the histidine amino acids did not stay in their initial position. However the distance between the negatively charged glutamate and the zinc stayed the same.

We assumed that the structure of our botulinum toxin obtained from the homology modelling could be in an energetically suboptimal state. That’s we decided to create another homology modelling based on another botulinum neurotoxin. Since most neurotoxin share nearly every characteristic of their tertiary structure, we decided to compute another homology model based on the botulinum neurotoxin C was has a slightly different folded.
But also the simulations using the homology model based on the class E neurotoxin were not able to fix the zinc ion relatively to the histidine residues.

Then we decided to simulate the experimentally obtained crystal structure 2PN0, which resembles the actual distances between the atoms and therefore should be in the energetically optimal state. But also in the simulation of the actual light chain structure, there was a conformational refolding in the active site of the protein. We finally also tried to introduce harmonic distance restraints, which improved the situation of the conformational change, but did not completely solve the issue of the modelling a stable active site of the metalloprotease.

empty

References

Ingles-Prieto, A., Ibarra-Molero, B., Delgado-Delgado, A., Perez-Jimenez, R., Fernandez, J. M., Gaucher, E. A., ... & Gavira, J. A. (2013) Conservation of protein structure over four billion years. Structure, 21(9), 1690-1697.. https://doi.org/10.1016/j.str.2013.06.020
Chothia, C., & Lesk, A. M. (1986) The relation between the divergence of sequence and structure in proteins. The EMBO journal, 5(4), 823-826.. https://doi.org/10.1002/j.1460-2075.1986.tb04288.x
Sitbon, E., & Pietrokovski S. (2007) Occurrence of protein structure elements in conserved sequence regions. BMC structural biology, 7(1), 3.. https://doi.org/10.1186/1472-6807-7-3
Xiang, Z. (2006). (2006) Advances in homology protein structure modeling. Current Protein and Peptide Science, 7(3), 217-227.. https://doi.org/10.2174/138920306777452312
Martí-Renom, M. A., Stuart, A. C., Fiser, A., Sánchez, R., Melo, F., & Šali, A. (2000) Comparative protein structure modeling of genes and genomes. Annual review of biophysics and biomolecular structure, 29(1), 291-325.. https://doi.org/10.1146/annurev.biophys.29.1.291
Abraham, M. J., Murtola, T., Schulz, R., Páll, S., Smith, J. C., Hess, B., & Lindahl, E. (2015) GROMACS: High performance molecular simulations through multi-level parallelism from laptops to supercomputers. SoftwareX, 1, 19-25.. https://doi.org/10.1016/j.softx.2015.06.001
Kutzner, C., Van Der Spoel, D., Fechner, M., Lindahl, E., Schmitt, U. W., De Groot, B. L., & Grubmüller, H. (2007) Speeding up parallel GROMACS on high‐latency networks. Journal of computational chemistry, 28(12), 2075-2084. https://doi.org/10.1002/jcc.20703
Waterhouse, A., Bertoni, M., Bienert, S., Studer, G., Tauriello, G., Gumienny, R., ... & Lepore, R. (2017) SWISS-MODEL: homology modelling of protein structures and complexes. Nucleic acids research.. https://doi.org/10.1093/nar/gky427
----- (----) ---- ---. ---

^[]

----- (----) ---- ---. ---

^[]

----- (----) ---- ---. ---

^[]

----- (----) ---- ---. ---

^[]