The minicellulosome scaffold complex used in our project contains a domain that is thought to bind the scaffold to a cellulose fiber, called the cellulose binding domain (CBD). By doing so it is thought to greatly enhance the efficiency of the bound enzymes simply by proximity to the substrate. Molecular dynamics was used to simulate a CBD protein in proximity to cellulose to gain insight in the molecular interactions between them.
The CBD structure we are using was crystallized by Tormo et al. . They hypothesize that the binding interface between CBD and cellulose must be at a planar strip of aromatic residues on one side of the beta-sheet sandwich. It is thought that the CBD binds to cellulose by stacking these aromatic amino acids on the ring structure of the glucose monomers in cellulose, as this is also the case in other known cellulose binding domains.
To verify the binding of CBD to cellulose, and to validate the proposed mechanism of binding, molecular dynamics modeling (MD) was used to simulate the binding of the CBD to cellulose. Furthermore, the resulting structure can be used to determine a free energy of binding.
MD is ideally suited for subjects involving biomolecules as these are often difficult to visualize experimentally. In particular, version 2.2 of the MARTINI forcefield is used . The forcefield in MD describes the interactions between all the different particles. Many different forcefields exist, but the coarse grained MARTINI forcefield is ideal for cheap, lengthy simulations of proteins. More recently polysaccharides have been properly parameterized for Martini 2 as well .
Figure 0: Our minicellulosome contains 3 enzymes and a cellulose binding domain bound together by a scaffold protein. An AGA2 mating receptor is fused to the scaffold to allow the scaffold to adhere to the cell surface.
To start, the CBD structure from Tormo et al. was used for its structural information. The structure was then coarse grained using the martinize tool. An elastic network was added to preserve the secondary structure of the protein, at a cutoff of 0.5 nm. The cutoff was chosen to minimize artificial restraints, while keeping the structure of the beta sheets intact.
Next, the cellulose model from Lopez et al.  was used to model cellulose in MARTINI 2. An infinite fiber spanning over the periodic boundary condition was realized by adding bonds between the first and last glucose monomers, effectively realizing an infinite fiber.
Cellulose and CBD were put together in the same simulation and water and ions were added (150mM NaCl). As Martini 2 water is represented as one spherical bead for every 4 water molecules, it has a tendency to pack in a crystalline lattice structure. This system in particular had a high tendency to freeze, even at 30℃. Most likely this is due to the regular crystalline structure of the cellulose acting as a nucleation point for crystallization of the water beads. To prevent this, 10% of antifreeze beads were added. Antifreeze beads act just like regular water, except other water beads perceive them as slightly larger, thus disrupting crystalline packing. This should not impact the accuracy of the simulation.
Common parameters for martini simulations were used for dealing with electrostatics . Temperature coupling was achieved by using the Berendsen thermostat at a ref-t of 303.15 K. Pressure coupling was done semiisotropically with Parenello-Rahman at a ref-p of 1.0 bar. While the compressibility in the xy plane was set to 3e-4, the compressibility in the z-direction was set to 0. This was done to keep the infinite fiber in tact and to keep the box from collapsing. The simulations were run for at least 400ns which should provide plenty of sampling. This took about 6 hours of wall time on the Groningen University’s Peregrine cluster.
4 simulations were setup where the CBD was inserted in different orientations around the cellulose fiber. Figure 1 shows the distance between CBD and cellulose over time. In all 4 simulations CBD binds to cellulose relatively quickly and never leaves.
Figure 1: Distance between CBD and cellulose over time. Every run is plotted separately, all bind to cellulose and never leave within the timeframe of the simulation.
With adequate sampling, a contact map between CBD and cellulose can be computed (Figure 2). By calculating a distance matrix between CBD and cellulose, and averaging this matrix, a visual representation of how CBD binds to cellulose on average. By taking the minimum distance along one axis of this matrix, a plot can be generated showing which residues of CBD are on average closer to cellulose (Figure 3).
Figure 2: Distance matrix between the backbone beads of CBD and all cellulose atoms. Distance (nm) is represented by yellow (close) to blue (far). Patches between indices 45-70 and 100-120 of CBD show the closest distance.
Figure 3: The average distance to cellulose for every backbone bead in the sequence of CBD. There is a clear contact patch around indices 56-67 and around index 112.
When these close residues are colored on the atomistic structure (Figure 4), it is striking to see that they correspond to planar strip that was hypothesized by Tormo et al. to bind to cellulose. Furthermore this binding patch contains many aromatic amino acids as was also hypothesized.
Figure 4: Structural visualization of the binding patch of CBD as predicted by Figure 3. Yellow beads are cellulose, the protein backbone of CBD is colored cyan. The space filling protein sequence represents the lowest valleys in the by-sequence distance data. This patch includes 4 of the 5 binding residues as hypothesized by Tormo et al. and contains many aromatic residues.
To gain more insight in the hypothetical molecular interactions, the atomistic structure of CBD was overlaid over the coarse grained structure, yielding Figure 5. When residues 118, 112, 56, 57 and 67 are visualized as “sticks”, it is striking to see that they would be in contact with the cellulose fiber exactly in the way as described by Tormo et al. This result strongly implies that these residues are indeed responsible for binding to cellulose, at the very least to straight, singular fibers. The planar patch of amino acids likely maximizes the contact area between CBD and cellulose, providing a strong binding affinity between the two.
Figure 5: Interface between cellulose and CBD. Non binding CBD residues are not visible. The backbone of the protein is in cyan, cellulose is in yellow. The binding patch of CBD consists of Trp 118, Arg 112, Asp 56, His 57 and Tyr 67.
Additionally a movie was rendered illustrating the strong binding between CBD and cellulose. The movie shows the separate CBD floating in aqueous solution. After some time it finds the cellulose fiber and binds at the binding site as hypothesized by Tormo et al.
Now that we have a configuration of the CBD strongly bound to cellulose, it would be interesting to measure the free energy of binding between the CBD and cellulose. This free energy of binding can be derived from the Potential of Mean Force (PMF). The PMF is a property that cannot be directly calculated from mere configurations and trajectories, but rather it has to be determined ‘experimentally’. By using steered MD simulations (SMD), an artificial force is applied on the CBD with the goal of increasing the distance between CBD and cellulose. As restorative forces keep the two molecules together, a high force is initially required to separate the two molecules. When the CBD is far enough away from cellulose, the force required to pull the CBD further away is low. The difference between these forces is known as the Potential of Mean Force.
The first pulling simulation was started to generate starting configurations for umbrella sampling. Figure 6 shows the pulling force increases over time to overcome the restorative forces of the molecular interactions, and decreases when CBD is pulled loose from cellulose. It is important to note that this pulling force is not directly related to the free energy of binding. The binding free energy is the difference in free energy between the bound and unbound states, and this is not what is measured here. Instead, umbrella sampling of the free energy landscape along the reaction coordinate is needed to probe the real difference in free energy between bound and unbound states.
Figure 6: Pulling force along the reaction coordinate over time. At first pulling force increases linearly. After CBD is pulled loose, the pulling force is allowed to decrease until the specified rate of pulling is reached.
Umbrella sampling was used to determine the potential of mean force of the interaction between cellulose and CBD. Umbrella sampling is a widely established method by which higher energy configurations can be sampled . When the reaction coordinate of the desired event is known, a molecule can be forced along this coordinate by steered MD. Then, separate configurations along the reaction coordinate are taken and restrained by an umbrella potential that mirrors the underlying local energy landscape. This makes it possible to sample the energy landscape along the entire reaction coordinate. This difference between the energy minimum and maximum in this landscape represents the free energy between both states (Figure 7).
Figure 7: Schematic representation of umbrella sampling. First starting configurations are generated by pulling along the reaction coordinate. Then these configurations are restrained by overlapping umbrella potentials, thereby sampling the energy landscape along the entire reaction coordinate. Figure taken from http://www.mdtutorials.com/gmx/umbrella/05_pull.html .
To setup the system for umbrella sampling, first the box from the finished simulation was enlarged as to provide plenty of room for pulling the CBD away from cellulose. The x axis of the box was changed to 30 nm, and the x axis was also used as the direction for pulling. Next, the system was resolvated and equilibrated.
Umbrella sampling windows of ~0.2 nm were chosen. These configurations were then equilibrated shortly and subjected to the harmonic umbrella potential for 3 ns with a force of 900 kJ/mol. Next gmx wham was used to extract the PMF over distance (Figure 8). As the Center Of Mass (COM) was some distance away from the COM of cellulose from the beginning, the PMF plot does not start at 0. The energy distance between the lowest energy state (binding to cellulose) and the highest energy state (unbound CBD) was determined to be -41.74 kJ/mol.
Figure 8: Plot of the potential of mean force for CBD-cellulose dissociation. PMF converges after around 2.5 nm of pulling. The energy difference between the lowest and highest energy states is roughly -41.74 kJ/mol.
When compared to partitioning free energies of around -30 kJ/mol, measured by Arola et al. , a dG of -41.74 seems a bit high. As this particular CBD was not measured by Arola et al. it is possible that our CBD of CipA3 Clostridium thermocellum does bind more strongly to cellulose than these other CBDs. It may be worthwhile to repeat the experiment in the soon to be released Martini 3 forcefield , as it is better equipped to handle ring structures and would allow for a more accurate description of both cellulose and the interaction site. This would require reparameterizing cellulose for Martini 3 however.
In conclusion, CBD binding to cellulose was modeled using coarse grained molecular dynamics. The results verify that CBD indeed binds to cellulose using the mechanism described by Tormo et al. The binding seems to be very strong as the simulations show CBD never leaves cellulose after it has bound. This observation was verified by PMF calculations, which determined the binding free energy between CBD and cellulose to be -41.74 kJ/mol which is a very strong bond according to experimental data from literature. These results justify keeping the CBD in our minicellulosome as the strong binding to cellulose will most likely enhance activity of the complex by proximity to cellulose.
 Tormo, J., Lamed, R., Chirino, A. J., Morag, E., Bayer, E. A., Shoham, Y., & Steitz, T. A. (1996). Crystal structure of a bacterial family-III cellulose-binding domain: a general mechanism for attachment to cellulose. The EMBO Journal, 15(21), 5739–5751. Retrieved from http://www.ncbi.nlm.nih.gov/pubmed/8918451
 Monticelli, L., Kandasamy, S. K., Periole, X., Larson, R. G., Tieleman, D. P., & Marrink, S. J. (2008). The MARTINI coarse-grained force field: Extension to proteins. Journal of Chemical Theory and Computation, 4(5), 819–834. https://doi.org/10.1021/ct700324x
 Schmalhorst, P. S., Deluweit, F., Scherrers, R., Heisenberg, C. P., & Sikora, M. (2017). Overcoming the Limitations of the MARTINI Force Field in Simulations of Polysaccharides. Journal of Chemical Theory and Computation, 13(10), 5039–5053. https://doi.org/10.1021/acs.jctc.7b00374
 López, C. A., Bellesia, G., Redondo, A., Langan, P., Chundawat, S. P. S., Dale, B. E., … Gnanakaran, S. (2015). MARTINI coarse-grained model for crystalline cellulose microfibers. Journal of Physical Chemistry B. https://doi.org/10.1021/jp5105938
 De Jong, D. H., Baoukina, S., Ingólfsson, H. I., & Marrink, S. J. (2016). Martini straight: Boosting performance using a shorter cutoff and GPUs. Computer Physics Communications. https://doi.org/10.1016/j.cpc.2015.09.014
 Domański, J., Hedger, G., Best, R. B., Stansfeld, P. J., & Sansom, M. S. P. (2017). Convergence and Sampling in Determining Free Energy Landscapes for Membrane Protein Association. Journal of Physical Chemistry B. https://doi.org/10.1021/acs.jpcb.6b08445
 Arola, S., & Linder, M. B. (2016). Binding of cellulose binding modules reveal differences between cellulose substrates. Scientific Reports. https://doi.org/10.1038/srep35358