Team:ETH Zurich/splitModel

For our approach B, DNA is used as a scaffold to bring two fusion-proteins together. Both proteins carry one half of the luciferase protein (either N-luc or C-luc), and upon DNA binding of the fusion-proteins the two parts of the luciferase are able to interact, reconstituting the complete and functional luciferase. To investigate how the fusion of N-luc and C-luc to OmpR and TetR, respectively, affect their structure and DNA binding, which distances between the two binding sites results in a close binding of the two proteins and which type of linkers facilitate the luciferase complementation, we simulated DNA-protein docking in silico. The graphs shown here were obtained with a distance of 10 bp between the two binding sites and a linker containing 3 repeats of flexible linker element.

Protein structure prediction of C-luc-FFF-TetR and OmpR-FFF-N-luc

To obtain a 3D structure of our fusion proteins, we used the protein structure prediction server I-TASSER [1,2,3]. This program uses a combination of both threading, in order to find template proteins of similar folds from the Protein Data Bank (PDB), and Monte Carlo-based simulations, which is needed for assembling the final protein structure. The program requires the amino acid of the fusion proteins as an input and allows specifying a template PDB file. The predicted proteins are displayed below. Comparing C-luc-TetR with PDB structures of the luciferase and TetR only, the parts of the fusion protein resembles the structure of the separate proteins. The same applies to the OmpR-NLuc fusion protein.

Cluc-FFF-TetR

OmpR-FFF-Nluc

3D structure of the fusion proteins obtained by the program I-tasser.

Evaluation and further improvement of C-luc-FFF-TetR and OmpR-FFF-N-luc 3D protein structures

We then analyzed both structures with the programmes ProSA [4, 5] and Procheck [6] . ProSA gives protein structures a Z-score, which is an overall quality score in the context of known protein structures. A plot is shown with the ranges of Z-scores for proteins of a similar size. A Z-score obtained outside of this range points towards an erroneous structure, while a z-score within this range indicates that the proteins were folded correctly. The results are displayed in figure 1 , both showing a z-score that lays within the ideal range.

Cluc-FFF-TetR

OmpR-FFF-NLuc

Z-scores obtained for the fusion protein

As a next step protein structures were analysed with Procheck which gives us a Ramachandran plot of the fusion proteins. This plot visualizes energetically allowed regions for the backbone dihedral angles (psi) against amino acid residues (phi). As our proteins contained multiple residues in ‘disallowed’ regions, we decided to use the program ModRefiner [7] to perform energy minimization in order to increase the quality of our models. The Ramachandran plots of the improved structures are displayed below.

Cluc-FFF-TetR

OmpR-FFF-NLuc

Ramachandran plots visualize energetically allowed regions for the backbone dihedral angles (psi) against amino acid residues (phi). Most of our residues can be found in the energetically favored regions.

The plots show a percentage of 88 and 82 % for residues in most favourable regions. Visualizing the structures using Pymol did not show drastic changes in 3D structure. Moreover, our scores closely resemble the 90% score that is needed to be called a good quality model as determined by the program. We can therefore continue with other fusion proteins to perform protein-DNA docking.

Protein – DNA docking

With the improved structures we separately performed Protein-DNA docking. A DNA strand containing the binding sites was created using C-DART, after which it served as an input, together with the refined structures, for the docking program HADDOCK [8] . HADDOCK exploits a data-driven approach, where information from NMR experiments, mutagenesis experiments as well as bioinformatic predictions are combined to find the optimal docking structures. The top 10 structures were checked and the most promising structures were fused to obtain an image of both structures docked to the same DNA strand. The best two options are shown below:

Final result 1

Final result 2

Docking of Cluc-FFF-TetR and OmpR-FFF-Nluc to the constructed DNA scaffolds

The first result shows a reconstitution of luciferase. However, when looking at previous pdb files of TetR bound to DNA ((https://www.rcsb.org/structure/1qpi) and PhoB (https://www.rcsb.org/structure/1gxp ) (an OmpR homologue, as no file of OmpR exists) it can be questioned if OmpR is likely to dock in this way. The second result has a docking profile that resembles both the TetR and PhoB docking, and is more likely to be correct. This means that we need to include linkers in our screening process that are longer than the one taking into account here, to increase the change of luciferase reconstitution.

Conclusion/Recommendations

Altogether, the in silico docking of our fusion proteins to the DNA showed us that reconstitution of the split luciferase is possible within our set-up. However, it is recommended to increase the number of repeats within the linkers as this will facilitate luciferase reconstitution.

Kinetic model of split-luciferase

Goals

Besides Protein-DNA docking, we also constructed an ODE model to guide the experimental design of our biosensor based on luciferase complementation. Simulating our system for different expression levels of transcription factor and receptor can help us to maximize the binding of our transcription factor to the DNA.

Model

Schematic representation of the split-luciferase kinetic model.

Tar-Envz (Taz) autophosphorylates itself upon ligand binding. This phosphate is quickly transferred to the transcription factor OmpR which facilitates its dimerization and binding to the DNA. As OmpR carries one half of the split luciferase protein, this part can complement the other half that is bound to the repressor TetR which is localized on the DNA next to the binding site of OmpR. Removal of the phosphate group of OmpR$-p$ happens by binding to an unphosphorylated Taz receptor.

Reactions

The reactions below represent the split-protein complementation system. As our reactions are happening on a post-translational level, we can assume that we don't have to take any protein production and degradation into account.

$Taz + L \leftrightarrow L-Taz$
$L-Taz \rightarrow L-Taz_{p}$
$L-Taz_{p} + OmpR \leftrightarrow L-Taz_{p}-OmpR$
$L-Taz_{p}-OmpR \rightarrow L-Taz + OmpR_{p}$
$OmpR_{p} + Taz \leftrightarrow Taz-OmpR_{p}$
$Taz-OmpR_{p} \rightarrow OmpR + Taz$
$OmpR + OmpR \leftrightarrow OmpR_{2}$
$OmpR_{p} + OmpR_{p} \leftrightarrow OmpR_{p2}$
$OmpR_{p2} + PompC \leftrightarrow OmpR_{p2}*$
$TetR + TetR \leftrightarrow TetR-2$
$TetR-2 + PTetR \leftrightarrow TetR*$
$TetR* + OmpR_{p2}* \leftrightarrow Luminescence$

From the reactions it follows that TetR repression can be tuned independently from the Envz/OmpR pathway: It is important to obtain a high concentration of TetR bound to the DNA in order to increase the luminsecent signal. Therefore our focus point for this ODE model will be the maximization of OmpR$_{p2}$ binding to the DNA.

Equations

\begin{equation} \frac{dL}{dt} = k_{-1} \cdot [L-Taz] - k_1 \cdot [L] \cdot [Taz] \end{equation}

\begin{equation} \frac{dTaz}{dt} = k_{-1} \cdot [L-Taz] - k_1 \cdot [L] \cdot [Taz] + k_{-5} \cdot [Taz-OmpR_p] - k_5 \cdot [OmpR] \cdot [Taz] + k_6 \cdot [Taz-OmpR_p] \end{equation}

\begin{equation} \frac{dL-Taz}{dt} = k_1 \cdot [L] \cdot [Taz] - k_{-1} \cdot [L-Taz] - k_2 \cdot [L-Taz] + k_4 \cdot [L-Taz_p-OmpR] \end{equation}

\begin{equation} \frac{dL-Taz_p}{dt} = k_2 \cdot [L-Taz] - k_3 \cdot [L-Taz_p] \cdot [OmpR] + k_{-3}*[L-Taz_p-OmpR] \end{equation}

\begin{equation} \frac{dOmpR}{dt} = - k_3 \cdot [L-Taz_p] \cdot [OmpR] + k_{-3}*[L-Taz_p-OmpR] + k_6 \cdot [Taz-OmpR_p] - 2 \cdot k_{dim1} \cdot [OmpR]^2 + 2 \cdot k_{-dim1}*[OmpR_2] \end{equation}

\begin{equation} \frac{dL-Taz_p-OmpR}{dt} = k_3 \cdot [L-Taz_p] \cdot [OmpR] - k_{-3}*[L-Taz_p-OmpR] \end{equation}

\begin{equation} \frac{dOmpR_p}{dt} = k_4 \cdot [L-Taz_p-OmpR] - 2 \cdot k_{dim2} \cdot [OmpR_p]^2 + 2 \cdot k_{-dim2}*[OmpR_{p2}] + k_{-5} \cdot [Taz-OmpR_p] - k_5 \cdot [OmpR] \cdot [Taz] \end{equation}

\begin{equation} \frac{dTaz-OmpR_p}{dt} = - k_{-5} \cdot [Taz-OmpR_p] + k_5 \cdot [OmpR] \cdot [Taz] - k_6 \cdot [Taz-OmpR_p] \end{equation}

\begin{equation} \frac{dPompC}{dt} = -k_8 \cdot [OmpR_{p2}] \cdot [PompC] + k_{-8} \cdot [OmpR_{p2}*] \end{equation}

\begin{equation} \frac{dOmpR_2}{dt} = k_{dim1} \cdot [OmpR]^2 - k_{-dim1}*[OmpR_2] \end{equation}

\begin{equation} \frac{dOmpR_{p2}*}{dt} = k_8 \cdot [OmpR_{p2}] \cdot [PompC] -k_{-8} \cdot [OmpR_{p2}*] \end{equation}

\begin{equation} \frac{dOmpR_{p2}}{dt} = k_{dim2} \cdot [OmpR_p]^2 - k_{-dim2}*[OmpR_{p2}] - k_8 \cdot [OmpR_{p2}] \cdot [PompC] + k_{-8} \cdot [OmpR_{p2}*] \end{equation}

Parameters

The parameters used within this model can be found here (parameter page). As this ODE model only provided qualitative insights for guiding the experimental set-up, no experimental data was incorporated.

Results

With the equations presented above, the influence of different Taz and OmpR protein levels on the binding of OmpR_p to the DNA was examined. The figure below shows the result for the induction of the pathway using the aspartate concentrations 10uM, 0.1mM an 1mM respectively. As Taz is a membrane receptor that is expressed in low amounts in the cell, we took a concentration range from 0 to 0.33 uM into account during the simulation. This responds to a maximum of 200 moleculs of Taz. For OmpR, we took a range from 0 to 4000 molecules, as the normal expression level of OmpR lays around 3000 molecules.

Concentration of phosphorilated OmpR bound to DNA as a function of the Taz and OmpR concentrations for different levels of ligand concentration.

All three figures show that level of Taz is critical for the amount of OmpR_p2 bound to the DNA. The higher the expression level of Taz, the more OmpR_p2 locates to the DNA. However, as overexpression of Taz leads to cell death, its best to put Taz under an inducible promotor. This will allow to tune the expression precisely. For OmpR a different situation applies. Already for a relative low expression of OmpR the maximum amount of OmpR_p2 bound to the DNA for a specific ligand concentration is achieved. Last, we see that the amount of OmpR$_p2 bound to the DNA increases with an increasing ligand concentration, confirming the possiblity of turning this pathway into a biosensor.

Conclusions

Our split-protein ODE model suggest to put the Taz receptor under an inducible promoter. This way the expression level can be tuned, obtaining a balance between a high protein expression and a cell that does keeps proliferating.

References

Y Zhang. I-TASSER server for protein 3D structure prediction. BMC Bioinformatics, 9: 40 (2008). doi: 10.1186/1471-2105-9-40.
J Yang, R Yan, A Roy, D Xu, J Poisson, Y Zhang. The I-TASSER Suite: Protein structure and function prediction. Nature Methods, 12: 7-8 (2015). doi:10.1038/nmeth.3213
Yang J, Yan R, Roy A, Xu D, Poisson J, Zhang Y. 2015. The I-TASSER Suite: protein structure and function prediction. Nature Methods. 12(1): 7-8. doi.org/10.1038/nmeth.3213
Wiederstein & Sippl (2007) ProSA-web: interactive web service for the recognition of errors in three-dimensional structures of proteins. Nucleic Acids Research 35, W407-W410.
Sippl, M.J. (1993) Recognition of Errors in Three-Dimensional Structures of Proteins. Proteins 17, 355-362
Laskowski R A, MacArthur M W, Moss D S, Thornton J M (1993). PROCHECK - a program to check the stereochemical quality of protein structures. J. App. Cryst., 26, 283-291
Dong Xu and Yang Zhang. Improving the Physical Realism and Structural Accuracy of Protein Models by a Two-step Atomic-level Energy Minimization. Biophysical Journal, vol 101, 2525-2534 (2011)
Van Zundert, G. C. P., et al. "The HADDOCK2. 2 web server: user-friendly integrative modeling of biomolecular complexes." Journal of molecular biology 428.4 (2016): 720-725.

[1] Y Zhang. I-TASSER server for protein 3D structure prediction. BMC Bioinformatics, 9: 40 (2008). doi: 10.1186/1471-2105-9-40.

[2] J Yang, R Yan, A Roy, D Xu, J Poisson, Y Zhang. The I-TASSER Suite: Protein structure and function prediction. Nature Methods, 12: 7-8 (2015). doi:10.1038/nmeth.3213

[3] Yang J, Yan R, Roy A, Xu D, Poisson J, Zhang Y. 2015. The I-TASSER Suite: protein structure and function prediction. Nature Methods. 12(1): 7-8. doi.org/10.1038/nmeth.3213

[4] Wiederstein & Sippl (2007) ProSA-web: interactive web service for the recognition of errors in three-dimensional structures of proteins. Nucleic Acids Research 35, W407-W410.

[5] Sippl, M.J. (1993) Recognition of Errors in Three-Dimensional Structures of Proteins. Proteins 17, 355-362

[6] Laskowski R A, MacArthur M W, Moss D S, Thornton J M (1993). PROCHECK - a program to check the stereochemical quality of protein structures. J. App. Cryst., 26, 283-291

[7] Dong Xu and Yang Zhang. Improving the Physical Realism and Structural Accuracy of Protein Models by a Two-step Atomic-level Energy Minimization. Biophysical Journal, vol 101, 2525-2534 (2011)

[8] Van Zundert, G. C. P., et al. "The HADDOCK2. 2 web server: user-friendly integrative modeling of biomolecular complexes." Journal of molecular biology 428.4 (2016): 720-725.