Using protein modelling to predict the functionality of an unknown fusion protein
Given a sequence of amino acid chain, how will we be able to get a preliminary idea if the resulting protein construct is functional? Since there are currently no available Cas13b Protein Data Bank (PDB) structures available for our novel fusion protein, we used online protein threading programme to construct different versions of 3D model of our RESCUE Editor system (APOBEC-dCas13b) fusion protein, given different linker sequences. Thereafter, the 3D models were optimized using molecular dynamics simulations. We hope that through the use of this programme, an optimal linker that preserves the individual structures of the functional domains can be discovered.
To construct a proposed model, we first took the following assumptions.
- A protein’s function is dependent on its structure.
- The complexed protein and its independent native state are similar in their conformations.
- The final protein structure generated are in their respective global native states.
We first generated the linker sequences in Table 1 below.
Table 1. Linker sequences considered for the design of the fusion protein constructs. The amino acid sequences for the linkers were manually converted to their respective random codon forms, before being reversed-transcribed to their DNA sequences. The linker sequence designs were adapted from Komor et al. (2016).
The rAPOBEC DNA sequence wwas provided by Dr. Komor, and the dPspCas13b was taken from the gene construct by Cox et al. (2016). The DNA sequences were joined using APOBEC at the 5’ end, followed by one of the linkers in the center, and dPspCas13b at the 3’ end.
The generated DNA sequences were submitted to functional domain scanning websites such as ExPASy ScanProsite, INTERPRO and NCBI CD servers for functional domain analyses. All three websites returned only matches to the cytidine deaminases family towards the N-terminus (not shown), thereby suggesting that there is no reported Cas13b structure in the database.
Each of the designed FASTA sequences were then submitted to RaptorX protein threading server for structural modelling. The obtained PDB structures were thereafter visualised in UCSF Chimera.
Closer inspection to the generated structures as shown in Figure 1 revealed that long linear motifs were present in the models, suggesting that parts of the models were in an energetically unfavourable state. Ramachandran plots of our models supported this observation by showing that the dihedral angles are more widely scattered as compared to crystal structures of proteins in general (Figure 2).
To achieve optimum 3D structures that are thermodynamically stable, we performed molecular dynamic simulation through GROMACS. APOBEC-XTEN-dCas13b structure was selected since XTEN was reported to yield the highest efficiency in APOBEC-Cas9 constructs (Komor et al., 2016). Topology file was first created using CHARMM27 force-field. The structure was then solvated in a cubic box before ions are added to neutralize the protein charges. Subsequently, the structure was subjected to energy minimization, and equilibration with temperature and pressure. After equilibration, molecular dynamics simulation was then performed for a short 10 picoseconds to generate a trajectory file as a proof of concept (Figure 3). The bash script and the respective MDP files required for running the simulations were uploaded into Github.
From our preliminary molecular dynamics run, we found that the APOBEC domain is still visibly segregated from the dCas13b domain and minimal alterations were induced to the molecule in general. However, further optimization and debugging will have to be performed to get a clearer insight of a native structure which may have existed in a functional state. Hence the selected fusion protein construct, APOBEC-XTEN-dPspCas13b, is still regarded as a viable option for the wet laboratory experiments.
References
Abraham, M. J., Murtola, T., Schulz, R., Páll, S., Smith, J. C., Hess, B., & Lindahl, E. (2015). GROMACS: High performance molecular simulations through multi-level parallelism from laptops to supercomputers. SoftwareX, 1–2, 19–25. https://doi.org/10.1016/J.SOFTX.2015.06.001
Bondi, A. (1964). van der Waals Volumes and Radii. The Journal of Physical Chemistry, 68(3), 441–451. https://doi.org/10.1021/j100785a001
Cox, D. B. T., Gootenberg, J. S., Abudayyeh, O. O., Franklin, B., Kellner, M. J., Joung, J., & Zhang, F. (2017). RNA editing with CRISPR-Cas13. Science (New York, N.Y.), 358(6366), 1019–1027. https://doi.org/10.1126/science.aaq0180
Komor, A. C., Kim, Y. B., Packer, M. S., Zuris, J. A., & Liu, D. R. (2016). Programmable editing of a target base in genomic DNA without double-stranded DNA cleavage. Nature, 533(7603), 420–424. https://doi.org/10.1038/nature17946
Murtola, T., Schulz, R., Páll, S., Smith, J. C., Hess, B., & Lindahl, E. (2015). GROMACS: High performance molecular simulations through multi-level parallelism from laptops to supercomputers. SoftwareX, 1–2, 19–25. https://doi.org/10.1016/J.SOFTX.2015.06.001
Pettersen, E. F., Goddard, T. D., Huang, C. C., Couch, G. S., Greenblatt, D. M., Meng, E. C., & Ferrin, T. E. (2004). UCSF Chimera?A visualization system for exploratory research and analysis. Journal of Computational Chemistry, 25(13), 1605–1612. https://doi.org/10.1002/jcc.20084