Team:IISc-Bangalore/Software

Software

PhageModifier: Overview and Motivation

PhageModifier is a pipeline of open-source computational tools designed to modify proteins to have high affinity for a small molecule through mutation of certain residues. The pipeline is written in Bash and combines fpocket, PocketAnneal, and AutoDock to ensure a seamless experience of protein modification. It comes equipped with a database of crystal structures of phage tail proteins which we have curated, as well as important structural information on the usage of these tail proteins. The pipeline then utilizes i3Drefine (installed separately) and DSSP to energy minimize the modified protein to resemble its native structure better.

Our PACMAN project involved lot of time-consuming computational work, as we sought to modify the tail fiber protein, gp37, to have high affinity for the colistin resistance conferring agent phosphoethanolamine. However, PACMAN isn't just limited to colistin resistance: its principle can be used to modify a wide variety of phage proteins to tackle other antimicrobial resistance agents, and we were thus motivated to create a pipeline that allows users, irrespective of their bioinformatics background, the ability to carry out the necessary computational work before designing his/her experiments.

Earlier, the pipeline was built such that the T4 tail protein had all its pockets identified, and was limited to mutating pocket residues. This was sufficient for PACMAN; but we decided to take it a step further, and expand PhageModifier to be able to modify ANY protein to have a high affinity for ANY ligand, with just one input line! The name PhageModifier was retained keeping in mind the broader context for which this software tool was developed.

Disclaimer: All the software programs listed here, as well as their own dependencies are to be cited separately in any application of PhageModifier. The algorithms given here are reproduced from the cited papers (under Fair Use) and are only presented here to help users better understand each pipeline tool separately. All tools which are a part of the PhageModifier download are under GPL license as of 16 October 2018.




Softwares in the pipeline

fpocket 1.0

fpocket is an open source protein pocket detection algorithm based on Voronoi tessellation. It has been widely used and its usage is well documented [1]. We have used version 1.0 here as it provided the necessary features for our purposes. fpocket is available under a GPL license. It was developed in the C programming language.




AutoDockTools

AutoDock is one of the most popular protein ligand docking tools, and has shown to be reliable in its prediction of binding affinities [2]. We have used it to determine the best natural pocket to modify and in characterization of the improved affinity of our modified protein. AutoDock4 is available under a GPL license.








PocketAnneal

PocketAnneal utilizes a simulated annealing algorithm, and combines the AutoDock scoring function and Dunbrack rotamers to mutate protein residues in a pocket-ligand complex to generate a pocket of higher affinity [3]. The algorithm is given below. It is available under a GPL license.




i3Drefine

i3Drefine uses an iterative and highly convergent energy minimization algorithm with a powerful all-atom composite physics and knowledge-based force fields and hydrogen bonding (HB) network optimization technique. [4] In our pipeline, we use it to energy minimise the modified protein so that it is better resembles the actual structure. The license of i3Drefine was not clear, and thus it must be downloaded seperately.




DSSP

DSSP calculates the most likely secondary structure assignment given the 3D structure of a protein. It reads the position of the atoms followed by calculations of the H-bond energy between all atoms.[5][6] i3Drefine requires the DSSP executable for its functioning, which is provided in the PhageModifier download. DSSP is available free of cost with no license restrictions.

Dependencies and Installation

PhageModifier requires Python 2.7.12 and recommends Ubuntu 16.04 for its usage. PhageModifier has been successfully tested on Ubuntu 14, Ubuntu 16 and Elementary OS. These OS are freely available with plenty of help on their usage.

PhageModifier is available for download at: https://github.com/preetham-v/phagemodifier

The downloaded zipped file can be extracted using either the user interface or by entering

>>unzip PhageModifier-master.zip

in the terminal. Once the folder is unzipped, follow these instructions to setup the various softwares. To set up fpocket, go to the PhageModifier-master folder (or the user-given name) and enter the commands given below. This may need root privileges.

>>cd fpocket-src-1.0/
>> make
>> make test
>> sudo make install


Numpy needs to be downgraded to version 1.8 to run PocketAnneal, along with installation of AutoDockTools. This can be done with:
>>sudo pip install numpy==1.8
>>sudo apt-get install autodocktools


Autodock and Autogrid also need to be installed using:

>>sudo apt-get install autogrid
>>sudo apt-get install autodock


After downloading i3Drefine, use the following instructions to set it up in the PhageModifier pipeline.

>>gunzip i3Drefine.tar.gz
>>cp i3Drefine /path/to/PhageModifier-master
>>cd /path/to/PhageModifier-master
>>cp dssp i3Drefine/software/DSSP
>>cd i3Drefine
>>export PATH=$PATH:/home/path/to/PhageModifier-master/jdk1.7.0_01/bin
>>export CLASSPATH=$CLASSPATH:/home/path/to/PhageModifier-master/i3Drefine/software/3Drefine
>>export CLASSPATH=$CLASSPATH:/home/path/to/PhageModifier-master/i3Drefine/software/3Drefine/programs
>>./configure.pl


This should complete the installation of i3Drefine. A test run can be performed by:

>>cd test
>>../bin/i3Drefine.sh start_model.pdb 1


This installation procedure has been tested on various systems without any problem. If you face any issues, please check the user manual of the individual software (given in the respective directory).

Usage and Test Run

The usage of PhageModifier is as follows:
>>bash pipeline.sh -i protein.pdb -l ligand.pdb -o output_directory


If your input files are in any other format, you can easily convert them into PDB format using OpenBabel.

In order to do a sample run which will help the user familiarize themselves with the software, we have provided two example files, namely example_2xgf.pdb and example_petn.pdb which are our input protein and ligand respectively. Enter the following command in the terminal:



>>bash pipeline.sh -i example_2xgf.pdb -l example_petn.pdb -o TestRun



The terminal screen will then display the commands being run. Once the pipeline has finished, a new folder called TestRun will be created which contains our created output files.

Output

We have modified PocketAnneal's output files to be more clear and helpful after taking feedback from test users on what additional information they would like in the output. Thus, after the entire pipeline has run, a new folder by the name of the output directory will be generated with the following files. Sample output is given below.



1) Results/BEST_protein.pdb : This is the new modified protein with increased affinity for the small molecule (without energy minimization)

The modified protein with the mutated residues highlighted in yellow


2) Results/BEST_protein_A.pdb : Chain A of BEST_protein.pdb for energy minimisation

3) Results/BEST_score.csv : The best score that was available at the time of each run

4) Results/NEW_protein.pdb : Last protein considered by PocketAnneal

5) Results/OP_ligand : Ligand docked at best position

The conformation of best docking for the ligand


6) Results/score_log.csv: Score compared against at each run

7) Results/score_log.png : Graph of score_log.csv

The output file showing an affinity increase from -1.4 kcal/mol to -4.54 kcal/mol


8) 3DRefiner_Output/RESULT/REFINED_1.pdb : Energy minimised final modified protein structure

The aligned structures of the energy minimised protein and the original protein


9) PocketScores : Folder containing docking scores from all pockets

10) ListOfMutations.txt : Gives the list of original residues on the left, followed by the chain, residue number and the mutated residue

Handy list of mutations to make to get modified protein from the original protein


11) Summary_of_pocket_scores : Docking scores on each pocket identified by fpocket

12) RMSD.txt : Gives the RMSD between the original protein and the energy minimised final protein structure

13) pocket.pdb : The pocket with the highest affinity in the WT protein

The original pocket which was inputted to PocketAnneal


14) protein.pdb : The original protein which has been formatted, stripped of ligands and had H added to it

Parameters

Due to the Lamarckian Genetic Algorithm and Simulated Annealing[7] algorithm that AutoDock and PocketAnneal follow being stochastic in nature, there is room for modifying their parameters thus varying the accuracy and run time. The current default parameters have been optimised for the example inputs, which should be good enough for most purposes. These are explained below:

1) ga_num_evals: Upper limit on the number of energy evaluations
2) ga_pop_size: Number of individuals in genetic population
3) ga_run: Number of dockings
4) n (PocketAnneal) - Number of mutations that will be tried. We noticed saturation at n=1000 in most runs.
5) a (PocketAnneal) - Cooling schedule for Simulated Annealing. An exponential decay (a=2) was the best peforming in our test runs.

An increase in the genetic algorithm parameters and n (PocketAnneal) should theoretically increase accuracy; however, keep run time and saturation of docking scores in mind while doing so.

Applications and usability

Originally designed for mutating just phage proteins, PhageModifier has since been generalised to make it applicable for any protein-ligand combination. This makes it highly useful for future iGEM teams as well as researchers looking to modify proteins: these applications include but are not limited to, ligand extraction, drug design, modifying protein function, peptide inhibitors, structure prediction and many more!

The output files will give users a complete understanding of the final protein structure, and the output PDB file can be further used for Molecular Dynamics simulations or pipelined into another software.

The software, once set up, is extremely easy to use: It is just a one-line command! Also, while PhageModifier requires a UNIX operating system to run, the output files generated are platform independent, and can thus be analyzed on a Windows, Linux or Mac OS.

Since it inputs and outputs files in a standardised PDB format and uses tools under a GPL license, PhageModifier can be easily integrated into the design of new software tools and projects.

Runtime and other technicalities

PhageModifier was tested on a laptop with the following specifications:

model name : Intel(R) Core(TM) i3-5010U CPU @ 2.10GHz
cpu cores : 2

and the process, for the example inputs on default parameters, took 62m15.734s for one complete run.

The downloaded folder, default named "PhageModifier-master" is of ~ 175 MB. The externally downloaded i3Drefine v1.0 is ~500 MB in size. Each run of the pipeline produces an output folder of ~5 MB in size.

Validation

Experimental

PhageModifier was built for its application in PACMAN, and thus our experiments to validate the two of them are described in the Experiments and the Results page. In brief, we were able to perform Circular Dichroism (CD) experiments for the wild type gp37 protein. The modified gp37 protein could not be purified in time for us to verify its spectrum.

Computational

We verified the improvement in binding affinity for phosphoethanolamine (pEtN) of our PhageModifier designed modification of gp37 (hereby referred to as gp37*) by comparing docking scores of gp37-pEtN and gp37*-pEtN complexes predicted by a wide variety of molecular docking tools available. These tools utilize different scoring algorithms and force fields, thus ensuring that the improved affinity predicted by PhageModifier is consistently seen irrespective of the algorithm used. Since each of these docking tools are experimentally verified to be reliable to a large extent, the results given below provide confidence in the use of PhageModifier for experimental design.

Docking/Complex Analysis Tool gp37 score gp37* score Conclusion
UCSF Pose&Rank -6.34 -7.47 gp37* has greater affinity
AutoDock 4.2 -4.2 kcal/mol -7.98 kcal/mol gp37* has greater affinity
PATCHDOCK 2352 2464 gp37* has slightly greater affinity
Score CHARMM22 6116507 233105504 gp37* has greater affinity
Achilles -4.80 -4.80 gp37* and gp37 have equal affinity
DSX Online -14.7 -18.049 gp37* has greater affinity

Error messages (and what to do about them)

While all efforts have been made to minimise errors, the following messages may be encountered. The part after | tells you what to do about it.

1) ERROR CTab(/usr/local/reduce_wwPDB_het_dict.txt): could not open | Typical during Format.sh, ignore

2) autodock4: *** Caution! Non-integral total charge (0.500 e) on ligand may indicate a problem... *** | This is common among many ligand files. Usually, this is not a problem and can be ignored safely

3) swig/python detected a memory leak of type 'BHtree *', no destructor found. | Typical during PocketAnneal and AutoDock, ignore

4) NumpyError: no module named oldnumeric | This is because you are using a version of numpy > 1.8. Please downgrade using instructions given in Dependencies.

5) Error: Could not find or load main class Refine | Please configure i3Drefine again using steps given in Dependencies.

6) Any error due to importing image in Python | Python image libraries are in a state of flux. Simply comment out line 20 and line 995 in Pocketanneal.py if the Image library is giving you trouble:

**import Image, ImageDraw, ImageFont** (Pocketanneal.py, line 26)

**grapher(XY_data, str(working_directory)+"/score_log.png", str(IP_database))** (Pocketanneal.py, line 995)

References


[1] Le Guilloux V, Schmidtke P and Tuffery P, Fpocket: An open source platform for ligand pocket detection , BMC Bioinformatics, 2009, 10:168
[2] Morris, Garrett M., et al. "AutoDock4 and AutoDockTools4: Automated docking with selective receptor flexibility." Journal of computational chemistry 30.16 (2009): 2785-2791.
[3] Nagarajan, Deepesh, et al. "Design of a heme-binding peptide motif adopting a β-hairpin conformation." Journal of Biological Chemistry (2018): jbc-RA118.
[4] Bhattacharya, Debswapna, and Jianlin Cheng. "i3Drefine software for protein 3D structure refinement and its assessment in CASP10." PloS one 8.7 (2013): e69648.
[5] Joosten, Robbie P., et al. "A series of PDB related databases for everyday needs." Nucleic acids research 39.suppl_1 (2010): D411-D419.
[6] Kabsch, Wolfgang, and Christian Sander. "Dictionary of protein secondary structure: pattern recognition of hydrogen‐bonded and geometrical features." Biopolymers: Original Research on Biomolecules 22.12 (1983): 2577-2637.
[7] Park, Moon-Won, and Yeong-Dae Kim. "A systematic procedure for setting parameters in simulated annealing algorithms." Computers & Operations Research 25.3 (1998): 207-217.