Summary

This tutorial presents the setup and analysis of Scaled Molecular Dynamics runs for the estimation of the relative residence time of a series of small molecules with a protein. In this example, the unbinding of a ligand from the N-Terminal domain of HSP90 is simulated.

Theoretical background/Overview

The evaluation of the unbinding kinetics properties of a small molecule ligand from its protein target requires a proper sampling of the ligand unbinding pathways. As this is generally out of reach with standard Molecular Dynamics simulations, especially when tens of ligands would need to be prioritized in an industrial context, several biasing strategies have been proposed for enhancing the sampling of the transition from the bound to the unbound state.

Among them, the use of scaled potentials during Molecular Dynamics on multiple replicas of the system, together with a statistical treatment for estimating the confidence in the estimations has been shown to perform well for ranking k_offs in congeneric series, see references. The procedure has been refined and integrated by Biki Technologies ( http://www.bikitech.com/ ) in a software package called Biki-Netics. The core of this package is a patched version of Gromacs implementing Scaled Molecular Dynamics.

In this tutorial, we will recapitulate the steps for setting up, running and analyzing Scaled Molecular Dynamics simulations starting from structures equilibrated with Amber (see the RAMD tutorial).

List of the Tools/Software used

1. Biki-Netics (v1.0.7) : GROMACS 4.6.1 patched with Scaled M.D. implementation
The Biki-Netics installation directory is referred to as $BIKI 2. Acpype : included in the Biki suite 3. Amber 11 4. Python 2.7 with numpy, scipy and pandas libraries 5. Shell scripts Input files and scripts used in tutorial The files used in the tutorial are zipped into the file SMD_tutorial.tar.gz which needs to be unzipped in a directory defined by the environment variable$TOPDIR

tar -xzvf  SMD_tutorial_files.tar.gz  -C  $TOPDIR The unzipped archive contains : •$TOPDIR/utils/ : system preparation utilities
• $TOPDIR/1931144/template/ : input files for compound 1931144 •$TOPDIR/1931144/rep_1/ : example of the output files for one replica
• $TOPDIR/1931144/post_proc/ : post_processing utilities and related example files The following scripts and input files will be required during different steps of the tutorial: 1. Preparation of Gromacs-compatible protein and ligand structure and topology files from system equilibrated with Amber (obtained at step 3 of the RAMD tutorial) •$TOPDIR/1931144/template/ref-equil-NPT.crd – Coordinates of the protein-ligand equilibrated complex in Amber format
• $TOPDIR/1931144/template/ref.prmtop – Amber topology file for the protein-ligand complex •$TOPDIR/1931144/template/script – Commands for Amber ptraj utility
2. Preparation of Gromacs-compatible structure and topology files
• $TOPDIR/1931144/template/output.rst.1 – Amber restart file generated at step 1 •$TOPDIR/1931144/template/ref.prmtop – Amber topology file for the protein-ligand complex
• $BIKI/biki/scripts/acpype.py – Amber to Gromacs conversion script 3. Correction of the atom naming in the topology file •$TOPDIR/1931144/template/ref_GMX.top – Gromacs topology file generated at step 2
4. Reordering of the ions in the Gromacs gro file
• $TOPDIR/1931144/template/ref_GMX.gro – Gromacs structure file generated at step 2 •$TOPDIR/utils/reorder.py – Python script for grouping Na+/Cl- ions together in the gro file
5. Preprocessing of the corrected structure and topology files
• $TOPDIR/1931144/template/system.mdp – Gromacs M.D. parameters file •$TOPDIR/1931144/template/ref_GMX_reordered.gro – Gromacs structure file generated at step 4
• $TOPDIR/1931144/template/ref_GMX_corr.top – Gromacs topology file generated at step 3 6. Setup of positional restraints •$TOPDIR/1931144/template/system.tpr – Gromacs portable binary run input file generated at step 5
• $TOPDIR/utils/make_ndx_commands.txt – Commands for Gromacs make_ndx utility •$TOPDIR/1931144/template/index.ndx – Index file generated previously at step 6
• $TOPDIR/1931144/template/posre.itp - Gromacs include topology file generated previously at step 6 •$TOPDIR/1931144/template/ref_GMX_corr.top – Gromacs topology file generated at step 3
• $TOPDIR/utils/include_pos_restraints.py – Script for including the positional restraints in the Gromacs topology file 7. Generation of the corrected binary run input file •$TOPDIR/1931144/template/system_posres.mdp – Gromacs M.D. parameters file
• $TOPDIR/1931144/template/ref_GMX_reordered.gro – Gromacs structure file generated at step 4 •$TOPDIR/1931144/template/ref_GMX_corr.top – Gromacs topology file generated at step 3
8. Generation of one directory for each replica
• $TOPDIR/utils/prepare.sh – Script for generating copies of the template directory 9. Submission of the simulations for all replica •$TOPDIR/utils/submit.sh – Script for generating copies of the template directory
10. Extraction of unbinding times from trajectories
• $TOPDIR/1931144/post_proc/extract_unbinding_times.sh – Script for processing trajectories for exit events •$TOPDIR/1931144/post_proc/get_times.py – Script for extracting exit times from raw extract_unbinding_times.sh log file
• $TOPDIR/1931144/post_proc/unbinding_times.txt – Example of a extract_unbinding_times.sh log file 11. Calculation of median and mean exit time and bootstrap analysis •$TOPDIR/1931144/post_proc/get_error_estimate.py – Script for computing median, mean and performing the bootstrap analysis
• $TOPDIR/1931144/post_proc/times.txt – Input file example for error estimation Procedure 1. The system preparation is performed in the$TOPDIR/1931144/template/ directory. Biki and Amber executables should be in $PATH. The first step is the cleaning of the starting structure ref-equil-NPT.crd so that the protein is centered within the water box. cd$TOPDIR/1931144/template/
cat script
parm ref.prmtop
trajin ref-equil-NPT.crd
center @CA mass
image
trajout output.rst restart

ptraj ref.prmtop script
1. Preparation of Gromacs-compatible structure and topology files.

python $BIKI/biki/scripts/acpype.py -p ref.prmtop -x output.rst.1 2. Systems were prepared with Amber14FF. Atom namings specific to Amber14FF are not taken properly into account by acpype.py, so they need to be corrected in the topology file. sed 's/ 2C / CT /g' ref_GMX.top | sed 's/ 3C / CT /g' | sed 's/ IP / Na+/' | sed 's/ IM / Cl-/' > ref_GMX_corr.top 3. In the gro file generated by acpype.py, the Na+ and Cl- ions are not grouped together, this leads to grompp errors if it is not fixed. python ../../utils/reorder.py –i ref_GMX.gro –o ref_GMX_reordered.gro 4. The corrected structure and topology files can be now preprocessed with grompp grompp_mpi -f system.mdp -c ref_GMX_reordered.gro -p ref_GMX_corr.top -o system.tpr -maxwarn 2 5. Application of positional restraints on the system. A 50 kJ/mol/nm positional restraint is imposed on the backbone of the protein residues far enough from the ligand (31-40, 77-98, 116-122, 167-170) • First, the corresponding set of atoms is created as a group in an index file cat ../../utils/make_ndx_commands.txt ri 31-40 | ri 77-98 | ri 116-122 | ri 167-170 "Backbone" & ! "r_31-40_r_77-98_r_116-122_r_167-170" del 24 name 24 constrained q cat ../../utils/make_ndx_commands.txt | make_ndx -f system.tpr • Then the application of the restraint to this group of atoms generates a posre.itp file echo "24" | genrestr -f ref_GMX_reordered.gro -n index.ndx -fc 50 50 50 • Finally the topology file is updated for taking into account the restraints python ../../utils/include_pos_restraints.py 1. The system.tpr file needs to be regenerated with the corrected files (and the right grompp version if system preparation and job submission are performed on different systems) grompp_mpi -f system_posres.mdp -c ref_GMX_reordered.gro -p ref_GMX_corr_posre.top -o system.tpr -maxwarn 3 2. Several copies of the template directory are generated for running the simulations on several replica: cd ../ ../utils/prepare.sh 3. Simulations are submitted using scaling factor l=0.4 : ../utils/submit.sh In order to keep the archive reasonably small, only the directory rep_1 corresponding to the first replica is included. 4. When all simulations are completed, the trajectories obtained for each replica are post-processed in the$TOPDIR/1931144/post_proc/ directory :
cd \$TOPDIR/1931144/post_proc/
First, the trajectory file(s) corresponding to each replica are searched and processed for exit events. Then the exit times are extracted and the replica where no exit event is found are listed.
./extract_unbinding_times.sh
python get_times.py

5. Mean and median exit times are computed and bootstraping is performed (1000 iterations) for estimating the errors.
python get_error_estimate.py

References

Mollica, L.; Decherchi, S.; Zia, S. R.; Gaspari, R.; Cavalli, A.; Rocchia, W. Kinetics of protein-ligand unbinding via smoothed potential molecular dynamics simulations. Scientific Reports 2015, 5

Decherchi, S.; Bottegoni, G.; Spitaleri, A.;Rocchia, W.; Cavalli, A. BiKi Life Sciences: A New Suite for Molecular Dynamics and Related Methods in Drug Discovery. Journal of Chemical Information and  Modeling 2018, 58, 219–224.