!> \file sda_flex.html !! main page for doxygen documentation /** * @mainpage SDA 7 ( SDA_Flex ) * @author Michael Martinez, Julia Romanowska, Neil Bruce, Daria Kokh, Paolo Mereghetti, Razif Gabdoulline, Stefan Richter and Rebecca Wade * @date 2014 * * * \tableofcontents * Table of contents: * * * @section intro Introduction: SDA 7 Main Features * *

* The functionalities from SDA 6 and SDAMM have been unified into a single codebase, making it easier to develop new functionalities for either or both codes. *

*

*

* New features : *

*

* * \image html workflow_sda7.png "Workflow for SDA 7" width=3cm * * @section section_archi Architecture of the Solute-Grid relationship * * * SDA 7 is written in Fortran 90. The additional features of Fortran 90 allow for a more "object oriented design" for the software than was possible with Fortran 77 which was used for previous versions of SDA. * * * *

The main new feature of SDA 7 is the introduction of 'flexibility' for one or all of the solutes. * Flexibility is introduced by considering several rigid body representations of a protein taken from an NMR ensemble, molecular dynamics simulation, normal mode analysis, or another relevant conformational sampling technique. * SDA 7 can then swap between these representations based on e.g. a Monte Carlo criterion. * Thus multiple conformations can be treated within a single trajectory.

*

* The diagram below: "Diagram of Solutes, Grids and their relationship", shows the relationships between key modules and how multiple proteins and flexibility are implemented: *

*

* *

* In the diagram solute 1 is flexible, and has two possible conformations. *

*

*

*

* *

* Solutes 2 and 3 are identical solutes and are not flexible. *

*

*

*

*

* It is straightforward to specify many identical flexible solutes: *

*

*

*

* * \image html StructureModuleProteins.png "Diagram of Proteins, Grids and their relationship" width=5cm * *

* The design presented is efficient for the memory usage, but has one important limitation: *

*

* Everything related to one conformation is stored in one \ref mod_setofgrid::sogrid "setofgrid", including the position of the atoms, charges and reaction criteria. * But the positions evolve during the trajectory unless the solute is fixed. (We refer to them as the "movable atoms"). * It would be possible to retrieve the instantaneous position of these "movable atoms" from the position and orientation of the solute stored for each \ref mod_protein::protein "protein" (by rigid-body transformation). * However, it is not efficient in practice. In particular, rotations require a matrix multiplication which is computationally expensive.
* The compromise made in SDA (since its origin in 1998) is to apply these rotations at every step, when the Brownian Dynamics update is applied in \ref make_bd_move_2prot.f90 "make_bd_move". * Only the rotation is applied, not the translation. *

*

* The translation is indeed recomputed each time that the instantaneous position of a movable atom is needed. * The full transformations (rotation and translation) are required at different stages, e.g. for computing the interactions, checking overlap with the exclusion grid, checking reaction criteria, etc., * and many other functionalities. *

* There were many reasons to use this implementation in SDA 6 (and previous versions): * *

* * @section section_parallel Parallelisation in SDA 7 * * @subsection subsection_sdamm Multiple molecule simulation (SDAMM) parallelisation * *

* Different parallelisation schemes are used in SDA 7. To help understand the parallelisation schemes, * we briefly discuss each simulation type, and then the strategy used for parallisation with some pseudo codes. *

*

* In SDAMM simulations, only one trajectory is generated and the trajectory has a predefined number of steps. * The most computationally expensive subroutines are : *

*

*

* The method of parallelisation is a straightforward application of OpenMP, with the main loop split between threads in theses specific subroutines. *

*

* For the force and energy calculations: *

*

*

* For the BD update : there is only one loop over all proteins *

*

* Parallel Performance *

*

* The scaling of parallel calculations can be heavily system dependent, * so you should test for your system before carrying out large calculations, especially at high levels of parallelisation. * You can expect that larger systems will scale better on more cores. * We tested a 128 lysozyme molecules on a shared memory AMD Magny Cours machine with 4 CPUs of 12 cores for a total of 48 cores. We present the following results as a guideline. *

* *

* * @subsection subsection_sda Two solute simulation (sda_2proteins) parallelisation * *

* The implementation of the parallelisation with two solutes is very different from that for SDAMM: *

*

*

* The approach chosen for parallelisation is to split the total number of trajectories across the different threads. * Each thread is responsible for running a full trajectory. When one trajectory is finished, the thread will start calculating a new trajectory, * until no more trajectories need to be conducted.

* The difficulties with this approach are: *

*

*

* The following choices were made for a thread safe implementation for two-solute calculations: *

*

* *

* A schematic outline of \ref MainLoop_sda_2proteins_omp.f90 (variables assignment as SHARED or PRIVATE are omitted) * is given below: *

*

* All objects have been previously initialized in \ref read_input_file.f90 *

* \verbatim MainLoop( tab_protein_unique ) !$OMP PARALLEL ! allocate local array, OpenMP or not allocate ( local_escape, local_sumtime,...) ! if OpenMP make a copy of the Solutes !$ if ( tid > 0 ) then !$ call copy_array_protein ( tab_protein_unique, ptab_protein, type_calc ) ! normal if not OpenMP or master !$ else ! write (*,*) 'Associate ptab_protein to tab_protein_unique' ptab_protein => tab_protein_unique !$ end if ! if OpenMP make a local copy of the list of complexes ( here empty, only copy data members ) !$ if ( tid >= 0 ) then !$ call copy_record ( o_complexes, p_local_complexe, .false., size_thread_complexe ) ! else normal if not openMP !$ else ! associate the global pointer to o_complexe if not openmp if ( associated ( o_complexes % pcomplexe ) ) then p_local_complexe => o_complexes end if !$ end if ! not necessary but convenient, simpler to refer to solute 1 and 2 protein1 => ptab_protein % array ( 1 ) protein2 => ptab_protein % array ( 2 ) !$OMP BARRIER !$OMP DO SCHEDULE( DYNAMIC ) LOOP_RUN: & do run = 1, total_traj !$ if need to merge complexes every 100 runs !$OMP CRITICAL ( crit_merge_p_complexe ) !$ call merge_complexe ( o_complexes, p_local_complexe ) !$OMP END CRITICAL ( crit_merge_p_complexe ) !$ endif call init_position_protein() ! begin BD loop LOOP_STEP: & do call force_epedhdlj_2proteins_fast( protein1, protein2 ) call make_bd_move_2prot( protein1, protein2 ) call check_reaction_pairs( protein1, protein2 ) ! add to the list of complexes if ( complexe_to_record .eqv. .true. ) call energy_epedhdlj_2proteins_fast( protein1, protein2, energy ) call add_record ( p_local_complexe, ptab_protein % array ( 2:2 ), energy ) end if ! use atomic inside call update_residence_time ( resid_time, protein2 % position, dist, dtnow ) end do LOOP_STEP ! only at the end of the trajectory call update_escape() end do LOOP_RUN ! deallocate Solutes for all threads, except the MASTER nullify ( protein1, protein2 ) !$ if ( tid > 0 ) then !option true for NOT deleting the grid !$ call delete_array_protein ( ptab_protein, .true. ) ! now need explicit deallocation of the pointer !$ deallocate ( ptab_protein, stat = ierr ) !$ nullify ( ptab_protein ) !$ end if ! same deallocation for local_complexes \endverbatim *

* The use of OpenMP pragma allows a compilation with or without OpenMP (without loss of performance). * With this implementation a high level of parallelization is achieved because there is no sequential code in the main loop. * The merging of the lists of complexes is still the main limitation, but their frequency can be tuned for a specific system and computer architecture. *

* * \subsubsection subsubsection_endeffect Run completion (end effect) issues *

* A demonstration simulation with 6 trajectories and 3 available threads can be represented by the scheme below * \image html EndEffect.png "End Effect : threads are waiting at the end of the simulation" width=1.8cm * * Each thread runs calculations for one trajectory. When a trajectory is completed, the thread will run the next trajectory. * At the very end of the calculation there are no more trajectories to perform and N-1 CPUs are in a waiting state * This may be a problem for association rate calculations where trajectories can, in principle, last for an infinite amount of time. * In practice this effect is usually minor.
* It is difficult to estimate the end effect during runs to test the performance of SDA 7 * (relatively short runs with X trajectories, where X is generally small ~ 100), * so the scaling is generally underestimated compared to that expected for a real simulation (> 10,000 trajectories).
*

* In the most favorable case, association rates calculations (with no complexes recorded) have scaling reaching 80 % on 48 CPUs. *

* * @section Force Computation of forces and energy in SDA 7 * *

* The implementation of computations of forces and of energies are similar in sda_2protein and SDAMM, * but they are nevertheless implemented in different functions for optimization. * The differences are the double-loop over all pairs of solutes in the case of SDAMM (as seen in the \ref section_parallel "parallel" section) * and the fact than the first solute is always kept fixed in sda_2proteins. *

*

* In the following sections, we use the example of the computation of the energy of the charges on solute 1 in the potential grid of solute 2. *

* * \subsection subsection_charge_grid Charges and Grids * *

* The following five types of interaction are described as "charges" interacting with a potential grid. *

*

* *

* The other energy terms: metal desolvation, and the correction to the image-charge have a different implementation and * will not be considered in this section. *

* *

* All energies are computed as :
* \f[ * E_{12} = \sum_{i=1}^n q_1^i(x,y,z) \, U_2(x,y,z) * \f] * where \f$E_{12}\f$ is the energy of the effective charges from solute 1, \f$q_1\f$, in the potential of solute 2, \f$U_2\f$.
* With the exception of the electrostatic energy, the total energy is evaluated as the sum of those components, \f$E = E_{12} + E_{21}\f$.
* * For the electrostatic interaction energy, the definition is sligthly different. The total electrostatic energy should satisfy:
* \f[ * E_{el} = E_{12} = E_{21} * \f] * But due to approximations in the electrostatic potential grid and effective charges computations, the results of \f$E_{12}\f$ do not exactly equal \f$E_{21}\f$. * In order to use the same algorithm for each type of interaction, we estimate the total electrostatic energy as the average of the two components:
* \f[ * E_{el} = \frac{1}{2} ( E_{12} + E_{21} ) * \f] * Where the factor \f$\frac{1}{2}\f$ is applied on the electrostatic potential grids when they are read into memory. * It can be adjusted in the SDA7 input file with the parameter epfct (default 0.5) *

*

* The same applies for the forces, with the difference that we need the derivatives of the energy:
* \f[ * \vec{f_{12}} = \sum_{i=1}^n - \, q_1^i(x,y,z) \, \nabla U_2(x,y,z) * \f] * * The values of the energy and the derivatives are extrapolated from the grids (see functions \ref mod_force_energy::potential_atom_uhbd and \ref mod_force_energy::force_atom_uhbd ). * Note that pointers to the grids are used, rather than the original grid. The use of pointers allows for these functions to be used on any 3-dimensional grid of real*8. * It is therefore be relatively straighforward to implement calculations of new energy and force terms. *

* * \subsection subsection_choice_grid Choice of the Grids *

* All grids and charges have a type associated with them (enumerated in \ref mod_gridtype), all grids derive from the same class \ref mod_grid. * The types are used to select the correct association between charges and grids. This is performed by the function \ref mod_protein::select_charge. * Given the grid_number, the \ref mod_protein::select_charge function associates pointers to the charges and the grids. * Pointers are then used for computing the interactions by the same subroutine. * * A simple and general method for evaluating the forces is summarized here schematically. It is simplified pseudo-code, from \ref mod_compute_force::force_epedhdlj_2proteins, which does not take into account * asymmetric interactions. * \verbatim subroutine energy_epedhdlj_2proteins ( prot1, prot2,...) ! loop over grid, sure to be associated because a pre-sorting is done in param_force_energy GRID: do i = 1, param_force_energy % nb_grid ! extract the grid number, identical to accessing the sogrid directly grid_number = param_force_energy % array_grid_calc ( i, 1 ) call compute_force_sda_2proteins ( prot1, prot2, grid_number,..) end do end subroutine subroutine compute_energy_sda_2proteins ( prot1, prot2, grid_number,..) ! assign correctly the pointers from the grid_number call select_charge ( grid_number, prot1, prot2, nq1, nq2, p_charge1, p_charge2, & p_charge_position1, p_charge_position2, dummy_array ) ! compute force, discussed in detail the next chapter Energy 12 = Sum of p_charge1 * U ( p_charge_position1, prot1 % p_sogrid % pset ( grid_number ) ) Energy 21 = Sum of p_charge2 * U ( p_charge_position2, prot2 % p_sogrid % pset ( grid_number ) ) end subroutine \endverbatim * * * \subsection subsection_normal_algorithm Simple Algorithm for computing forces and energies * *

* There are several possible methods for calculating the force (and energy), here we present the simplest one. * Its advantage is, as its name suggests, simplicity, and in addition generality. * This method should be used as the first when a new type of interaction is being added to the code, since it greatly reduces the risk of introducing errors. * The principal difficulty is dealing with the coordinate transformation between frames of reference, because the grids are fixed and never rotated. *

* * \image html ForceDrawFinal2.png "Computation of forces" width=4cm * *

* The figure "Computation of forces" shows the general computation of \f$f_{21}\f$, the force applied on solute 2, * when both solutes can rotate (e.g. the case of SDAMM). *

* The grid shown with the dashed line represents the grid orientation in the fixed frame of reference R', and is stored in memory. * The solid lines represents the grid orientation corresponding to the instantaneous orientation of solute 1 (frame R). *

* The simple algorithm is a loop over the charges of solute 1: * * * Below is an extract from the SDA 7 code (\ref mod_compute_force_sdamm::force_epedhdlj_sdamm) for the computation of the force and the torque on solute 2. * * \verbatim subroutine compute_force_sdamm_2proteins( prot1, prot2, pos_relat,..) ! get the orientation of the first solute ex = prot1 % orientation_x ey = prot1 % orientation_y ez = prot1 % orientation_z ! inverse the matrix, it is a unity matrix : transposed equal the inverse xe(1) = ex(1) xe(2) = ey(1) xe(3) = ez(1) ye(1) = ex(2) ye(2) = ey(2) ye(3) = ez(2) ze(1) = ex(3) ze(2) = ey(3) ze(3) = ez(3) ! loop over the charges of solute 2 FORCE2: do j=1,nq2 ! pos_relat is the relative position of the center of geometry of the solutes, and p_charge_position2 have the correct orientation p2 = p_charge_position2 (:,j) ! relative position of the charge in the frame of reference R rto = p2 + pos_relat ! make the trnasformation between the frames of reference, rtn is the relative position in the frame R' call tr ( rto, rtn, ex, ey, ez ) ! get the derivative of the potential at the position rtn call force_atom_uhbd ( rtn, force, prot1 % p_sogrid % pset_grid ( grid_number) ) ! transform back into R call tr ( force, force_o, xe, ye, ze ) ! multiply by the charge to get f in R. Back transformation and multiplication can be inverted here qf = force_o * p_charge2(j) ! call the cross product to get the torque, directly in R call cross (tmp2, p2, qf ) ! add to the total torque and force tt2 = tt2 + tmp2 ff2 = ff2 + qf end do FORCE2 \endverbatim *

*

* With the code presented, we could compute all interactions, both for sda_2proteins and SDAMM.
* There are nevertheless different implementations for optimizing the calculations. * In sda_2proteins, solute 1 is always fixed, so that some computations are simpler: * the charges of solute 2 do not need to be expressed in the rotated frame. *

* * \subsection subsection_fast_algorithm Fast Algorithm * *

* Two improvements of the simple algorithm are made: *

*

* *

* The functions that implement these improvements have an equivalent name to the functions implementing the simple algorithm, with the "fast" suffix added.
* The module \ref mod_force_energy "mod_force_energy" stores many information about the required calculations. * In particluar, it always tries to use the optimized functions.

*

* The fast algorithm is more difficult to maintain than the simple algorithm, but in the best case (SDAMM with all four interactions) there is a 50% gain in performance. *

* * \section Record Complexes and trajectories files *

* The implementation of those main outputs has been largely modified, compared to SDA6. The specifications for the new design were : *

* The first release of SDA 7 does not fully support all those requirements (features are implemented but interfaces should be simplified), but it already presents considerable advantages.
* A versioning has also been included, to maintain the compatibility with previously generated files. *

* * \subsection design_record Design of the module record *

* The requirements cited above should apply to both complexes and trajectories files. Their functions are indeed very similar: * they record information about the solute (solute number, conformation number...), the solute's position and orientation, * the energy interactions terms and additional statistics dependent on the type of simulation. *

*

* But their usage differs a lot. Data for trajectories are always appended into the file. * They are usually large files with a simple and chronological ordering.
* On an other hand, complexes are ordered by energy values. The best complexes (with a minimum energy) are at the top of the file * and those with large energies are discarded, the idea being to maintain a maximum number of relevant complexes, * in order to analyse them afterwards with external tools like the clustering. *

*

* The implementation of complexes and trajectories output in SDA 7 groups the similarities into a main class \ref mod_record::record "record". * All the functions/tools that use complexes and trajectories objects, should call functions from \ref mod_record::record "record".
* Moreover, each "line" of those files is defined in class \ref mod_onecomplexe::one_complexe "one_complex" (it's called 'complex' but it refers to one solute).

*

* The main difference between complexes and trajectories files is in the implementation of intermediate types: \ref mod_record::listcomplexe "ListComplexes" and \ref mod_record::trajectory "Trajectory". * There are different algorithms for collecting the objects of type \ref mod_onecomplexe::one_complexe "one_complex" into complexes and trajectories: *