diff --git a/README.md b/README.md index 65766f66..27d4d92e 100644 --- a/README.md +++ b/README.md @@ -4,6 +4,11 @@ [![codecov](https://codecov.io/gh/m3g/MolSimToolkit.jl/branch/main/graph/badge.svg)](https://codecov.io/gh/m3g/MolSimToolkit.jl) [![Aqua QA](https://JuliaTesting.github.io/Aqua.jl/dev/assets/badge.svg)](https://github.com/JuliaTesting/Aqua.jl) +# MyDevMolSimToolkit + +[MolSimToolkit.jl](https://github.com/m3g/MolSimToolkit.jl) provides a set of tools to +analyse molecular dynamics simulations, and a framework for the development of custom +analysis tools. This repository is a experimental version of the package for Work in Progress functionalities # MolSimToolkit.jl **MolSimToolkit.jl** is a Julia package for analyzing molecular dynamics (MD) simulations. It provides fast, flexible tools for trajectory analysis, structural alignment, secondary structure mapping, and more. The toolkit is designed to be lightweight and extensible, making it easy to build custom analysis workflows. diff --git a/docs/src/Reweighting.md b/docs/src/Reweighting.md index cc87c8ff..94354694 100644 --- a/docs/src/Reweighting.md +++ b/docs/src/Reweighting.md @@ -39,40 +39,46 @@ julia> i1 = PDBTools.selindex(atoms(simulation), "resname TFE and name O") julia> i2 = PDBTools.selindex(atoms(simulation), "protein and name O") ``` +In the picture below, it is shown the interaction which will be perturbed in this system: + +!!!!INSERT FIGURE ## Setting perturbation function In order to obtain these weights, we have to use two functions: the ```reweight``` function, which will calculate each weight and the ```perturbation``` function, responsible for taking each computed distance between atomic pairs in every frame and determine the resulting energy using these distances in that particular frame based on the applied perturbation. -So, secondly, we define some "perturbation" function (here we call it ```gaussian decay```) and set up its parameters. Please, take a look at the interface: +So, secondly, we define some "perturbation" function (here we call it ```polynomial_decay```) and set up its parameters. Please, take a look at the interface: ```julia-repl -julia> gaussian_decay(r, α, β) = α*exp(-abs(β)*r^2) +julia> poly_decay_perturbation(r, α, cut) = α * ((r/cut)^2 - 1)^2 gaussian_decay (generic function with 1 method) julia> α = 5.e-3 0.005 -julia> β = 5.e-3 -0.005 +julia> cut = 10.0 +10.0 ``` -As it can be seen, the function has to receive two parameters: `r` which corresponds to the distance between two selected atoms and some parameter to account a modification and change its magnitude, here, we inserted two of them in the same function, `α`, to change the maximum value of the gaussian curve and `β`, to adjust its decay behaviour with a given value of `r`. +As it can be seen, the function has to receive three parameters: `r` which corresponds to the distance between two selected atoms and some parameter to account a modification and change its magnitude, here, we inserted two of them in the same function, `α`, to change the maximum value of the curve (at r = 1) and `cut`, the distance `r` where the function equals zero. In the image below, we can see the curve and how it changes with different values of `α` and `cut`: + +!!! INSERT FIGURE ## Computing the new weights Finally, using the ```reweight``` function, we pass both the ```simulation``` and the last function anonymously in the input. Again, watch the interface: ```julia-repl -julia> cut_off = 12.0 -12.0 -julia> weights = reweight(simulation, (i,j,r) -> gaussian_decay(r, α, β), i1, i2; cutoff = cut_off) +julia> weights = reweight(simulation, r -> gaussian_decay(r, α, cut), i1, i2; cutoff = cut) ``` -`i and j`: if you selected two atom types, `i` will be the index for either the first, the second, the third and so on up to the last atom of the first group and `j` will be same, but now for the second one. With these two parameters, it is possible to determine every combination of two atoms, each one coming from one group, and compute the associated dsitance `r`, so that we are taking into account all interactions between these two atom types to our perturbation. However, if we are dealing with just one group, both of them are indexes for all the atoms of the selected group. Bear in mind that repeated combinations (like `i,j = 1,2 or 2,1`) will no be computed, since the `reweight` function calls the `map_pairwise!` function, from [CellListMap.jl](https://github.com/m3g/CellListMap.jl) that is able to avoid this problem. +`r`: the distance between the twos atoms -`r`: the distance between the twos atoms with indexes `i` and `j` in the selected groups. +`cutoff`: the maximum distance that will be computed between two atoms. The default value is `12.0 Å`. -`cutoff`: the maximum distance that will be computed between two atoms. The default value is `12.0` Angstrom. +!!! warning + It is highly recommended to set the same value of `cutoff` for both `perturbation` and `reweight` functions. + With this in mind, calculations will be done more quickly and you do not need to worry about your input function + behaviour above the `cutoff` value, since distances out of the perturbation range will not be computed. Once the calculations are finished, the resulted interface is shown, like the example below: ```julia-repl @@ -125,6 +131,11 @@ julia> weights.probability 0.12988266765019355 ``` +!!! tip + Note that these values (and, consequently, calculations that use them) are functions of `r`, + so in other to avoid mathematical complications, a good piece of advice is to create functions that + are continous in the closed interval from zero to the cutoff value. + ## Reference Functions ```@autodocs Modules = [MolSimToolkit.Reweighting] diff --git a/src/MolSimToolkit.jl b/src/MolSimToolkit.jl index c557fcf6..84043276 100644 --- a/src/MolSimToolkit.jl +++ b/src/MolSimToolkit.jl @@ -25,6 +25,7 @@ using Reexport: @reexport using ProgressMeter: Progress, next!, @showprogress using Statistics: mean using Printf: @sprintf +using OrderedCollections export wrap, wrap_to_first export distances diff --git a/src/Reweighting/Reweight.jl b/src/Reweighting/Reweight.jl index cf520664..891b1137 100644 --- a/src/Reweighting/Reweight.jl +++ b/src/Reweighting/Reweight.jl @@ -8,16 +8,86 @@ Structure that contains the result of the reweighting analysis of the sequence. `energy` is a vector that contains the energy difference for each frame in the simulation after applying some perturbation. """ struct ReweightResults{T<:Real} - probability::Vector{T} - relative_probability::Vector{T} - energy::Vector{T} + probabilities::Vector{Vector{T}} + relative_probabilities::Vector{Vector{T}} + energies::Vector{Vector{T}} + distances::Vector{Int} + entropy_diff::Vector{T} + scaling::Vector{<:Real} end -""" +struct Perturbation{F<:Function} + subgroup1::Vector{Int} + subgroup2::Vector{Int} + perturbation_function::F + scaling::Vector{<:Real} +end + +struct SystemPerturbations + group1::Vector{Int} + number_atoms_group1::Int + group2::Vector{Int} + number_atoms_group2::Int + perturbations::OrderedCollections.OrderedDict{Any, Perturbation} +end + +struct SystemPerturbationsOneGroup + group1::Vector{Int} + number_atoms_group1::Int + perturbations::OrderedCollections.OrderedDict{Any, Perturbation} +end + +####CellListMap configuration +mutable struct EnergyAndDistances + energy::Float64 + distances::Int64 +end + +CellListMap.copy_output(x::EnergyAndDistances) = EnergyAndDistances(copy(x.energy), copy(x.distances)) + +function CellListMap.reset_output!(output::EnergyAndDistances) + output.energy = 0.0 + output.distances = 0 + return output +end + +function CellListMap.reducer(x::EnergyAndDistances, y::EnergyAndDistances) + x.energy += y.energy + x.distances += y.distances + return EnergyAndDistances(x.energy, x.distances) +end +#### + +function Perturbation( + atoms, + subgroup1::Union{String, Function}, + subgroup2::Union{String, Function}, + perturbation_function::Function, + scaling::Vector{<:Real}; + one_gp::Bool = false +) + + v1 = PDBTools.selindex(atoms, subgroup1) + v2 = PDBTools.selindex(atoms, subgroup2) + if one_gp == false || v1 == v2 || (v1 != v2 && isdisjoint(v1, v2)) + return Perturbation(v1, v2, perturbation_function, scaling) + else + return error( + """ + Selected subgroups have overlap of atoms but they are not equal. + Subgroups must be either equal or completly different. + """) + end +end + + """ reweight( simulation::Simulation, f_perturbation::Function, - group_1::AbstractVector{<:Integer}; + group for i in eachindex(perturbations) + perturbations[i][1] = findall(i1 -> i1 in PDBTools.select(simulation.atoms, perturbations[i][1]), simulation.atoms[gp1]) + perturbations[i][2] = findall(i2 -> i2 in PDBTools.select(simulation.atoms, perturbations[i][2]), simulation.atoms[gp2]) + endist::Bool = false, cutoff::Real = 12.0, k::Real = 1.0, T::Real = 1.0 @@ -25,10 +95,13 @@ end reweight( simulation::Simulation, f_perturbation::Function, - group_1::AbstractVector{<:Integer}, - group_2::AbstractVector{<:Integer}; + group_1::AbstractVector{<:Integer}, + n_atoms_per_molecule_gp_1::Int, + group_2::AbstractVector{<:Integer}, + n_atoms_per_molecule_gp_2::Int; + all_dist::Bool = false, cutoff::Real = 12.0, - k::Real = 1.0, + k::Real = 1.0, T::Real = 1.0 ) @@ -36,11 +109,16 @@ Function that calculates the energy difference when a perturbation is applied on It returns "ReweightResults" structure that contains three results: `probability`, `relative_probability` and `energy` vectors. -The function needs a MolSimToolkit's simulation object, another function to compute the perturbation and one or two types of atoms. +The function needs a MolSimToolkit's simulation object, another function to compute the perturbation, and one or two types of atoms. Additionally, you can also define a cutoff distance, the constant "k" (in some cases, it will be Boltzmann constant) and the temperature, T, of the system. -Group_1 (and group_2) is a vector of atoms indexes, extracted, for example, from a pdb file. +group_1 (and group_2) is a vector of atoms indexes, extracted, for example, from a pdb file. + +n_atoms_per_molecule is the number of atoms per molecules of each group + +By default, only minimum distances are computed in order to perturb the system. +Using all_dist option allows the computation of all possible distances between the selected group of atoms # Example @@ -68,7 +146,7 @@ julia> i2 = PDBTools.selindex(atoms(simulation), "residue 15 and name HB3") 1-element AbstractVector{<:Integer}: 171 -julia> sum_of_dist = reweight(simulation, (i,j,d2) -> d2, i1, i2; cutoff = 25.0) +julia> sum_of_dist = reweight(simulation, r -> r, i1, 2, i2, 1; all_dist = true, cutoff = 25.0) ------------- FRAME WEIGHTS ------------- @@ -106,66 +184,6 @@ julia> sum_of_dist.energy This result is the energy difference between the perturbed frame and the original one. In this case, it is the sum of distances between the reffered atoms ``` """ -function reweight( - simulation::Simulation, - f_perturbation::Function, - group_1::AbstractVector{<:Integer}; - cutoff::Real = 12.0, - k::Real = 1.0, - T::Real = 1.0 -) - prob_vec = zeros(length(simulation)) - prob_rel_vec = zeros(length(simulation)) - energy_vec = zeros(length(simulation)) - for (iframe, frame) in enumerate(simulation) - coordinates = positions(frame) - first_coors = coordinates[group_1] - system = ParticleSystem( - xpositions = first_coors, - unitcell = unitcell(frame), - cutoff = cutoff, - output = 0.0, - output_name = :total_energy - ) - energy_vec[iframe] = map_pairwise!((x, y, i, j, d2, total_energy) -> total_energy + f_perturbation(i, j, sqrt(d2)/10), system) - end - @. prob_rel_vec = exp(-(energy_vec)/k*T) - prob_vec = prob_rel_vec/sum(prob_rel_vec) - output = ReweightResults(prob_vec, prob_rel_vec, energy_vec) - return output -end -function reweight( - simulation::Simulation, - f_perturbation::Function, - group_1::AbstractVector{<:Integer}, - group_2::AbstractVector{<:Integer}; - cutoff::Real = 12.0, - k::Real = 1.0, - T::Real = 1.0 -) - prob_vec = zeros(length(simulation)) - prob_rel_vec = zeros(length(simulation)) - energy_vec = zeros(length(simulation)) - for (iframe, frame) in enumerate(simulation) - coordinates = positions(frame) - uc = unitcell(frame) - first_coors = coordinates[group_1] - second_coors = coordinates[group_2] - system = ParticleSystem( - xpositions = first_coors, - ypositions = second_coors, - unitcell = uc.orthorhombic ? diag(uc.matrix) : uc.matrix, - cutoff = cutoff, - output = 0.0, - output_name = :total_energy - ) - energy_vec[iframe] = map_pairwise!((x, y, i, j, d2, total_energy) -> total_energy + f_perturbation(i, j, sqrt(d2)/10), system) - end - @. prob_rel_vec = exp(-(energy_vec)/k*T) - prob_vec = prob_rel_vec/sum(prob_rel_vec) - output = ReweightResults(prob_vec, prob_rel_vec, energy_vec) - return output -end import Base.show import Statistics @@ -178,22 +196,305 @@ function Base.show(io::IO, mime::MIME"text/plain", res::ReweightResults) FRAME WEIGHTS ------------- - Average probability = $(Statistics.mean(res.probability)) - standard deviation = $(Statistics.std(res.probability)) - - ------------------------------------------- - FRAME WEIGHTS RELATIVE TO THE ORIGINAL ONES - ------------------------------------------- - - Average probability = $(Statistics.mean(res.relative_probability)) - standard deviation = $(Statistics.std(res.relative_probability)) + Average probability = $(Statistics.mean.(res.probabilities)) + standard deviation = $(Statistics.std.(res.probabilities)/sqrt(length(first(res.probabilities)))) ---------------------------------- - COMPUTED ENERGY AFTER PERTURBATION + AVERAGE PERTURBED ENERGIES ---------------------------------- - Average energy = $(Statistics.mean(res.energy)) - standard deviation = $(Statistics.std(res.energy)) + Average Energies = $(Statistics.mean.(res.energies)) + Standard Deviations = $(Statistics.std.(res.energies)/sqrt(length(first(res.energies)))) + + ------------------------------------------- + ENTROPY LOSS DUE TO REWEIGHTING + ------------------------------------------- + Entropy = $(res.entropy_diff) + """) +end + +#Function for two molecular entities +function reweight( + simulation::Simulation, #Simulation object + pert_input::SystemPerturbations; #Data structure containing molecular entities and their subgroups whose distances will be perturbed + all_distances::Bool = false, #Flag to use CellListMap, computing all possible distances between the entities besides minimum distances + k::Real = 1.0, #Boltzmann constant + T::Real = 1.0, #Temperature + cutoff::Real = 12.0, #Cutoff of distances + tol::Real = 1.e-16 #Tolerance to consider a distance contribution +) + + #Defining results + raw_output = OrderedCollections.OrderedDict{Any, Vector{AbstractVector}}(k => + [ + [[zeros(length(simulation))] for δ in pert_input.perturbations[k].scaling], + [[zeros(length(simulation))] for δ in pert_input.perturbations[k].scaling], + zeros(length(simulation)), + zeros(Int, length(simulation)), + ] + for k in keys(pert_input.perturbations) + ) + + #Number of atoms per molecule + n_molecules_gp1 = length(pert_input.group1) ÷ pert_input.number_atoms_group1 + computed_energy = 0 + + #Defining function if CellListMap option is activated + function energy_and_distances!(i,j,d, subgroup1, subgroup2, pert_func::Function, out::EnergyAndDistances) + if is_in(subgroup1, pert_input.group1[i]) && is_in(subgroup2, pert_input.group2[j]) + out.energy += pert_func(d) + if abs(out.energy) >= tol + out.distances += 1 + end + end + return out + end + + #Performing computation for every frame + @showprogress for (iframe, frame) in enumerate(simulation) + coordinates = positions(frame) + gp_1_coord = coordinates[pert_input.group1] + gp_2_coord = coordinates[pert_input.group2] + uc = unitcell(frame) + if all_distances + system = ParticleSystem( + xpositions = gp_1_coord, + ypositions = gp_2_coord, + unitcell = uc.orthorhombic ? diag(uc.matrix) : uc.matrix, + cutoff = cutoff, + output = EnergyAndDistances(0.0, 0), + output_name = :energy_and_distances + ) + for pk in keys(pert_input.perturbations) + map_pairwise!( + (x, y, i, j, d2, output) -> energy_and_distances!( + i, + j, + sqrt(d2), + pert_input.perturbations[pk].subgroup1, + pert_input.perturbations[pk].subgroup2, + pert_input.perturbations[pk].perturbation_function, + output + ), + system + ) + raw_output[pk][3][iframe] = system.energy_and_distances.energy + raw_output[pk][4][iframe] = system.energy_and_distances.distances + system.energy_and_distances.energy = 0.0 + system.energy_and_distances.distances = 0 + end + else + for mol_ind in 1:n_molecules_gp1 + gp_1_coord_ref = gp_1_coord[(mol_ind - 1) * pert_input.number_atoms_group1 + 1 : mol_ind * pert_input.number_atoms_group1] + gp_1_list, gp_2_list = minimum_distances( + xpositions = gp_1_coord_ref, + ypositions = gp_2_coord, + xn_atoms_per_molecule = pert_input.number_atoms_group1, + yn_atoms_per_molecule = pert_input.number_atoms_group2, + unitcell = uc.orthorhombic ? diag(uc.matrix) : uc.matrix, + cutoff = cutoff + ) + for pk in keys(pert_input.perturbations) + for d_i in eachindex(gp_2_list) + if gp_2_list[d_i].within_cutoff && is_in(pert_input.perturbations[pk].subgroup2, pert_input.group2[gp_2_list[d_i].i]) && is_in(pert_input.perturbations[pk].subgroup1, pert_input.group1[gp_2_list[d_i].j]) + computed_energy = pert_input.perturbations[pk].perturbation_function(gp_2_list[d_i].d) + raw_output[pk][3][iframe] += computed_energy + raw_output[pk][4][iframe] += abs(computed_energy) >= tol ? 1 : 0 + end + end + computed_energy = 0 + end + end + end + end + for pk in keys(raw_output) + raw_output[pk][3] = [δ * raw_output[pk][3] for δ in pert_input.perturbations[pk].scaling] + raw_output[pk][2] = [exp.(-raw_output[pk][3][δ]/(k*T)) for δ in eachindex(raw_output[pk][3])] + raw_output[pk][1] = [raw_output[pk][2][δ]/sum(raw_output[pk][2][δ]) for δ in eachindex(raw_output[pk][2])] + end + output = OrderedCollections.OrderedDict{Any, ReweightResults}(kys => + ReweightResults( + raw_output[kys][1], + raw_output[kys][2], + raw_output[kys][3], + raw_output[kys][4], + [-k*(sum(raw_output[kys][1][δ] .* log.(raw_output[kys][1][δ])) - log(1/length(simulation))) for δ in eachindex(pert_input.perturbations[kys].scaling)], + pert_input.perturbations[kys].scaling + ) + for kys in keys(pert_input.perturbations) + ) + return output +end + +#Function for one molecular entity +function reweight( + simulation::Simulation, #Simulation object + pert_input::SystemPerturbationsOneGroup; #Data structure containing molecular entities and their subgroups whose distances will be perturbed + all_distances::Bool = false, #Flag to use CellListMap, computing all possible distances between the entities besides minimum distances + k::Real = 1.0, #Boltzmann constant + T::Real = 1.0, #Temperature + cutoff::Real = 12.0, #Cutoff of distances + tol::Real = 1.e-32 #Tolerance to consider a distance contribution +) + #Case vector for min distances + cases = OrderedCollections.OrderedDict{Any, Vector{<:Real}}(k => + zeros(4) + for k in keys(pert_input.perturbations) + ) + + for k in keys(pert_input.perturbations) + cases[k] = check_vecs(pert_input.perturbations[k].subgroup1, pert_input.perturbations[k].subgroup2) + end + + #Defining results + raw_output = OrderedCollections.OrderedDict{Any, Vector{AbstractVector}}(k => + [ + [[zeros(length(simulation))] for δ in pert_input.perturbations[k].scaling], + [[zeros(length(simulation))] for δ in pert_input.perturbations[k].scaling], + zeros(length(simulation)), + zeros(length(simulation)), + ] + for k in keys(pert_input.perturbations) + ) + + #Number of atoms per molecule + n_molecules_gp1 = length(pert_input.group1) ÷ pert_input.number_atoms_group1 + computed_energy = 0 + + #division factor for double counting + dvf = 1 + + #Defining function if CellListMap option is activated + function energy_and_distances!(i,j,d, subgroup1, subgroup2, pert_func::Function, output::EnergyAndDistances) + gp1 = pert_input.group1 + mol1 = (i - 1) ÷ pert_input.number_atoms_group1 + 1 + mol2 = (j - 1) ÷ pert_input.number_atoms_group1 + 1 + if (is_in(subgroup1, gp1[i]) && is_in(subgroup2, gp1[j]) && mol1 != mol2) || (is_in(subgroup1, gp1[j]) && is_in(subgroup2, gp1[i]) && mol1 != mol2) + output.energy += pert_func(d) + if abs(output.energy) >= tol + output.distances += 1 + end + end + return output + end + + #Performing computation for every frame + @showprogress for (iframe, frame) in enumerate(simulation) + coordinates = positions(frame) + gp_coord = coordinates[pert_input.group1] + uc = unitcell(frame) + if all_distances + system = ParticleSystem( + xpositions = gp_coord, + unitcell = uc.orthorhombic ? diag(uc.matrix) : uc.matrix, + cutoff = cutoff, + output = EnergyAndDistances(0.0, 0), + output_name = :energy_and_distances + ) + for pk in keys(pert_input.perturbations) + map_pairwise!( + (x, y, i, j, d2, output) -> energy_and_distances!( + i, + j, + sqrt(d2), + pert_input.perturbations[pk].subgroup1, + pert_input.perturbations[pk].subgroup2, + pert_input.perturbations[pk].perturbation_function, + output + ), + system + ) + raw_output[pk][3][iframe] = system.energy_and_distances.energy + raw_output[pk][4][iframe] = system.energy_and_distances.distances + system.energy_and_distances.energy = 0.0 + system.energy_and_distances.distances = 0 + end + else + for mol_ind in 1:n_molecules_gp1 + i_index = (mol_ind - 1) * pert_input.number_atoms_group1 + 1 + f_index = mol_ind * pert_input.number_atoms_group1 + gp_coord_ref = gp_coord[i_index : f_index] + gp_1_list, gp_2_list = minimum_distances( + xpositions = gp_coord_ref, + ypositions = gp_coord, + xn_atoms_per_molecule = pert_input.number_atoms_group1, + yn_atoms_per_molecule = pert_input.number_atoms_group1, + unitcell = uc.orthorhombic ? diag(uc.matrix) : uc.matrix, + cutoff = cutoff + ) + for pk in keys(pert_input.perturbations) + sg1 = pert_input.perturbations[pk].subgroup1 + sg2 = pert_input.perturbations[pk].subgroup2 + for d_i in eachindex(gp_2_list) + if gp_2_list[d_i].within_cutoff && is_in(collect(i_index:1:f_index), gp_2_list[d_i].i) == false && is_in(sg2, pert_input.group1[gp_2_list[d_i].i]) && is_in(sg1, pert_input.group1[gp_2_list[d_i].j]) + if cases[pk][1] == 1 + dvf = 2 + end + computed_energy = pert_input.perturbations[pk].perturbation_function(gp_2_list[d_i].d) + raw_output[pk][3][iframe] += computed_energy / dvf + raw_output[pk][4][iframe] += abs(computed_energy) >= tol ? 1 / dvf : 0 + end + end + computed_energy = 0 + dvf = 1 + end + end + end + end + for pk in keys(raw_output) + raw_output[pk][3] = [δ * raw_output[pk][3] for δ in pert_input.perturbations[pk].scaling] + raw_output[pk][2] = [exp.(-raw_output[pk][3][δ]/(k*T)) for δ in eachindex(raw_output[pk][3])] + raw_output[pk][1] = [raw_output[pk][2][δ]/sum(raw_output[pk][2][δ]) for δ in eachindex(raw_output[pk][2])] + end + output = OrderedCollections.OrderedDict{Any, ReweightResults}(kys => + ReweightResults( + raw_output[kys][1], + raw_output[kys][2], + raw_output[kys][3], + Int.(raw_output[kys][4]), + [-k*(sum(raw_output[kys][1][δ] .* log.(raw_output[kys][1][δ])) - log(1/length(simulation))) for δ in eachindex(pert_input.perturbations[kys].scaling)], + pert_input.perturbations[kys].scaling + ) + for kys in keys(pert_input.perturbations) + ) + return output +end + +#Checking if PDB file and input match +function check_input(simulation::Simulation, atom_group::Union{String, Function}, n_mol_per_group::Int, name::String, debug::Bool) + check = length(unique(PDBTools.residue.(PDBTools.select(simulation.atoms, atom_group)))) #check number of residues (number of molecules) in PDB + division = check ÷ n_mol_per_group + quotient = mod(check, n_mol_per_group) + if debug + if ((division == 1 && quotient == 0) || n_mol_per_group == 1) + println("Number of molecules in the system seems to be correct for $name ($n_mol_per_group)") + elseif division != 1 && quotient == 0 + return @warn(""" + Number of molecules in the system ($check) seems to be a multiple of your input for $name ($n_mol_per_group). + """) + elseif quotient != 0 + return @warn(""" + The number of residues ($check) in the PDB file for $name is different than the number of molecules based on the number on the input ($n_mol_per_group) + """) + end + end +end + +function is_in(x, i) + j = searchsortedfirst(x, i) + j > length(x) && return false + if x[j] == i + return true + else + return false + end +end + +function check_vecs(v1, v2) + if v1 == v2 + return [1,0] + end + return [0,1] end \ No newline at end of file diff --git a/src/Reweighting/Reweighting.jl b/src/Reweighting/Reweighting.jl index 7b19e74a..8ed862e3 100644 --- a/src/Reweighting/Reweighting.jl +++ b/src/Reweighting/Reweighting.jl @@ -1,10 +1,14 @@ module Reweighting +using ProgressMeter using LinearAlgebra: diag -using CellListMap: ParticleSystem, map_pairwise, map_pairwise! +using CellListMap using ..MolSimToolkit: Simulation, positions, unitcell +using ..MolSimToolkit.MolecularMinimumDistances +import OrderedCollections +import PDBTools -export reweight, lennard_jones_perturbation, poly_decay_perturbation, gaussian_decay_perturbation +export reweight, lennard_jones_perturbation, Perturbation, SystemPerturbations, SystemPerturbationsOneGroup, gauss const testdir = "$(@__DIR__)/test" @@ -13,69 +17,411 @@ include("./perturbation_examples.jl") end #Module Reweighting -@testitem "Reweighting one frame" begin +@testitem "Reweight with small trajectory using minimum distances and atoms from the same molecule" begin import PDBTools + import OrderedCollections using MolSimToolkit.Reweighting using MolSimToolkit.Reweighting: testdir - simulation = Simulation("$testdir/Testing_reweighting.pdb", "$testdir/Testing_reweighting_one_frame.xtc") + simulation = Simulation("$testdir/Testing_reweighting.pdb", "$testdir/Testing_reweighting_10_frames_trajectory.xtc") + + g1 = PDBTools.selindex(simulation.atoms, "residue 4981 and name F12") + + g2 = PDBTools.selindex(simulation.atoms, at -> at.residue == 4981 && at.name in ["H", "H21", "H22"]) + + c1 = at -> at = true + + c2 = at -> at.residue == 4981 && at.name in ["H21", "H22"] + + dist(r) = r - i1 = PDBTools.selindex(atoms(simulation), "resname TFE and name O") + Dict = OrderedCollections.OrderedDict(1 => Perturbation(simulation.atoms, c1, c2, dist, [1])) - i2 = PDBTools.selindex(atoms(simulation), "residue 11") + input = SystemPerturbations(g1, 1, g2, 3, Dict) - sum_of_dist = reweight(simulation, (i,j,r) -> r, [i1[239]], i2; cutoff = 25.0) - @test sum_of_dist.energy ≈ [7.4295543149] + res = reweight(simulation, + input; + all_distances = false, + k = 1.0, + T = 1.0, + cutoff = 12.0, + ) + + @test res[1].energies[1] ≈ [ + 2.66700, + 0.0, + 0.0, + 2.64641, + 2.42596, + 2.61084, + 2.40289, + 2.48584, + 2.86653, + 2.86769 + ] atol = 1.e-4 + + @test res[1].distances ≈ [ + 1, + 0, + 0, + 1, + 1, + 1, + 1, + 1, + 1, + 1, + ] atol = 1.e-4 end -@testitem "Reweighting small trajectory" begin +@testitem "Reweight with small trajectory using all distances and atoms from the same molecule" begin import PDBTools + import OrderedCollections using MolSimToolkit.Reweighting using MolSimToolkit.Reweighting: testdir simulation = Simulation("$testdir/Testing_reweighting.pdb", "$testdir/Testing_reweighting_10_frames_trajectory.xtc") - i1 = PDBTools.selindex(atoms(simulation), "index 97 or index 106") + g1 = PDBTools.selindex(simulation.atoms, "residue 4981 and name F12") - i2 = PDBTools.selindex(atoms(simulation), "residue 15 and name HB3") + g2 = PDBTools.selindex(simulation.atoms, at -> at.residue == 4981 && at.name in ["H", "H21", "H22"]) - sum_of_dist = reweight(simulation, (i,j,r) -> r, i1, i2, cutoff = 25.0) - @test sum_of_dist.energy ≈ [ - 1.773896547670759, 1.5923698293115915, 1.716614676290554, - 1.933003841107648, 1.602329229247863, 1.9639005665480983, - 3.573986006775934, 2.188798265022823, 2.066180657974777, - 1.6845109623700647 - ] + c1 = at -> at = true + + c2 = at -> at.residue == 4981 && at.name in ["H21", "H22"] + + dist(r) = r + + Dict = OrderedCollections.OrderedDict(1 => Perturbation(simulation.atoms, c1, c2, dist, [-2, -1, 0, 1, 2])) + + input = SystemPerturbations(g1, 1, g2, 3, Dict) + + res = reweight(simulation, + input; + all_distances = true, + k = 1.0, + T = 1.0, + cutoff = 12.0, + ) + @test res[1].energies ≈ [ + -2*[5.450897,6.018059,6.05581,5.926243,5.067425,5.816867,5.229802,5.382748,6.127764, 6.212621], + -[5.450897,6.018059,6.05581,5.926243,5.067425,5.816867,5.229802,5.382748,6.127764, 6.212621], + zeros(10), + [5.450897,6.018059,6.05581,5.926243,5.067425,5.816867,5.229802,5.382748,6.127764, 6.212621], + 2*[5.450897,6.018059,6.05581,5.926243,5.067425,5.816867,5.229802,5.382748,6.127764, 6.212621], + ] atol = 1.e-4 end -@testitem "Reweighting small trajectory probs" begin +@testitem "Reweight with small trajectory using minimum distances and contributions from different residues" begin import PDBTools + import OrderedCollections using MolSimToolkit.Reweighting using MolSimToolkit.Reweighting: testdir simulation = Simulation("$testdir/Testing_reweighting.pdb", "$testdir/Testing_reweighting_10_frames_trajectory.xtc") - i1 = PDBTools.selindex(atoms(simulation), "resname TFE and name O") + g1 = PDBTools.selindex(simulation.atoms, at -> at.residue == 15 && at.name in ["H", "N", "C", "OC1", "OC2"]) + + g2 = PDBTools.selindex(simulation.atoms,at -> at.residue == 11 && at.name in ["CB", "HB1", "HB2", "HB3"]) + + c1 = at -> at.name in ["H"] + + c2 = at -> at.name in ["HB3"] + + c11 = at -> at.name in ["H"] + + c12 = at -> at = true + + dist(r) = r + + Dict = OrderedCollections.OrderedDict( + "a" => Perturbation(simulation.atoms, c1, c2, dist, [-1, 0, 1]), + "b" => Perturbation(simulation.atoms, c11, c12, dist, [1.0]) + ) + + input = SystemPerturbations(g1, 5, g2, 4, Dict) + + res = reweight(simulation, + input; + all_distances = false, + k = 1.0, + T = 1.0, + cutoff = 15.0, + ) + + @test res["a"].energies ≈ [ + -[6.568418, 0, 6.064767, 0, 0, 0, 9.888358, 0, 7.498538, 0], + zeros(10), + [6.568418, 0, 6.064767, 0, 0, 0, 9.888358, 0, 7.498538, 0], + ] atol = 1.e-3 + + @test res["a"].probabilities ≈ [ + [0.031440, 4.41428e-5, 0.0190000, 4.41428e-5, 4.41428e-5, 4.41428e-5, 0.869599, 4.41428e-5, 0.079695, 4.41428e-5], + ones(10)/10, + [0.000234, 0.166546, 0.000387, 0.166546, 0.166546, 0.166546, 8.4542e-6, 0.166546, 9.224899e-5, 0.166546], + ] atol = 1.e-5 + + @test res["a"].distances ≈ [ + 1, 0, 1, 0, 0, 0, 1, 0, 1, 0, + ] atol = 1.e-1 + + @test res["b"].energies[1] ≈ [ + 6.568418, + 5.697282, + 6.064767, + 0, + 6.03079, + 6.464387, + 9.888358, + 9.800954, + 7.498538, + 6.192204 + ] atol = 1.e-4 +end + +@testitem "Reweight with small trajectory using all distances and contributions from different residues" begin + import PDBTools + import OrderedCollections + using MolSimToolkit.Reweighting + using MolSimToolkit.Reweighting: testdir + + simulation = Simulation("$testdir/Testing_reweighting.pdb", "$testdir/Testing_reweighting_10_frames_trajectory.xtc") + + g1 = PDBTools.selindex(simulation.atoms, at -> at.residue == 15 && at.name in ["H", "N", "C", "OC1", "OC2"]) + + g2 = PDBTools.selindex(simulation.atoms,at -> at.residue == 11 && at.name in ["CB", "HB1", "HB2", "HB3"]) + + c1 = at -> at.name in ["H"] + + c2 = at -> at.name in ["HB3"] + + c11 = at -> at.name in ["H"] + + c12 = at -> at = true + + dist(r) = r + + Dict = OrderedCollections.OrderedDict( + "a" => Perturbation(simulation.atoms, c1, c2, dist, [1]), + "b" => Perturbation(simulation.atoms, c11, c12, dist, [1]) + ) + + input = SystemPerturbations(g1, 5, g2, 4, Dict) - i2 = PDBTools.selindex(atoms(simulation), "protein and name O") + res = reweight(simulation, + input; + all_distances = true, + k = 1.0, + T = 1.0, + cutoff = 15.0, + ) - α = 5.e-3 + @test res["a"].energies[1] ≈ [ + 6.568418, + 6.00216, + 6.064767, + 10.650489, + 6.989886, + 7.265108, + 9.888358, + 9.993368, + 7.498538, + 6.640912 + ] atol = 1.e-3 - β = 5.e-3 - cut_off = 12.0 + @test res["b"].energies[1] ≈ [ + 27.963684, + 24.046826, + 26.315003, + 41.866706, + 26.591741, + 27.698368, + 43.392474, + 40.395707, + 32.164286, + 26.216961 + ] atol = 1.e-4 +end + +@testitem "Reweight with small trajectory using minimum distances and contributions from one group" begin #ADICIONAR TESTE COM CONTRIBUIÇÕES NÃO REPETIDAS (OXIGENIO E HIDROGENIOS, POR EX.) + import PDBTools + import OrderedCollections + using MolSimToolkit.Reweighting + using MolSimToolkit.Reweighting: testdir + + simulation = Simulation("$testdir/Testing_reweighting.pdb", "$testdir/Testing_reweighting_10_frames_trajectory.xtc") + + g1 = PDBTools.selindex(simulation.atoms, at -> at.resname == "SOL" && at.residue in [100,120,140] && at.name != "MW") + + c1 = at -> at = true + + c2 = at -> at = true + + c11 = at -> at.name in ["HW1", "HW2"] + + c12 = at -> at.name in ["OW"] + + dist(r) = r + + Dict = OrderedCollections.OrderedDict( + "a" => Perturbation(simulation.atoms, c1, c2, dist,[1]; one_gp=true), + "b" => Perturbation(simulation.atoms, c11, c12, dist, [-1,2]; one_gp=true) + ) + + input = SystemPerturbationsOneGroup(g1, 3, Dict) + + res = reweight(simulation, + input; + all_distances = false, + k = 1.0, + T = 1.0, + cutoff = 27.0, + ) + + @test res["a"].energies[1] ≈ [ + 33.45344, + 54.10159, + 16.77879, + 49.15888, + 17.86500, + 67.33274, + 48.39516, + 24.73005, + 20.73088, + 0.0 + ] atol = 1.e-3 + + @test res["a"].probabilities[1] ≈ [ + 2.96043532e-15 + 3.19137596e-24 + 5.16492547e-8 + 4.47269872e-22 + 1.7431271e-8 + 5.7248292e-30 + 9.5995092e-22 + 1.8191801e-11 + 9.9241468e-10 + 0.9999999 + ] atol = 1.e-7 + + + @test res["a"].distances ≈ [ + 2, + 3, + 1, + 2, + 1, + 3, + 2, + 1, + 1, + 0 + ] atol = 1.e-3 + + @test res["b"].energies ≈ [ + -[12.58697, 54.10159, 16.778796108272978, 0, 0, 0, 0, 0, 0, 0], + [25.17394, 108.20318, 33.557592216545956, 0, 0, 0, 0, 0, 0, 0] + ] atol = 1.e-4 + + @test res["b"].distances ≈ [ + 1, + 3, + 1, + 0, + 0, + 0, + 0, + 0, + 0, + 0 + ] atol = 1.e-4 +end + +@testitem "Reweight with small trajectory using all distances and contributions from one group" begin + import PDBTools + import OrderedCollections + using MolSimToolkit.Reweighting + using MolSimToolkit.Reweighting: testdir + + simulation = Simulation("$testdir/Testing_reweighting.pdb", "$testdir/Testing_reweighting_10_frames_trajectory.xtc") + + g1 = PDBTools.selindex(simulation.atoms, at -> at.resname == "SOL" && at.residue in [100,120,140] && at.name != "MW") + + c1 = at -> at = true + + c2 = at -> at = true + + c11 = at -> at.name in ["HW1", "HW2"] + + c12 = at -> at.name in ["OW"] + + dist(r) = r + + Dict = OrderedCollections.OrderedDict( + "a" => Perturbation(simulation.atoms, c1, c2, dist, [1]; one_gp=true), + "b" => Perturbation(simulation.atoms, c11, c12, dist, [1]; one_gp=true) + ) + + input = SystemPerturbationsOneGroup(g1, 3, Dict) + + res = reweight(simulation, + input; + all_distances = true, + k = 1.0, + T = 1.0, + cutoff = 27.0, + ) + + @test res["a"].energies[1] ≈ [ + 316.9570, + 510.3088, + 155.5428, + 293.7871, + 168.9915, + 630.7433, + 450.7814, + 231.5664, + 189.6638, + 0.0 + ] atol = 1.e-3 + + @test res["a"].distances == [ + 18, + 27, + 9, + 12, + 9, + 27, + 18, + 9, + 9, + 0 + ] + + @test res["b"].energies[1] ≈ [ + 140.8590, + 226.0973, + 68.6984, + 148.9806, + 75.3751, + 280.1446, + 199.8053, + 102.9400, + 84.8140, + 0.0 + ] atol = 1.e-3 - probs_test = reweight(simulation, (i,j,r) -> gaussian_decay_perturbation(r, α, β), i1, i2; cutoff = cut_off) - @test probs_test.probability ≈ [ - 0.08987791339898044 - 0.07326337222373071 - 0.0973116226496827 - 0.10965810145525891 - 0.09829891590498603 - 0.0916792371461855 - 0.08548699059703141 - 0.12480704633057726 - 0.09973413264337352 - 0.12988266765019355 + @test res["b"].distances == [ + 8, + 12, + 4, + 6, + 4, + 12, + 8, + 4, + 4, + 0 ] end \ No newline at end of file diff --git a/src/Reweighting/perturbation_examples.jl b/src/Reweighting/perturbation_examples.jl index c11ba23d..2cfc6d96 100644 --- a/src/Reweighting/perturbation_examples.jl +++ b/src/Reweighting/perturbation_examples.jl @@ -1,32 +1,32 @@ -#SETTING EQUATIONS FOR PERTUBATION +#FUNCTIONS FOR PERTURBATIONS -#Perturbing Lennard-Jones interactions +#Lennard-Jones Interactions function switching_func(r, switch, cut) switching = ((cut^2-r^2)^2*(cut^2-r^2-3*(switch^2-r^2)))/(cut^2-switch^2)^3 return switching end -function lennard_jones(r, sig, ep) - V = 4 * ep * ((sig/r)^12 - (sig/r)^6) +function lennard_jones(r, σ, ϵ) + V = 4 * ϵ * ((σ/r)^12 - (σ/r)^6) return V end -function lennard_jones_perturbation(r, sig1, ep1, sig2, ep2; alpha = 0.0, beta = 0.0, smooth::Bool = false, switch = 8.0, cut = 12.0) - epsilon_original = sqrt(ep1*ep2) - epsilon_perturbed = sqrt(ep1*(1+beta)*ep2) - sigma_original = (sig1 + sig2)/2 - sigma_perturbed = (sig1*(1 + alpha) + sig2)/2 - V_perturbed = (lennard_jones(r, sigma_perturbed, epsilon_perturbed) - lennard_jones(r, sigma_original, epsilon_original)) +function lennard_jones_perturbation(r, σ1, ϵ1, σ2, ϵ2, α, β, smooth::Bool = false, switch = 8.0, cut = 12.0) + ϵ_orig = sqrt(ϵ1*ϵ2) + ϵ_pert = (1 + β) * sqrt(ϵ1*ϵ2) + σ_orig = (σ1 + σ2)/2 + σ_pert = α * (σ1* + σ2)/2 + V_pert = (lennard_jones(r, σ_pert, ϵ_pert) - lennard_jones(r, σ_orig, ϵ_orig)) if smooth && switch < r <= cut - V_perturbed = switching_func(r,switch,cut) * V_perturbed + V_pert = switching_func(r, switch, cut) * V_pert else - V_perturbed = V_perturbed - (lennard_jones(cut, sigma_perturbed, epsilon_perturbed) - lennard_jones(cut, sigma_original, epsilon_original)) + V_pert = V_pert - (lennard_jones(cut, σ_pert, ϵ_pert) - lennard_jones(cut, σ_orig, ϵ_orig)) end - return V_perturbed + return V_pert end -#Applying a polynomial decay pertubation -poly_decay_perturbation(r, alpha, cut) = r > cut ? zero(r) : alpha*((r/cut)^2 - 1)^2 +#Asymmetric Gaussian +skew(r, A, B, C, D) = 1 + A*exp(-B*(r - C)^2) / (1 + exp(D*(r - C))) -#Applying a half-gaussian pertubation -gaussian_decay_perturbation(r, alpha, beta) = beta > 0 ? alpha*exp(-beta*r^2) : error("you need to insert positive values for β") \ No newline at end of file +#Gaussian Perturbation +gauss(r, α, β, μ) = 1 + α*exp(-β*(r - μ)^2) diff --git a/src/Reweighting/test/Testing_reweighting_one_frame.xtc b/src/Reweighting/test/Testing_reweighting_one_frame.xtc deleted file mode 100644 index d92b1a21..00000000 Binary files a/src/Reweighting/test/Testing_reweighting_one_frame.xtc and /dev/null differ