-
Notifications
You must be signed in to change notification settings - Fork 234
feat: implement Fast-SNP, speed up loopless FVA #1444
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
|
Thanks so much. This looks great! Sorry for the issues with the CI. I will try to fix them this week, but you might have to rebase on the devel branch after that. In general I think it will be a great contribution. I will try to review as soon as possible. |
|
@cdiener Hello. I've merged devel into this PR (I've seen you did some python versions adjustment). Please approve the CI workflow to check. Is it true, that now CI should work successfully? |
|
Sorry for the delay, currently on paternity leave. Looks like a great addition. Hopefully I can get to it in the next weeks. I am actually not that opposed to changing the loopless argument to a string. It seems the most explicit and it would be an easy fix in existing code. |
|
@cdiener Hello. I've added a small fix to fastSNP algorithm implementation to make it stable. Now tests should pass, please approve the CI workflow to check. |
cdiener
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hi thanks, realy great addition.
Are the FastSNP variables and constraints incompatible with the original LP? If not it would probably be much more efficient to just modify the existing LP instance with a context which will also make sure all changes are reverted after.
| model = solver.Model() | ||
|
|
||
| q_list = [] | ||
| for i in range(s_int.shape[1]): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
wouldn't it be faster to add those to the original model and use a context? For instance the code below re-adds all the constraints from S that are already in the original model
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
For this function it is important to work only with internal reactions, however, the original model contains boundary reactions and additional constraints. It is harder to filter them, the easier way is just to create the new problem.
We checked that creating a new problem before solving is not a bottleneck and takes negligible time comparing with the whole function running time. Also this function is typically not called in a loop, so we won't create more than one additional problem.
| model.add(q_list) | ||
|
|
||
| for idx, row in enumerate(s_int): | ||
| nnz_list = np.flatnonzero(np.abs(row) > zero_cutoff) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this will ignore additional constraints from model.constraints that the user may have defined.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
For this function, it is important that we find cyclic reactions based only on stoichiometric matrix and reaction directions. Otherwise, the implemented method does not work. This function is used for filtering reactions, so it is ok. I've added a comment to docstring about what this function exactly does.
p.s.
To satisfy all constraints (even reactions bounds, not just directions) the algorithm should be much slower. We definitely know, that in many models there are some cyclic reactions, that cannot carry any non-zero flux (if all bounds are considered), however, it is harder problem to prove that.
| modulo_list = [] | ||
| v_list = [] | ||
|
|
||
| for i in range(n): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this might miss user-defined custom variables
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nullspace_fast_snp is a helper function and it just takes only a matrix and directions as an input and calculates its nullspace. There are no user-defined constraints for this subtask.
Actually both nullspace_fast_snp and find_cyclic_reactions functions are two helper subtasks, that are solved independently before the flux variability optimization which is compatible with all the additional custom constraints and variables.
| {x: 1.0, v: 1.0} | ||
| ) | ||
|
|
||
| for idx, row in enumerate(S): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this might miss user-defined constraints
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Reply is similar as in the thread about custom user-defined variables
| q_list.append(q) | ||
| model.add(q_list) | ||
|
|
||
| for idx, row in enumerate(s_int): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should be more efficient to iterate over model.constrains directly and clone or modify
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
As I written in another reply, it is easier to create a new helper optimization problem, rather than filtering all unnecessary variables and constraints from the copy of the original problem. Also, from my experience it takes approximately the same time to copy the original problem and to create the new one.
|
Hello, thank you very much for your review. I've added an update:
For non-accepted suggestions I've added a response. Looking forward for tests run and next review! |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hi, thanks for the explanation. It makes sense to me now. It would still be possible to identify only the equalities from the set of all constraints quite easily, but using the S matrix is not that much worse because you are already using the fast contraint setters from optlang.
Just one small move for a file and it is still missing the release notes. Maybe also rebase on devel if possible.
|
I moved the file Also I added release notes. |
cdiener
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks, looks good now. 😄
|
Note to self: this is an API breaking change because the type of the loopless arg has been changed. |
TL;DR
This PR implements Fast-SNP algorithm for adding loopless constraints. It consists of two key optimizations:
As a result a 10-100x (or more) speed-up for loopless FVA is achieved.
This PR is somewhat similar to #841, but differs in implementation.
Code modifications
One of the key optimizations is that
add_looplessandflux_variability_analysisfunctions use only reactions that can be a part of a cycle (we call them cyclic). This greatly improves performance due to the number of cyclic reactions is usually much smaller than the number of internal reactions.Function
find_cyclic_reactionsWe add a new function
find_cyclic_reactionswhich finds all reactions in the model that can be a part of a cycle.Also for each such reaction we identify if it can have positive and negative flux in a cycle. This is tested by restricting boundary reaction fluxes to zero and optimizing flux through each of the internal reactions.
There are two methods (
methodparameter) that can be called:basic: a straightforward procedure that iterates over all of the reactions;optimized: an optimized version inspired by Fast-SNP random weights idea, which discovers many cyclic reactions in one go.The second method uses less LP programs and usually works at least 2x faster. It is a custom procedure, so we don't have a citation for the algorithm.
Function
add_looplessWe introduced new
methodparameter. Ifmethod="fastSNP"we:find_cyclic_reactions.If
method="original"then the current SVD-based algorithm is ran on all of the internal reactions.Function
flux_variability_analysisWe made some optimizations for this function if
loopless=True.find_cyclic_reactionsfunction. Reactions which are not cyclic are optimized using the basic FVA mode—their extreme values are guaranteed to be achievable without using cycle fluxes. This greatly improves the performance of the function even when using the CycleFreeFlux method.add_looplessregime to this function, which adds loopless constraints (re-using the calculated cyclic reactions andadd_looplessfunction) to solve loopless FVA for the cyclic reactions. In this regime the exact FVA bounds are calculated (unlike for the CycleFreeFlux method).Performance
FVA quick example
For the quick demonstration of the performance improvement we assembled the following script, which solves FVA for a single reaction with the original and new Fast-SNP loopless constraints.
Here are the example run times for the tested model/reaction pairs (CPLEX solver was used for MILP):
iIT341, reactionACKr:iAF692, reactionMDHy:iNJ661, reactionNADTRHD_copy1:The fastest complete loopless FVA
The most efficient way to calculate loopless FVA for all of the reactions is to directly call
flux_variability_analysisfunction, without using explicitadd_loopless:Discussion points
There are several points regarding the API we wanted to highlight for discussion:
flux_variability_analysis(model, loopless=True, params_add_loopless={})is the most efficient way to calculate loopless FVA. The functionflux_variability_analysisidentifies which method (add_loopless-based or CycleFreeFlux) will be used based on theparams_add_looplesspresence. This form enables backward-compatibility, but is a bit clunky. A better way could be to makelooplessparameter deprecated and replace it with a new string-basedmethodparameter to control the behavior, with the corresponding backward compatibility-handling procedure that setsmethod="CycleFreeFlux"ifloopless=Trueis present.add_looplessfunction the originaluse_cyclic=Trueto enable that, however we decided not to implement it. Our thinking is thatmethod=originalis provided for the reproducibility of the previous behavior, but ultimately shouldn't be used: the Fast-SNP algorithm should always provide a better performance.flux_variability_analysis(loopless=True)can still be ran when the full loopless FVA is not feasible. Thus we decided to implement running CycleFreeFlux only for the cyclic reactions, which significantly improves the performance. This changes the behavior, but should produce the same results as before.We would be happy to hear your feedback and are ready to implement any additional changes to API/docs/tests/etc that are needed for this PR to be merged.