-
Notifications
You must be signed in to change notification settings - Fork 67
Formal Verification and Model-Based Testing in Strimzi #158
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Conversation
Signed-off-by: see-quick <[email protected]>
Signed-off-by: see-quick <[email protected]>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for the proposal, which is IMO very good written.
I appreciate your work on the PoC and help directly in the Quint project.
However, I have few things to this, which are anyway mentioned in the Disadvantages and Maintainability
section.
The biggest scarecrow for me is the maintainability. You know from your own experience with performance tests, which is most recent contribution to testing part of the Strimzi project, that it's mainly maintained by you. Others, me included, are not really using it on "some" basis, nobody else wrote another tests or triggering the pipelines you written. I'm a bit afraid that the same thing will happen with this. And not because it's not useful (as performance tests are IMO really useful in cases you want to see what change had some influence in terms of performance), but because "it's another thing that we need to maintain". That, in some way, is connected to the learning curve you mentioned in the section as well.
From what I understand, because in the PoC you have the integration test there, it will be executed as part of our build pipeline. And in case that there will be some change that would require change into the model, the build will fail. Now, contributor that will not have experience with Quint and the overall procedure, will be confused and ask other maintainers for help. It will (again, it's just my opinion) make the development more complicated - especially when the contributor will be a newcomer and they never heard about Quint before.
The simulation feature for the developer is useful, but I assume that the developer will have to learn about Quint, how it works and how to properly use this feature.
I don't want to block this proposal, but it would be great to discuss these situations plus what exactly does it mean for the contributors (in terms of learning and maintainability).
Because in case that we all now approve it and you will work on implementation and then you will be the only one that actually use it and maintain it, then I guess it will just add another burden on you (and that's not desired).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for the proposal and for the effort in experimenting this.
I think at this point, out main issue as a project is not quality but lack of long-term contributors. I'm not sure if setting up a higher barrier of entry for contributors is the thing we can afford right now.
The QNT model specification seems really complex - especially given the User Operator is pretty straight forward code base without much complexity. That likely also means a huge maintenance effort for any change and rather big space for bugs in the MBT tests themself. I cannot imagine how would look like for something more complex such as the Cluster Operator.
It also does not seem to give us any clues what does it actually test and how to provide an understanding how well is the code base covered and whether the tests are executed in the right way and don't produce false positives.
So - to be honest - I struggle to see where this would add any value to at least compensate for the negative aspects. I do not want to block this if others want to go this way, but I'm pretty much something like -0,9 on this.
One can find the model and PoC implementation here: | ||
- PoC of Formal Specification for User Operator in Quint: [UserOperatorModel.qnt](https://github.com/see-quick/verification/blob/main/formal_verification/quint/strimzi/user-operator/UserOperatorModel.qnt). | ||
- Model-Based Testing Implementation PoC: [UserOperatorModelMbtIT.java](https://github.com/strimzi/strimzi-kafka-operator/compare/main...see-quick:strimzi-kafka-operator:model-based-testing-user-operator-with-specification) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Out of curiosity ... are the huge JSONs you have in the PR really expected to be committed to the GitHub repository?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, generating these JSONs during build would be very time-consuming, so having them pre-generated is the best way IMO.
| Line Coverage | **89%** | 83% | | ||
| Branch Coverage | **78%** | 71% | |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
To be honest, over my professional life, I learned that having good line or ranch coverage is a lot less important than understanding the system and the risks of where the issues might come from and covering that.
So, from the practical perspective, are there any issues that this helped discover in the User Operator?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
To be honest, over my professional life, I learned that having good line or ranch coverage is a lot less important than understanding the system and the risks of where the issues might come from and covering that.
I agree it's not mainly about improving test coverage but proving that UO is correct based on the specification, and it never reaches a bad state.
So, from the practical perspective, are there any issues that this helped discover in the User Operator?
Well, in the case of UO, I didn't find any bugs, even though I have scenarios where I was questioning myself. But the model did help clarify edge cases, validate assumptions (like auth transitions, paused reconciliation behaviour), and ensure invariants were always passing/true (were not violated). So I think even when I did not discover a new bug, it just gave us more confidence in the correctness of the implementation.
But applying the same approach to other components might reveal subtle bugs or design flaws that might otherwise go unnoticed :).
Thanks for the proposal and for the effort with this. I think it was a very useful exercise. |
That’s a fair point, but the impact should be minimal. The MBT model is limited to the user-operator, which rarely changes. Most small updates won’t affect the model, and if major logic changes do, it’s likely worth updating the model anyway.
🙏 . It works well; it's just about to run one command, and that could be documented or referenced in Quint's documentation. It's not a big deal, trust me. The effort-to-reward ratio is very favourable.
That's a great point — and totally fair to discuss. Having a formal model actually improves contributor onboarding by making the User Operator’s design much easier to understand. IMO, when designing the UO model, I have figured out a lot of stuff about how it works in the core, and it really improved my overall understanding of the UO. On maintainability: yes, it adds some overhead, but it pays off. If we ever change the design of the UO, it’s incredibly valuable to validate the change both in the model and in the real implementation, giving us high confidence that the new behaviour is correct => which avoids future bugs.
That’s a fair concern. I mean, in that case, we don’t need to model the entire Cluster Operator -- we can focus on specific Cluster Operator subsystems like KafkaRoller, where correctness is critical. The goal isn’t to model everything, but to selectively apply MBT where it brings the most value. |
To be honest, I do not think this is a good argument for MBT -> if the code rarely changes, we do not need to invest a lot of effort and create a big contributor barrier to test it better. The value of tests comes when things are changing. |
This proposal introduces potential integration with formal verification and model-based testing within Strimzi.