Skip to content

Formal Verification and Model-Based Testing in Strimzi #158

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
wants to merge 2 commits into
base: main
Choose a base branch
from

Conversation

see-quick
Copy link
Member

This proposal introduces potential integration with formal verification and model-based testing within Strimzi.

Copy link
Member

@im-konge im-konge left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the proposal, which is IMO very good written.
I appreciate your work on the PoC and help directly in the Quint project.

However, I have few things to this, which are anyway mentioned in the Disadvantages and Maintainability section.

The biggest scarecrow for me is the maintainability. You know from your own experience with performance tests, which is most recent contribution to testing part of the Strimzi project, that it's mainly maintained by you. Others, me included, are not really using it on "some" basis, nobody else wrote another tests or triggering the pipelines you written. I'm a bit afraid that the same thing will happen with this. And not because it's not useful (as performance tests are IMO really useful in cases you want to see what change had some influence in terms of performance), but because "it's another thing that we need to maintain". That, in some way, is connected to the learning curve you mentioned in the section as well.

From what I understand, because in the PoC you have the integration test there, it will be executed as part of our build pipeline. And in case that there will be some change that would require change into the model, the build will fail. Now, contributor that will not have experience with Quint and the overall procedure, will be confused and ask other maintainers for help. It will (again, it's just my opinion) make the development more complicated - especially when the contributor will be a newcomer and they never heard about Quint before.

The simulation feature for the developer is useful, but I assume that the developer will have to learn about Quint, how it works and how to properly use this feature.

I don't want to block this proposal, but it would be great to discuss these situations plus what exactly does it mean for the contributors (in terms of learning and maintainability).

Because in case that we all now approve it and you will work on implementation and then you will be the only one that actually use it and maintain it, then I guess it will just add another burden on you (and that's not desired).

Copy link
Member

@scholzj scholzj left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the proposal and for the effort in experimenting this.

I think at this point, out main issue as a project is not quality but lack of long-term contributors. I'm not sure if setting up a higher barrier of entry for contributors is the thing we can afford right now.

The QNT model specification seems really complex - especially given the User Operator is pretty straight forward code base without much complexity. That likely also means a huge maintenance effort for any change and rather big space for bugs in the MBT tests themself. I cannot imagine how would look like for something more complex such as the Cluster Operator.

It also does not seem to give us any clues what does it actually test and how to provide an understanding how well is the code base covered and whether the tests are executed in the right way and don't produce false positives.

So - to be honest - I struggle to see where this would add any value to at least compensate for the negative aspects. I do not want to block this if others want to go this way, but I'm pretty much something like -0,9 on this.

Comment on lines +61 to +63
One can find the model and PoC implementation here:
- PoC of Formal Specification for User Operator in Quint: [UserOperatorModel.qnt](https://github.com/see-quick/verification/blob/main/formal_verification/quint/strimzi/user-operator/UserOperatorModel.qnt).
- Model-Based Testing Implementation PoC: [UserOperatorModelMbtIT.java](https://github.com/strimzi/strimzi-kafka-operator/compare/main...see-quick:strimzi-kafka-operator:model-based-testing-user-operator-with-specification)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Out of curiosity ... are the huge JSONs you have in the PR really expected to be committed to the GitHub repository?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, generating these JSONs during build would be very time-consuming, so having them pre-generated is the best way IMO.

Comment on lines +144 to +145
| Line Coverage | **89%** | 83% |
| Branch Coverage | **78%** | 71% |
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

To be honest, over my professional life, I learned that having good line or ranch coverage is a lot less important than understanding the system and the risks of where the issues might come from and covering that.

So, from the practical perspective, are there any issues that this helped discover in the User Operator?

Copy link
Member Author

@see-quick see-quick May 29, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

To be honest, over my professional life, I learned that having good line or ranch coverage is a lot less important than understanding the system and the risks of where the issues might come from and covering that.

I agree it's not mainly about improving test coverage but proving that UO is correct based on the specification, and it never reaches a bad state.

So, from the practical perspective, are there any issues that this helped discover in the User Operator?

Well, in the case of UO, I didn't find any bugs, even though I have scenarios where I was questioning myself. But the model did help clarify edge cases, validate assumptions (like auth transitions, paused reconciliation behaviour), and ensure invariants were always passing/true (were not violated). So I think even when I did not discover a new bug, it just gave us more confidence in the correctness of the implementation.


But applying the same approach to other components might reveal subtle bugs or design flaws that might otherwise go unnoticed :).

@ppatierno
Copy link
Member

Thanks for the proposal and for the effort with this. I think it was a very useful exercise.
On my side, I can just resonate what Jakub and Lukas said. I see more drawbacks than advantages.
The maintainability and the fact that adding this could raise the contribution barrier for other users is what bother me much. I would be -1 but not sure if others really want to move forward with it.

@see-quick
Copy link
Member Author

From what I understand, because in the PoC you have the integration test there, it will be executed as part of our build pipeline. And in case that there will be some change that would require change into the model, the build will fail. Now, contributor that will not have experience with Quint and the overall procedure, will be confused and ask other maintainers for help. It will (again, it's just my opinion) make the development more complicated - especially when the contributor will be a newcomer and they never heard about Quint before.

That’s a fair point, but the impact should be minimal. The MBT model is limited to the user-operator, which rarely changes. Most small updates won’t affect the model, and if major logic changes do, it’s likely worth updating the model anyway.

The simulation feature for the developer is useful, but I assume that the developer will have to learn about Quint, how it works and how to properly use this feature.

🙏 . It works well; it's just about to run one command, and that could be documented or referenced in Quint's documentation. It's not a big deal, trust me. The effort-to-reward ratio is very favourable.

what exactly does it mean for the contributors (in terms of learning and maintainability).

That's a great point — and totally fair to discuss. Having a formal model actually improves contributor onboarding by making the User Operator’s design much easier to understand. IMO, when designing the UO model, I have figured out a lot of stuff about how it works in the core, and it really improved my overall understanding of the UO.

On maintainability: yes, it adds some overhead, but it pays off. If we ever change the design of the UO, it’s incredibly valuable to validate the change both in the model and in the real implementation, giving us high confidence that the new behaviour is correct => which avoids future bugs.


The QNT model specification seems really complex - especially given the User Operator is pretty straight forward code base without much complexity. That likely also means a huge maintenance effort for any change and rather big space for bugs in the MBT tests themself. I cannot imagine how would look like for something more complex such as the Cluster Operator.

That’s a fair concern. I mean, in that case, we don’t need to model the entire Cluster Operator -- we can focus on specific Cluster Operator subsystems like KafkaRoller, where correctness is critical. The goal isn’t to model everything, but to selectively apply MBT where it brings the most value.

@scholzj
Copy link
Member

scholzj commented May 29, 2025

That’s a fair point, but the impact should be minimal. The MBT model is limited to the user-operator, which rarely changes. Most small updates won’t affect the model, and if major logic changes do, it’s likely worth updating the model anyway.

To be honest, I do not think this is a good argument for MBT -> if the code rarely changes, we do not need to invest a lot of effort and create a big contributor barrier to test it better. The value of tests comes when things are changing.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants