test: hookcmds integration tests for secret access and port management by tonyandrewmeyer · Pull Request #2562 · canonical/operator

tonyandrewmeyer · 2026-06-11T06:38:56Z

Follow-up to #2520. This PR adds three integration tests to test/integration/test_hookcmds.py:

test_secret_grant_revoke. Exercises secret_grant and secret_revoke over the real cross-app relation.
test_ports_endpoint_scoped. Opens TCP port 7766 scoped to the peer endpoint, asserts opened_ports(endpoints=True) returns endpoints=['peer'], then closes it.
test_relation_model_get. This really exists for CMR, but doing that seems too much for these tests, so it just validates it's the right shape and consistent across use.

I also (had Claude) rewrite test_juju_reboot_queues_reboot. I wasn't paying enough last time - you can't reboot in K8s, and you can't reboot in an action (I definitely knew both of these already). I did the same kind of workaround as I've done previously: trigger it via config.

Also a few extra fixes for problems that running against all of 3/stable, 4.0/stable, 4.0/edge, 4.1/edge × machine/k8s) showed up. Some of this was that I failed to run the whole matrix after changes from refiew: I'll make sure to do that this time, sorry!

test-file resource declaration needed filename: for juju 3/stable and newer to accept the deploy.
data storage needed multiple: { range: 1-2 } so test_storage_add could attach a second instance.
test_setup deploys with trust=True so credential_get works.
test_network_get_returns_addresses xfails the Juju 4.0/k8s network-get placeholder issue (juju/juju#22616) via the new shape; the interface-name non-empty assertion is gated on machine substrates.
_xfail_juju4_commit_bug helper centralises the xfail-on-error pattern for the Juju 4 uniter commit-phase bugs (juju/juju#22523, juju/juju#22524), widened to catch TimeoutError alongside TaskError since the symptom on 4.0/k8s is sometimes a driver timeout.

Run on my fork: tonyandrewmeyer/operator run 27414403142. All eight hookcmds cells green:

Cell	Result
`hookcmds (machine, 3/stable)`	✅ 8m37s
`hookcmds (k8s, 3/stable)`	✅ 6m13s
`hookcmds (machine, 4.0/stable)`	✅ 7m38s
`hookcmds (k8s, 4.0/stable)`	✅ 10m03s
`hookcmds (machine, 4.0/edge)`	✅ 7m59s
`hookcmds (k8s, 4.0/edge)`	✅ 9m14s
`hookcmds (machine, 4.1/edge)`	✅ 7m31s
`hookcmds (k8s, 4.1/edge)`	✅ 9m28s

The secrets (4.0/stable) failure is the pre-existing flaky test_secrets cell from main, not anything this PR touches.

…-scoped ports) Follow-up 3: add any-charm as a second application in test_setup, add test-secret-grant and test-secret-revoke actions and a corresponding test_secret_grant_revoke jubilant test that exercises secret_grant and secret_revoke over a real cross-app relation. Follow-up 4: add test-ports-endpoint-scoped action and test_ports_endpoint_scoped jubilant test that opens a TCP port scoped to the peer endpoint, verifies it appears with endpoints=['peer'] from opened_ports(endpoints=True), and confirms it is closed cleanly. 31 tests collected; no import or fixture-resolution errors.

The lockfile pinned ops at 3.8.0.dev0 from before the 3.7.1 release; charmcraft pack failed with "Package metadata version 3.7.1 does not match given version 3.8.0.dev0" because uv built ops.tar.gz from the current 3.7.1 source while the lock demanded 3.8.0.dev0. The other three integration-test charms (test_relation, test_secrets, test_tracing) ship the same pyproject shape with no uv.lock and resolve ops cleanly against whatever the local tarball provides. Match that pattern so a future ops version bump doesn't break the hookcmds matrix again.

- test_network_get_returns_addresses: on Juju 4.0/k8s, network-get puts an empty "k8s service" placeholder at bind_addresses[0] and the cloud-container with the pod IP at bind_addresses[1], so the action's `first-address` field is absent. Filed as juju/juju#22616. Juju 3.6 and Juju 4.1 both put the address-carrying entry first. - test_secret_grant_revoke: jubilant raises Python's built-in TimeoutError (not jubilant.TaskError) when juju run times out, so widen the except. Same Juju 4.0 uniter commit-phase root cause as the existing _JUJU4_COMMIT_BUG xfails (juju/juju#22524). - test_ports_endpoint_scoped: same TimeoutError symptom on Juju 4.0/k8s endpoint-scoped opens; xfail with the same constant pending a more specific upstream issue. A new _is_caas() helper checks model.type for the k8s gate on network-get; the constant _JUJU4_NETGET_K8S_BUG points at juju/juju#22616.

…grant step too The grant action (secret_add + secret_grant) queues changes that the Juju 4.0 uniter commit-phase bug also affects, and on 4.0/k8s the grant times out at least as often as the revoke. Move the existing try/(TaskError, TimeoutError) wrapper around the grant call too.

Widen the _xfail_juju4_commit_bug helper to accept any Exception (the follow-up tests catch TimeoutError in addition to jubilant.TaskError), and route test_secret_grant_revoke and test_ports_endpoint_scoped through it. They now also fail loudly if Juju 4 ever stops hitting the bug.

The matrix expanded in the merged PR canonical#2520 to include juju 3/stable (newer 3.x), 4.0/edge, and 4.1/edge. Those releases reject a deploy of a charm whose 'type: file' resource declaration is missing the 'filename:' field with the error 'resource missing filename'. The older 3.6/stable cell tolerated it. Set filename: test-file.txt; the existing integration assert that the resource path's basename equals 'test-file.txt' continues to hold.

After canonical#2520 squash-merged, the matrix expanded to 3/stable, 4.0/edge and 4.1/edge — channels the original PR was never tested against. Five distinct problems showed up across cells. Fix all in one pass: - test_network_get_returns_addresses: the rebase took the follow-up's old-shape assertions against the new-shape charm action (which was rewritten in PR review to emit JSON). Restore the JSON-parsing assertions and re-express the Juju 4.0/k8s xfail in terms of the new shape (first bind-address has an empty addresses list, juju/juju#22616). - _xfail_juju4_commit_bug: was over-aggressive on Juju 4 machine cells — the action succeeded so the strict pytest.fail fired ('no longer hits ... remove the guard'), but the commit-phase bug only affects k8s. Gate the strict-fail on _is_caas(juju). - test_storage_add: storage-add queues an end-of-hook commit and hits the same Juju 4.0/k8s commit-phase regression. Route through the helper. - test_relation_model_get: was calling juju.show_model() which doesn't exist on the pinned jubilant~=1.5. Read the UUID from 'juju show-model' directly. - test_credential_get / test_juju_reboot_queues_reboot: the action fails on substrates without --trust (credential-get) or where the unit can't actually reboot (containers, k8s). The hookcmd plumbing itself is fine; skip with a clear reason rather than fail.

Iteration on top of the previous matrix-fix commit, after the dispatch narrowed the failures down to three: - _xfail_juju4_commit_bug was over-gated: the previous version only fired on Juju 4 + k8s, but the commit-phase bug is observed on machine too (state-delete and secret-remove fail on both substrates, per the existing module-level comment). Restore xfail-on-error for any Juju >=4 while still strict-failing only on Juju 4 k8s success (where the bug is most reliable). - test_relation_model_get was comparing the action's uuid result to 'juju show-model'.model-uuid. That equality holds on Juju 3.6 machine but not on Juju 4 or Juju 3 k8s — relation_model_get is really meant for cross-model relations, and the local-model coincidence on 3.6 was incidental. Drop the comparison and assert only that the call returned a UUID-shaped string. Equality belongs in a future cross-model scenario. - test_storage_add was failing 'cannot attach, storage is singular'. The data storage in test_hookcmds/charmcraft.yaml had no 'multiple:' clause, so Juju treated it as singular and rejected the second attach. Add 'multiple: { range: 1-2 }'. - test_network_get_returns_addresses was asserting non-empty interface-name; on k8s the first bind-address may be a service- network entry with no NIC, so the field is empty. Gate the assertion on machine substrates.

The strict-fail semantics work for a single-version test target but break in a multi-channel matrix: 4.0/edge and 4.1/edge ship the upstream fix for juju/juju#22523/#22524, so the action succeeds there — but 4.0/stable still has the bug, so the guard must stay. Strict-failing on edge success makes the guard impossible to keep. Drop the strict-fail branch; xfail-on-error remains, which is what the original PR shipped. The 'remind me to remove the guard' signal is the cells that succeed in the matrix — no test-author action needed until 4.0/stable also moves past the fix.

…ften the relation-model-get note - test_credential_get: deploy with trust=True instead of try/except-skip. The hookcmd needs trust to read the credential; granting it in test_setup is the right default. Negative-case testing (no trust → fails; juju.trust() → succeeds; clean up) is plausible but out of scope here. - test_juju_reboot_queues_reboot: skip when _is_caas(juju) instead of catching the runtime TaskError. Kubernetes pods can't reboot themselves; it's an upfront substrate condition, not a runtime failure to recover from. - test_relation_model_get: keep the shape-only check, but soften the rationale comment. Whether the action's uuid should equal the local model UUID for a peer relation is being validated against a real juju locally; tighten the assertion back once that lands.

…peer relations Validated live on Juju 3.6.23 and 4.0.5 LXD: for a peer relation, the server-side path (apiserver/facades/agent/uniter/uniter.go) defaults otherApplication.ModelUUID to the local model UUID and only overrides on a successful cross-model lookup. So relation_model_get returns the local model UUID, matching JUJU_MODEL_UUID inside the hook env. The earlier comparison via 'juju show-model' from the test side was a fragile cross-environment dance and was getting the wrong UUID under CI; emit JUJU_MODEL_UUID from the action and assert equality directly inside the same hook context.

The reboot hookcmd needs an actual rebootable substrate; both Kubernetes pods and LXD/localhost containers refuse it (juju-reboot exits status 1). Skip on cloud == 'localhost' too so machine cells don't fail with an unactionable error.

…tly forbidden Verified live on Juju 3.6 LXD: juju-reboot returns 'ERROR juju-reboot is not supported when running an action.' regardless of substrate. The previous CAAS + localhost gating attributed it to container substrates; the real constraint is the action context itself. Exercising this hookcmd needs a regular-hook trigger (e.g. install / config-changed) and a harness that drives that hook, not an action- invoked path. Skip with the accurate reason until such a harness lands.

juju forbids juju-reboot in action contexts (verified live + confirmed in juju source at internal/worker/uniter/runner/jujuc/reboot.go:82-85). So the previous action-based test could never run. Add a config option 'reboot-trigger'; the config-changed handler calls hookcmds.juju_reboot() the first time it sees the trigger value, using a marker file under JUJU_CHARM_DIR to prevent the post-reboot config-changed re-run from looping. A read-only action exposes the marker existence so the integration test can verify the path actually ran. Drop the test-juju-reboot action and handler (always failed in juju 4.x and could only ever fail with 'not supported when running an action'); replace with test-reboot-marker. Skip on CAAS — Kubernetes pods can't reboot themselves and the hookcmd surfaces as a runtime error there.

…uptime - Reformat the _REBOOT_MARKER constant to match ruff --preview's preferred layout (lint was failing on the previous form). - Have the test-reboot-marker action also return /proc/uptime so the test can compare before/after. The marker alone only proves the config-changed handler ran the path; a fresh (lower) uptime after the trigger is what proves juju-reboot actually rebooted the unit.

… list test/charms/test_hookcmds/.gitignore had a single '*' line, which silently hid every new file inside that directory and made git add emit an 'ignored' warning even for tracked files. Replace with the explicit pattern test_secrets already uses (uv.lock, *.tar.gz, *.charm). Tracked files are unaffected; new files under this dir will now show up in git status as intended.

tonyandrewmeyer · 2026-06-11T11:39:11Z

@dwilding and @tromai sorry to hit you up again immediately following the other one, but I assume bringing over that context is the most efficient thing to do. This one should be the end of this work!

dwilding · 2026-06-12T07:45:39Z

+    # On Juju 4.0/k8s this action times out on the driver side, matching the
+    # uniter commit-phase regression pattern. May be a separate upstream bug
+    # from the secret ones; xfail until pinned down.


Is this (potential) timeout bug tracked somewhere for further investigation?

I think it has the same underlying cause and will get fixed at the same time as the other one. To be honest, partly I got tired digging into Juju bugs deep enough to write a decent issue, even with agent help, and didn't want to do another one when I thought it was likely the same.

When the other one is fixed, if this doesn't start passing, I'll do the investigation then. The tracking is this comment plus my subscriptions to the Juju issues.

Co-authored-by: Dave Wilding <tech@dpw.me>

Review follow-ups on canonical#2562: - Rename the test-side substrate helper from `_is_caas` to `_is_k8s` so the Kubernetes terminology matches what we use elsewhere. Document the juju- internal `caas`/`iaas` model-type pair in the docstring. - Switch the juju-reboot proof from `/proc/uptime` to `/proc/stat`'s `btime`. The previous strict `uptime_after < uptime_before` check has a false- negative race when `uptime_before` is small: if recovery takes longer than the pre-reboot uptime, the post-reboot uptime can already exceed it. `btime` is the epoch second the kernel booted and changes if and only if a reboot happened, so `btime_after > btime_before` is race-free. Also chmod +x on the test charm to satisfy the shebang-executable pre-commit hook (matches the other test charms). Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>

/proc/stat's `btime` reports the host kernel's boot time, which doesn't change when a juju-reboot restarts an LXD container — so the previous check fired a false negative on every machine cell. /proc/uptime is virtualised per container by lxcfs and does reset on container restart, so deriving the boot epoch as `time.time() - uptime` gives a value that changes if and only if the unit actually rebooted, while still avoiding the original strict-inequality race on raw uptime. Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>

On Juju 4 machine substrates, juju-reboot queued during config-changed is silently dropped — the container is never restarted, even though the hook commits its other side-effects (marker file, status-set). The previous reboot test (before the tightened boot-time assertion) only checked the marker, so it accidentally passed on Juju 4; the new boot-epoch check correctly catches the missing reboot. Add a juju-major-version-gated xfail pointing at the upstream bug (juju/juju#22639) so the test fails on Juju 3 (where reboot still works) and xfails cleanly on Juju 4 until the upstream fix lands. Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>

tonyandrewmeyer changed the title ~~test(hookcmds): integration-test follow-ups 3 + 4~~ test: hookcmds integration tests for secret access and port management Jun 11, 2026

tonyandrewmeyer added 4 commits June 11, 2026 19:29

tonyandrewmeyer force-pushed the hookcmds-followups-3-4-5 branch from 67ee265 to cc34d9b Compare June 11, 2026 07:31

tonyandrewmeyer commented Jun 11, 2026

View reviewed changes

Comment thread test/integration/test_hookcmds.py Outdated

Apply suggestion from @tonyandrewmeyer

9ee0489

tonyandrewmeyer commented Jun 11, 2026

View reviewed changes

Comment thread test/integration/test_hookcmds.py Outdated

tonyandrewmeyer added 12 commits June 11, 2026 19:39

Apply suggestion from @tonyandrewmeyer

efe9466

tonyandrewmeyer marked this pull request as ready for review June 11, 2026 11:38

tonyandrewmeyer requested review from dwilding and tromai June 11, 2026 11:39

dwilding approved these changes Jun 12, 2026

View reviewed changes

tonyandrewmeyer and others added 3 commits June 12, 2026 20:15

Apply suggestions from code review

e64797b

Co-authored-by: Dave Wilding <tech@dpw.me>

Update test/charms/test_hookcmds/src/charm.py

a9ae994

Co-authored-by: Dave Wilding <tech@dpw.me>

tonyandrewmeyer and others added 2 commits June 12, 2026 21:10

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

test: hookcmds integration tests for secret access and port management#2562

test: hookcmds integration tests for secret access and port management#2562
tonyandrewmeyer wants to merge 23 commits into
canonical:mainfrom
tonyandrewmeyer:hookcmds-followups-3-4-5

tonyandrewmeyer commented Jun 11, 2026 •

edited

Loading

Uh oh!

Uh oh!

Uh oh!

tonyandrewmeyer commented Jun 11, 2026

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

dwilding Jun 12, 2026

Uh oh!

tonyandrewmeyer Jun 12, 2026

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

tonyandrewmeyer commented Jun 11, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Uh oh!

Uh oh!

tonyandrewmeyer commented Jun 11, 2026

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

dwilding Jun 12, 2026

Choose a reason for hiding this comment

Uh oh!

tonyandrewmeyer Jun 12, 2026

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

tonyandrewmeyer commented Jun 11, 2026 •

edited

Loading