Skip to content

Conversation

@danzatt
Copy link
Contributor

@danzatt danzatt commented Aug 1, 2025

Sign published sysexts

Use systemd-repart for creating signed DDI sysexts. The keys are expected to be stored in GitHub secrets SYSEXT_PRIVATE_KEY and SYSEXT_CERTIFICATE.

Note/TODO: this needs additional changes in the GH repo setup, there needs to be a signing key and certificate in GH secrets. Additionally, the signing certificate should be published as a release, so that consumers can use it to verify the sysexts. This can be automated through new GH actions, so that the signing key is would be generated and uploaded right in GH actions.

How to use

The following butane config can be used for testing (pulls wasmedge sysext as an example, puts signing certificate into /etc/verity.d and forces sysext signature verification using service dropin). For the sysext verification to work, you need to recompile kernel with CONFIG_DM_VERITY_VERIFY_ROOTHASH_SIG (implemented in a separate PR flatcar/scripts#3162).

Note: it uses releases from my fork of sysext-bakery

variant: flatcar
version: 1.0.0

storage:
  files:
    - path: /opt/extensions/wasmedge-0.14.1-x86-64.raw
      mode: 0420
      contents:
        source: https://github.com/danzatt/sysext-bakery/releases/download/wasmedge-0.14.1/wasmedge-0.14.1-x86-64.raw
    - path: /etc/sysupdate.wasmedge.d/wasmedge.conf
      contents:
        source: https://extensions.flatcar.org/extensions/wasmedge.conf
    - path: /etc/verity.d/sysext-bakery.crt
      mode: 0420
      contents:
        source: https://github.com/danzatt/sysext-bakery/releases/download/signing-key/sysext.crt
  links:
    - target: /opt/extensions/wasmedge-0.14.1-x86-64.raw
      path: /etc/extensions/wasmedge.raw
      hard: false
systemd:
  units:
    - name: systemd-sysext.service
      enabled: true
      dropins:
        - name: force-signature.conf
          contents: |
            [Service]
            TimeoutStartSec=10
            ExecStart=
            ExecStart=systemd-sysext refresh --image-policy="root=verity+signed+absent:usr=verity+signed+absent"
            ExecReload=
            ExecReload=systemd-sysext refresh --image-policy="root=verity+signed+absent:usr=verity+signed+absent"
            [Unit]
            JobRunningTimeoutSec=5

Testing done

[Describe the testing you have done before submitting this PR. Please include both the commands you issued as well as the output you got.]

  • Changelog entries added in the respective changelog/ directory (user-facing change, bug fix, security fix, update)
  • Inspected CI output for image differences: /boot and /usr size, packages, list files for any missing binaries, kernel modules, config files, kernel modules, etc.

@danzatt danzatt requested a review from a team as a code owner August 1, 2025 15:45
lib/generate.sh Outdated
erofs)
mkfs.erofs "${fname}" "${basedir}"
;;
(
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we should make signing opt-in, so wrap the the direct invocations in one if branch?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This could also reuse the exported repart flag options env var.
FYI: I also noticed problems with erofs and have filed a PR to fix this and if merged earlier you might need to adapt the settings when rebasing here #176

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we should make signing opt-in, so wrap the the direct invocations in one if branch?

I've just modified the PR to include both raw squashfs and signed DDI in the release (see releases in my fork).

run: |
pushd bakery
echo "${{ secrets.SYSEXT_PRIVATE_KEY }}" > sysext.key
echo "${{ secrets.SYSEXT_CERTIFICATE }}" > sysext.crt
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we should publish the cert as release artifact - and maybe include it in the sysupdate definition?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, we definitely should publish it as a release. However, I'm not so sure about updating it using sysupdate. If I understand correctly, sysupdate cannot cryptographically verify the integrity of the update (besides using HTTPS). So it could potentially be used to inject a malicious certificate during sysupdate. Maybe we could ship up-to date sysext bakery certificate in OS image (e.g. in /opt) and the user would just symlink the cert to /etc/verity.d to trust it. This way the bakery cert would be updated with OS update (that is verified using Flatcar release key).

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sysupdate has an option to verify with GPG (which is recommended) but anyway, this then would have to be part of the Ignition config. I think for now we can also provision the cert via Ignition. If the cert doesn't change it sounds ok to have it in the OS but if it's expiring every now and then, having it in the OS might be more complex for managing the lifecycle?

lib/generate.sh Outdated
systemd-repart \
--empty=create \
--size=auto \
--private-key=sysext.key \
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

does systemd-repart support pkcs11 tokens? that would be more secure than making the private key accessible to github actions workers

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, it does. I've even tested it with Keyvault PKCS11 token. I'll try to figure out how to use keyvault from github actions.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I've just reworked the PR to use PKCS11 token. I've also included script to set up Azure keyvault and managed identity for GiHub Action.

@danzatt danzatt force-pushed the signed-ddi-sysexts branch from 1f2a6ab to e6114b7 Compare August 6, 2025 14:51
@danzatt
Copy link
Contributor Author

danzatt commented Aug 28, 2025

I've just updated the PR. It now:

  • builds both signed DDI and unsigned raw image (as before)
    • there are two sysupdate files, one for the signed and one for the raw file
  • the bakery CI will now run inside a container based on Ubuntu 24.10
    • Github runner only offers 24.04, which has too old systemd-repart (without PKCS support)
    • the docker image will contain prebuild Keyvault PKCS11 token
    • I had to pull off some tricks to make it work, because bakery itself runs containers (i.e. Docker-in-Docker)
  • sysexts are signed using Azure Keyvault via PKCS11 token
    • there is a script (lib/setup_azure_keyvault.sh) to set up the Keyvault, managed identity for GitHub Actions and permissions

@chewi
Copy link
Contributor

chewi commented Aug 29, 2025

Are you sure we actually need Azure CLI? This is normally only needed when signing manually. In CI, you're more likely to use a managed identity or a token set in an environment variable. We currently used a managed identity for signing the boot binaries in CI. You may see errors about the az command not being found, but that's only because it tries that before other mechanisms.

lib/generate.sh Outdated
SQUASHFS_ARG+=('-xattrs-exclude' "'^btrfs.'")
fi

# Note: We didn't chown to root:root, meaning that the file ownership is left as is
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
# Note: We didn't chown to root:root, meaning that the file ownership is left as is
# Note: We didn't chown to root:root, meaning that the file ownership is left as is with btrfs

@pothos
Copy link
Member

pothos commented Sep 1, 2025

  • I had to pull off some tricks to make it work, because bakery itself runs containers (i.e. Docker-in-Docker)

Maybe unpriv. Podman could help here because it can be nested many times (with the right invocation options).

lib/generate.sh Outdated
SYSTEMD_REPART_MKFS_OPTIONS_SQUASHFS="${SQUASHFS_ARG[*]}"
# Compression recommendations from https://erofs.docs.kernel.org/en/latest/mkfs.html
# and forcing a zero UUID for reproducibility (maybe we could also hash the name and version)
SYSTEMD_REPART_MKFS_OPTIONS_EROFS="-zlz4hc,12 -C65536 -Efragments,ztailpacking -Uclear --all-root"
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
SYSTEMD_REPART_MKFS_OPTIONS_EROFS="-zlz4hc,12 -C65536 -Efragments,ztailpacking -Uclear --all-root"
SYSTEMD_REPART_MKFS_OPTIONS_EROFS="-zlz4hc,12 -C65536 -Efragments,ztailpacking -U00000000-0000-0000-0000-000000000000 --all-root"

From #181

@danzatt
Copy link
Contributor Author

danzatt commented Sep 1, 2025

Are you sure we actually need Azure CLI? This is normally only needed when signing manually. In CI, you're more likely to use a managed identity or a token set in an environment variable. We currently used a managed identity for signing the boot binaries in CI. You may see errors about the az command not being found, but that's only because it tries that before other mechanisms.

@chewi We need it because the job isn't running on an Azure VM with attached managed identity (like in our CI). As the job is running in GitHub it's not possible to directly assign an Azure identity to the job/worker. Instead I configure the managed identity to trust this github repo to provide federated identity to Azure (I basically followed this guide https://learn.microsoft.com/en-us/azure/developer/github/connect-from-azure-openid-connect). Azure then gives the github action a one-time token to login into the managed identity. The PKCS11 token than uses az to authenticate (using the AzureCliCredential).

@chewi
Copy link
Contributor

chewi commented Sep 1, 2025

I figured managed identities wouldn't be possible, but I thought environment variables might be. They are, but service principal secrets are not recommended. However! I chatted with Copilot, and it has told me that if azure-keyvault-pkcs11 adds support for workload identities (one extra line), then it should work without Azure CLI. The only clunky part is you have to write the token out to a file first. It admits this part could be better, but it's still preferable over installing Azure CLI.

Azure::Identity::ChainedTokenCredential::Sources{
    std::make_shared<Azure::Identity::EnvironmentCredential>(),
    std::make_shared<Azure::Identity::WorkloadIdentityCredential>(),
    getClientSecretCredential(),
    std::make_shared<Azure::Identity::AzureCliCredential>(),
    std::make_shared<Azure::Identity::ManagedIdentityCredential>()});
permissions:
  id-token: write
  contents: read

jobs:
  authenticate:
    runs-on: ubuntu-latest
    steps:
      - name: Request OIDC token and save to file
        id: oidc-token
        uses: actions/github-script@v7
        with:
          script: |
            const fs = require('fs');
            const token = await github.actions.getIDToken();
            fs.writeFileSync('/tmp/azure-oidc-token', token);
            return '/tmp/azure-oidc-token';
      - name: Set AZURE_FEDERATED_TOKEN_FILE
        run: echo "AZURE_FEDERATED_TOKEN_FILE=${{ steps.oidc-token.outputs.result }}" >> $GITHUB_ENV

I'm also wondering whether you really need a Ubuntu image for this. The Flatcar SDK already has everything needed... except Docker? I'm confused about why that is needed though.

Add an extra signing step, which takes the built sysext image and wraps
it into a signed discoverable disk image (DDI) using systemd-repart.
Signing is performed inside Flatcar SDK container via Azure Keyvault
PKCS11 token. We build both, signed discoverable disk image and raw
unsigned image.
This adds a script which sets up Azure keyvault with a signing key,
managed identity and federated credentials to allow GitHub Actions
running within the context of this repo to perform sign operations using
the keyvault. This also adds an action which pulls the certificate from
keyvault and publishes it as a release.
@danzatt danzatt force-pushed the signed-ddi-sysexts branch from a4233dd to 2b6a03c Compare October 17, 2025 07:09
@danzatt
Copy link
Contributor Author

danzatt commented Oct 17, 2025

I've just reworked the PR. Now instead of building the sysext directly using systemd-repart, I keep the original workflow and then wrap the .raw file into a signed DDI using custom config for systemd-repart. This final signing/wrapping step is done in latest Flatcar SDK container. Currently I have to pull code from my WIP PR on scripts branch (adds +cryptsetup useflag to systemd, which is required for signing). I've also switched to OIDC authentication and workload identity as @chewi suggested.

We have to drop the last commit, once the scripts PR is merged and release is made. You can have a look at example signed sysexts on my fork https://github.com/danzatt/sysext-bakery

@chewi
Copy link
Contributor

chewi commented Oct 17, 2025

I've just reworked the PR. Now instead of building the sysext directly using systemd-repart, I keep the original workflow and then wrap the .raw file into a signed DDI using custom config for systemd-repart. This final signing/wrapping step is done in latest Flatcar SDK container.

Sounds good, but why did you have to do that?

@@ -0,0 +1,15 @@
#!/bin/bash

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
set -euo pipefail

We should stop script execution on errors

done

echo "Waiting 30 seconds for the permissions to apply. If the script fails, please wait and run it again."
sleep 30
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do we have any chance to poll for the completion instead of defining a wait time that may be exceeded?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I've just fixed it. There seems to be no way to poll if the rights are actually effective other than trying to use them. So, instead I've just added retry loop when actually using them.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not pushed yet, or?

@danzatt
Copy link
Contributor Author

danzatt commented Nov 11, 2025

I've just reworked the PR. Now instead of building the sysext directly using systemd-repart, I keep the original workflow and then wrap the .raw file into a signed DDI using custom config for systemd-repart. This final signing/wrapping step is done in latest Flatcar SDK container.

Sounds good, but why did you have to do that?

Two reasons:

  1. systemd-repart doesn't separate the build and sign steps and does both in a single invocation. This is a problem, since the OIDC token lasts just 5 minutes and would be expired by the time systemd-repart would build the sysext and get to signing (at least for larger sysexts). So I had to use a hack, where a refresher script would run in background every couple of minutes, which is not very neat.
  2. The current setup is battle-tested by our users, switching to systemd-repart might bring slight discrepancies to mkfs arguments. Although not highly probable, this might break some worloads. We want to still ship both signed (DDI) and unsigned (raw) versions of the sysexts anyway. So instead of running the image build twice for every sysext, we build the .raw as before and then just wrap it into a DDI.

Comment on lines +8 to +12
token=$(curl -sSL \
-H "Authorization: Bearer $ACTIONS_ID_TOKEN_REQUEST_TOKEN" \
"${ACTIONS_ID_TOKEN_REQUEST_URL}&audience=api://AzureADTokenExchange" \
| jq -r '.value')
echo "$token" > "$AZURE_FEDERATED_TOKEN_FILE"
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
token=$(curl -sSL \
-H "Authorization: Bearer $ACTIONS_ID_TOKEN_REQUEST_TOKEN" \
"${ACTIONS_ID_TOKEN_REQUEST_URL}&audience=api://AzureADTokenExchange" \
| jq -r '.value')
echo "$token" > "$AZURE_FEDERATED_TOKEN_FILE"
curl -sSL \
-H "Authorization: Bearer $ACTIONS_ID_TOKEN_REQUEST_TOKEN" \
"${ACTIONS_ID_TOKEN_REQUEST_URL}&audience=api://AzureADTokenExchange" \
| jq -r '.value' > "$AZURE_FEDERATED_TOKEN_FILE"

@pothos
Copy link
Member

pothos commented Nov 24, 2025

We need to think a bit more about the cert management.

I agree that the cert should be under /etc and ideally we would manage that cert with sysupdate as well. Can you add this?

Security wise that doesn't add any stronger protection than now but I also don't think that it would add any security if the rootfs is anyway unprotected - in this case an attacker can write other files that compromise the system (a systemd unit, for example). So it always depends on the threat model.

I would even suggest the following. Maybe we can get rid of the key vault and also use throwaway certs here instead of a global cert that we renew every now and then?
That would mean generating a throwaway cert for every sysupdate component, and have it be part of the same sysupdate configuration so that we don't need to think about cert lifecycles and secure key material storage.
This doesn't make the system weaker - in fact it makes it stronger because downgrade attacks aren't possible this way - but again, I don't think the security story is even a big factor. We rather want to have this out so that users can configure a policy if they want to.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

5 participants