Description
Currently we are using the name:tag of an image, which has some issues
- it fails to reload changed images, such as
:latest
- it fails to recognize re-tagged images, as duplicates
Previously we have also used digests, which is another can of worms
- they are not preserved, when saving to an archive
- they vary depending on the registry, require network
So it would be better to add support for the "id" to our image loading code.
This is calculated based on the contents of the image itself, and also in CRI.
It looks something like this: sha256:c55b0f125dc65ee6a9a78307d9a2dfc446e96af7477ca29ddd4945fd398cc698
Instead of busybox:latest
(short name) or docker.io/library/busybox:latest
(canonical name)
Then we can compare this with crictl images
(or docker images
, when not using CRI)
So we should avoid digests, which is confusing because they look similar:
$ docker pull busybox:latest
latest: Pulling from library/busybox
Digest: sha256:be4684e4004560b2cd1f12148b7120b0ea69c385bcc9b12a637537a2c60f97fb
Status: Image is up to date for busybox:latest
docker.io/library/busybox:latest
And instead use the "image id", for separating two images from eachother:
$ docker images busybox:latest
REPOSITORY TAG IMAGE ID CREATED SIZE
busybox latest c55b0f125dc6 4 days ago 1.24MB
$ docker image inspect busybox:latest | head
[
{
"Id": "sha256:c55b0f125dc65ee6a9a78307d9a2dfc446e96af7477ca29ddd4945fd398cc698",
"RepoTags": [
"busybox:latest"
],
"RepoDigests": [
"busybox@sha256:be4684e4004560b2cd1f12148b7120b0ea69c385bcc9b12a637537a2c60f97fb"
],
"Parent": "",
Note that the image id changes with the architecture, while the repo digest remains the same.
$ docker images busybox:latest
REPOSITORY TAG IMAGE ID CREATED SIZE
busybox latest f6467c4e9e15 4 days ago 1.4MB
$ docker image inspect busybox:latest | head
[
{
"Id": "sha256:f6467c4e9e1526a6e856444fde786794014c628ab8d64a2eaca5ca1a95ff13de",
"RepoTags": [
"busybox:latest"
],
"RepoDigests": [
"busybox@sha256:be4684e4004560b2cd1f12148b7120b0ea69c385bcc9b12a637537a2c60f97fb"
],
"Parent": "",
We can still use digests for the kicbase of course, this was about handling in cache
and image
.
--base-image='gcr.io/k8s-minikube/kicbase:v0.0.22@sha256:7cc3a3cb6e51c628d8ede157ad9e1f797e8d22a1b3cedc12d3f1999cb52f962e'
Some pseudo-code
Docker
"github.com/docker/docker/client"
cli, err := client.NewClientWithOpts(client.FromEnv)
cli.NegotiateAPIVersion(ctx)
img, _, err := cli.ImageInspectWithRaw(ctx, ref)
id := img.ID
similar to the CLI:
docker image inspect --format "{{ .Id }}" $ref
Podman
(no daemon, no api - we just call the CLI instead.)
sudo podman image inspect --format "sha256:{{ .Id }}" $ref
podman doesn't have sha256:
prefix in id? add it, for comparison
need to make sure to add it to the crictl
output, when using cri-o
Tarball
"github.com/google/go-containerregistry/pkg/v1/tarball"
img, err := tarball.ImageFromPath(path, nil)
cn, err := img.ConfigName()
id := cn.String()
Implementation details:
- The Id is actually the checksum of the config file:
$ docker save busybox:latest > busybox_latest.tar
$ tar -tf busybox_latest.tar
a355ae7461fdd43484ed16e7e48620ff19b187adc03bcd4b5cfd5ba3ce2ee670/
a355ae7461fdd43484ed16e7e48620ff19b187adc03bcd4b5cfd5ba3ce2ee670/VERSION
a355ae7461fdd43484ed16e7e48620ff19b187adc03bcd4b5cfd5ba3ce2ee670/json
a355ae7461fdd43484ed16e7e48620ff19b187adc03bcd4b5cfd5ba3ce2ee670/layer.tar
c55b0f125dc65ee6a9a78307d9a2dfc446e96af7477ca29ddd4945fd398cc698.json
manifest.json
repositories
$ tar -xf busybox_latest.tar --wildcards "*.json"
$ jq . manifest.json
[
{
"Config": "c55b0f125dc65ee6a9a78307d9a2dfc446e96af7477ca29ddd4945fd398cc698.json",
"RepoTags": [
"busybox:latest"
],
"Layers": [
"a355ae7461fdd43484ed16e7e48620ff19b187adc03bcd4b5cfd5ba3ce2ee670/layer.tar"
]
}
]
$ sha256sum c55b0f125dc65ee6a9a78307d9a2dfc446e96af7477ca29ddd4945fd398cc698.json
c55b0f125dc65ee6a9a78307d9a2dfc446e96af7477ca29ddd4945fd398cc698 c55b0f125dc65ee6a9a78307d9a2dfc446e96af7477ca29ddd4945fd398cc698.json
- It is possible to have more than one image per archive, so we
needcan give the tag when getting the id (ornil
, to get all) - Since we only need the config and not the layers, the operation is still fast even for large images (such as
python:latest
)
Previous issues: