Skip to content

Getting the Id of an image and not just the name:tag #11347

Open
@afbjorklund

Description

@afbjorklund

Currently we are using the name:tag of an image, which has some issues

  • it fails to reload changed images, such as :latest
  • it fails to recognize re-tagged images, as duplicates

Previously we have also used digests, which is another can of worms

  • they are not preserved, when saving to an archive
  • they vary depending on the registry, require network

So it would be better to add support for the "id" to our image loading code.

This is calculated based on the contents of the image itself, and also in CRI.

It looks something like this: sha256:c55b0f125dc65ee6a9a78307d9a2dfc446e96af7477ca29ddd4945fd398cc698

Instead of busybox:latest (short name) or docker.io/library/busybox:latest (canonical name)

Then we can compare this with crictl images

(or docker images, when not using CRI)

So we should avoid digests, which is confusing because they look similar:

$ docker pull busybox:latest
latest: Pulling from library/busybox
Digest: sha256:be4684e4004560b2cd1f12148b7120b0ea69c385bcc9b12a637537a2c60f97fb
Status: Image is up to date for busybox:latest
docker.io/library/busybox:latest

And instead use the "image id", for separating two images from eachother:

$ docker images busybox:latest
REPOSITORY   TAG       IMAGE ID       CREATED      SIZE
busybox      latest    c55b0f125dc6   4 days ago   1.24MB
$ docker image inspect busybox:latest | head
[
    {
        "Id": "sha256:c55b0f125dc65ee6a9a78307d9a2dfc446e96af7477ca29ddd4945fd398cc698",
        "RepoTags": [
            "busybox:latest"
        ],
        "RepoDigests": [
            "busybox@sha256:be4684e4004560b2cd1f12148b7120b0ea69c385bcc9b12a637537a2c60f97fb"
        ],
        "Parent": "",

Note that the image id changes with the architecture, while the repo digest remains the same.

$ docker images busybox:latest
REPOSITORY   TAG       IMAGE ID       CREATED      SIZE
busybox      latest    f6467c4e9e15   4 days ago   1.4MB
$  docker image inspect busybox:latest | head
[
    {
        "Id": "sha256:f6467c4e9e1526a6e856444fde786794014c628ab8d64a2eaca5ca1a95ff13de",
        "RepoTags": [
            "busybox:latest"
        ],
        "RepoDigests": [
            "busybox@sha256:be4684e4004560b2cd1f12148b7120b0ea69c385bcc9b12a637537a2c60f97fb"
        ],
        "Parent": "",

We can still use digests for the kicbase of course, this was about handling in cache and image.

      --base-image='gcr.io/k8s-minikube/kicbase:v0.0.22@sha256:7cc3a3cb6e51c628d8ede157ad9e1f797e8d22a1b3cedc12d3f1999cb52f962e'

Some pseudo-code

Docker

"github.com/docker/docker/client"
	cli, err := client.NewClientWithOpts(client.FromEnv)
	cli.NegotiateAPIVersion(ctx)
	img, _, err := cli.ImageInspectWithRaw(ctx, ref)
	id := img.ID

similar to the CLI:

    docker image inspect --format "{{ .Id }}" $ref

Podman

(no daemon, no api - we just call the CLI instead.)

    sudo podman image inspect --format "sha256:{{ .Id }}" $ref

podman doesn't have sha256: prefix in id? add it, for comparison

need to make sure to add it to the crictl output, when using cri-o

Tarball

"github.com/google/go-containerregistry/pkg/v1/tarball"
             img, err := tarball.ImageFromPath(path, nil)
		cn, err := img.ConfigName()
		id := cn.String()

Implementation details:

  • The Id is actually the checksum of the config file:
$ docker save busybox:latest > busybox_latest.tar
$ tar -tf busybox_latest.tar 
a355ae7461fdd43484ed16e7e48620ff19b187adc03bcd4b5cfd5ba3ce2ee670/
a355ae7461fdd43484ed16e7e48620ff19b187adc03bcd4b5cfd5ba3ce2ee670/VERSION
a355ae7461fdd43484ed16e7e48620ff19b187adc03bcd4b5cfd5ba3ce2ee670/json
a355ae7461fdd43484ed16e7e48620ff19b187adc03bcd4b5cfd5ba3ce2ee670/layer.tar
c55b0f125dc65ee6a9a78307d9a2dfc446e96af7477ca29ddd4945fd398cc698.json
manifest.json
repositories
$ tar -xf busybox_latest.tar --wildcards "*.json"
$ jq . manifest.json 
[
  {
    "Config": "c55b0f125dc65ee6a9a78307d9a2dfc446e96af7477ca29ddd4945fd398cc698.json",
    "RepoTags": [
      "busybox:latest"
    ],
    "Layers": [
      "a355ae7461fdd43484ed16e7e48620ff19b187adc03bcd4b5cfd5ba3ce2ee670/layer.tar"
    ]
  }
]
$ sha256sum c55b0f125dc65ee6a9a78307d9a2dfc446e96af7477ca29ddd4945fd398cc698.json
c55b0f125dc65ee6a9a78307d9a2dfc446e96af7477ca29ddd4945fd398cc698  c55b0f125dc65ee6a9a78307d9a2dfc446e96af7477ca29ddd4945fd398cc698.json
  • It is possible to have more than one image per archive, so we need can give the tag when getting the id (or nil, to get all)
  • Since we only need the config and not the layers, the operation is still fast even for large images (such as python:latest)

Previous issues:

Metadata

Metadata

Assignees

No one assigned

    Labels

    area/imageIssues/PRs related to the minikube image subcommandkind/featureCategorizes issue or PR as related to a new feature.lifecycle/frozenIndicates that an issue or PR should not be auto-closed due to staleness.priority/backlogHigher priority than priority/awaiting-more-evidence.

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions