Skip to content

zimdump: new function to analyze ZIM content size with hints about compression factor #957

@benoit74

Description

@benoit74

For openzim/mwoffliner#2180 I had to analyze the ZIM content.

I did it with python-libzim binding because I'm way more comfortable with it.

The struggle I had (which luckily was not blocker) is that while it is possible to have access to an Item size (uncompressed AFAIK), I did not found any way to get its compressed size. It was hence hard to be 100% sure where the increased ZIM size went from.

Is that mostly normal since there is no such compressed size, because we only compress the cluster, not individual items? Or is it just something which is missing in the binding(s)? Should I have used another tool / zimtool to do this analysis?

At least having a rough estimation of compression factor for every item would help to analyze a bit deeper such situations. Maybe simply exposing clusters, and which cluster is used by which item, and every cluster compression factor (compressed and uncompressed size for instance) would be sufficient.

Metadata

Metadata

Assignees

No one assigned

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions