-
Notifications
You must be signed in to change notification settings - Fork 766
Description
Flyte & Flytekit version
v1.15.3
Describe the bug
Flytekit now uses msgpack for serialization. The DictTransformer specifically uses the MessagePackEncoder which uses msgpack.packb. A major problem with this is that msgpack does not enforce any ordering when serializing dictionaries.
Expected behavior
Flyte Propeller should not produce different cache keys for different orderings of a dictionary. If possible, Flyte Propeller should unmarshal the literal like it does for attribute resolution before computing the cache keys.
Additional context to reproduce
Using the repro described in #2924, if you reorder the dictionary, you get different cache keys:
>>> calculate_cache_key_multiple_times(dict(a=1, b=2, c=3))
task_name-cache_version-4301e138bbdea0ddd2bf116844d2e6f9 1000
Name: count, dtype: int64
>>> calculate_cache_key_multiple_times(dict(b=2, a=1, c=3))
task_name-cache_version-4de598fac8ead0a6caa30b3881caebb9 1000
Name: count, dtype: int64
It's more common that we see Flyte inputs reordering dictionaries when they are large. But we've found that the ordering is non-deterministic (as expected of Python dictionaries).
In combination with this, when Flyte Propeller calculates a hash for the cache key, it uses the raw input literals which are likely the msgpack-serialized artifacts instead of the actual value of the object, which ends up causing us cache misses non-deterministically for the same inputs.
We have a workaround to override the DictTransformer to use ormsgpack as shown below, but there are some limitations to ormsgpack since it's a bit slower and may not support non-string keys.
class DictTransformer(FlyteDictTransformer):
@staticmethod
async def dict_to_binary_literal(
ctx: FlyteContext, v: dict, python_type: Type[dict], allow_pickle: bool
) -> Literal:
"""
Converts a Python dictionary to a Flyte-specific ``Literal`` using _sorted_ MessagePack encoding.
Falls back to Flyte's default dictionary encoding if encoding fails.
"""
try:
# Handle dictionaries with non-string keys (e.g., Dict[int, Type])
msgpack_bytes = ormsgpack.packb(v, option=ormsgpack.OPT_SORT_KEYS)
return Literal(scalar=Scalar(binary=Binary(value=msgpack_bytes, tag=MESSAGEPACK)))
except TypeError:
log.error("Error converting dictionary to binary literal using ormsgpack")
return await FlyteDictTransformer.dict_to_binary_literal(
ctx, v, python_type, allow_pickle
)
Screenshots
No response
Are you sure this issue hasn't been raised already?
- Yes
Have you read the Code of Conduct?
- Yes
Metadata
Metadata
Assignees
Labels
Type
Projects
Status