Skip to content

test_predict_kmeans sklearn test can sometimes fail because of non-deterministic cluster relocation #97

@fcharras

Description

@fcharras

Our cluster relocation function relies on a parallel argpartition function that doesn't have the same tie-breaking strategy than np.argpartition, and, besides, it chooses tie-breaks in a non-deterministic way.

It means that two consecutive KMeans.fit ran with the sklearn_numba_dpex engine, with the same seed, are not guaranteed to converge to the same list of centroids, but only to the same list of centroids up to a permutation. This is not user-friendly.

This can (rarely) cause sklearn test_predict_kmeans to fail.

This seems to be a solid argument to justify the cost of adding some synchronization in our argpartition kernels to at least ensure a deterministic tie-break strategy ?

Or maybe, sort the cluster centers after the fit in a deterministic way ?

WDYT ?

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions