`test_predict_kmeans` sklearn test can sometimes fail because of non-deterministic cluster relocation

Our cluster relocation function relies on a parallel `argpartition` function that doesn't have the same tie-breaking strategy than `np.argpartition`, and, besides, it chooses tie-breaks in a non-deterministic way.

It means that two consecutive `KMeans.fit` ran with the `sklearn_numba_dpex` engine, with the same seed, are not guaranteed to converge to the same list of centroids, but only to the same list of centroids up to a permutation. This is not user-friendly.

This can (rarely) cause sklearn [`test_predict_kmeans`](https://github.com/scikit-learn/scikit-learn/blob/main/sklearn/cluster/tests/test_k_means.py#L621) to fail.

This seems to be a solid argument to justify the cost of adding some synchronization in our argpartition kernels to at least ensure a deterministic tie-break strategy ?

Or maybe, sort the cluster centers after the fit in a deterministic way ?

WDYT ?

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

`test_predict_kmeans` sklearn test can sometimes fail because of non-deterministic cluster relocation #97

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

test_predict_kmeans sklearn test can sometimes fail because of non-deterministic cluster relocation #97

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions

`test_predict_kmeans` sklearn test can sometimes fail because of non-deterministic cluster relocation #97