Our cluster relocation function relies on a parallel argpartition function that doesn't have the same tie-breaking strategy than np.argpartition, and, besides, it chooses tie-breaks in a non-deterministic way.
It means that two consecutive KMeans.fit ran with the sklearn_numba_dpex engine, with the same seed, are not guaranteed to converge to the same list of centroids, but only to the same list of centroids up to a permutation. This is not user-friendly.
This can (rarely) cause sklearn test_predict_kmeans to fail.
This seems to be a solid argument to justify the cost of adding some synchronization in our argpartition kernels to at least ensure a deterministic tie-break strategy ?
Or maybe, sort the cluster centers after the fit in a deterministic way ?
WDYT ?