Skip to content

Skipped flaky tests #300

@asmeurer

Description

@asmeurer

Several tests are completely skipped right now because they are "flaky".

  • test_reshape
  • test_std
  • test_var
  • test_remainder

This is a pretty high priority issue because these functions are effectively completely untested, even though they appear to be tested.

Tests should be written in such a way that they aren't flaky, for instance, by using high numerical tolerances (or if necessary, avoiding values testing entirely).

Note that health checks for timeouts should just be skipped, and health checks for filtering too much should be fixed by fixing the strategy.

EDIT:

Activity

ev-br

ev-br commented on Nov 17, 2024

@ev-br
Member

Looking at test_std, https://github.com/data-apis/array-api-tests/blob/master/array_api_tests/test_statistical_functions.py#L262, it does not seem to attempt any value testing. Then what is flaky, assert_dtype or assert_keepdimable_shape?

asmeurer

asmeurer commented on Nov 18, 2024

@asmeurer
MemberAuthor

I've no idea what's flaky with any of these. The first order of business would to remove that decorator and figure out why the test was failing. It's also possible that some of these were only flaky with certain libraries.

asmeurer

asmeurer commented on Nov 18, 2024

@asmeurer
MemberAuthor

Also, it's possible the flakyness was fixed and the skip was never removed. It looks like skip for std was added in #233 (with no explanation) if you want to check previous versions.

At best, if the test seems to be passing, we can just remove the skip and see if any upstream failures are found. Like I mentioned in another issue, it's really easy to just revert changes here if they break stuff since we don't even have releases, so I wouldn't be too worried about that.

ev-br

ev-br commented on Nov 23, 2024

@ev-br
Member

test_reshape is fixed in gh-319

ev-br

ev-br commented on Nov 25, 2024

@ev-br
Member

Caught a test_std failure with array_api_compat.numpy:

array_api_tests/test_statistical_functions.py::test_std FAILED                   [100%]

    @given(
>       x=hh.arrays(
            dtype=hh.real_floating_dtypes,
            shape=hh.shapes(min_side=1),
            elements={"allow_nan": False},
        ).filter(lambda x: math.prod(x.shape) >= 2),
        data=st.data(),
    )

array_api_tests/test_statistical_functions.py:254: 
_ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ 

    def inconsistent_generation():
>       raise FlakyStrategyDefinition(
            "Inconsistent data generation! Data generation behaved differently "
            "between different runs. Is your data generation depending on external "
            "state?"
        )
E       hypothesis.errors.FlakyStrategyDefinition: Inconsistent data generation! Data generation behaved differently between different runs. Is your data generation depending on external state?

../../miniforge3/envs/array-api-tests/lib/python3.11/site-packages/hypothesis/internal/conjecture/datatree.py:52: FlakyStrategyDefinition
-------------------------------------- Hypothesis --------------------------------------
You can add @seed(146745493194750825545715057348996307346) to this test or run pytest with --hypothesis-seed=146745493194750825545715057348996307346 to reproduce this failure.
=================================== warnings summary ===================================
array_api_tests/test_statistical_functions.py: 59 warnings
  /home/br/miniforge3/envs/array-api-tests/lib/python3.11/site-packages/numpy/_core/_methods.py:227: RuntimeWarning: Degrees of freedom <= 0 for slice
    ret = _var(a, axis=axis, dtype=dtype, out=out, ddof=ddof,

-- Docs: https://docs.pytest.org/en/stable/how-to/capture-warnings.html
=============================== short test summary info ================================
FAILED array_api_tests/test_statistical_functions.py::test_std - hypothesis.errors.FlakyStrategyDefinition: Inconsistent data generation! Data gener...
===================== 1 failed, 7 deselected, 59 warnings in 3.14s =====================
(array-api-tests) br@gonzales:~/repos/array-api-tests$ 
asmeurer

asmeurer commented on Nov 25, 2024

@asmeurer
MemberAuthor

I can reproduce that with

ARRAY_API_TESTS_MODULE=array_api_compat.numpy pytest --disable-warnings array_api_tests/test_statistical_functions.py -k std -v --hypothesis-seed=146745493194750825545715057348996307346 --max-examples=10000
asmeurer

asmeurer commented on Nov 25, 2024

@asmeurer
MemberAuthor

I can't tell what is causing it. None of the strategies seem to be that unusual. The only thing I see that's a little different from the other tests is that the input array is filtered to have at least 2 elements, but that shouldn't be causing this error.

Unfortunately, hypothesis makes it quite hard to tell what's going on with this error. The only thing I can suggest would be to refactor the input strategies, e.g., to use shared instead of data.draw. Otherwise, we may want to report this upstream on the hypothesis repo, and see if the hypothesis devs can offer any advice. It may also just be a bug in hypothesis.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Metadata

Metadata

Assignees

No one assigned

    Labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions

      Skipped flaky tests · Issue #300 · data-apis/array-api-tests