Skip to content

text diffs on huge files are slow #12406

Open
@d-b-w

Description

@d-b-w

Ok, of course they are... But usually* when a dev commits the test, the test is passing, so the dev may not notice how preposterous a diff they are inadvertently asking for.

In my case, a 5 year old test happened to be comparing text files about 2million lines long as strings, functionally:

    assert fh1.read() == fh2.read()

This was fine, until the order of some fields changed and the test started hanging in CI for hours. The right thing to do is to fix this annoying test, but I thought that it might also make sense to push a fix up to pytest.

tl;dr - _diff_text() already knows the verbosity level - would it make sense to truncate the length of the diff calculated in "non verbose" mode? By default, the diff is truncated to the first 10 lines, so _diff_text() is doing extra computation that the caller will never see or use.

Metadata

Metadata

Assignees

No one assigned

    Labels

    topic: rewriterelated to the assertion rewrite mechanismtype: performanceperformance or memory problem/improvement

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions