Skip to content

feat: collect once during display() in jupyter notebooks #1167

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
wants to merge 4 commits into
base: main
Choose a base branch
from

Conversation

timsaucer
Copy link
Contributor

@timsaucer timsaucer commented Jun 23, 2025

Which issue does this PR close?

None

Rationale for this change

By design in a Jupyter notebook display() calls both __repr__ and _repr_html_. This currently causes collect() on DataFrames to occur twice, which can lead to double the execution time during evaluation. This PR causes collect to only happen once.

What changes are included in this PR?

If we are in a jupyter notebook, we will cache the result of a __repr__ or _repr_html_ call. When the other call happens, it will consume the cached calls. This means that for display() in a jupyter notebook the collected data will be freed.

Are there any user-facing changes?

None.

@timsaucer timsaucer self-assigned this Jun 23, 2025
Copy link
Contributor

@alamb alamb left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Makes sense to me

Copy link
Contributor

@kylebarron kylebarron left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think this is a reasonable workaround because there are many Jupyter-protocol frontends that do not support displaying HTML output. This means that repr would be broken for the IPython console, for example

@kylebarron
Copy link
Contributor

By design in a Jupyter notebook display() calls both __repr__ and _repr_html_.

Ref https://discourse.jupyter.org/t/find-out-if-my-code-runs-inside-a-notebook-or-jupyter-lab/6935/8

You might be able to look in the IPython config to see what's running, but this answer is 10+ years old and might've changed https://stackoverflow.com/a/24937408

@timsaucer timsaucer marked this pull request as draft June 24, 2025 00:25
@timsaucer
Copy link
Contributor Author

I don't think this is a reasonable workaround because there are many Jupyter-protocol frontends that do not support displaying HTML output. This means that repr would be broken for the IPython console, for example

Thanks for the feedback! I changed to check for the environment as you suggested and tested in jupyter, ipython console, and regular python console.

@kylebarron
Copy link
Contributor

As mentioned in a comment on SO, that fails in jupyter console, and I verified that still fails:

image

Copy link
Contributor

@kylebarron kylebarron left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think that looks a lot more stable, even though it's kinda annoying

@timsaucer timsaucer marked this pull request as ready for review June 25, 2025 11:27
@timsaucer timsaucer force-pushed the feat/collect-once-in-jupyter-notebook branch from 8d65e99 to ae65240 Compare June 25, 2025 13:02
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants