Skip to content

feat: dataframe string formatter #1170

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 4 commits into from
Jun 25, 2025

Conversation

timsaucer
Copy link
Contributor

Which issue does this PR close?

None

Rationale for this change

We recently added a html formatter, which is useful for generating html tables as a customer. This adds the similar feature to __repr__ to allow for text based formatting.

What changes are included in this PR?

Add method to formatter to format a dataframe as a string.
Renamed html_formatter to dataframe_formatter.

Are there any user-facing changes?

Yes, the html_formatter module has been renamed to dataframe_formatter. A deprecation warning exists if any users try the old method. This was only released a couple of weeks ago so I doubt there has been time for adoption yet.

@timsaucer timsaucer requested a review from kosiew June 24, 2025 19:56
@timsaucer timsaucer changed the title Feat/dataframe str formatter feat: dataframe string formatter Jun 24, 2025
@timsaucer timsaucer force-pushed the feat/dataframe_str_formatter branch from a9b9679 to f04b973 Compare June 25, 2025 00:24
Copy link
Contributor

@kosiew kosiew left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good to me.
Left some suggestions.

Comment on lines 1115 to 1127
@staticmethod
def default_str_repr(
batches: list[pa.RecordBatch],
schema: pa.Schema,
has_more: bool,
table_uuid: str | None = None,
) -> str:
"""Return the default string representation of a DataFrame.

This method is used by the default formatter and implemented in Rust for
performance reasons.
"""
return DataFrameInternal.default_str_repr(batches, schema, has_more, table_uuid)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

default_str_repr is at the bottom of the class.

It might be clearer to place it alongside other repr/formatter methods (repr, repr_html) for cohesion.

Comment on lines +263 to +265
@classmethod
def is_styles_loaded(cls) -> bool:
"""Check if HTML styles have been loaded in the current session.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I thought of a better idea to implement this
#1171

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Excellent! Maybe we open a separate PR for this?

"The module 'html_formatter' is deprecated and will be removed in the next release."
"Please use 'dataframe_formatter' instead.",
DeprecationWarning,
stacklevel=2,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Using stacklevel=2 may not point users to their import site. Consider stacklevel=3 or more, so the warning originates where they imported it.

@timsaucer timsaucer merged commit 03921d9 into apache:main Jun 25, 2025
17 checks passed
@timsaucer timsaucer deleted the feat/dataframe_str_formatter branch June 25, 2025 12:20
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants