Skip to content

initial pass at implementing a data summary tool for Python #8208

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
wants to merge 7 commits into
base: main
Choose a base branch
from

Conversation

melissa-barca
Copy link
Contributor

@melissa-barca melissa-barca commented Jun 19, 2025

First pass at #7114

Provides Assistant with a getDataSummary tool, currently only implemented for Python, that provides a JSON structured summary of a data object by using the Positron API to communicate with the Variables Comm. I updated the variable's python backend to reuse existing functionality from the data explorer.

I used the inspectVariables tool as a guide for retrieving info from the variables comm.

image

Release Notes

New Features

  • N/A

Bug Fixes

  • N/A

QA Notes

@:data-explorer
@:assistant
@:variables
@:plots
@:viewer

@melissa-barca melissa-barca requested a review from wesm June 19, 2025 21:15
Copy link

github-actions bot commented Jun 19, 2025

E2E Tests 🚀
This PR will run tests tagged with: @:critical @:data-explorer @:assistant @:variables @:plots @:viewer

readme  valid tags

Copy link
Contributor

@wesm wesm left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This looks like a great start. My main suggestion is to rename the API that routes requests to the variables comm to something more generic (and it can just query a single session variable at a time) so that we can use it to add more data querying tools without having to modify the Positron API each time

The other changes that we will want to make is to make the handling of these tool calls "asynchronous" so they they do not block the functioning of the variables comm — this means basically copying the pattern from the data explorer comm for the get_column_profiles request (and its corresponding return_column_profiles front-end API, see https://github.com/posit-dev/positron/blob/main/extensions/positron-python/python_files/posit/positron/data_explorer.py#L492-L519)

"type_display": column.type_display,
"summary_stats": summary_stats,
}
)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's a good starting point to have this tool surfaced in the variables comm — since computing summary stats or other computed profiles can be expensive (and thus block other messaging handling in the variables comm), we'll probably want to separate "expensive" requests (e.g. summary stats, frequency tables, histograms, etc.) from "cheap" requests (like asking for the schema), and make sure that the expensive requests and performed in an asynchronous-response pattern like the get_column_profiles request in the data explorer. This doesn't all have to get done in this PR so can be follow up work


schema = TableSchema(columns=column_schemas)
except Exception as e:
raise ValueError(f"Failed to get schema: {e}") from e
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We should be able to push some of these tool-calling helpers into functions in data_explorer.py that we can unit test there

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I haven't added any unit tests, but I did add _get_column_profiles and _get_table_schema_from_view as helpers to data_explorer.py

@melissa-barca melissa-barca force-pushed the feature/ai-data branch 2 times, most recently from 29b64a0 to 94cb220 Compare June 27, 2025 04:03
@melissa-barca melissa-barca requested a review from jmcphers June 27, 2025 04:42
@melissa-barca melissa-barca marked this pull request as ready for review June 27, 2025 04:49
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants