-
Notifications
You must be signed in to change notification settings - Fork 105
initial pass at implementing a data summary tool for Python #8208
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Conversation
E2E Tests 🚀 |
5c7173e
to
5567433
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This looks like a great start. My main suggestion is to rename the API that routes requests to the variables comm to something more generic (and it can just query a single session variable at a time) so that we can use it to add more data querying tools without having to modify the Positron API each time
The other changes that we will want to make is to make the handling of these tool calls "asynchronous" so they they do not block the functioning of the variables comm — this means basically copying the pattern from the data explorer comm for the get_column_profiles request (and its corresponding return_column_profiles front-end API, see https://github.com/posit-dev/positron/blob/main/extensions/positron-python/python_files/posit/positron/data_explorer.py#L492-L519)
extensions/positron-python/python_files/posit/positron/variables_comm.py
Show resolved
Hide resolved
"type_display": column.type_display, | ||
"summary_stats": summary_stats, | ||
} | ||
) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It's a good starting point to have this tool surfaced in the variables comm — since computing summary stats or other computed profiles can be expensive (and thus block other messaging handling in the variables comm), we'll probably want to separate "expensive" requests (e.g. summary stats, frequency tables, histograms, etc.) from "cheap" requests (like asking for the schema), and make sure that the expensive requests and performed in an asynchronous-response pattern like the get_column_profiles
request in the data explorer. This doesn't all have to get done in this PR so can be follow up work
|
||
schema = TableSchema(columns=column_schemas) | ||
except Exception as e: | ||
raise ValueError(f"Failed to get schema: {e}") from e |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We should be able to push some of these tool-calling helpers into functions in data_explorer.py that we can unit test there
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I haven't added any unit tests, but I did add _get_column_profiles
and _get_table_schema_from_view
as helpers to data_explorer.py
src/vs/workbench/api/common/positron/extHost.positron.protocol.ts
Outdated
Show resolved
Hide resolved
29b64a0
to
94cb220
Compare
'path' parameter, also rename summarizeData function to make it more generic
274cd0f
to
b902acc
Compare
First pass at #7114
Provides Assistant with a
getDataSummary
tool, currently only implemented for Python, that provides a JSON structured summary of a data object by using the Positron API to communicate with the Variables Comm. I updated the variable's python backend to reuse existing functionality from the data explorer.I used the
inspectVariables
tool as a guide for retrieving info from the variables comm.Release Notes
New Features
Bug Fixes
QA Notes
@:data-explorer
@:assistant
@:variables
@:plots
@:viewer