-
-
Notifications
You must be signed in to change notification settings - Fork 19.5k
BUG: Fix for Issue 53028 - Fix DictCursor returning column names #63503
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Conversation
e78320d to
d6a9f62
Compare
d6a9f62 to
60e1ef2
Compare
|
Hi @rhshadrach, good day! could you please review this when you get a chance. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Pull request overview
This PR fixes Issue #53028 where pd.read_sql_query() with dictionary-returning cursors (like pymysql's DictCursor) incorrectly populated DataFrames with column names instead of actual values. The root cause was that lib.to_object_array_tuples() expected tuples/lists but received dictionaries, and iterating over dictionaries yields keys rather than values.
Key Changes:
- Modified
_convert_arrays_to_dataframe()inpandas/io/sql.pyto detect dict-like rows and convert them to tuples in the correct column order - Added a test case to verify the fix works correctly with dictionary-based cursor results
Reviewed changes
Copilot reviewed 2 out of 2 changed files in this pull request and generated 3 comments.
| File | Description |
|---|---|
| pandas/io/sql.py | Added logic to detect and convert dictionary rows to tuples before processing in _convert_arrays_to_dataframe() |
| pandas/tests/io/test_sql.py | Added test case test_convert_arrays_to_dataframe_with_dictcursor() to verify dictionary cursor handling and imported _convert_arrays_to_dataframe for testing |
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
pandas/tests/io/test_sql.py
Outdated
| def test_convert_arrays_to_dataframe_with_dictcursor(): | ||
| """ | ||
| Test for GH#53028: DictCursor returns dictionaries which cause | ||
| _convert_arrays_to_dataframe to populate DataFrame with column names | ||
| instead of actual values. | ||
| """ | ||
| # Simulate DictCursor output (e.g., from pymysql with DictCursor) | ||
| dict_data = [ | ||
| {"id": 117, "value": "ABCDEF", "state_id": 5}, | ||
| {"id": 163, "value": "DEFRDC", "state_id": 5}, | ||
| ] | ||
| columns = ["id", "value", "state_id"] | ||
|
|
||
| result = _convert_arrays_to_dataframe(dict_data, columns) | ||
|
|
||
| expected = DataFrame( | ||
| [[117, "ABCDEF", 5], [163, "DEFRDC", 5]], | ||
| columns=columns, | ||
| ) | ||
|
|
||
| tm.assert_frame_equal(result, expected) |
Copilot
AI
Jan 6, 2026
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The test only covers the happy path where all dictionaries have all the expected keys. Consider adding test cases for edge cases such as:
- Empty data list
- Dictionaries missing some columns (to verify error handling)
- Column order preservation (dictionaries with keys in a different order than the columns parameter)
- Mixed types in values (NULL/None values, different data types)
These additional tests would help ensure the fix is robust and handles various real-world scenarios that might occur with different database cursors.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
modified to test via the user API
pandas/io/sql.py
Outdated
| # Fix for GH#53028: DictCursor returns dictionaries which cause | ||
| # to_object_array_tuples to populate DataFrame with column names instead | ||
| # of actual values. Convert dictionaries to tuples in column order. | ||
| if data and len(data) > 0 and is_dict_like(data[0]): | ||
| data = [tuple(row[col] for col in columns) for row in data] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The comment repeats a lot of the code. I think we can make this more concise.
| # Fix for GH#53028: DictCursor returns dictionaries which cause | |
| # to_object_array_tuples to populate DataFrame with column names instead | |
| # of actual values. Convert dictionaries to tuples in column order. | |
| if data and len(data) > 0 and is_dict_like(data[0]): | |
| data = [tuple(row[col] for col in columns) for row in data] | |
| if data and len(data) > 0 and is_dict_like(data[0]): | |
| # GH#53028 - handle cursors that return dictionaries | |
| data = [tuple(row[col] for col in columns) for row in data] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
done
pandas/tests/io/test_sql.py
Outdated
| drop_table(table_name, sqlite_buildin) | ||
|
|
||
|
|
||
| def test_convert_arrays_to_dataframe_with_dictcursor(): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Instead of testing the internal function, can you test via the user API. Perhaps something like #52437 (comment)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
done
c5a1ad3 to
60e1ef2
Compare
remove redundant len(data) Co-authored-by: Copilot <[email protected]>
… values in read_sql_query
doc/source/whatsnew/vX.X.X.rstfile if fixing a bug or adding a new feature.AGENTS.md.Problem:
When using pymysql with DictCursor (or similar dictionary-returning cursors), pd.read_sql_query() returned a DataFrame where cells contained column names instead of actual values.
RCA:
DictCursor returns rows as dictionaries: [{'id': 117, 'value': 'ABCDEF'}, ...]
read_sql_query() calls _convert_arrays_to_dataframe(), which calls lib.to_object_array_tuples(data)
to_object_array_tuples() expects tuples/lists.
When it receives dictionaries, it does the following:
Solution:
Check if data contains dictionaries (using is_dict_like() on the first row)
If dictionaries are found, converts each row to a tuple in column order: tuple(row[col] for col in columns)
Then passes the converted data to to_object_array_tuples()