Skip to content

Conversation

@AMRUTH-ASHOK
Copy link

@AMRUTH-ASHOK AMRUTH-ASHOK commented Dec 29, 2025

… values in read_sql_query

Problem:
When using pymysql with DictCursor (or similar dictionary-returning cursors), pd.read_sql_query() returned a DataFrame where cells contained column names instead of actual values.

RCA:
DictCursor returns rows as dictionaries: [{'id': 117, 'value': 'ABCDEF'}, ...]
read_sql_query() calls _convert_arrays_to_dataframe(), which calls lib.to_object_array_tuples(data)
to_object_array_tuples() expects tuples/lists.

When it receives dictionaries, it does the following:

  • Iterating a dict yields keys, not values
  • Converting a dict to a tuple gives tuple({'id': 117, 'value': 'ABCDEF'}) gives ('id', 'value')
  • Result: column names instead of values

Solution:
Check if data contains dictionaries (using is_dict_like() on the first row)
If dictionaries are found, converts each row to a tuple in column order: tuple(row[col] for col in columns)
Then passes the converted data to to_object_array_tuples()

@AMRUTH-ASHOK AMRUTH-ASHOK force-pushed the fix-dictcursor-gh53028 branch from e78320d to d6a9f62 Compare December 29, 2025 16:18
@AMRUTH-ASHOK AMRUTH-ASHOK force-pushed the fix-dictcursor-gh53028 branch from d6a9f62 to 60e1ef2 Compare December 29, 2025 16:19
@AMRUTH-ASHOK AMRUTH-ASHOK changed the title Fix for Issue 53028: Fix DictCursor returning column names BUG: Fix for Issue 53028 - Fix DictCursor returning column names Dec 29, 2025
@AMRUTH-ASHOK
Copy link
Author

Hi @rhshadrach, good day! could you please review this when you get a chance.

@fangchenli fangchenli requested a review from Copilot January 6, 2026 00:30
@fangchenli fangchenli added Bug IO SQL to_sql, read_sql, read_sql_query Regression Functionality that used to work in a prior pandas version labels Jan 6, 2026
Copy link

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR fixes Issue #53028 where pd.read_sql_query() with dictionary-returning cursors (like pymysql's DictCursor) incorrectly populated DataFrames with column names instead of actual values. The root cause was that lib.to_object_array_tuples() expected tuples/lists but received dictionaries, and iterating over dictionaries yields keys rather than values.

Key Changes:

  • Modified _convert_arrays_to_dataframe() in pandas/io/sql.py to detect dict-like rows and convert them to tuples in the correct column order
  • Added a test case to verify the fix works correctly with dictionary-based cursor results

Reviewed changes

Copilot reviewed 2 out of 2 changed files in this pull request and generated 3 comments.

File Description
pandas/io/sql.py Added logic to detect and convert dictionary rows to tuples before processing in _convert_arrays_to_dataframe()
pandas/tests/io/test_sql.py Added test case test_convert_arrays_to_dataframe_with_dictcursor() to verify dictionary cursor handling and imported _convert_arrays_to_dataframe for testing

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Comment on lines 4398 to 4418
def test_convert_arrays_to_dataframe_with_dictcursor():
"""
Test for GH#53028: DictCursor returns dictionaries which cause
_convert_arrays_to_dataframe to populate DataFrame with column names
instead of actual values.
"""
# Simulate DictCursor output (e.g., from pymysql with DictCursor)
dict_data = [
{"id": 117, "value": "ABCDEF", "state_id": 5},
{"id": 163, "value": "DEFRDC", "state_id": 5},
]
columns = ["id", "value", "state_id"]

result = _convert_arrays_to_dataframe(dict_data, columns)

expected = DataFrame(
[[117, "ABCDEF", 5], [163, "DEFRDC", 5]],
columns=columns,
)

tm.assert_frame_equal(result, expected)
Copy link

Copilot AI Jan 6, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The test only covers the happy path where all dictionaries have all the expected keys. Consider adding test cases for edge cases such as:

  1. Empty data list
  2. Dictionaries missing some columns (to verify error handling)
  3. Column order preservation (dictionaries with keys in a different order than the columns parameter)
  4. Mixed types in values (NULL/None values, different data types)

These additional tests would help ensure the fix is robust and handles various real-world scenarios that might occur with different database cursors.

Copilot uses AI. Check for mistakes.
Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

modified to test via the user API

pandas/io/sql.py Outdated
Comment on lines 166 to 170
# Fix for GH#53028: DictCursor returns dictionaries which cause
# to_object_array_tuples to populate DataFrame with column names instead
# of actual values. Convert dictionaries to tuples in column order.
if data and len(data) > 0 and is_dict_like(data[0]):
data = [tuple(row[col] for col in columns) for row in data]
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The comment repeats a lot of the code. I think we can make this more concise.

Suggested change
# Fix for GH#53028: DictCursor returns dictionaries which cause
# to_object_array_tuples to populate DataFrame with column names instead
# of actual values. Convert dictionaries to tuples in column order.
if data and len(data) > 0 and is_dict_like(data[0]):
data = [tuple(row[col] for col in columns) for row in data]
if data and len(data) > 0 and is_dict_like(data[0]):
# GH#53028 - handle cursors that return dictionaries
data = [tuple(row[col] for col in columns) for row in data]

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

done

drop_table(table_name, sqlite_buildin)


def test_convert_arrays_to_dataframe_with_dictcursor():
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Instead of testing the internal function, can you test via the user API. Perhaps something like #52437 (comment)

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

done

@AMRUTH-ASHOK AMRUTH-ASHOK force-pushed the fix-dictcursor-gh53028 branch from c5a1ad3 to 60e1ef2 Compare January 6, 2026 22:31
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Bug IO SQL to_sql, read_sql, read_sql_query Regression Functionality that used to work in a prior pandas version

Projects

None yet

Development

Successfully merging this pull request may close these issues.

BUG: BUG: read_sql_query duplicates column names in cells in pandas v2.0.0

3 participants