forked from jcrobak/parquet-python
-
-
Notifications
You must be signed in to change notification settings - Fork 188
Open
Description
"""Test categorical data with nulls and read with filters"""
fn = os.path.join(str(tempdir), 'test.parquet')
# Create DataFrame with categorical and nullable columns
df = pd.DataFrame({
'cat_col': ['A', 'B', None, 'C'] * 2,
'filter_col': list(range(8)),
'nullable_int': pd.array([1, None, 3, 4] * 2, dtype="Int64")
})
df['cat_col'] = df['cat_col'].astype('category')
print("df")
print(df)
# Write DataFrame
write(fn, df, file_scheme='hive', row_group_offsets=[0, 4])
# Test reading with row_filter and value filter
pf = ParquetFile(fn)
# Test with row_filter=True and filter_col > 6
df_filtered = pf.to_pandas(filters=[('filter_col', '>', 6)], row_filter=True)
expected = df[df['filter_col'] > 6].reset_index(drop=True)
print("df_filtered")
print(df_filtered)
print("expected")
print(expected)
assert_frame_equal(df_filtered, expected)
shows:
testing.pyx:173: AssertionError
---------------------------- Captured stdout call -----------------------------
df
cat_col filter_col nullable_int
0 A 0 1
1 B 1 <NA>
2 NaN 2 3
3 C 3 4
4 A 4 1
5 B 5 <NA>
6 NaN 6 3
7 C 7 4
df_filtered
cat_col filter_col nullable_int
0 NaN 7 2267141176816
expected
cat_col filter_col nullable_int
0 C 7 4
=========================== short test summary info ===========================
FAILED fastparquet/test/test_output.py::test_categorical_with_nulls_and_filters - AssertionError: DataFrame.iloc[:, 0] (column name="cat_col") are different
DataFrame.iloc[:, 0] (column name="cat_col") values are different (100.0 %)
[index]: [0]
[left]: [NaN]
Categories (3, object): ['A', 'B', 'C']
[right]: ['C']
Categories (3, object): ['A', 'B', 'C']
At positional index 0, first diff: nan != C
============================== 1 failed in 8.17s ==============================
Describe the issue:
This test would tend to show a limitation when combining row filtering with nulls within the same dataframe.
I have pushed this test case in PR #956
and will try to investigate.
Environment:
- Python version: 3.13
- Operating System: Windows
- Install method (conda, pip, source): mix of source + unzipping windows wheels to retrieve compiled code.
Metadata
Metadata
Assignees
Labels
No labels