Open
Description
Feature matrix calculation can fail in some cases when using Equal
and NotEqual
primitives to compare categorical columns with different categories. An example is included below using the attached data.
Code Sample, a copy-pastable example to reproduce your bug.
import pandas as pd
import featuretools as ft
df = pd.read_csv("SBAcase_cleaned_train.csv")
es = ft.EntitySet()
logical_types = {
"NAICS": "Categorical",
"NewExist": "Categorical",
}
es.add_dataframe(dataframe_name="df", dataframe=df, index="id", make_index=True, logical_types=logical_types)
ft.dfs(entityset=es, target_dataframe_name="df", trans_primitives=["equal", "not_equal"])
TypeError: Categoricals can only be compared if 'categories' are the same.