Skip to content

Feature Request: Add PandasCategoricalEncoder to encode categorical features as pandas categorical #828

Open
@ClaudioSalvatoreArcidiacono

Description

Some libraries like LightGBM are well integrated with pandas categorical
types
.
I could not find a nice implementation to encode categorical features as pandas
categorical columns while preserving the categories across different datasets. I would like to
propose the addition of a PandasCategoricalEncoder to the feature_engine library to
address this issue.

Is your feature request related to a problem? Please describe.
Yes, I often encounter issues when working with categorical data in pandas. The current
methods do not ensure consistent encoding across different datasets, leading to
potential errors.

Describe the solution you'd like
I would like to implement the PandasCategoricalEncoder class, which will transform
categorical features into pandas categorical types. This encoder will ensure that
categories are encoded consistently between training and testing datasets, and it will
handle unseen categories gracefully based on specified parameters.

Describe alternatives you've considered
I have considered using existing categorical encoding libraries, but they do not provide
such feature.

Additional context
The PandasCategoricalEncoder will include features such as handling missing values,
allowing for flexible unseen category management, and providing methods for inverse
transformation to retrieve original values. This will enhance the usability and
reliability of categorical data processing in pandas.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions