ENH: pandas mutate, add R's mutate functionality to enable users to easily create new columns in data frames

### Feature Type

- [X] Adding new functionality to pandas

- [ ] Changing existing functionality in pandas

- [ ] Removing existing functionality in pandas


### Problem Description

I wish feature engineering (i.e. creating new columns from old ones) could be more efficient and convenient in pandas.
Mainly, common ways of adding features to dataframes in pandas include

1. using chained `.assign` statements (which are hard to debug and contain many hard-to-read lambda expressions) or
2. calling `df['new_column'] = ...` repeatedly in some `add_features` function, this is better for debugging purposes but also hard to read and inconvenient as the user always has to type quotes and the word `df`.

In R's `mutate` function, the series are accessible directly from the scope which makes code much more readable (debugging in R is something else to discuss).

### Feature Description

We could easily add this functionality by providing a context manager (perhaps pd.mutate, to follow R's naming here) which temporarily moves all columns of a dataframe into the caller's `locals`, allows the caller to create new pd.Series while calling and then (upon the context manager's __exit__) all those new pd.Series (or the modified old ones) could be formed to a data frame again.

This makes feature engineering much more convenient, efficient and likely also more debuggable that using chained .assign statements (in the debugger, one could directly access all the pd.Series in that scope).
A minimal example implementation could look like the following:

```python

# %%

from multiprocessing import context
import pandas as pd
import numpy as np

df = pd.DataFrame(
    {
        "col_a": [1, 2, 3, 4, 5],
        "col_b": [10, 20, 30, 40, 50],
    }
)

df

# %%

import inspect
from contextlib import contextmanager
from copy import deepcopy

class mutate_df:
    def __init__(self, df: pd.DataFrame):
        self.df = df

    def __enter__(self):
        frame = inspect.currentframe().f_back
        self.scope_keys = deepcopy(self._extract_locals_keys_from_frame(frame))

        for col in self.df.columns:
            if col in self.scope_keys:
                # Maybe give a warning here?
                pass

            frame.f_locals[col] = self.df[col]

    def _extract_locals_keys_from_frame(self, frame):
        s = {
            str(key)
            for key in frame.f_locals.keys()
            if not key.startswith("_")
        }
        return s

    def __exit__(self, exc_type, exc_val, exc_tb):
        if exc_type:
            raise exc_type(exc_val)

        frame = inspect.currentframe().f_back
        current_keys = self._extract_locals_keys_from_frame(frame)

        added_keys = current_keys - self.scope_keys
        added_keys = set.union(added_keys, set(self.df.columns))

        for key in added_keys:
            try:
                val = frame.f_locals[key]
                self.df[key] = val

            except:
                pass

        return True

with mutate_df(df):
    # All of df's columns are available in the scope
    # as pd.Series objects

    # Set the entire column to one value
    c = 10
    # Use columns defined previously
    col_c = col_a * 20

    # Create new columns
    rolling_mean = col_b.rolling(2).mean()
    col_b_cumsum = col_b.cumsum()

df


# %%

```

The drawback of this feature is that we are fiddling with the caller's `locals` which is not the most elegant.
However, I believe that feature engineering like this is much better to debug and makes the code more readable (than using chained `.assign`s or repeatedly calling `df['new_feature'] = 2 * df['old_feature'] ** 2`).

Therefore I think this feature would make life easier and pandas more useful (and users faster) in data science tasks.

### Alternative Solutions

One might want to handle the `locals` better here to make the usage of this feature less error-prone.
Perhaps one would want to cache previous `locals` and then *only* have the dataframe's columns as the locals in the caller's scope.

This would make debugging even more clean, because if a user sets a breakpoint in such a `with pd.mutate` statement, then that user sees all the columns in the scope's locals clearly instead of having to inspect the dataframe's columns values in the debugger.

### Additional Context

_No response_

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Uh oh!

ENH: pandas mutate, add R's mutate functionality to enable users to easily create new columns in data frames #56499

Feature Type

Problem Description

Feature Description

Alternative Solutions

Additional Context

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Uh oh!

ENH: pandas mutate, add R's mutate functionality to enable users to easily create new columns in data frames #56499

Description

Feature Type

Problem Description

Feature Description

Alternative Solutions

Additional Context

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions