-
-
Notifications
You must be signed in to change notification settings - Fork 18.6k
Implemented NumbaExecutionEngine #61487
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Changes from 5 commits
aa42037
db9f3b0
4cb240d
97d9063
69e0e35
7365079
c605857
24a0615
File filter
Filter by extension
Conversations
Jump to
Diff view
Diff view
There are no files selected for viewing
Original file line number | Diff line number | Diff line change |
---|---|---|
|
@@ -178,6 +178,60 @@ def apply( | |
""" | ||
|
||
|
||
class NumbaExecutionEngine(BaseExecutionEngine): | ||
""" | ||
Numba-based execution engine for pandas apply and map operations. | ||
""" | ||
|
||
@staticmethod | ||
def map( | ||
data: np.ndarray | Series | DataFrame, | ||
func, | ||
args: tuple, | ||
kwargs: dict, | ||
decorator: Callable | None, | ||
skip_na: bool, | ||
): | ||
""" | ||
Elementwise map for the Numba engine. Currently not supported. | ||
""" | ||
raise NotImplementedError("Numba map is not implemented yet.") | ||
|
||
@staticmethod | ||
def apply( | ||
data: np.ndarray | Series | DataFrame, | ||
func, | ||
args: tuple, | ||
kwargs: dict, | ||
decorator: Callable, | ||
axis: int | str, | ||
): | ||
""" | ||
Apply `func` along the given axis using Numba. | ||
""" | ||
engine_kwargs: dict[str, bool] | None = ( | ||
decorator if isinstance(decorator, dict) else None | ||
) | ||
|
||
looper_args, looper_kwargs = prepare_function_arguments( | ||
func, | ||
args, | ||
kwargs, | ||
num_required_args=1, | ||
) | ||
# error: Argument 1 to "__call__" of "_lru_cache_wrapper" has | ||
# incompatible type "Callable[..., Any] | str | list[Callable | ||
# [..., Any] | str] | dict[Hashable,Callable[..., Any] | str | | ||
# list[Callable[..., Any] | str]]"; expected "Hashable" | ||
nb_looper = generate_apply_looper( | ||
func, | ||
**get_jit_arguments(engine_kwargs), | ||
) | ||
result = nb_looper(data, axis, *looper_args) | ||
# If we made the result 2-D, squeeze it back to 1-D | ||
return np.squeeze(result) | ||
|
||
|
||
def frame_apply( | ||
obj: DataFrame, | ||
func: AggFuncType, | ||
|
@@ -1094,23 +1148,15 @@ def wrapper(*args, **kwargs): | |
return wrapper | ||
|
||
if engine == "numba": | ||
args, kwargs = prepare_function_arguments( | ||
self.func, # type: ignore[arg-type] | ||
engine_obj = NumbaExecutionEngine() | ||
result = engine_obj.apply( | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. This is the ideal, but what would be even better is that So, if we call So, an idea is that before we delegate the execution to a third-party executin engine, we could do something like: if engine == "numba":
numba,jit.__pandas_udf__ = NumbaExecutorEngine This way, when There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Thanks for the suggestion! The implementation would look something like this correct? numba.jit.__pandas_udf__ = NumbaExecutionEngine
result = numba.jit.__pandas_udf__.apply(
self.values,
self.func,
self.args,
self.kwargs,
engine_kwargs,
self.axis,
) Also just to clarify, this implementation should wait until Numba writes its own executor? There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Yes, that's correct. But since we already support numba, I wouldn't wait until it's implemented in numba. I would create the execution engine class ourselves, and just simulate that things work the way you describe. So, in the future we expect numba.jit to have the There may be other options, but this approach will keep background compatibility for now when For reference, this is the implementation of the interface for blosc2, another jit compiler: https://github.com/Blosc/python-blosc2/pull/418/files. There are differences, since blosc2 is mostly for vectorized numpy operations, and numba should work well with jitting loops over numpy arrays. but the idea is somehow similar. |
||
self.values, | ||
self.func, | ||
self.args, | ||
self.kwargs, | ||
num_required_args=1, | ||
) | ||
# error: Argument 1 to "__call__" of "_lru_cache_wrapper" has | ||
# incompatible type "Callable[..., Any] | str | list[Callable | ||
# [..., Any] | str] | dict[Hashable,Callable[..., Any] | str | | ||
# list[Callable[..., Any] | str]]"; expected "Hashable" | ||
nb_looper = generate_apply_looper( | ||
self.func, # type: ignore[arg-type] | ||
**get_jit_arguments(engine_kwargs), | ||
engine_kwargs, | ||
self.axis, | ||
) | ||
result = nb_looper(self.values, self.axis, *args) | ||
# If we made the result 2-D, squeeze it back to 1-D | ||
result = np.squeeze(result) | ||
else: | ||
result = np.apply_along_axis( | ||
wrap_function(self.func), | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This note is for final pandas users, I think we don't need to share too much about the internal implementation (users in general won't know about
NumbaExecutionEngine
. What the change in this PR will ideally mean for users is that they'll be able to usedf.apply(func, engine=numba.jit)
. I'd mention that instead.