Closed
Description
Describe the bug
In 37.1.0 and earlier the substr
function in python called the built in scalar function Substr, which allowed for being called with either two or three arguments (string, start position, length). In 38.0.0 this now uses a macro that generates a single form that only takes two arguments and the user needs to call substring
if they want to specify length.
To Reproduce
import datafusion
from datafusion import lit, functions as F
ctx = datafusion.SessionContext()
df = ctx.from_pydict({"a": [1]})
df.select(F.substr(lit("test text"), lit(0), lit(3)).alias("subst")).show()
In 37.1.0 produces:
DataFrame()
+-------+
| subst |
+-------+
| te |
+-------+
In 38.0.0 produces:
Traceback (most recent call last):
File "/Users/tsaucer/src/personal/datafusion-python/examples/test_substr.py", line 22, in <module>
df.select(F.substr(lit("test text"), lit(0), lit(3)).alias("subst")).show()
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
TypeError: substr() takes 2 positional arguments but 3 were given
Expected behavior
No change in performance, but it is probably better to have users update to calling substring
. Currently this function is not exposed, but it is included as part of PR #711
Additional context
This two line commit will resolve the issue: