-
Notifications
You must be signed in to change notification settings - Fork 25
Description
When differentiating with respect to an empty array, the results tend to vary:
using DifferentiationInterface, ForwardDiff, ReverseDiff, Mooncake, Enzyme
ADTYPES = [
AutoForwardDiff(),
AutoReverseDiff(),
AutoMooncake(; config=nothing),
AutoEnzyme(; mode=Forward),
AutoEnzyme(; mode=Reverse),
# and more...
]
for adtype in ADTYPES
DifferentiationInterface.value_and_gradient(sum, adtype, Float64[])
end
ReverseDiff, Mooncake, and reverse Enzyme all happily return (0.0, [])
😄
Forward Enzyme tries to use a batch size of 0 and errors:
DifferentiationInterface.jl/DifferentiationInterface/ext/DifferentiationInterfaceEnzymeExt/utils.jl
Lines 11 to 14 in 6a58124
function DI.pick_batchsize(::AutoEnzyme, N::Integer) | |
B = DI.reasonable_batchsize(N, 16) | |
return DI.BatchSizeSettings{B}(N) | |
end |
And ForwardDiff tries to construct a GradientResult
which errors:
Lines 315 to 318 in 6a58124
fc = DI.fix_tail(f, map(DI.unwrap, contexts)...) | |
result = GradientResult(x) | |
result = gradient!(result, fc, x) | |
return DR.value(result), DR.gradient(result) |
Funnily enough gradient
with ForwardDiff (rather than value_and_gradient
) is fine because it doesn't try to construct the GradientResult
. I imagine the other operators would also have varying behaviour.
I suppose it is a bit of a trivial edge case, but would it be possible to unify the behaviour of the AD backends?