-
-
Notifications
You must be signed in to change notification settings - Fork 21
Description
Background
The showcases two amazing abilities of PyMC-BART:
- The ability to have non-normal error distributions for BART RV's like the
pm.Gamma
distribution that is used, and - the ability to model functions other than the mean like the modeling of non-constant variance.
While the first ability is easily understood and a small step from existing work, the latter ability is both novel and amazing. I would love to use this ability in my work, I just think the implementation feels a little wonky to me for lack of a better word. Additionally, the documentation in pymc-bart’s paper [Quiroga et al., 2022] does not completely explain how this works.
The Issue
It is my understanding that when growing trees for computing estimated variance, every proposed tree has leaf values drawn from
For the , these initial guesses of "sales" are much higher than one would guess for initial guesses of standard deviation. Hence, in this picture:
it is not surprising that the 94% HDI is too wide. We should expect around 12 observations falling outside of the 94% HDI band, but in reality only 2 or 3 observations fall outside of the band. I am guessing the estimate of standard deviation is systematically too high because of the initial conditions.
Thoughts on Implementation
So, with all of the above I am requesting:
- Allow for a PYMC RV prior (or multiple priors when
size > 1
) to be specified on leaf-node values, and/or - Better document the existing functionality on how leaf-nodes are computed so at least the mechanism is more transperent.
Thanks for all the hard-work, plugging BART models in as components of larger probabilistic programs is very much a winning idea and should dominate applied workflows where uncertainty quantification is important. I would love to see this cleaned up a little bit and I am happy to help with documentation or code changes, but need more direction on the math/sampling side of things. Thanks again.