Description
Hello! Thanks so much for this package! I'm learning a ton about making inference from random forest models, and I really appreciate the effort you've put into making this more understandable.
I came across an issue when using your package on a {ranger} model built using {spatialRF} when trying to run randomForestExplainer::plot_predict_interaction()
. It seems that the method used by {randomForestExplainer} to get the list of dependent variable names is fragile, and can error out if the formula
syntax wasn't used to create the {ranger} model.
For instance, with {ranger}, you can build a model like this:
forest_ranger <- ranger::ranger(x = mtcars[, c("mpg", "disp", "hp", "drat", "wt", "qsec", "vs", "am", "gear", "carb")], y = mtcars[, "cyl"])
Which will then error out when trying to run:
plot_predict_interaction(forest_ranger, mtcars, "mpg", "hp")
But it doesn't error out when building the same model using the formula
syntax:
forest_ranger <- ranger::ranger(cyl ~ ., data = mtcars)
plot_predict_interaction(forest_ranger, mtcars, "mpg", "hp")
The issue arises in this line in {randomForestExplainer}:
The {spatialRF} package doesn't build the {ranger} model using the formula
syntax, so randomForestExplainer::plot_predict_interaction()
won't work on the resulting model:
forest_ranger <- spatialRF::rf(dependent.variable.name = "cyl",
predictor.variable.names = c("mpg", "disp", "hp", "drat", "wt", "qsec", "vs", "am", "gear", "carb"),
data = mtcars)
plot_predict_interaction(forest_ranger, mtcars, "mpg", "hp")
I documented this issue and my workaround in the repo for {spatialRF} but I thought I'd add it here, too since it seems like the issue is perhaps more relevant for {randomForestExplainer} and how it captures what the dependent variables are in a {ranger} model.
It looks like, in a {ranger} model, you can get the independent variables directly from the $forest$independent.variable.names
component? Maybe this is a more robust way to capture that info for plot_predict_interaction()
?
What do you think?