fragile approach to getting names of independent variables?

Hello! Thanks so much for this package! I'm learning a ton about making inference from random forest models, and I really appreciate the effort you've put into making this more understandable.

I came across an issue when using your package on a {ranger} model built using {spatialRF} when trying to run `randomForestExplainer::plot_predict_interaction()`. It seems that the method used by {randomForestExplainer} to get the list of dependent variable names is fragile, and can error out if the `formula` syntax wasn't used to create the {ranger} model. 

For instance, with {ranger}, you can build a model like this:

```
forest_ranger <- ranger::ranger(x = mtcars[, c("mpg", "disp", "hp", "drat", "wt", "qsec", "vs", "am", "gear", "carb")], y = mtcars[, "cyl"])
```

Which will then error out when trying to run:

```
plot_predict_interaction(forest_ranger, mtcars, "mpg", "hp")
```

But it doesn't error out when building the same model using the `formula` syntax:

```
forest_ranger <- ranger::ranger(cyl ~ ., data = mtcars)
plot_predict_interaction(forest_ranger, mtcars, "mpg", "hp")
```

The issue arises in this line in {randomForestExplainer}: [https://github.com/ModelOriented/randomForestExplainer/blob/630c4fe9f7ddcc0a9a586dc4c4fc1822e9d30776/R/min_depth_interactions.R#L363](https://github.com/ModelOriented/randomForestExplainer/blob/630c4fe9f7ddcc0a9a586dc4c4fc1822e9d30776/R/min_depth_interactions.R#L363)

The {spatialRF} package doesn't build the {ranger} model using the `formula` syntax, so `randomForestExplainer::plot_predict_interaction()` won't work on the resulting model:

```
forest_ranger <- spatialRF::rf(dependent.variable.name = "cyl", 
                               predictor.variable.names = c("mpg", "disp", "hp", "drat", "wt", "qsec", "vs", "am", "gear", "carb"), 
                               data = mtcars)
plot_predict_interaction(forest_ranger, mtcars, "mpg", "hp")
```

I documented [this issue and my workaround in the repo for {spatialRF}](https://github.com/BlasBenito/spatialRF/issues/10) but I thought I'd add it here, too since it seems like the issue is perhaps more relevant for {randomForestExplainer} and how it captures what the dependent variables are in a {ranger} model.

It looks like, in a {ranger} model, you can get the independent variables directly from the `$forest$independent.variable.names` component? Maybe this is a more robust way to capture that info for `plot_predict_interaction()`?

What do you think?

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

fragile approach to getting names of independent variables? #28

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

fragile approach to getting names of independent variables? #28

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions