BART model save, reload and new predictions

Hi, I have been trying to save, reload and generate new predictions with a model that includes a BARTRV.

I am able to save the trace as a pickle (net_cdf works too), and then instantiate  a new model and get the posterior predictions on the training data, but when I try to add new data I get shape errors. The shape errors are odd since when I train the model I can update the model with new data for predictions without any issues. It is only when I use the newly instantiated model that I am unable to update the input data.

Below is a minimal example:

```
from pathlib import Path

import arviz as az
import matplotlib.pyplot as plt
import numpy as np
import pandas as pd
import pymc as pm
import pymc_bart as pmb

import cloudpickle as cpkl
import dill 
print(f"Running on PyMC v{pm.__version__}")

print(f"Running on PyMC-BART v{pmb.__version__}")
```
```
try:
    bikes = pd.read_csv(Path("..", "data", "bikes.csv"))
except FileNotFoundError:
    bikes = pd.read_csv(pm.get_data("bikes.csv"))

features = ["hour", "temperature", "humidity", "workingday"]

X = bikes[features]
Y = bikes["count"]

xt = X[0:10]
yt = Y[0:10]
```

```
with pm.Model() as model_bikes:
    xdata = pm.MutableData("xdata", X)
    a = pm.Exponential("a", 1)
    mu_ = pmb.BART("mu_", xdata, np.log(Y), m=20)
    mu = pm.Deterministic("mu", pm.math.exp(mu_))
    y = pm.NegativeBinomial("y", mu=mu, alpha=a, observed=Y, shape=xdata.shape[0])
    idata_bikes = pm.sample(random_seed=99, draws=100, tune=100, compute_convergence_checks=False)
idata_bikes
```

Pickle instead of netcdf, but this seems to work fine
```
# # pickle
with open('test4.pkl', mode='wb') as file:
    cpkl.dump(idata_bikes, file)

with open("test4.pkl", mode="rb") as file:
    idata4 = cpkl.load(file)

```

Posterior predictions on updated data works with OG model with the OG idata and the saved and loaded idata
```
with model_bikes:
    pm.set_data({"xdata": xt})
    post1 = pm.sample_posterior_predictive(idata_bikes, var_names=["mu", "y"])

with model_bikes:
    pm.set_data({"xdata": xt})
    post2 = pm.sample_posterior_predictive(idata4, var_names=["mu", "y"])
print(post1.posterior_predictive["mu"].values.mean((0,1)))
print(post2.posterior_predictive["mu"].values.mean((0,1)))
```

# Restart the session to test the load from a clean slate and reload the data from above

```
from pathlib import Path

import arviz as az
import matplotlib.pyplot as plt
import numpy as np
import pandas as pd
import pymc as pm
import pymc_bart as pmb

import cloudpickle as cpkl
import dill 
print(f"Running on PyMC v{pm.__version__}")

print(f"Running on PyMC-BART v{pmb.__version__}")
```

```
try:
    bikes = pd.read_csv(Path("..", "data", "bikes.csv"))
except FileNotFoundError:
    bikes = pd.read_csv(pm.get_data("bikes.csv"))

features = ["hour", "temperature", "humidity", "workingday"]

X = bikes[features]
Y = bikes["count"]

xt = X[0:10]
yt = Y[0:10]
```

Specify the new model. Only difference is variable names
```
with pm.Model() as model2:
    xdata2 = pm.MutableData("xdata", X)
    a2 = pm.Exponential("a", 1)
    mu_2 = pmb.BART("mu_", xdata2, np.log(Y), m=50)
    mu2 = pm.Deterministic("mu", pm.math.exp(mu_2))
    y2 = pm.NegativeBinomial("y", mu=mu2, alpha=a2, observed=Y, shape=xdata2.shape[0])
```

load the saved idata

```
with open("test4.pkl", mode="rb") as file:
    idata4 = cpkl.load(file)
```

get posterior predictions on the training data
```
with model2:
    post5 = pm.sample_posterior_predictive(idata4, var_names=["mu", "y"], )
```
This works minus a slight difference in predicted values, possible due to a difference in random state? The post5 compares well to the post1 and post2 above.


get the poster predictions with new data
```
with model2:
    pm.set_data({"xdata": xt})
    post4 = pm.sample_posterior_predictive(idata4, var_names=["mu", "y"])
```
This fails with the following error

```
"name": "ValueError",
	"message": "size does not match the broadcast shape of the parameters. (10,), (10,), (348,)\nApply node that caused the error: nbinom_rv{0, (0, 0), int64, True}(RandomGeneratorSharedVariable(<Generator(PCG64) at 0x7F1CCA3F2A40>), MakeVector{dtype='int64'}.0, 4, a, Composite{...}.1)\nToposort index: 5\nInputs types: [RandomGeneratorType, TensorType(int64, shape=(1,)), TensorType(int64, shape=()), TensorType(float64, shape=()), TensorType(float64, shape=(None,))]\nInputs shapes: ['No shapes', (1,), (), (), (348,)]\nInputs strides: ['No strides', (8,), (), (), (8,)]\nInputs values: [Generator(PCG64) at 0x7F1CCA3F2A40, array([10]), array(4), array(1.50162583), 'not shown']\nOutputs clients: [['output'], ['output']]\n\nHINT: Re-running with most PyTensor optimizations disabled could provide a back-trace showing when this node was created. This can be done by setting the PyTensor flag 'optimizer=fast_compile'. If that does not work, PyTensor optimizations can be disabled with 'optimizer=None'.\nHINT: Use the PyTensor flag `exception_verbosity=high` for a debug print-out and storage map footprint of this Apply node.",
...
```

I can't figure out where this shape error arises from. The trained model specified in the top allows for updating of data without issues, so I am not sure if there is a general issue with the way the model is specified?

Is there a different process to saving and reloading a model with a BARTRV?

I also tried pickling the whole model, but that doesn't work because of the multiprocessing components in the BART object. I get a socket error when trying to reload the pickled object.

I have also posted this in the discourse, as I was not sure where it makes the most sense to discuss this issue.

Thanks!



Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Uh oh!

BART model save, reload and new predictions #123

Restart the session to test the load from a clean slate and reload the data from above

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Uh oh!

BART model save, reload and new predictions #123

Description

Restart the session to test the load from a clean slate and reload the data from above

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions