Upgrade synthetic control to model multiple treated units

## What
Currently, the synthetic control functionality is constrained to a single treatment unit. Clearly, having one treated unit is the minimum you could have for a working synthetic control solution. This still offers non-trivial functionality, and we have docs with a generic example with simulated data, and also for the effects of Brexit (the UK is the only treated unit).

However, there are many situations where you will have more than one treated unit. This could happen in many different domains, but it will be notable in marketing with geolift situations. We also have a docs page on geolift with a single treated geo. We also have a docs page on multi-cell geolift analysis where we have multiple treated geos. That docs page currently walks through an example of a pooled analysis approach where we simply take the average of the outcome variable across the treated geos and then proceed to model it as a single treated unit case of synthetic control. The alternative was to treat the geos as unpooled - in that case we simply run _multiple_ independent single treated unit synthetic control analyses.

## Why
This issue proposes that we add the ability to model multiple treated units (or geos). This is has a number of motivations:
* it is a more general solution
* it would allow a single modeling approach to geo testing (or any other multiple treatment unit situation)
* it would allow the full flexibility from pooled and unpooled analysis approaches, but also newly, partially pooled analysis where there could be information sharing across weights.
* it will lay the foundation for implementing synthetic differences in differences #47

## Changes

### Changes to the `WeightedSumFitter` class

This pymc model class would need to be changed so that we have a weight _matrix_, rather than a weight _vector_.

https://github.com/pymc-labs/CausalPy/blob/4227edf88fbc861181b9adfa8b0f949fc306f2be/causalpy/pymc_models.py#L254-L271

So rather than `dims="coeffs"` (where `coeffs` correspond to control units), it would be `dims=("control_units", "treated_units")`. This would give us an unpooled set of weights of each of the control units for each of the treated units. A later step could them implement partial pooling over these weights (across the `treated_unit`) dimension.

The `WeightedSumFitter.build_model` method would also change to update the fact that the raw data would no longer be long form, so the incoming data (currently a design matrix `X` would now be a 2D matrix, probably shape `("time", "unit")`.

### Changes to the `SyntheticControl` class

* `SyntheticControl` would no longer inherit from the `PrePostFit` class. So all the logic currently in `PrePostFit.__innit__` would move to the new `SyntheticControl.__init__`. This will leave `InterruptedTimeSeries` as the only class that does inherit from `PrePostFit`, so there would be opportunity to collapse that class hierarchy, but that is a peripheral issue. The core thing is that `SyntheticControl` would change a lot.
* The incoming dataframe is still split into pre and post treatment
* Remove the `formula` argument and no longer use a design matrix approach (with patsy). This would result in quite a lot of change to the logic in `SyntheticControl.__init__`
* Update the `_bayesian_plot` method. 

### Changes to tests
* Update all the integration tests to deal with the changed API
* Add new tests to cover the new multiple treated unit case

### Changes to docs
* We'd have to update the docs to use the new API.
* We would also want to update the existing multi-cell geolift analysis docs.

	def build_model(self, X, y, coords):
	"""
	Defines the PyMC model
	"""
	with self:
	self.add_coords(coords)
	n_predictors = X.shape[1]
	X = pm.Data("X", X, dims=["obs_ind", "coeffs"])
	y = pm.Data("y", y[:, 0], dims="obs_ind")
	# TODO: There we should allow user-specified priors here
	beta = pm.Dirichlet("beta", a=np.ones(n_predictors), dims="coeffs")
	# beta = pm.Dirichlet(
	# name="beta", a=(1 / n_predictors) * np.ones(n_predictors),
	# dims="coeffs"
	# )
	sigma = pm.HalfNormal("sigma", 1)
	mu = pm.Deterministic("mu", pm.math.dot(X, beta), dims="obs_ind")
	pm.Normal("y_hat", mu, sigma, observed=y, dims="obs_ind")

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Upgrade synthetic control to model multiple treated units #456

What

Why

Changes

Changes to the `WeightedSumFitter` class

Changes to the `SyntheticControl` class

Changes to tests

Changes to docs

Sub-issues

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Upgrade synthetic control to model multiple treated units #456

Description

What

Why

Changes

Changes to the WeightedSumFitter class

Changes to the SyntheticControl class

Changes to tests

Changes to docs

Sub-issues

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions

Changes to the `WeightedSumFitter` class

Changes to the `SyntheticControl` class