Skip to content

Propensity score Joint Estimation Versus 2 stage Estimation #500

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
wants to merge 23 commits into
base: main
Choose a base branch
from

Conversation

NathanielF
Copy link
Contributor

@NathanielF NathanielF commented Jul 6, 2025

Working on a draft PR to improve or augment the existing propensity scoring weighting implementation. In particular to make it a bit faster and more "Bayesian". As it currently stands we are performing a two-step strategy where we fit a propensity score model and then push the values of the posterior estimate for the propensity score through a re-weighting routine to estimate the causal contrast.

But we could try and explore a more properly Bayesian model where we fit the propensity score outcome and the model outcome at once in the same model context. This more properly Bayesian and a good bit faster.

See for instance, work here: https://github.com/ajnafa/Latent-Bayesian-MSM by Jordan Nafa and Andrew Heiss

---- EDIT ----

So i think i've finally understood the role of the propensity score in Bayesian estimation and I demonstrate that the two-stage method is what we actually want. The changes i've made are actually quite small. I've added a function to the PyMC model associated with the inverse propensity experiment class.

But in the notebook example i show why this two-stage process is to be preferred over the joint fit model.


📚 Documentation preview 📚: https://causalpy--500.org.readthedocs.build/en/500/

Copy link

Check out this pull request on  ReviewNB

See visual diffs & provide feedback on Jupyter Notebooks.


Powered by ReviewNB

Copy link

codecov bot commented Jul 6, 2025

Codecov Report

Attention: Patch coverage is 96.36364% with 2 lines in your changes missing coverage. Please review.

Project coverage is 94.62%. Comparing base (fdce5b0) to head (6160a3f).
Report is 2 commits behind head on main.

Files with missing lines Patch % Lines
causalpy/pymc_models.py 95.12% 2 Missing ⚠️
Additional details and impacted files
@@            Coverage Diff             @@
##             main     #500      +/-   ##
==========================================
+ Coverage   94.59%   94.62%   +0.03%     
==========================================
  Files          28       28              
  Lines        2053     2104      +51     
==========================================
+ Hits         1942     1991      +49     
- Misses        111      113       +2     

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

@NathanielF NathanielF marked this pull request as ready for review July 11, 2025 22:58
@NathanielF NathanielF changed the title Propensity score latent - DRAFT Propensity score Joint Estimation Versus 2 stage Estimation Jul 11, 2025
Copy link

review-notebook-app bot commented Jul 12, 2025

View / edit / reply to this conversation on ReviewNB

juanitorduz commented on 2025-07-12T08:29:09Z
----------------------------------------------------------------

Can you briefly describe what you want to compare, just to keep the storyline flow?


Copy link

review-notebook-app bot commented Jul 12, 2025

View / edit / reply to this conversation on ReviewNB

juanitorduz commented on 2025-07-12T08:29:10Z
----------------------------------------------------------------

Can you please remove this output ?


Copy link

review-notebook-app bot commented Jul 12, 2025

View / edit / reply to this conversation on ReviewNB

juanitorduz commented on 2025-07-12T08:29:11Z
----------------------------------------------------------------

@NathanielF Thank you for this simulation! I wanted to do this for years, and this makes everything very clear!


Copy link

review-notebook-app bot commented Jul 12, 2025

View / edit / reply to this conversation on ReviewNB

juanitorduz commented on 2025-07-12T08:29:11Z
----------------------------------------------------------------

Could you please add an explanation for this section?


Copy link

review-notebook-app bot commented Jul 12, 2025

View / edit / reply to this conversation on ReviewNB

juanitorduz commented on 2025-07-12T08:29:12Z
----------------------------------------------------------------

Can you please add explanation on what these numbers represent (again, for the non-expert users)


Copy link

review-notebook-app bot commented Jul 12, 2025

View / edit / reply to this conversation on ReviewNB

juanitorduz commented on 2025-07-12T08:29:12Z
----------------------------------------------------------------

can you please remove this output?


Copy link

review-notebook-app bot commented Jul 12, 2025

View / edit / reply to this conversation on ReviewNB

juanitorduz commented on 2025-07-12T08:29:13Z
----------------------------------------------------------------

Can you add explanation of this model comparison


Copy link

review-notebook-app bot commented Jul 12, 2025

View / edit / reply to this conversation on ReviewNB

juanitorduz commented on 2025-07-12T08:29:13Z
----------------------------------------------------------------

Again, can you please add some explanatory test here?


Copy link

review-notebook-app bot commented Jul 12, 2025

View / edit / reply to this conversation on ReviewNB

juanitorduz commented on 2025-07-12T08:29:14Z
----------------------------------------------------------------

Can you please remove this output?


Copy link

review-notebook-app bot commented Jul 12, 2025

View / edit / reply to this conversation on ReviewNB

juanitorduz commented on 2025-07-12T08:29:14Z
----------------------------------------------------------------

Can you please remove this output?


Signed-off-by: Nathaniel <[email protected]>
@NathanielF NathanielF requested a review from drbenvincent July 13, 2025 05:28
Copy link

review-notebook-app bot commented Jul 16, 2025

View / edit / reply to this conversation on ReviewNB

drbenvincent commented on 2025-07-16T19:59:40Z
----------------------------------------------------------------

Not seen this mp_ctx stuff before. Any chance you can add in a link, maybe to pymc docs. Feel free to expand on this, possibly with an admonition box if you think it's worth it. What does it do? Is it necessary? What will happen with this setting on non Mac's?


Copy link

review-notebook-app bot commented Jul 16, 2025

View / edit / reply to this conversation on ReviewNB

drbenvincent commented on 2025-07-16T19:59:41Z
----------------------------------------------------------------

Any way we can use the align environment here to get the equations lined up? Might need to expand into more lines - could help readability anyway

Can we add a proper glossary term for the first mention of propensity score?

Where you say "we've seen", where is that? Linkify please.


Copy link

review-notebook-app bot commented Jul 16, 2025

View / edit / reply to this conversation on ReviewNB

drbenvincent commented on 2025-07-16T19:59:41Z
----------------------------------------------------------------

Possibly annoying, but if it's possible to add [sphinx] internal links to the sections (in your bullet point outline), that would be cool

For the links to blog posts, I've got a mild preference to add these in as bibtex references rather than links


Copy link

review-notebook-app bot commented Jul 16, 2025

View / edit / reply to this conversation on ReviewNB

drbenvincent commented on 2025-07-16T19:59:42Z
----------------------------------------------------------------

Will leave it up to you, but is it worth adding a graphviz (or daft) DAG here?


Copy link

review-notebook-app bot commented Jul 16, 2025

View / edit / reply to this conversation on ReviewNB

drbenvincent commented on 2025-07-16T19:59:42Z
----------------------------------------------------------------

Could potentially be worth adding a note admonition box - you've got a number of quite box code blocks, so it could be worth flagging up that the nice causalpy API for "just making it work" will be coming up in a later section, and that this is exploratory/illustrative code in order to explore the topic.


Copy link

review-notebook-app bot commented Jul 16, 2025

View / edit / reply to this conversation on ReviewNB

drbenvincent commented on 2025-07-16T19:59:43Z
----------------------------------------------------------------

Line #7.    N = df1.shape[0]

Might be overkill, but I like keeping the global score relatively tidy. Would it be worth making a simple dataclass here? Consider that an idea and I'll leave it up to you as to whether it's not worth it


Copy link

review-notebook-app bot commented Jul 16, 2025

View / edit / reply to this conversation on ReviewNB

drbenvincent commented on 2025-07-16T19:59:44Z
----------------------------------------------------------------

Line #30.        with pm.Model(coords=coords) as model:

Also, sorry if annoying, but for docs I think it helps a lot if we use dims extensively so that we get the dimension labels (not just the sizes) in the graphviz


Copy link

review-notebook-app bot commented Jul 16, 2025

View / edit / reply to this conversation on ReviewNB

drbenvincent commented on 2025-07-16T19:59:44Z
----------------------------------------------------------------

I'm very tempted here to add a reference about the spline component and add a bit of explanation/intuition on how this works with "non-linearity in the treatment effect". Presumably this is treatment effect as a function of propensity? Can you give some background or intuition of example situations where there might be a non-linear effect? Feel free to add it into an admonition box so that we don't interrupted the logical flow of the notebook


Copy link

review-notebook-app bot commented Jul 16, 2025

View / edit / reply to this conversation on ReviewNB

drbenvincent commented on 2025-07-16T19:59:45Z
----------------------------------------------------------------

Line #78.            chosen = np.random.choice(range(propensity_scores.shape[1]))

Not sure why , but I found this very interesting. If you felt like it, you could add an admonition box briefly highlighting this as a cool way to get distributional information from one model into another.

Would that be useful in other 2-stage models, like instrumental variables? Though from memory you implemented that as a 1-stage model?


Copy link

review-notebook-app bot commented Jul 16, 2025

View / edit / reply to this conversation on ReviewNB

drbenvincent commented on 2025-07-16T19:59:45Z
----------------------------------------------------------------

Line #9.    joint_model = make_model(X_trt, X_outcome, T_data, Y_data, coords, priors=priors)

Can we rename make_model into something like make_joint_model?


Copy link

review-notebook-app bot commented Jul 16, 2025

View / edit / reply to this conversation on ReviewNB

drbenvincent commented on 2025-07-16T19:59:46Z
----------------------------------------------------------------

Can add hide-inputs cell tag on this and maybe add some brief markdown explanation of the table. What should the reader take away from this big table of summary stats that's most relevant to the story?


Copy link

review-notebook-app bot commented Jul 16, 2025

View / edit / reply to this conversation on ReviewNB

drbenvincent commented on 2025-07-16T19:59:47Z
----------------------------------------------------------------

Got any references for the reader to follow up the Bayesian feedback / collider bias via the likelihood?

Raised earlier, but any implications for 1 vs 2 stage methods of estimation in instrumental variables?


Copy link

review-notebook-app bot commented Jul 16, 2025

View / edit / reply to this conversation on ReviewNB

drbenvincent commented on 2025-07-16T19:59:48Z
----------------------------------------------------------------

Can't remember if we have it, but add glossary term for potential outcomes please


Copy link

review-notebook-app bot commented Jul 16, 2025

View / edit / reply to this conversation on ReviewNB

drbenvincent commented on 2025-07-16T19:59:48Z
----------------------------------------------------------------

Can we output just as the raw scalars? The DataArray HTML repr is adding a bit of clutter here


Copy link

review-notebook-app bot commented Jul 16, 2025

View / edit / reply to this conversation on ReviewNB

drbenvincent commented on 2025-07-16T19:59:49Z
----------------------------------------------------------------

Same point as above


Copy link

review-notebook-app bot commented Jul 16, 2025

View / edit / reply to this conversation on ReviewNB

drbenvincent commented on 2025-07-16T19:59:49Z
----------------------------------------------------------------

Add link or reference again here please


Copy link

review-notebook-app bot commented Jul 16, 2025

View / edit / reply to this conversation on ReviewNB

drbenvincent commented on 2025-07-16T19:59:50Z
----------------------------------------------------------------

"Here we some divergence" grammatical issue


Copy link

review-notebook-app bot commented Jul 16, 2025

View / edit / reply to this conversation on ReviewNB

drbenvincent commented on 2025-07-16T19:59:51Z
----------------------------------------------------------------

add hide-input cell tag


Copy link

review-notebook-app bot commented Jul 16, 2025

View / edit / reply to this conversation on ReviewNB

drbenvincent commented on 2025-07-16T19:59:51Z
----------------------------------------------------------------

Line #1.    compare_prop_dists(idata_treatment_2s_lalonde, idata_lalonde)

missed this when it was defined, but can we change the function name to have "propensity" fully spelled out?


Copy link

review-notebook-app bot commented Jul 16, 2025

View / edit / reply to this conversation on ReviewNB

drbenvincent commented on 2025-07-16T19:59:52Z
----------------------------------------------------------------

Missing "." at end of last sentence.


Copy link

review-notebook-app bot commented Jul 16, 2025

View / edit / reply to this conversation on ReviewNB

drbenvincent commented on 2025-07-16T19:59:53Z
----------------------------------------------------------------

Add link/reference. This is a long notebook, so the reader probably has no memory of what the NHEFS data is :)


Copy link

review-notebook-app bot commented Jul 16, 2025

View / edit / reply to this conversation on ReviewNB

drbenvincent commented on 2025-07-16T19:59:53Z
----------------------------------------------------------------

add hide-output cell tag ? Not 100% sure about this one


Copy link

review-notebook-app bot commented Jul 16, 2025

View / edit / reply to this conversation on ReviewNB

drbenvincent commented on 2025-07-16T19:59:54Z
----------------------------------------------------------------

Also not 100% sure, but consider adding hide-output cell tag


Copy link

review-notebook-app bot commented Jul 16, 2025

View / edit / reply to this conversation on ReviewNB

drbenvincent commented on 2025-07-16T19:59:54Z
----------------------------------------------------------------

Just a similar comment to above. This is very interesting and I think readers will want to know where to read more. Can you add in some citations here, including on the doubly-robust inference.

Any chance of adding a glossary term for doubly-robust inference?


Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants