-
Notifications
You must be signed in to change notification settings - Fork 77
Propensity score Joint Estimation Versus 2 stage Estimation #500
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Conversation
Signed-off-by: Nathaniel <[email protected]>
Signed-off-by: Nathaniel <[email protected]>
Check out this pull request on See visual diffs & provide feedback on Jupyter Notebooks. Powered by ReviewNB |
Codecov ReportAttention: Patch coverage is
Additional details and impacted files@@ Coverage Diff @@
## main #500 +/- ##
==========================================
+ Coverage 94.59% 94.62% +0.03%
==========================================
Files 28 28
Lines 2053 2104 +51
==========================================
+ Hits 1942 1991 +49
- Misses 111 113 +2 ☔ View full report in Codecov by Sentry. 🚀 New features to boost your workflow:
|
Signed-off-by: Nathaniel <[email protected]>
Signed-off-by: Nathaniel <[email protected]>
Signed-off-by: Nathaniel <[email protected]>
Signed-off-by: Nathaniel <[email protected]>
Signed-off-by: Nathaniel <[email protected]>
Signed-off-by: Nathaniel <[email protected]>
Signed-off-by: Nathaniel <[email protected]>
Signed-off-by: Nathaniel <[email protected]>
Signed-off-by: Nathaniel <[email protected]>
Signed-off-by: Nathaniel <[email protected]>
Signed-off-by: Nathaniel <[email protected]>
Signed-off-by: Nathaniel <[email protected]>
Signed-off-by: Nathaniel <[email protected]>
Signed-off-by: Nathaniel <[email protected]>
View / edit / reply to this conversation on ReviewNB juanitorduz commented on 2025-07-12T08:29:09Z Can you briefly describe what you want to compare, just to keep the storyline flow? |
View / edit / reply to this conversation on ReviewNB juanitorduz commented on 2025-07-12T08:29:10Z Can you please remove this output ? |
View / edit / reply to this conversation on ReviewNB juanitorduz commented on 2025-07-12T08:29:11Z @NathanielF Thank you for this simulation! I wanted to do this for years, and this makes everything very clear! |
View / edit / reply to this conversation on ReviewNB juanitorduz commented on 2025-07-12T08:29:11Z Could you please add an explanation for this section? |
View / edit / reply to this conversation on ReviewNB juanitorduz commented on 2025-07-12T08:29:12Z Can you please add explanation on what these numbers represent (again, for the non-expert users) |
View / edit / reply to this conversation on ReviewNB juanitorduz commented on 2025-07-12T08:29:12Z can you please remove this output? |
View / edit / reply to this conversation on ReviewNB juanitorduz commented on 2025-07-12T08:29:13Z Can you add explanation of this model comparison |
View / edit / reply to this conversation on ReviewNB juanitorduz commented on 2025-07-12T08:29:13Z Again, can you please add some explanatory test here? |
View / edit / reply to this conversation on ReviewNB juanitorduz commented on 2025-07-12T08:29:14Z Can you please remove this output? |
View / edit / reply to this conversation on ReviewNB juanitorduz commented on 2025-07-12T08:29:14Z Can you please remove this output? |
Signed-off-by: Nathaniel <[email protected]>
Signed-off-by: Nathaniel <[email protected]>
Signed-off-by: Nathaniel <[email protected]>
Signed-off-by: Nathaniel <[email protected]>
Signed-off-by: Nathaniel <[email protected]>
View / edit / reply to this conversation on ReviewNB drbenvincent commented on 2025-07-16T19:59:40Z Not seen this mp_ctx stuff before. Any chance you can add in a link, maybe to pymc docs. Feel free to expand on this, possibly with an admonition box if you think it's worth it. What does it do? Is it necessary? What will happen with this setting on non Mac's? |
View / edit / reply to this conversation on ReviewNB drbenvincent commented on 2025-07-16T19:59:41Z Any way we can use the align environment here to get the equations lined up? Might need to expand into more lines - could help readability anyway
Can we add a proper glossary term for the first mention of propensity score?
Where you say "we've seen", where is that? Linkify please. |
View / edit / reply to this conversation on ReviewNB drbenvincent commented on 2025-07-16T19:59:41Z Possibly annoying, but if it's possible to add [sphinx] internal links to the sections (in your bullet point outline), that would be cool
For the links to blog posts, I've got a mild preference to add these in as bibtex references rather than links |
View / edit / reply to this conversation on ReviewNB drbenvincent commented on 2025-07-16T19:59:42Z Will leave it up to you, but is it worth adding a graphviz (or daft) DAG here? |
View / edit / reply to this conversation on ReviewNB drbenvincent commented on 2025-07-16T19:59:42Z Could potentially be worth adding a note admonition box - you've got a number of quite box code blocks, so it could be worth flagging up that the nice causalpy API for "just making it work" will be coming up in a later section, and that this is exploratory/illustrative code in order to explore the topic. |
View / edit / reply to this conversation on ReviewNB drbenvincent commented on 2025-07-16T19:59:43Z Line #7. N = df1.shape[0] Might be overkill, but I like keeping the global score relatively tidy. Would it be worth making a simple dataclass here? Consider that an idea and I'll leave it up to you as to whether it's not worth it |
View / edit / reply to this conversation on ReviewNB drbenvincent commented on 2025-07-16T19:59:44Z Line #30. with pm.Model(coords=coords) as model: Also, sorry if annoying, but for docs I think it helps a lot if we use dims extensively so that we get the dimension labels (not just the sizes) in the graphviz |
View / edit / reply to this conversation on ReviewNB drbenvincent commented on 2025-07-16T19:59:44Z I'm very tempted here to add a reference about the spline component and add a bit of explanation/intuition on how this works with "non-linearity in the treatment effect". Presumably this is treatment effect as a function of propensity? Can you give some background or intuition of example situations where there might be a non-linear effect? Feel free to add it into an admonition box so that we don't interrupted the logical flow of the notebook |
View / edit / reply to this conversation on ReviewNB drbenvincent commented on 2025-07-16T19:59:45Z Line #78. chosen = np.random.choice(range(propensity_scores.shape[1])) Not sure why , but I found this very interesting. If you felt like it, you could add an admonition box briefly highlighting this as a cool way to get distributional information from one model into another.
Would that be useful in other 2-stage models, like instrumental variables? Though from memory you implemented that as a 1-stage model? |
View / edit / reply to this conversation on ReviewNB drbenvincent commented on 2025-07-16T19:59:45Z Line #9. joint_model = make_model(X_trt, X_outcome, T_data, Y_data, coords, priors=priors) Can we rename make_model into something like make_joint_model? |
View / edit / reply to this conversation on ReviewNB drbenvincent commented on 2025-07-16T19:59:46Z Can add hide-inputs cell tag on this and maybe add some brief markdown explanation of the table. What should the reader take away from this big table of summary stats that's most relevant to the story? |
View / edit / reply to this conversation on ReviewNB drbenvincent commented on 2025-07-16T19:59:47Z Got any references for the reader to follow up the Bayesian feedback / collider bias via the likelihood?
Raised earlier, but any implications for 1 vs 2 stage methods of estimation in instrumental variables? |
View / edit / reply to this conversation on ReviewNB drbenvincent commented on 2025-07-16T19:59:48Z Can't remember if we have it, but add glossary term for potential outcomes please |
View / edit / reply to this conversation on ReviewNB drbenvincent commented on 2025-07-16T19:59:48Z Can we output just as the raw scalars? The DataArray HTML repr is adding a bit of clutter here |
View / edit / reply to this conversation on ReviewNB drbenvincent commented on 2025-07-16T19:59:49Z Same point as above |
View / edit / reply to this conversation on ReviewNB drbenvincent commented on 2025-07-16T19:59:49Z Add link or reference again here please |
View / edit / reply to this conversation on ReviewNB drbenvincent commented on 2025-07-16T19:59:50Z "Here we some divergence" grammatical issue |
View / edit / reply to this conversation on ReviewNB drbenvincent commented on 2025-07-16T19:59:51Z add hide-input cell tag |
View / edit / reply to this conversation on ReviewNB drbenvincent commented on 2025-07-16T19:59:51Z Line #1. compare_prop_dists(idata_treatment_2s_lalonde, idata_lalonde) missed this when it was defined, but can we change the function name to have "propensity" fully spelled out? |
View / edit / reply to this conversation on ReviewNB drbenvincent commented on 2025-07-16T19:59:52Z Missing "." at end of last sentence. |
View / edit / reply to this conversation on ReviewNB drbenvincent commented on 2025-07-16T19:59:53Z Add link/reference. This is a long notebook, so the reader probably has no memory of what the NHEFS data is :) |
View / edit / reply to this conversation on ReviewNB drbenvincent commented on 2025-07-16T19:59:53Z add hide-output cell tag ? Not 100% sure about this one |
View / edit / reply to this conversation on ReviewNB drbenvincent commented on 2025-07-16T19:59:54Z Also not 100% sure, but consider adding hide-output cell tag |
View / edit / reply to this conversation on ReviewNB drbenvincent commented on 2025-07-16T19:59:54Z Just a similar comment to above. This is very interesting and I think readers will want to know where to read more. Can you add in some citations here, including on the doubly-robust inference.
Any chance of adding a glossary term for doubly-robust inference? |
Working on a draft PR to improve or augment the existing propensity scoring weighting implementation. In particular to make it a bit faster and more "Bayesian". As it currently stands we are performing a two-step strategy where we fit a propensity score model and then push the values of the posterior estimate for the propensity score through a re-weighting routine to estimate the causal contrast.
But we could try and explore a more properly Bayesian model where we fit the propensity score outcome and the model outcome at once in the same model context. This more properly Bayesian and a good bit faster.
See for instance, work here: https://github.com/ajnafa/Latent-Bayesian-MSM by Jordan Nafa and Andrew Heiss
---- EDIT ----
So i think i've finally understood the role of the propensity score in Bayesian estimation and I demonstrate that the two-stage method is what we actually want. The changes i've made are actually quite small. I've added a function to the PyMC model associated with the inverse propensity experiment class.
But in the notebook example i show why this two-stage process is to be preferred over the joint fit model.
📚 Documentation preview 📚: https://causalpy--500.org.readthedocs.build/en/500/