-
Notifications
You must be signed in to change notification settings - Fork 2.2k
GSoC 2026 projects
New contributors should first read the contributing guide and learn the basics of PyTensor. Also they should read through some of the examples in the PyMC docs.
To be considered as a GSoC student, you should make a PR to PyMC / PyTensor. It can be something small, like a doc fix or simple bug fix. Some beginner friendly issues can be found here.
If you are a student interested in participating, please contact us via our Discourse site.
Below there is a list of possible topics for your GSoC project, we are also open to other topics, contact us on Discourse. Keep in mind that these are only ideas and that some of them can't be completely solved in a single GSoC project. When writing your proposal, choose some specific tasks and make sure your proposal is adequate for the GSoC time commitment. We expect all projects to be 350h projects, if you'd like to be considered for a 175h project you must reach out on Discourse. We will not accept 175h applications from people with whom we haven't discussed their time commitments before submitting the application.
This project will build on previous GSoC projects to continue improving PyMCs support for modeling spatial processes. There are many possible algorithms one may choose to work on, such as Gaussian process based methods for point processes like Nearest Neighbor GPs or the Vecchia approximation, and models that are types of Gaussian Markov Random Fields, like CAR, ICAR and BYM models. Implementations of these can be found in the R package CARBayes and INLA.
- Bill Engels
- Chris Fonnesbeck
- Hours: 350
- Expected outcome: An implementation of one or more of the methods listed above, along with one or more notebook examples that can be added to the PyMC docs demonstrating these techniques.
- Skills required: Python, statistics, GPs
- Difficulty: Medium
This project works to extend the existing Minibatch functionality to support the streaming case. This would allow PyMC's Variational inference methods to be used on data larger than could fit in memory. This project would also work to introduce Minibatch support to all other inference methods in the library that would benefit from it, such as the recently introduced Pathfinder functionality.
We strongly suspect this project should integrate with [Dask] APIs, so prior knowledge on that would make help in this project.
- Hours: 350
- Expected outcome: An improved Minibatch implementation for all inference methods that support it. A notebook demonstrating inference using a streaming data source.
- Skills required: Python, Dask, Optimization
- Difficulty: Medium
- Chris Fonnesbeck
- Rob Zinkov
PyMC has support for Variational inference using blackbox methods which use a hardcoded guide program autogenerated for every model. It would be nice to give users the ability to write their own guide programs as is done in libraries like Pyro. This project would work to introduce a guide program module as well as generalising the existing inference algorithms to support them.
- Hours: 350
- Expected outcome: A working implementation of guide programs for blackbox optimization using the ELBO as the loss. This should also include an example notebook showcasing the feature.
- Skills required: Python, Variational Inference, Optimization
- Difficulty: Hard
- Rob Zinkov
The COLA library implements several optimizations for speeding up linear algebra operations. This project would work to introduce these optimizations to pytensor as a collection of graph rewrites. This issue tracks the current state of this effort, but there is potential for massive speedups.
- Hours: 350
- Expected outcome: The creation of a sizeable portion of these rewrites along with a notebook demonstrating the potential speedups they offer on typical pymc programs.
- Skills required: Python, Linear Algebra
- Difficulty: Medium
- Jesse Grabowski
- Rob Zinkov
PyMC-Extras includes a specialized PyMC functionality that can marginalize (and recover) finite discrete univariate variables for more efficient MCMC sampling. Recently we also added support for marginalization of DiscreteMarkovChain, yielding automatically derived HiddenMarkovModels.
A non-trivial example using this functionality in a multiple changepoint model can be found in this gist
This project would aim to extend this functionality in several ways:
- Support marginalization of truncated versions of other discrete distributions like Truncated Binomial or Truncated Poisson.
- Support marginalization of variables with closed form solution such as
Beta + Binomial = BetaBinomial - Contribute new pymc-examples showcasing the new/existing functionality.
These points are suggestions and not an exhaustive list. Not all points must be tackled in the proposed project.
This project will require interacting with PyTensor, which is the backend used by PyMC. See https://www.pymc.io/projects/docs/en/v5.0.2/learn/core_notebooks/pymc_pytensor.html for more details. An understanding of probability theory is helpful but not a requirement (you can learn as you go)
- Hours: 350
- Expected outcome: Support for marginalisation of Truncated distributions as well as finding closed form solutions for some conjugacy pairs.
- Skills required: Python, Probability
- Difficulty: Hard
- Rob Zinkov