Skip to content

Conversation

@achiefa
Copy link
Collaborator

@achiefa achiefa commented Dec 9, 2025

This PR implements the Neural Tangent Kernel (NTK) in Colibri. The idea is to compute the NTK for any PDF model that is trained using the Monte Carlo replica method and gradient-based optimisers.

To compute the NTKs, the model parameters are stored on disk during training with a user-specified recording frequency. This will create a new directory called parameters in each replica folder, which will contain the set of parameters for each recorded epoch.

The user can compute the NTK for each replica at any recorded epoch using the action compute_ntk.

@achiefa achiefa self-assigned this Dec 9, 2025
Copy link
Member

@LucaMantani LucaMantani left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

A little comment

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hi @achiefa , thanks for starting this. Just from a quick look, may I suggest to not have this write stuff directly, but rather add things in the GradientDescentResult? This way the writing is delegated to other dedicated functions and you don't need to modify much here. Similarly for the MonteCarloFit class.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hi @LucaMantani, thanks for your comment. Indeed, we have considered this option, which I agree has a more solid design principle. However, I was worried that storing the parameters for all recorded epochs could yield memory issues during training. If instead we use a buffer that is saved on disk and freed at the end of each epoch, then we avoid any potential memory issue. Maybe this is not a problem at all, and we can simply store all parameters in a big array and then add it to GradientDescentResult.

Just to quantify the problem: For a neural network with 763 parameters (float64), a single array is about 0.01 MB. This is then multiplied by the number of epochs for which we want to save the parameters. For instance, if we have 100 epochs, this adds up to ~1MB for one replica. Again, probably we can afford this in favour of a better code design. What do you think?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think 1 MB is nothing, we load in memory several Gb due to the data and FK tables. Even if one had a model with 1000 parameters, saving it 1000 times would be 8 Mb. So I would say memory is far from being an issue?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I agree. Let's put it in GradientDescentResult then.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

5 participants