-
Notifications
You must be signed in to change notification settings - Fork 3
Implementation of the NTK in Colibri #422
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Conversation
LucaMantani
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
A little comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hi @achiefa , thanks for starting this. Just from a quick look, may I suggest to not have this write stuff directly, but rather add things in the GradientDescentResult? This way the writing is delegated to other dedicated functions and you don't need to modify much here. Similarly for the MonteCarloFit class.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hi @LucaMantani, thanks for your comment. Indeed, we have considered this option, which I agree has a more solid design principle. However, I was worried that storing the parameters for all recorded epochs could yield memory issues during training. If instead we use a buffer that is saved on disk and freed at the end of each epoch, then we avoid any potential memory issue. Maybe this is not a problem at all, and we can simply store all parameters in a big array and then add it to GradientDescentResult.
Just to quantify the problem: For a neural network with 763 parameters (float64), a single array is about 0.01 MB. This is then multiplied by the number of epochs for which we want to save the parameters. For instance, if we have 100 epochs, this adds up to ~1MB for one replica. Again, probably we can afford this in favour of a better code design. What do you think?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think 1 MB is nothing, we load in memory several Gb due to the data and FK tables. Even if one had a model with 1000 parameters, saving it 1000 times would be 8 Mb. So I would say memory is far from being an issue?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I agree. Let's put it in GradientDescentResult then.
This PR implements the Neural Tangent Kernel (NTK) in Colibri. The idea is to compute the NTK for any PDF model that is trained using the Monte Carlo replica method and gradient-based optimisers.
To compute the NTKs, the model parameters are stored on disk during training with a user-specified recording frequency. This will create a new directory called
parametersin each replica folder, which will contain the set of parameters for each recorded epoch.The user can compute the NTK for each replica at any recorded epoch using the action
compute_ntk.