-
@rasbt I noticed that the following comment:
is added in this notebook class Llama3Model(nn.Module):
....
self.trf_blocks = nn.Sequential(
*[TransformerBlock(cfg) for _ in range(cfg["n_layers"])]) Code Snippet below for class Llama3Model(nn.Module):
....
self.trf_blocks = nn.ModuleList( # ModuleList since Sequential can only accept one input, and we need `x, mask, cos, sin`
[TransformerBlock(cfg) for _ in range(cfg["n_layers"])]
) If you think that the comment is incorrect, I would be happy to create a PR and remove/amend the comment in the |
Beta Was this translation helpful? Give feedback.
Replies: 1 comment 3 replies
-
Hello, I believe Sebastian left that comment from the old implementation. The comment is right when you (usually) use But here, he looped over the Imo, you're right that it's PR worthy in any case: switching llama2-llama3 to use |
Beta Was this translation helpful? Give feedback.
Thanks for the comments @Shamik-07 and @casinca . Like @casinca pointed out the comment is actually correct as we use: