2.1.5中EncoderLayer里涉及到add&norm的操作,貌似都是先对原始输入做norm,然后将norm后的数据输入MHA/FFN,然后再和原始输入相加,这不是pre-norm的操作吗?按理说原生的Transformer不都是post-norm嘛~
Activity
YingxuanW commentedon Mar 21, 2024
看到这里同样有疑问,个人会这样实现:
x = norm(x + attention(x,x,x,mask))
x =norm(x + ffn(x))
hedes1992 commentedon Jun 13, 2024
yes, I find the following is the right
https://github.com/hyunwoongko/transformer/blob/master/models/blocks/encoder_layer.py