Skip to content

Commit b0f8273

Browse files
authored
Finished Transformer!
1 parent 6a4bb8d commit b0f8273

File tree

1 file changed

+3
-3
lines changed

1 file changed

+3
-3
lines changed

NLP/16.7 Transformer/README.md

Lines changed: 3 additions & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -56,7 +56,7 @@ transformer模型中缺少一种解释输入序列中单词顺序的方法,它
5656

5757
最后把这个Positional Encoding与embedding的值相加,作为输入送到下一层。
5858

59-
![](https://gitee.com/kkweishe/images/raw/master/ML/2019-9-26_9-25-43.png)
59+
![](https://gitee.com/kkweishe/images/raw/master/ML/2019-9-26_14-45-31.png)
6060

6161

6262

@@ -82,7 +82,7 @@ The animal didn't cross the street because it was too tired
8282

8383
4. 下一步就是把Value和softmax得到的值进行相乘,并相加,得到的结果即是self-attetion在当前节点的值。
8484

85-
![](https://gitee.com/kkweishe/images/raw/master/ML/2019-9-26_9-4-8.png)
85+
![](https://gitee.com/kkweishe/images/raw/master/ML/2019-9-26_14-47-17.png)
8686

8787
在实际的应用场景,为了提高计算速度,我们采用的是矩阵的方式,直接计算出Query, Key, Value的矩阵,然后把embedding的值与三个矩阵直接相乘,把得到的新矩阵 Q 与 K 相乘,乘以一个常数,做softmax操作,最后乘上 V 矩阵。
8888

@@ -98,7 +98,7 @@ The animal didn't cross the street because it was too tired
9898

9999
这篇论文更牛逼的地方是给self-attention加入了另外一个机制,被称为“multi-headed” attention,该机制理解起来很简单,**就是说不仅仅只初始化一组Q、K、V的矩阵,而是初始化多组,tranformer是使用了8组**,所以最后得到的结果是8个矩阵。
100100

101-
![](https://gitee.com/kkweishe/images/raw/master/ML/2019-9-26_9-13-0.png)
101+
![](https://gitee.com/kkweishe/images/raw/master/ML/2019-9-26_14-49-14.png)
102102

103103
![](https://gitee.com/kkweishe/images/raw/master/ML/2019-9-26_9-13-50.png)
104104

0 commit comments

Comments
 (0)