Skip to content

Commit aaa2234

Browse files
committed
添加duee_v1,更新tplinker_plus结果
1 parent 3d2b9c2 commit aaa2234

26 files changed

+4236
-429
lines changed

.gitignore

Lines changed: 3 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -126,4 +126,6 @@ dmypy.json
126126

127127
# Pyre type checker
128128
.pyre/
129-
outputs
129+
outputs
130+
data_caches
131+
./data/spo.zip.lock

README.md

Lines changed: 28 additions & 12 deletions
Original file line numberDiff line numberDiff line change
@@ -8,16 +8,31 @@ GPLinker_pytorch
88
- 其中`TPLinker_Plus`代码在模型部分可能有点区别。
99

1010
# 更新
11-
- 2022/03/01 添加`tplinker_plus``duie_v1`上的结果。
11+
- 2022/03/03 添加`tplinker_plus`+`bert-base-chinese`权重在`duie_v1`上的结果。添加`duee_v1`任务的训练代码,请查看`duee_v1目录`
12+
- 2022/03/01 添加`tplinker_plus`+`hfl/chinese-roberta-wwm-ext`权重在`duie_v1`上的结果。
1213
- 2022/02/25 现已在Dev分支更新最新的huggingface全家桶版本的代码,main分支是之前旧的代码(执行效率慢)
1314

1415
# 结果
15-
Tips: `gplinker``RTX3090`条件下要训练`5-6h`
16-
| method | pretrained_model_name_or_path | f1 | precision | recall |
17-
| ------------- | ----------------------------- | ------------------ | ------------------ | ------------------ |
18-
| gplinker | hfl/chinese-roberta-wwm-ext | 0.8214065255731926 | 0.8250077498782166 | 0.8178366038895478 |
19-
| gplinker | bert-base-chinese | 0.8198087178424598 | 0.8146470447994109 | 0.8250362175688137 |
20-
| tplinker_plus | bert-base-chinese | 0.8202398259785976 | 0.8169624387588463 | 0.8235436147328684 |
16+
Tips: 在`RTX3090``20epoch`的条件下,`gplinker`需要训练`5-6h``tplinker_plus`则需要训练`16-17h`
17+
|dataset | method | pretrained_model_name_or_path | f1 | precision | recall |
18+
|-------------- | ------------- | ----------------------------- | ------------------ | ------------------ | ------------------ |
19+
|duie_v1 | gplinker | hfl/chinese-roberta-wwm-ext | 0.8214065255731926 | 0.8250077498782166 | 0.8178366038895478 |
20+
|duie_v1 | gplinker | bert-base-chinese | 0.8198087178424598 | 0.8146470447994109 | 0.8250362175688137 |
21+
|duie_v1 | tplinker_plus | hfl/chinese-roberta-wwm-ext | 0.8256425523469291 | 0.8295114656031908 | 0.8218095614381671 |
22+
|duie_v1 | tplinker_plus | bert-base-chinese | 0.8216261688290682 | 0.8076458240569943 | 0.8360990385881737 |
23+
24+
25+
# Tensorboard日志
26+
## gplinker训练日志
27+
<p align="left">
28+
<img src="figure/gplinker.jpg" width="70%" />
29+
</p>
30+
31+
## tplinker_plus训练日志
32+
<p align="left">
33+
<img src="figure/tplinker_plus.jpg" width="70%" />
34+
</p>
35+
2136

2237
# 依赖
2338
所需的依赖如下:
@@ -75,9 +90,10 @@ accelerate launch train.py \
7590
- `max_length`: 句子的最大长度,当大于这个长度时候,`tokenizer`会进行截断处理。
7691
- `topk`: 保存`topk`个数模型,默认为`1`
7792
- `num_workers`: `dataloader``num_workers`参数,`linux`系统下发现`GPU`使用率不高的时候可以尝试设置这个参数大于`0`,而`windows`下最好设置为`0`,不然会报错。
93+
- `use_efficient`: 是否使用`EfficientGlobalPointer`,默认为`False`
7894

79-
80-
# Tensorboard日志
81-
<p align="center">
82-
<img src="figure/tensorboard_log.jpg" width="100%" />
83-
</p>
95+
# Reference
96+
- 苏剑林. (Jan. 30, 2022). 《GPLinker:基于GlobalPointer的实体关系联合抽取 》[Blog post]. Retrieved from https://kexue.fm/archives/8888
97+
- https://github.com/bojone/GPLinker
98+
- https://github.com/bojone/bert4keras/tree/master/examples/task_relation_extraction_gplinker.py
99+
- https://github.com/131250208/TPlinker-joint-extraction/tree/master/tplinker_plus

data/spo.py

Lines changed: 0 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -40,8 +40,6 @@ class SPO(datasets.GeneratorBasedBuilder):
4040
VERSION = datasets.Version("1.0.0")
4141

4242
BUILDER_CONFIGS = [
43-
# en
44-
# nested
4543
datasets.BuilderConfig(name="spo", version=VERSION, description="spo"),
4644
]
4745

data/spo.zip

38.3 MB
Binary file not shown.

duee_v1/data/README.md

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1 @@
1+
请下载数据,当前文件夹只提供了样例数据。

duee_v1/data/duee_dev.json

Lines changed: 10 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,10 @@
1+
{"text": "消失的“外企光环”,5月份在华裁员900余人,香饽饽变“臭”了", "id": "cba11b5059495e635b4f95e7484b2684", "event_list": [{"event_type": "组织关系-裁员", "trigger": "裁员", "trigger_start_index": 15, "arguments": [{"argument_start_index": 17, "role": "裁员人数", "argument": "900余人", "alias": []}, {"argument_start_index": 10, "role": "时间", "argument": "5月份", "alias": []}], "class": "组织关系"}]}
2+
{"text": "前两天,被称为 “ 仅次于苹果的软件服务商 ” 的 Oracle( 甲骨文 )公司突然宣布在中国裁员。。", "id": "b90900665eee74f658eed125a321ee06", "event_list": [{"event_type": "组织关系-裁员", "trigger": "裁员", "trigger_start_index": 48, "arguments": [{"argument_start_index": 0, "role": "时间", "argument": "前两天", "alias": []}, {"argument_start_index": 4, "role": "裁员方", "argument": "被称为 “ 仅次于苹果的软件服务商 ” 的 Oracle( 甲骨文 )公司", "alias": []}], "class": "组织关系"}]}
3+
{"text": "不仅仅是中国IT企业在裁员,为何500强的甲骨文也发生了全球裁员", "id": "c970d67bb8d3e57db77c4dbbdfbe9769", "event_list": [{"event_type": "组织关系-裁员", "trigger": "裁员", "trigger_start_index": 11, "arguments": [{"argument_start_index": 4, "role": "裁员方", "argument": "中国IT企业", "alias": []}], "class": "组织关系"}, {"event_type": "组织关系-裁员", "trigger": "裁员", "trigger_start_index": 30, "arguments": [{"argument_start_index": 16, "role": "裁员方", "argument": "500强的甲骨文", "alias": []}], "class": "组织关系"}]}
4+
{"text": "据猛龙随队记者Josh Lewenberg报道,消息人士透露,猛龙已将前锋萨加巴-科纳特裁掉。此前他与猛龙签下了一份Exhibit 10合同。在被裁掉后,科纳特下赛季大概率将前往猛龙的发展联盟球队效力。", "id": "8a440ade8a8bac469e0357cc519ec9c0", "event_list": [{"event_type": "组织关系-裁员", "trigger": "裁掉", "trigger_start_index": 44, "arguments": [{"argument_start_index": 31, "role": "裁员方", "argument": "猛龙", "alias": []}], "class": "组织关系"}, {"event_type": "组织关系-加盟", "trigger": "签下", "trigger_start_index": 53, "arguments": [{"argument_start_index": 35, "role": "加盟者", "argument": "前锋萨加巴-科纳特", "alias": []}, {"argument_start_index": 51, "role": "所加盟组织", "argument": "猛龙", "alias": []}], "class": "组织关系"}, {"event_type": "组织关系-裁员", "trigger": "裁掉", "trigger_start_index": 73, "arguments": [{"argument_start_index": 31, "role": "裁员方", "argument": "猛龙", "alias": []}], "class": "组织关系"}]}
5+
{"text": "冠军射手被裁掉,欲加入湖人队,但湖人却无意,冠军射手何去何从", "id": "80691881637ea5a0fee0ad01a689ce4a", "event_list": [{"event_type": "组织关系-裁员", "trigger": "裁掉", "trigger_start_index": 5, "arguments": [], "class": "组织关系"}]}
6+
{"text": "中关村在线消息:据国外媒体报道,著名的国际网约车公司Uber进行首次大幅度裁员,全球裁员人数约为400人,主要集中在市场营销部门,这是Uber自2009年成立以来最大规模的裁员行动。", "id": "312d8e8b74cf4d1ac548143b27c949a3", "event_list": [{"event_type": "组织关系-裁员", "trigger": "裁员", "trigger_start_index": 37, "arguments": [{"argument_start_index": 16, "role": "裁员方", "argument": "著名的国际网约车公司Uber", "alias": []}, {"argument_start_index": 46, "role": "裁员人数", "argument": "约为400人", "alias": []}], "class": "组织关系"}]}
7+
{"text": "6月7日报道,IBM将裁员超过1000人。IBM周四确认,将裁减一千多人。据知情人士称,此次裁员将影响到约1700名员工,约占IBM全球逾34万员工中的0.5%。IBM股价今年累计上涨16%,但该公司4月发布的财报显示,一季度营收下降5%,低于市场预期。", "id": "bc436949df35727c97d96c2fee166113", "event_list": [{"event_type": "组织关系-裁员", "trigger": "裁员", "trigger_start_index": 11, "arguments": [{"argument_start_index": 0, "role": "时间", "argument": "6月7日", "alias": []}, {"argument_start_index": 7, "role": "裁员方", "argument": "IBM", "alias": []}, {"argument_start_index": 53, "role": "裁员人数", "argument": "1700名员工", "alias": []}], "class": "组织关系"}]}
8+
{"text": "41岁程序员被裁 北京有1500万的房产,网友:去跑滴滴在出租两套", "id": "319284cdc1cd75ee0033ca61dbb96752", "event_list": [{"event_type": "组织关系-裁员", "trigger": "被裁", "trigger_start_index": 6, "arguments": [], "class": "组织关系"}]}
9+
{"text": "计算机行业大变革?甲骨文中国区裁员,IBM收购红帽公司", "id": "4acb307de2f61a9989c7616af6ff9f10", "event_list": [{"event_type": "组织关系-裁员", "trigger": "裁员", "trigger_start_index": 15, "arguments": [{"argument_start_index": 9, "role": "裁员方", "argument": "甲骨文中国区", "alias": []}], "class": "组织关系"}, {"event_type": "财经/交易-出售/收购", "trigger": "收购", "trigger_start_index": 21, "arguments": [{"argument_start_index": 18, "role": "收购方", "argument": "IBM", "alias": []}, {"argument_start_index": 23, "role": "出售方", "argument": "红帽公司", "alias": []}], "class": "财经/交易"}]}
10+
{"text": "近期,蔚来美国裁员 70 人,其中有 20 人位于圣何塞的北美总部办公室和研发中心,50 人位于旧金山办公室,此外,旧金山办公室也在这次裁员中正式关闭。", "id": "7c1f296c17f91ed0bf6fb0620367b0fa", "event_list": [{"event_type": "组织关系-裁员", "trigger": "裁员", "trigger_start_index": 7, "arguments": [{"argument_start_index": 0, "role": "时间", "argument": "近期", "alias": []}, {"argument_start_index": 3, "role": "裁员方", "argument": "蔚来", "alias": []}, {"argument_start_index": 10, "role": "裁员人数", "argument": "70 人", "alias": []}], "class": "组织关系"}]}

0 commit comments

Comments
 (0)