Skip to content

Commit af16c95

Browse files
authored
[PIR] Update simcse to apply pir (#10396)
1 parent 868f6ee commit af16c95

File tree

3 files changed

+17
-3
lines changed

3 files changed

+17
-3
lines changed

slm/applications/neural_search/recall/simcse/README.md

Lines changed: 6 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -149,6 +149,12 @@ simcse/
149149

150150
<a name="模型训练"></a>
151151

152+
下载数据集并解压到当前目录:
153+
```shell
154+
wget https://bj.bcebos.com/v1/paddlenlp/data/literature_search_data.zip
155+
unzip literature_search_data.zip
156+
```
157+
152158
## 5. 模型训练
153159

154160
**语义索引预训练模型下载链接:**

slm/applications/neural_search/recall/simcse/deploy/python/deploy.sh

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -12,4 +12,4 @@
1212
# See the License for the specific language governing permissions and
1313
# limitations under the License.
1414

15-
python predict.py --model_dir=../../output
15+
python deploy/python/predict.py --model_dir=./output

slm/applications/neural_search/recall/simcse/deploy/python/predict.py

Lines changed: 10 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -16,12 +16,17 @@
1616
import os
1717
import sys
1818

19+
import numpy as np
1920
import paddle
2021
from paddle import inference
2122
from scipy import spatial
2223

2324
from paddlenlp.data import Pad, Tuple
2425
from paddlenlp.transformers import AutoTokenizer
26+
from paddlenlp.utils.env import (
27+
PADDLE_INFERENCE_MODEL_SUFFIX,
28+
PADDLE_INFERENCE_WEIGHTS_SUFFIX,
29+
)
2530
from paddlenlp.utils.log import logger
2631

2732
sys.path.append(".")
@@ -90,8 +95,8 @@ def __init__(
9095
self.max_seq_length = max_seq_length
9196
self.batch_size = batch_size
9297

93-
model_file = model_dir + "/inference.get_pooled_embedding.pdmodel"
94-
params_file = model_dir + "/inference.get_pooled_embedding.pdiparams"
98+
model_file = model_dir + f"/inference{PADDLE_INFERENCE_MODEL_SUFFIX}"
99+
params_file = model_dir + f"/inference{PADDLE_INFERENCE_WEIGHTS_SUFFIX}"
95100
if not os.path.exists(model_file):
96101
raise ValueError("not find model file path {}".format(model_file))
97102
if not os.path.exists(params_file):
@@ -238,6 +243,9 @@ def predict(self, data, tokenizer):
238243

239244
if args.benchmark:
240245
self.autolog.times.end(stamp=True)
246+
247+
query_logits = np.atleast_2d(query_logits)
248+
title_logits = np.atleast_2d(title_logits)
241249
result = [float(1 - spatial.distance.cosine(arr1, arr2)) for arr1, arr2 in zip(query_logits, title_logits)]
242250
return result
243251

0 commit comments

Comments
 (0)