Skip to content

Commit 62f988d

Browse files
committed
stylistic update based on Codacy
1 parent 3e2ff96 commit 62f988d

File tree

9 files changed

+20
-22
lines changed

9 files changed

+20
-22
lines changed

notebooks/README.md

Lines changed: 3 additions & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -1,14 +1,14 @@
11
# 笔记本示例 Notebooks
22

3-
### ceval_example_for_chinese_alpaca.ipynb
3+
### ceval_example_for_chinese_alpaca.ipynb
44

55
利用Chinese Alpaca模型解码C-Eval数据集的示例。
66

77
Example of decoding C-Eval dataset with Chinese Alpaca.
88

99
建议查看Colab上的最新版 / Check latest notebook:<a href="https://colab.research.google.com/drive/12YewimRT7JuqJGOejxN7YG8jq2de4DnF?usp=sharing" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>
1010

11-
### convert_and_quantize_chinese_llama_and_alpaca.ipynb
11+
### convert_and_quantize_chinese_llama_and_alpaca.ipynb
1212

1313
Colab上的转换和量化中文LLaMA/Alpaca(含Plus版本)的运行示例(仅供流程参考)。
1414

@@ -40,7 +40,7 @@ Example of running the Gradio demo on Colab.
4040

4141
在Colab中打开 / Open the notebook in Colab: [![Open In Colab](https://colab.research.google.com/assets/colab-badge.svg)](https://colab.research.google.com/github/ymcui/Chinese-LLaMA-Alpaca/blob/main/notebooks/gradio_web_demo.ipynb)
4242

43-
### legacy/
43+
### legacy/
4444

4545
旧版notebook,供参考,但不会再更新。
4646

scripts/README.md

Lines changed: 7 additions & 7 deletions
Original file line numberDiff line numberDiff line change
@@ -1,6 +1,6 @@
11
# 代码与脚本 Code and Scripts
22

3-
### training/
3+
### training/
44

55
预训练与指令精调代码,Wiki:
66

@@ -12,13 +12,13 @@ Pre-training and instruction finetuning code, Wiki:
1212
- Pre-training: https://github.com/ymcui/Chinese-LLaMA-Alpaca/wiki/Pretraining-Script
1313
- Instruction finetuning: https://github.com/ymcui/Chinese-LLaMA-Alpaca/wiki/SFT-Script
1414

15-
### inference/
15+
### inference/
1616

1717
使用🤗transformers进行推理,Wiki:[https://github.com/ymcui/Chinese-LLaMA-Alpaca/wiki/使用Transformers推理](https://github.com/ymcui/Chinese-LLaMA-Alpaca/wiki/使用Transformers推理)
1818

1919
Inference using 🤗transformers, Wiki: https://github.com/ymcui/Chinese-LLaMA-Alpaca/wiki/Inference-with-Transformers
2020

21-
### langchain/
21+
### langchain/
2222

2323
使用LangChain进行检索式问答和文本摘要的示例,Wiki:[https://github.com/ymcui/Chinese-LLaMA-Alpaca/wiki/与LangChain进行集成](https://github.com/ymcui/Chinese-LLaMA-Alpaca/wiki/与LangChain进行集成)
2424

@@ -30,25 +30,25 @@ Using LangChain for Retrieval QA and Summarization, Wiki: https://github.com/ymc
3030

3131
A server that implements OPENAI API using fastapi, Wiki: [https://github.com/ymcui/Chinese-LLaMA-Alpaca/wiki/API-Calls](https://github.com/ymcui/Chinese-LLaMA-Alpaca/wiki/API-Calls)
3232

33-
### merge_tokenizer/
33+
### merge_tokenizer/
3434

3535
中文词表扩充代码,Wiki: [https://github.com/ymcui/Chinese-LLaMA-Alpaca/wiki/训练细节#准备工作词表扩充](https://github.com/ymcui/Chinese-LLaMA-Alpaca/wiki/训练细节#准备工作词表扩充)
3636

3737
Code for extending Chinese vocabulary, Wiki: https://github.com/ymcui/Chinese-LLaMA-Alpaca/wiki/Training-Details#preparation-vocabulary-expansion
3838

39-
### merge_llama_with_chinese_lora.py
39+
### merge_llama_with_chinese_lora.py
4040

4141
合并LLaMA/Alpaca LoRA脚本,Wiki: [https://github.com/ymcui/Chinese-LLaMA-Alpaca/wiki/手动模型合并与转换](https://github.com/ymcui/Chinese-LLaMA-Alpaca/wiki/手动模型合并与转换)
4242

4343
Script for merging LLaMA/Alpaca LoRA. Wiki: https://github.com/ymcui/Chinese-LLaMA-Alpaca/wiki/Manual-Conversion
4444

45-
### merge_llama_with_chinese_lora_low_mem.py
45+
### merge_llama_with_chinese_lora_low_mem.py
4646

4747
(推荐)低资源版合并LLaMA/Alpaca LoRA脚本,Wiki: [https://github.com/ymcui/Chinese-LLaMA-Alpaca/wiki/手动模型合并与转换](https://github.com/ymcui/Chinese-LLaMA-Alpaca/wiki/手动模型合并与转换)
4848

4949
(recommended)Script for merging LLaMA/Alpaca LoRA (low-resource version). Wiki: https://github.com/ymcui/Chinese-LLaMA-Alpaca/wiki/Manual-Conversion
5050

51-
### crawl_prompt.py
51+
### crawl_prompt.py
5252

5353
指令数据爬取脚本,Wiki:[https://github.com/ymcui/Chinese-LLaMA-Alpaca/wiki/训练细节#训练数据](https://github.com/ymcui/Chinese-LLaMA-Alpaca/wiki/训练细节#训练数据)
5454

scripts/ceval/evaluator.py

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -26,7 +26,7 @@ def generate_few_shot_prompt(self, subject, dev_df):
2626
for i in range(k):
2727
prompt += self.format_example(dev_df.iloc[i, :])
2828
return prompt
29-
29+
3030
def eval_subject(self, subject_name, test_df, dev_df=None, few_shot=False, save_result_dir=None):
3131
pass
3232

scripts/langchain/langchain_sum.py

Lines changed: 2 additions & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -15,10 +15,9 @@
1515
from langchain import HuggingFacePipeline
1616
from langchain.text_splitter import RecursiveCharacterTextSplitter
1717
from langchain.prompts import PromptTemplate
18-
from langchain.docstore.document import Document
1918
from langchain.chains.summarize import load_summarize_chain
2019

21-
prompt_template = """Below is an instruction that describes a task.
20+
prompt_template = """Below is an instruction that describes a task.
2221
Write a response that appropriately completes the request.\n\n
2322
### Instruction:\n请为以下文字写一段摘要:\n{text}\n\n### Response: """
2423
refine_template = (
@@ -41,7 +40,7 @@
4140
device = torch.device(0)
4241
else:
4342
device = torch.device('cpu')
44-
43+
4544
text_splitter = RecursiveCharacterTextSplitter(chunk_size=600, chunk_overlap=100, length_function=len)
4645
with open(file_path) as f:
4746
text = f.read()

scripts/merge_llama_with_chinese_lora_low_mem.py

Lines changed: 2 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -210,7 +210,7 @@ def merge_shards(output_dir, num_shards: int):
210210
shards_merged = {}
211211
for d in shards_dicts:
212212
shards_merged |= d
213-
213+
214214
print(f"Saving the merged shard to " + os.path.join(output_dir, f"consolidated.0{i}.pth"))
215215
torch.save(shards_merged, os.path.join(output_dir, f"consolidated.0{i}.pth"))
216216

@@ -305,7 +305,7 @@ def merge_shards(output_dir, num_shards: int):
305305
print(f"merging {lora_key_A} and lora_B.weight form {tl_idx}-th LoRA weight to {k}")
306306
state_dict[k] += (
307307
transpose(
308-
t_and_l['state_dict'][lora_key_B].float()
308+
t_and_l['state_dict'][lora_key_B].float()
309309
@ t_and_l['state_dict'][lora_key_A].float(), t_and_l['fan_in_fan_out']) * t_and_l['scaling']
310310
)
311311
weight_size = state_dict[k].numel() * dtype_byte_size(state_dict[k].dtype)

scripts/merge_tokenizer/merge_tokenizers.py

Lines changed: 0 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -62,6 +62,5 @@
6262
text='''白日依山尽,黄河入海流。欲穷千里目,更上一层楼。
6363
The primary use of LLaMA is research on large language models, including'''
6464
print("Test text:\n",text)
65-
print
6665
print(f"Tokenized by LLaMA tokenizer:{llama_tokenizer.tokenize(text)}")
6766
print(f"Tokenized by Chinese-LLaMA tokenizer:{chinese_llama_tokenizer.tokenize(text)}")

scripts/openai_server_demo/README.md

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -116,7 +116,7 @@ json返回体:
116116

117117
`top_k`: 在随机采样(random sampling)时,前top_k高概率的token将作为候选token被随机采样。
118118

119-
`top_p`: 在随机采样(random sampling)时,累积概率超过top_p的token将作为候选token被随机采样,越低随机性越大,举个例子,当top_p设定为0.6时,概率前5的token概率分别为[0.23, 0.20, 0.18, 0.11, 0.10]时,前三个token的累积概率为0.61,那么第4个token将被过滤掉,只有前三的token将作为候选token被随机采样。
119+
`top_p`: 在随机采样(random sampling)时,累积概率超过top_p的token将作为候选token被随机采样,越低随机性越大,举个例子,当top_p设定为0.6时,概率前5的token概率分别为{0.23, 0.20, 0.18, 0.11, 0.10}时,前三个token的累积概率为0.61,那么第4个token将被过滤掉,只有前三的token将作为候选token被随机采样。
120120

121121
`repetition_penalty`: 重复惩罚,具体细节可以参考这篇文章:<https://arxiv.org/pdf/1909.05858.pdf>
122122

scripts/openai_server_demo/openai_api_server.py

Lines changed: 2 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -182,7 +182,7 @@ async def create_chat_completion(request: ChatCompletionRequest):
182182
else:
183183
msgs = [ChatMessage(role=x['role'],content=x['message']) for x in msgs]
184184
output = predict(
185-
input=msgs,
185+
input=msgs,
186186
max_new_tokens=request.max_tokens,
187187
top_p=request.top_p,
188188
top_k=request.top_k,
@@ -200,7 +200,7 @@ async def create_chat_completion(request: ChatCompletionRequest):
200200
async def create_completion(request: CompletionRequest):
201201
"""Creates a completion"""
202202
output = predict(
203-
input=request.prompt,
203+
input=request.prompt,
204204
max_new_tokens=request.max_tokens,
205205
top_p=request.top_p,
206206
top_k=request.top_k,

scripts/training/run_clm_sft_with_peft.py

Lines changed: 2 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -322,8 +322,8 @@ def main():
322322
files = [os.path.join(path,file.name) for file in path.glob("*.json")]
323323
logger.info(f"training files: {' '.join(files)}")
324324
train_dataset = buid_instruction_dataset(
325-
data_path=files,
326-
tokenizer=tokenizer,
325+
data_path=files,
326+
tokenizer=tokenizer,
327327
max_seq_length=data_args.max_seq_length,
328328
data_cache_dir = None,
329329
preprocessing_num_workers = data_args.preprocessing_num_workers)

0 commit comments

Comments
 (0)