Skip to content

Commit c22b912

Browse files
authored
Merge pull request RVC-Project#1850 from RVC-Project/dev
chore(sync): merge dev into main
2 parents 8c0cec1 + 3dcd788 commit c22b912

Some content is hidden

Large Commits have some content hidden by default. Use the searchbox below for content that may be hidden.

53 files changed

+1879
-4545
lines changed

.env

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -5,4 +5,5 @@ no_proxy = localhost, 127.0.0.1, ::1
55
weight_root = assets/weights
66
weight_uvr5_root = assets/uvr5_weights
77
index_root = logs
8+
outside_index_root = assets/indices
89
rmvpe_root = assets/rmvpe

.github/workflows/unitest.yml

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -33,4 +33,4 @@ jobs:
3333
python infer/modules/train/preprocess.py logs/mute/0_gt_wavs 48000 8 logs/mi-test True 3.7
3434
touch logs/mi-test/extract_f0_feature.log
3535
python infer/modules/train/extract/extract_f0_print.py logs/mi-test $(nproc) pm
36-
python infer/modules/train/extract_feature_print.py cpu 1 0 0 logs/mi-test v1
36+
python infer/modules/train/extract_feature_print.py cpu 1 0 0 logs/mi-test v1 True

.gitignore

Lines changed: 5 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -21,3 +21,8 @@ rmvpe.pt
2121

2222
# To set a Python version for the project
2323
.tool-versions
24+
25+
/runtime
26+
/assets/weights/*
27+
ffmpeg.*
28+
ffprobe.*

README.md

Lines changed: 70 additions & 97 deletions
Original file line numberDiff line numberDiff line change
@@ -14,35 +14,28 @@
1414

1515
[![Discord](https://img.shields.io/badge/RVC%20Developers-Discord-7289DA?style=for-the-badge&logo=discord&logoColor=white)](https://discord.gg/HcsmBBGyVk)
1616

17-
[**更新日志**](https://github.com/RVC-Project/Retrieval-based-Voice-Conversion-WebUI/blob/main/docs/Changelog_CN.md) | [**常见问题解答**](https://github.com/RVC-Project/Retrieval-based-Voice-Conversion-WebUI/wiki/%E5%B8%B8%E8%A7%81%E9%97%AE%E9%A2%98%E8%A7%A3%E7%AD%94) | [**AutoDL·5毛钱训练AI歌手**](https://github.com/RVC-Project/Retrieval-based-Voice-Conversion-WebUI/wiki/Autodl%E8%AE%AD%E7%BB%83RVC%C2%B7AI%E6%AD%8C%E6%89%8B%E6%95%99%E7%A8%8B) | [**对照实验记录**](https://github.com/RVC-Project/Retrieval-based-Voice-Conversion-WebUI/wiki/Autodl%E8%AE%AD%E7%BB%83RVC%C2%B7AI%E6%AD%8C%E6%89%8B%E6%95%99%E7%A8%8B](https://github.com/RVC-Project/Retrieval-based-Voice-Conversion-WebUI/wiki/%E5%AF%B9%E7%85%A7%E5%AE%9E%E9%AA%8C%C2%B7%E5%AE%9E%E9%AA%8C%E8%AE%B0%E5%BD%95)) | [**在线演示**](https://modelscope.cn/studios/FlowerCry/RVCv2demo)
17+
[**更新日志**](https://github.com/RVC-Project/Retrieval-based-Voice-Conversion-WebUI/blob/main/docs/cn/Changelog_CN.md) | [**常见问题解答**](https://github.com/RVC-Project/Retrieval-based-Voice-Conversion-WebUI/wiki/%E5%B8%B8%E8%A7%81%E9%97%AE%E9%A2%98%E8%A7%A3%E7%AD%94) | [**AutoDL·5毛钱训练AI歌手**](https://github.com/RVC-Project/Retrieval-based-Voice-Conversion-WebUI/wiki/Autodl%E8%AE%AD%E7%BB%83RVC%C2%B7AI%E6%AD%8C%E6%89%8B%E6%95%99%E7%A8%8B) | [**对照实验记录**](https://github.com/RVC-Project/Retrieval-based-Voice-Conversion-WebUI/wiki/Autodl%E8%AE%AD%E7%BB%83RVC%C2%B7AI%E6%AD%8C%E6%89%8B%E6%95%99%E7%A8%8B](https://github.com/RVC-Project/Retrieval-based-Voice-Conversion-WebUI/wiki/%E5%AF%B9%E7%85%A7%E5%AE%9E%E9%AA%8C%C2%B7%E5%AE%9E%E9%AA%8C%E8%AE%B0%E5%BD%95)) | [**在线演示**](https://modelscope.cn/studios/FlowerCry/RVCv2demo)
18+
19+
</div>
20+
21+
------
1822

1923
[**English**](./docs/en/README.en.md) | [**中文简体**](./README.md) | [**日本語**](./docs/jp/README.ja.md) | [**한국어**](./docs/kr/README.ko.md) ([**韓國語**](./docs/kr/README.ko.han.md)) | [**Français**](./docs/fr/README.fr.md)| [**Türkçe**](./docs/tr/README.tr.md)
2024

21-
</div>
25+
点此查看我们的[演示视频](https://www.bilibili.com/video/BV1pm4y1z7Gm/) !
26+
27+
训练推理界面:go-web.bat
28+
29+
![image](https://github.com/RVC-Project/Retrieval-based-Voice-Conversion-WebUI/assets/129054828/092e5c12-0d49-4168-a590-0b0ef6a4f630)
30+
31+
实时变声界面:go-realtime-gui.bat
32+
33+
![image](https://github.com/RVC-Project/Retrieval-based-Voice-Conversion-WebUI/assets/129054828/143246a9-8b42-4dd1-a197-430ede4d15d7)
2234

2335
> 底模使用接近50小时的开源高质量VCTK训练集训练,无版权方面的顾虑,请大家放心使用
2436
2537
> 请期待RVCv3的底模,参数更大,数据更大,效果更好,基本持平的推理速度,需要训练数据量更少。
2638
27-
<table>
28-
<tr>
29-
<td align="center">训练推理界面</td>
30-
<td align="center">实时变声界面</td>
31-
</tr>
32-
<tr>
33-
<td align="center"><img src="https://github.com/RVC-Project/Retrieval-based-Voice-Conversion-WebUI/assets/129054828/092e5c12-0d49-4168-a590-0b0ef6a4f630"></td>
34-
<td align="center"><img src="https://github.com/RVC-Project/Retrieval-based-Voice-Conversion-WebUI/assets/129054828/730b4114-8805-44a1-ab1a-04668f3c30a6"></td>
35-
</tr>
36-
<tr>
37-
<td align="center">go-web.bat</td>
38-
<td align="center">go-realtime-gui.bat</td>
39-
</tr>
40-
<tr>
41-
<td align="center">可以自由选择想要执行的操作。</td>
42-
<td align="center">我们已经实现端到端170ms延迟。如使用ASIO输入输出设备,已能实现端到端90ms延迟,但非常依赖硬件驱动支持。</td>
43-
</tr>
44-
</table>
45-
4639
## 简介
4740
本仓库具有以下特点
4841
+ 使用top1检索替换输入源特征为训练集特征来杜绝音色泄漏
@@ -54,52 +47,47 @@
5447
+ 使用最先进的[人声音高提取算法InterSpeech2023-RMVPE](#参考项目)根绝哑音问题。效果最好(显著地)但比crepe_full更快、资源占用更小
5548
+ A卡I卡加速支持
5649

57-
点此查看我们的[演示视频](https://www.bilibili.com/video/BV1pm4y1z7Gm/) !
58-
5950
## 环境配置
6051
以下指令需在 Python 版本大于3.8的环境中执行。
6152

62-
### Windows/Linux/MacOS等平台通用方法
63-
下列方法任选其一。
64-
#### 1. 通过 pip 安装依赖
65-
1. 安装Pytorch及其核心依赖,若已安装则跳过。参考自: https://pytorch.org/get-started/locally/
53+
(Windows/Linux)
54+
首先通过 pip 安装主要依赖:
6655
```bash
56+
# 安装Pytorch及其核心依赖,若已安装则跳过
57+
# 参考自: https://pytorch.org/get-started/locally/
6758
pip install torch torchvision torchaudio
68-
```
69-
2. 如果是 win 系统 + Nvidia Ampere 架构(RTX30xx),根据 #21 的经验,需要指定 pytorch 对应的 cuda 版本
70-
```bash
71-
pip install torch torchvision torchaudio --index-url https://download.pytorch.org/whl/cu117
72-
```
73-
3. 根据自己的显卡安装对应依赖
74-
- N卡
75-
```bash
76-
pip install -r requirements.txt
77-
```
78-
- A卡/I卡
79-
```bash
80-
pip install -r requirements-dml.txt
81-
```
82-
- A卡ROCM(Linux)
83-
```bash
84-
pip install -r requirements-amd.txt
85-
```
86-
- I卡IPEX(Linux)
87-
```bash
88-
pip install -r requirements-ipex.txt
59+
60+
#如果是win系统+Nvidia Ampere架构(RTX30xx),根据 #21 的经验,需要指定pytorch对应的cuda版本
61+
#pip install torch torchvision torchaudio --index-url https://download.pytorch.org/whl/cu117
8962
```
9063

91-
#### 2. 通过 poetry 来安装依赖
92-
安装 Poetry 依赖管理工具,若已安装则跳过。参考自: https://python-poetry.org/docs/#installation
64+
可以使用 poetry 来安装依赖:
9365
```bash
66+
# 安装 Poetry 依赖管理工具, 若已安装则跳过
67+
# 参考自: https://python-poetry.org/docs/#installation
9468
curl -sSL https://install.python-poetry.org | python3 -
69+
70+
# 通过poetry安装依赖
71+
poetry install
9572
```
96-
通过poetry安装依赖
73+
74+
你也可以通过 pip 来安装依赖:
9775
```bash
98-
poetry install
76+
N卡:
77+
pip install -r requirements.txt
78+
79+
A卡/I卡:
80+
pip install -r requirements-dml.txt
81+
82+
A卡Rocm(Linux):
83+
pip install -r requirements-amd.txt
84+
85+
I卡IPEX(Linux):
86+
pip install -r requirements-ipex.txt
9987
```
10088

101-
### MacOS
102-
可以通过 `run.sh` 来安装依赖
89+
------
90+
Mac 用户可以通过 `run.sh` 来安装依赖
10391
```bash
10492
sh ./run.sh
10593
```
@@ -109,48 +97,48 @@ RVC需要其他一些预模型来推理和训练。
10997

11098
你可以从我们的[Hugging Face space](https://huggingface.co/lj1995/VoiceConversionWebUI/tree/main/)下载到这些模型。
11199

112-
### 1. 下载 assets
113-
以下是一份清单,包括了所有RVC所需的预模型和其他文件的名称。你可以在`tools`文件夹找到下载它们的脚本。
100+
以下是一份清单,包括了所有RVC所需的预模型和其他文件的名称:
101+
```bash
102+
./assets/hubert/hubert_base.pt
114103

115-
- ./assets/hubert/hubert_base.pt
104+
./assets/pretrained
116105

117-
- ./assets/pretrained
106+
./assets/uvr5_weights
118107

119-
- ./assets/uvr5_weights
108+
想测试v2版本模型的话,需要额外下载
120109

121-
想使用v2版本模型的话,需要额外下载
110+
./assets/pretrained_v2
122111

123-
- ./assets/pretrained_v2
112+
如果你正在使用Windows,则你可能需要这个文件,若ffmpeg和ffprobe已安装则跳过; ubuntu/debian 用户可以通过apt install ffmpeg来安装这2个库, Mac 用户则可以通过brew install ffmpeg来安装 (需要预先安装brew)
124113

125-
### 2. 安装 ffmpeg
126-
若ffmpeg和ffprobe已安装则跳过。
114+
./ffmpeg
127115

128-
#### Ubuntu/Debian 用户
129-
```bash
130-
sudo apt install ffmpeg
131-
```
132-
#### MacOS 用户
133-
```bash
134-
brew install ffmpeg
135-
```
136-
#### Windwos 用户
137-
下载后放置在根目录。
138-
- 下载[ffmpeg.exe](https://huggingface.co/lj1995/VoiceConversionWebUI/blob/main/ffmpeg.exe)
116+
https://huggingface.co/lj1995/VoiceConversionWebUI/blob/main/ffmpeg.exe
139117

140-
- 下载[ffprobe.exe](https://huggingface.co/lj1995/VoiceConversionWebUI/blob/main/ffprobe.exe)
118+
./ffprobe
141119

142-
### 3. 下载 rmvpe 人声音高提取算法所需文件
120+
https://huggingface.co/lj1995/VoiceConversionWebUI/blob/main/ffprobe.exe
143121

144-
如果你想使用最新的RMVPE人声音高提取算法,则你需要下载音高提取模型参数并放置于RVC根目录
122+
如果你想使用最新的RMVPE人声音高提取算法,则你需要下载音高提取模型参数并放置于RVC根目录
145123

146-
- 下载[rmvpe.pt](https://huggingface.co/lj1995/VoiceConversionWebUI/blob/main/rmvpe.pt)
124+
https://huggingface.co/lj1995/VoiceConversionWebUI/blob/main/rmvpe.pt
147125

148-
#### 下载 rmvpe 的 dml 环境(可选, A卡/I卡用户)
126+
A卡I卡用户需要的dml环境要请下载
149127

150-
- 下载[rmvpe.onnx](https://huggingface.co/lj1995/VoiceConversionWebUI/blob/main/rmvpe.onnx)
128+
https://huggingface.co/lj1995/VoiceConversionWebUI/blob/main/rmvpe.onnx
151129

152-
### 4. AMD显卡Rocm(可选, 仅Linux)
130+
```
131+
之后使用以下指令来启动WebUI:
132+
```bash
133+
python infer-web.py
134+
```
135+
如果你正在使用Windows 或 macOS,你可以直接下载并解压`RVC-beta.7z`,前者可以运行`go-web.bat`以启动WebUI,后者则运行命令`sh ./run.sh`以启动WebUI。
153136

137+
对于需要使用IPEX技术的I卡用户,请先在终端执行`source /opt/intel/oneapi/setvars.sh`(仅Linux)。
138+
139+
仓库内还有一份`小白简易教程.doc`以供参考。
140+
141+
## AMD显卡Rocm相关(仅Linux)
154142
如果你想基于AMD的Rocm技术在Linux系统上运行RVC,请先在[这里](https://rocm.docs.amd.com/en/latest/deploy/linux/os-native/install.html)安装所需的驱动。
155143

156144
若你使用的是Arch Linux,可以使用pacman来安装所需驱动:
@@ -167,25 +155,10 @@ export HSA_OVERRIDE_GFX_VERSION=10.3.0
167155
sudo usermod -aG render $USERNAME
168156
sudo usermod -aG video $USERNAME
169157
````
170-
171-
## 开始使用
172-
### 直接启动
173-
使用以下指令来启动 WebUI
158+
之后运行WebUI:
174159
```bash
175160
python infer-web.py
176161
```
177-
### 使用整合包
178-
下载并解压`RVC-beta.7z`
179-
#### Windows 用户
180-
双击`go-web.bat`
181-
#### MacOS 用户
182-
```bash
183-
sh ./run.sh
184-
```
185-
### 对于需要使用IPEX技术的I卡用户(仅Linux)
186-
```bash
187-
source /opt/intel/oneapi/setvars.sh
188-
```
189162

190163
## 参考项目
191164
+ [ContentVec](https://github.com/auspicious3000/contentvec/)

Retrieval_based_Voice_Conversion_WebUI.ipynb

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -290,7 +290,7 @@
290290
"\n",
291291
"!python3 extract_f0_print.py logs/{MODELNAME} {THREADCOUNT} {ALGO}\n",
292292
"\n",
293-
"!python3 extract_feature_print.py cpu 1 0 0 logs/{MODELNAME}"
293+
"!python3 extract_feature_print.py cpu 1 0 0 logs/{MODELNAME} True"
294294
]
295295
},
296296
{

Retrieval_based_Voice_Conversion_WebUI_v2.ipynb

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -309,7 +309,7 @@
309309
"\n",
310310
"!python3 extract_f0_print.py logs/{MODELNAME} {THREADCOUNT} {ALGO}\n",
311311
"\n",
312-
"!python3 extract_feature_print.py cpu 1 0 0 logs/{MODELNAME}"
312+
"!python3 extract_feature_print.py cpu 1 0 0 logs/{MODELNAME} True"
313313
]
314314
},
315315
{

assets/indices/.gitignore

Lines changed: 2 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,2 @@
1+
*
2+
!.gitignore

configs/config.json

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -1 +1 @@
1-
{"pth_path": "assets/weights/kikiV1.pth", "index_path": "logs/kikiV1.index", "sg_input_device": "VoiceMeeter Output (VB-Audio Vo (MME)", "sg_output_device": "VoiceMeeter Input (VB-Audio Voi (MME)", "sr_type": "sr_model", "threhold": -60.0, "pitch": 12.0, "rms_mix_rate": 0.5, "index_rate": 0.0, "block_time": 0.2, "crossfade_length": 0.08, "extra_time": 2.00, "n_cpu": 4.0, "use_jit": false, "use_pv": false, "f0method": "fcpe"}
1+
{"pth_path": "assets/weights/kikiV1.pth", "index_path": "logs/kikiV1.index", "sg_hostapi": "MME", "sg_wasapi_exclusive": false, "sg_input_device": "VoiceMeeter Output (VB-Audio Vo", "sg_output_device": "VoiceMeeter Input (VB-Audio Voi", "sr_type": "sr_device", "threhold": -60.0, "pitch": 12.0, "rms_mix_rate": 0.5, "index_rate": 0.0, "block_time": 0.15, "crossfade_length": 0.08, "extra_time": 2.0, "n_cpu": 4.0, "use_jit": false, "use_pv": false, "f0method": "fcpe"}

configs/config.py

Lines changed: 16 additions & 13 deletions
Original file line numberDiff line numberDiff line change
@@ -2,6 +2,7 @@
22
import os
33
import sys
44
import json
5+
import shutil
56
from multiprocessing import cpu_count
67

78
import torch
@@ -58,13 +59,17 @@ def __init__(self):
5859
self.dml,
5960
) = self.arg_parse()
6061
self.instead = ""
62+
self.preprocess_per = 3.7
6163
self.x_pad, self.x_query, self.x_center, self.x_max = self.device_config()
6264

6365
@staticmethod
6466
def load_config_json() -> dict:
6567
d = {}
6668
for config_file in version_config_list:
67-
with open(f"configs/{config_file}", "r") as f:
69+
p = f"configs/inuse/{config_file}"
70+
if not os.path.exists(p):
71+
shutil.copy(f"configs/{config_file}", p)
72+
with open(f"configs/inuse/{config_file}", "r") as f:
6873
d[config_file] = json.load(f)
6974
return d
7075

@@ -123,15 +128,13 @@ def has_xpu() -> bool:
123128
def use_fp32_config(self):
124129
for config_file in version_config_list:
125130
self.json_config[config_file]["train"]["fp16_run"] = False
126-
with open(f"configs/{config_file}", "r") as f:
131+
with open(f"configs/inuse/{config_file}", "r") as f:
127132
strr = f.read().replace("true", "false")
128-
with open(f"configs/{config_file}", "w") as f:
133+
with open(f"configs/inuse/{config_file}", "w") as f:
129134
f.write(strr)
130-
with open("infer/modules/train/preprocess.py", "r") as f:
131-
strr = f.read().replace("3.7", "3.0")
132-
with open("infer/modules/train/preprocess.py", "w") as f:
133-
f.write(strr)
134-
print("overwrite preprocess and configs.json")
135+
logger.info("overwrite " + config_file)
136+
self.preprocess_per = 3.0
137+
logger.info("overwrite preprocess_per to %d" % (self.preprocess_per))
135138

136139
def device_config(self) -> tuple:
137140
if torch.cuda.is_available():
@@ -161,10 +164,7 @@ def device_config(self) -> tuple:
161164
+ 0.4
162165
)
163166
if self.gpu_mem <= 4:
164-
with open("infer/modules/train/preprocess.py", "r") as f:
165-
strr = f.read().replace("3.7", "3.0")
166-
with open("infer/modules/train/preprocess.py", "w") as f:
167-
f.write(strr)
167+
self.preprocess_per = 3.0
168168
elif self.has_mps():
169169
logger.info("No supported Nvidia GPU found")
170170
self.device = self.instead = "mps"
@@ -247,5 +247,8 @@ def device_config(self) -> tuple:
247247
)
248248
except:
249249
pass
250-
print("is_half:%s, device:%s" % (self.is_half, self.device))
250+
logger.info(
251+
"Half-precision floating-point: %s, device: %s"
252+
% (self.is_half, self.device)
253+
)
251254
return x_pad, x_query, x_center, x_max

configs/inuse/.gitignore

Lines changed: 4 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,4 @@
1+
*
2+
!.gitignore
3+
!v1
4+
!v2

0 commit comments

Comments
 (0)