Skip to content

Commit 7883aa6

Browse files
authored
update the stage of run.sh and synthesize_e2e.sh, to be clear (#4057)
* run.sh修改:为 synthesize 和 synthesize_e2e 添加 --stage 参数控制 vocoder 模型选择,REAMDE.md修改:补充 stage 参数说明,明确 vocoder 选择逻辑 * 添加run.sh中stage参数相关的注释 * HiFiGAN改为MultiBand MelGAN * cmsc文件改回原位(No.15不修改),这里只对No.6做修改 * update the stage of run.sh and synthesize_e2e.sh, to be clear * fix the md
1 parent a9d9d5f commit 7883aa6

File tree

1 file changed

+2
-2
lines changed

1 file changed

+2
-2
lines changed

examples/aishell3/ernie_sat/README.md

Lines changed: 2 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -13,7 +13,7 @@ In ERNIE-SAT, we propose two innovations:
1313
## Dataset
1414
### Download and Extract
1515
Download AISHELL-3 from it's [Official Website](http://www.aishelltech.com/aishell_3) and extract it to `~/datasets`. Then the dataset is in the directory `~/datasets/data_aishell3`.
16-
16+
1717
### Get MFA Result and Extract
1818
We use [MFA2.x](https://github.com/MontrealCorpusTools/Montreal-Forced-Aligner) to get durations for aishell3_fastspeech2.
1919
You can download from here [aishell3_alignment_tone.tar.gz](https://paddlespeech.cdn.bcebos.com/MFA/AISHELL-3/with_tone/aishell3_alignment_tone.tar.gz), or train your MFA model reference to [mfa example](https://github.com/PaddlePaddle/PaddleSpeech/tree/develop/examples/other/mfa) (use MFA1.x now) of our repo.
@@ -138,7 +138,7 @@ You can check the text of downloaded wavs in `source/README.md`.
138138
```bash
139139
./run.sh --stage 3 --stop-stage 3 --gpus 0
140140
```
141-
`stage 3` of `run.sh` calls `local/synthesize_e2e.sh`, `stage 0` of it is **Speech Synthesis** and `stage 1` of it is **Speech Editing**.
141+
`stage 3` of `run.sh` calls `local/synthesize_e2e.sh`. `synthesize_e2e.sh` is a script for performing both **Speech Synthesis** and **Speech Editing** tasks by default. It converts input text into speech for synthesis and modifies existing speech based on new text content for editing.
142142

143143
You can modify `--wav_path``--old_str` and `--new_str` yourself, `--old_str` should be the text corresponding to the audio of `--wav_path`, `--new_str` should be designed according to `--task_name`, both `--source_lang` and `--target_lang` should be `zh` for model trained with AISHELL3 dataset.
144144
## Pretrained Model

0 commit comments

Comments
 (0)