By admin, 11 十月, 2024

Common Voice Dataset

We’re building an open source, multi-language dataset of voices that anyone can use to train speech-enabled applications.

Includes both Cantonese and Mandarin Chinese!!

抽样粤语(Chinese Hong Kong)语音数据的质量不好,录音人声音不够清晰(不是声优级别的声音),背景噪音较大,标记文件有错。另外还有个Cantonese的分类。

感觉可能用现有的TTS生成数据质量会好得多。

标签

By admin, 11 十月, 2024

1. Open Visual Studio Code.  
2. Go to the Extensions tab (Ctrl+Shift+X) and install the Python extension.  
3. Go to File > Preferences > Settings.  
4. In the left pane, search for “conda”.  
5. In the right pane, search for “python.condaPath” and set the path to your Anaconda installation.  
6. In the left pane, search for “conda env”.  
7. In the right pane, search for “python.condaEnvFile” and set the path to your environment file.  

标签

By admin, 28 八月, 2024

安装以下模块,可以自动提取模块里待翻译字符串:

https://www.drupal.org/project/potx

这里有个翻译文件编辑器(不过看起来用处不大,纯文本编辑器就可以了):https://poedit.net/

提取后的po文件需要改一行才能导入,否则会报错:

"Plural-Forms: nplurals=2; plural=(n!=1);\n"

对于原文是中文,要翻译成英文的情况,可以先把开发系统设置成en-gb,导出po文件的时候包含翻译(en是不能包含翻译的,这不利于追加修改),然后导入到以en为默认语言的目标系统。

标签

最新评论