首页 | CTO笔记

By admin , 18 十月, 2024

大规模中文自然语言处理语料

https://github.com/brightmart/nlp_chinese_corpus

CLUECorpus2020：https://github.com/CLUEbenchmark/CLUECorpus2020

Coqui TTS

🐸（青蛙）TTS

For the first time, tts need to download a data model. If the download fails, it will fail for the second time. We need to remove empty data model folder from path below to make it do a retry download:

/home/hgneng/.local/share/tts/

Librosa

audio and music processing in Python

希尔贝壳AISHELL-3 高保真中文语音数据库

希尔贝壳中文普通话语音数据库AISHELL-3的语音时长为85小时88035句，可做为多说话人合成系统。录制过程在安静室内环境中，使用高保真麦克风（44.1kHz，16bit）。218名来自中国不同口音区域的发言人参与录制。专业语音校对人员进行拼音和韵律标注，并通过严格质量检验，此数据库音字确率在98%以上。

https://www.aishelltech.com/aishell_3

Common Voice Dataset

We’re building an open source, multi-language dataset of voices that anyone can use to train speech-enabled applications.

Includes both Cantonese and Mandarin Chinese!!

抽样粤语（Chinese Hong Kong）语音数据的质量不好，录音人声音不够清晰（不是声优级别的声音），背景噪音较大，标记文件有错。另外还有个Cantonese的分类。

感觉可能用现有的TTS生成数据质量会好得多。

How to activate conda env in Visual Studio Code?

1. Open Visual Studio Code.
2. Go to the Extensions tab (Ctrl+Shift+X) and install the Python extension.
3. Go to File > Preferences > Settings.
4. In the left pane, search for “conda”.
5. In the right pane, search for “python.condaPath” and set the path to your Anaconda installation.
6. In the left pane, search for “conda env”.
7. In the right pane, search for “python.condaEnvFile” and set the path to your environment file.

XTTS Investigation

According to https://docs.coqui.ai/en/stable/models/xtts.html , it supports Chinese.

run this to check:

 tts --model_name tts_models/multilingual/multi-dataset/xtts_v2 \
   --list_language_idx

When it fails download, try to set proxy (pay attention that it's "http" for https_proxy):

Linux系统日志

/var/log/journal/*

查看日志：

journalctl
journalctl -r

清理日志，仅保留180天：

sudo journalctl --vacuum-time=180d

参考：https://linuxhandbook.com/clear-systemd-journal-logs/

MIT App Inventor

似乎是一个在网页上用图形化界面帮助小朋友写App的东西，和Scratch可能有些关联（图形化编程的界面有点像）

https://appinventor.mit.edu/

大规模中文自然语言处理语料

标签

Coqui TTS

Coqui TTS

标签

Librosa

Librosa

标签

希尔贝壳AISHELL-3 高保真中文语音数据库

标签

Common Voice Dataset

Common Voice Dataset

标签

How to activate conda env in Visual Studio Code?

标签

XTTS Investigation

标签

Linux系统日志

标签

MIT App Inventor

标签

Ollama

标签

最新内容

最新评论