整理了近期所有TTS相关的大模型
发布时间:2024年06月06日
从 XTTS 到 Pheme,从OpenVoice 到 VITS,每个大模型包括源码地址,支持的语言,非常棒!
XTTS
[Repo](https://github.com/coqui-ai/TTS)
[](https://huggingface.co/coqui/XTTS-v2)
[CPML](https://coqui.ai/cpml)
[Yes](https://huggingface.slack.com/archives/C05QZTQJUDD/p1705418518292139)
Multilingual
[Technical notes](https://erogol.substack.com/p/xttsv2-notes)
[](https://huggingface.co/spaces/coqui/xtts)
TorToiSe-TTS
[Repo](https://github.com/neonbjb/tortoise-tts)
[](https://huggingface.co/jbetker/tortoise-tts-v2)
[Apache 2.0](https://github.com/neonbjb/tortoise-tts/blob/main/LICENSE)
[Yes](https://git.ecker.tech/mrq/tortoise-tts)
English
[Technical report](https://arxiv.org/abs/2305.07243)
[](https://huggingface.co/spaces/Manmay/tortoise-tts)
VITS/ MMS-TTS
[Repo](https://github.com/huggingface/transformers/tree/7142bdfa90a3526cfbed7483ede3afbef7b63939/src/transformers/models/vits)
[](https://huggingface.co/kakao-enterprise) / [MMS](https://huggingface.co/models?search=mms-tts)
[Apache 2.0](https://github.com/huggingface/transformers/blob/main/LICENSE)
[Yes](https://github.com/ylacombe/finetune-hf-vits)
English
[Paper](https://arxiv.org/abs/2106.06103)
[](https://huggingface.co/spaces/kakao-enterprise/vits)
Pheme
[Repo](https://github.com/PolyAI-LDN/pheme)
[](https://huggingface.co/PolyAI/pheme)
[CC-BY](https://github.com/PolyAI-LDN/pheme/blob/main/LICENSE)
[Yes](https://github.com/PolyAI-LDN/pheme#training)
English
[Paper](https://arxiv.org/abs/2401.02839)
[](https://huggingface.co/spaces/PolyAI/pheme)
OpenVoice
[Repo](https://github.com/myshell-ai/OpenVoice)
[](https://huggingface.co/myshell-ai/OpenVoice)
[CC-BY-NC 4.0](https://github.com/myshell-ai/OpenVoice/blob/main/LICENSE)
No
ZH + EN
[Paper](https://arxiv.org/abs/2312.01479)
[](https://huggingface.co/spaces/myshell-ai/OpenVoice)
IMS-Toucan
[Repo](https://github.com/DigitalPhonetics/IMS-Toucan)
[GH release](https://github.com/DigitalPhonetics/IMS-Toucan/tags)
[Apache 2.0](https://github.com/DigitalPhonetics/IMS-Toucan/blob/ToucanTTS/LICENSE)
[Yes](https://github.com/DigitalPhonetics/IMS-Toucan#build-a-toucantts-pipeline)
Multilingual
[Paper](https://arxiv.org/abs/2206.12229)
[](https://huggingface.co/spaces/Flux9665/IMS-Toucan)
Matcha-TTS
[Repo](https://github.com/shivammehta25/Matcha-TTS)
[GDrive](https://drive.google.com/drive/folders/17C_gYgEHOxI5ZypcfE_k1piKCtyR0isJ)
[MIT](https://github.com/shivammehta25/Matcha-TTS/blob/main/LICENSE)
[Yes](https://github.com/shivammehta25/Matcha-TTS/tree/main#train-with-your-own-dataset)
English
[Paper](https://arxiv.org/abs/2309.03199)
[](https://huggingface.co/spaces/shivammehta25/Matcha-TTS)
pflowTTS
[Unofficial Repo](https://github.com/p0p4k/pflowtts_pytorch)
[GDrive](https://drive.google.com/drive/folders/1x-A2Ezmmiz01YqittO_GLYhngJXazaF0)
[MIT](https://github.com/p0p4k/pflowtts_pytorch/blob/master/LICENSE)
[Yes](https://github.com/p0p4k/pflowtts_pytorch#instructions-to-run)
English
[Paper](https://openreview.net/pdf?id=zNA7u7wtIN)
Not Available
StyleTTS 2
[Repo](https://github.com/yl4579/StyleTTS2)
[](https://huggingface.co/yl4579/StyleTTS2-LibriTTS/tree/main)
[MIT](https://github.com/yl4579/StyleTTS2/blob/main/LICENSE)
[Yes](https://github.com/yl4579/StyleTTS2#finetuning)
English
[Paper](https://arxiv.org/abs/2306.07691)
[](https://huggingface.co/spaces/styletts2/styletts2)
VALL-E
[Unofficial Repo](https://github.com/enhuiz/vall-e)
Not Available
[MIT](https://github.com/enhuiz/vall-e/blob/main/LICENSE)
[Yes](https://github.com/enhuiz/vall-e#get-started)
NA
[Paper](https://arxiv.org/abs/2301.02111)
Not Available
HierSpeech++
[Repo](https://github.com/sh-lee-prml/HierSpeechpp)
[GDrive](https://drive.google.com/drive/folders/1-L_90BlCkbPyKWWHTUjt5Fsu3kz0du0w)
[CC-BY-NC-SA 4.0](https://github.com/sh-lee-prml/HierSpeechpp/blob/main/LICENSE)
No
KR + EN
[Paper](https://arxiv.org/abs/2311.12454)
[](https://huggingface.co/spaces/LeeSangHoon/HierSpeech_TTS)
Bark
[Repo](https://github.com/huggingface/transformers/tree/main/src/transformers/models/bark)
[](https://huggingface.co/suno/bark)
[MIT](https://github.com/suno-ai/bark/blob/main/LICENSE)
No
Multilingual
[Paper](https://arxiv.org/abs/2209.03143)
[](https://huggingface.co/spaces/suno/bark)
EmotiVoice
[Repo](https://github.com/netease-youdao/EmotiVoice)
[GDrive](https://drive.google.com/drive/folders/1y6Xwj_GG9ulsAonca_unSGbJ4lxbNymM)
[Apache 2.0](https://github.com/netease-youdao/EmotiVoice/blob/main/LICENSE)
[Yes](https://github.com/netease-youdao/EmotiVoice/wiki/Voice-Cloning-with-your-personal-data)
ZH + EN
Not Available
Not Available
参考地址:
https://github.com/Vaibhavs10/open-tts-tracker/tree/main
出自:https://mp.weixin.qq.com/s/c2sICIdX3lcFBgpZS4uEzg
Imglarger是一个使用人工智能来增强和放大图像的工具。它可以帮助您提高照片的质量、分辨率和细节,而不会丢失任何信息。