Open-source TTS model support with timestamps #77

alesstracker21 · 2024-11-30T23:46:09Z

First of all, I'd like to say this is a great project! I am looking for ways to integrate it into my project.
I have an open ended question here, It feels to me like this project heavily relies on cloud services, but I am running everything locally because I aim to have then a self contained service I can use myself and I find that no open source model (at least that I know of, i.e. Melo/Parler/Coqui and so on) support timestamps and might require converting the outputted phonemes by the TTS manually into timestamps.
Has anybody managed to make this run with open-source TTS models, while keeping timepoint data available for lipsync?

Thanks for being a part of the conversation!

met4citizen · 2024-12-02T11:22:25Z

I've been looking for such a TTS project for some time (with no luck), so thank you for starting this thread.

One potential candidate on my radar is Piper. I haven't had time to explore it in depth, but based on the documentation and demos, it has several qualities I like: it uses neural voices, it is fast, it is released under an MIT License, and the project seems active. There are even WASM versions available, so you could run it entirely in a browser. Additionally, related to the TalkingHead project, there appears to be an open PR for generating word-level alignment data.

(Edit: It seems that the referenced PR generates audio twice: once for the entire sentence and then again separately for each word. This approach is not optimal. The duration of each phoneme is already available from the first run, but this information is simply not exposed in the lower-level API. More about this here. It seems, however, that one of the maintainers is currently rewriting Piper due to some licensing concerns and new features, so for now it might be best to wait and see how the project evolves.)

alesstracker21 changed the title ~~Open-source TTS model support~~ Open-source TTS model support with timestamps Nov 30, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Open-source TTS model support with timestamps #77

Open-source TTS model support with timestamps #77

alesstracker21 commented Nov 30, 2024

met4citizen commented Dec 2, 2024 •

edited

Loading

Open-source TTS model support with timestamps #77

Open-source TTS model support with timestamps #77

Comments

alesstracker21 commented Nov 30, 2024

met4citizen commented Dec 2, 2024 • edited Loading

met4citizen commented Dec 2, 2024 •

edited

Loading