You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
First of all, I'd like to say this is a great project! I am looking for ways to integrate it into my project.
I have an open ended question here, It feels to me like this project heavily relies on cloud services, but I am running everything locally because I aim to have then a self contained service I can use myself and I find that no open source model (at least that I know of, i.e. Melo/Parler/Coqui and so on) support timestamps and might require converting the outputted phonemes by the TTS manually into timestamps.
Has anybody managed to make this run with open-source TTS models, while keeping timepoint data available for lipsync?
Thanks for being a part of the conversation!
The text was updated successfully, but these errors were encountered:
alesstracker21
changed the title
Open-source TTS model support
Open-source TTS model support with timestamps
Nov 30, 2024
I've been looking for such a TTS project for some time (with no luck), so thank you for starting this thread.
One potential candidate on my radar is Piper. I haven't had time to explore it in depth, but based on the documentation and demos, it has several qualities I like: it uses neural voices, it is fast, it is released under an MIT License, and the project seems active. There are even WASM versions available, so you could run it entirely in a browser. Additionally, related to the TalkingHead project, there appears to be an open PR for generating word-level alignment data.
(Edit: It seems that the referenced PR generates audio twice: once for the entire sentence and then again separately for each word. This approach is not optimal. The duration of each phoneme is already available from the first run, but this information is simply not exposed in the lower-level API. More about this here. It seems, however, that one of the maintainers is currently rewriting Piper due to some licensing concerns and new features, so for now it might be best to wait and see how the project evolves.)
First of all, I'd like to say this is a great project! I am looking for ways to integrate it into my project.
I have an open ended question here, It feels to me like this project heavily relies on cloud services, but I am running everything locally because I aim to have then a self contained service I can use myself and I find that no open source model (at least that I know of, i.e. Melo/Parler/Coqui and so on) support timestamps and might require converting the outputted phonemes by the TTS manually into timestamps.
Has anybody managed to make this run with open-source TTS models, while keeping timepoint data available for lipsync?
Thanks for being a part of the conversation!
The text was updated successfully, but these errors were encountered: