New Fish model #58

jmtatsch · 2024-09-13T21:06:59Z

Have you seen the new fish speech model https://github.com/fishaudio/fish-speech ?
Wonderful voice cloning and intonation performance.
Would you consider supporting it?

matatonic · 2024-09-13T23:15:14Z

I am considering it, so far I've heard it's not as good as xtts, but haven't tried it myself yet.

jmtatsch · 2024-09-14T05:14:41Z

Imho its far superior to xtts - less robotic and more emotional.
https://www.youtube.com/watch?v=Ghc8cJdQyKQ
Only catch is its non-commercial license.

thiswillbeyourgithub · 2024-09-14T17:45:21Z

I see a major reason to implement support for Fish: it seems to support quantization.

I have an old GPU with 8G of RAM so every byte matters to me and I really struggled to find any good information on how to quantize XTTS. I conclude that it's not something that can be relied upon so seeing this PR that adds quantization support for Fish Speech makes me very interested!

PS: what's up with deepspeed for XTTS btw? I see that it takes a pip install deepspeed. If you can't support in the official image could you give me some pointers to use it on my side? XTTS is pretty slow for me, too much for interactivity.

matatonic · 2024-09-14T17:50:24Z

I see a major reason to implement support for Fish: it seems to support quantization.

I have an old GPU with 8G of RAM so every byte matters to me and I really struggled to find any good information on how to quantize XTTS. I conclude that it's not something that can be relied upon so seeing this PR that adds quantization support for Fish Speech makes me very interested!

That's a great point, thanks for that.

Re: deepspeed, can you start a new issue or discussion? it's worth its own space, I know it would help low VRAM folks a lot but it's a bit complex, especially for windows.

thiswillbeyourgithub · 2024-09-14T18:03:50Z

Also to add: they apparently support --compile for operator fusion

thiswillbeyourgithub · 2024-10-12T19:16:54Z

Hi, I took a quick look at fish audio again. I'm sharing this to make it easier to give it a try!

Their reference is there https://speech.fish.audio/ but I ended up doing my thing:

git clone https://github.com/fishaudio/fish-speech/
cd fish-speech

Then create docker-compose.yml with content:

    services:
      fish-speech:
        image: fishaudio/fish-speech:latest-dev  # avoid building it
        volumes:
          - ./:/exp
        deploy:
          resources:
            reservations:
              devices:
                - driver: nvidia
                  count: all
                  capabilities: [gpu]
        network_mode: host  # to access their gradio

docker compose up then go to localhost:7860 to check out their gradio.

My takeaway is that its of super high quality, and quite fast. Hard to quantify but I never saw it take more than 2.2G of VRAM, whereas xtts often took all my 8Go (might actually be a bug come to think of it?!). Fish on my old gpu seems to take 60s to generate 30s of audio. But have done zero optimization. I don't really understand how to enable quantization. There seems to be some args to setup --compile and --half but I don't have the time right now.

I think to go further I would need to compile it from the repo to modify the entry point to the other python gradio scripts. There are some related to quantization directly.

thiswillbeyourgithub mentioned this issue Sep 14, 2024

How to use deepspeed for XTTS #59

Open

matatonic added the enhancement New feature or request label Sep 14, 2024

thiswillbeyourgithub mentioned this issue Sep 20, 2024

Cuda support for piper #51

Open

thiswillbeyourgithub mentioned this issue Oct 13, 2024

Support fish.audio PABannier/bark.cpp#195

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

New Fish model #58

New Fish model #58

jmtatsch commented Sep 13, 2024

matatonic commented Sep 13, 2024

jmtatsch commented Sep 14, 2024

thiswillbeyourgithub commented Sep 14, 2024

matatonic commented Sep 14, 2024

thiswillbeyourgithub commented Sep 14, 2024

thiswillbeyourgithub commented Oct 12, 2024 •

edited

Loading

New Fish model #58

New Fish model #58

Comments

jmtatsch commented Sep 13, 2024

matatonic commented Sep 13, 2024

jmtatsch commented Sep 14, 2024

thiswillbeyourgithub commented Sep 14, 2024

matatonic commented Sep 14, 2024

thiswillbeyourgithub commented Sep 14, 2024

thiswillbeyourgithub commented Oct 12, 2024 • edited Loading

thiswillbeyourgithub commented Oct 12, 2024 •

edited

Loading