Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

On silence, the mic hallucinates #68

Open
aiaimimi0920 opened this issue Apr 11, 2024 · 6 comments
Open

On silence, the mic hallucinates #68

aiaimimi0920 opened this issue Apr 11, 2024 · 6 comments
Labels
bug Something isn't working

Comments

@aiaimimi0920
Copy link
Contributor

The current version(bd9c18a) also has a large amount of output when the microphone is silent
https://github.com/V-Sekai/godot-whisper/assets/153103332/13ce75ed-2c6f-4224-bdd7-6bc0b118caa2

I remember the previous version(three months ago?) didn't seem to have so many microphone hallucinations
https://github.com/V-Sekai/godot-whisper/assets/153103332/aad89a0c-f965-4349-9b4c-6d0233161b79

If possible, it would be best to solve this problem

@Ughuuu
Copy link
Collaborator

Ughuuu commented Apr 11, 2024

That's true. In new version i decoupled the logic as much as possible, so it can be called from gdscript independently. Its true halucination is worse. I'll try look into combining iree.gd for hallucination, now that thats done. @fire ? Ideas?

@fire
Copy link
Member

fire commented Apr 11, 2024

People have mentioned combining silence detection with whisper as a first thought, but I am concerned about the total latency of the voice transcription.

@Ughuuu
Copy link
Collaborator

Ughuuu commented Apr 11, 2024

I see. I'll look into the vad_detection logic, most likely that one when I migrated I didn't do it right. I'll look at old version and see what is different in this one.

@fire
Copy link
Member

fire commented Apr 11, 2024

AI based VAD is also a thing, and that was my approach for iree and whisper-jax.

@Ughuuu
Copy link
Collaborator

Ughuuu commented Apr 18, 2024

The silence part maybe works, some parts in project settings:
-audio/input/transcribe/vad_treshold
-audio/input/transcribe/freq_treshold
Need to be configured.

For now increasing vad_treshold to 2, as that seems to give good results in my case. Increasing it to 5 is even better in terms of silence detection.

@fire fire changed the title the silence mic hallucinating On silence, the mic hallucinates Apr 20, 2024
@Ughuuu
Copy link
Collaborator

Ughuuu commented Apr 24, 2024

@aiaimimi0920 , lmk if u get a chance to try it.

@fire fire added the bug Something isn't working label Nov 4, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
Development

No branches or pull requests

3 participants