-
Notifications
You must be signed in to change notification settings - Fork 17
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Feature Suggestion - Existing subtitle text accuracy enhancement #93
Comments
Hi, @codefaux , glad to know the transcription you are getting has great accuracy via prompting. Currently this project transcribes the whole audio without segmenting it. The enclosing method could be modified to take in the original subtitle cues, aligning with your idea, and ofc. it means the original time codes need to be reasonably accurate. Your approach sounds to me a promising way to improve the subtitle quality and it would also be interesting to know how well Whisper works on very short audio segments without surrounding context. |
Hey there. Thanks for your attention. I think there's strong merit here, but after a few days of poking at it, I can say -- there are a few issues and I'm at my limit on improving the situation. Using a DVD source with what I would judge to be very accurate timecodes on their less accurate wording, I've found a few key notes;
I've reached the end of the road regarding my own ability to improve this, I'm not deeply versed in either python or AI model manupilation/etc, it's been a somewhat brute force approach from the start and I don't know how to begin implementing the changes required either here or on my own project. If you wish for help testing implementations of the above, I'd gladly provide it, but as of now I'm only worth as much as any other Ideas Guy, lol. |
No worries at all. It has been quite common so far for people with less insight into the code base to use issues to throw ideas. Do you have a loyalty-free pair of video and subtitle files for me to test this? If you want to draw a PR draft that's also welcome. |
I do not have properly 'free' clips to share, unfortunately -- my sole test subject has been a DVD boxed set processed by MakeMKV. I also can't provide a PR; my work so far is literally a pair of python scripts acting on ffmpeg directly, completely separate from implementation in this project in any way. |
Hi, @codefaux, just checked in a naive implementation on taking in prompts during transcription. Feel free to give it a go with your clips and let me know if this improves the outcome. Examples of usage:
During my test I did spot hallucinations, some of which were mitigated by the internal filtering logic but not all. Configuring |
Hi there.
Amazing project, it looks like -almost- exactly what I'm looking for and I think this feature might be worth implementing.
Right now I'm trying to improve the subtitles available on a series. This series has subtitles, but they're very ...poor. Timing is only OK, but word accuracy is dog poo. One example within seconds of the opening in the first episode:
Subtitle:
Actual scene:
The problem is that the series is scifi, so planet/race/person names and technobabble come up frequently, which often means transcription models have a tendency to become too...creative.
EDIT: Forgot to mention here -- this is why I'm trying to use the original subtitle text as an initial_prompt below -- I believe this will help with matching said difficulties to text, and in my experience this has been the case.
My current (very WIP) effort uses the .srt to split the audio segments for each subtitle out of a media file, then processes each individual audio clip with Whisper -- using the original subtitle text as initial_prompt, patience=2 -- and the text I'm getting from Whisper is a great accuracy match but I'm fighting with timing stuff.
Is this worth implementing here?
Is there a better way?
Can this project already accomplish my goal, but somehow I've missed it?
Thanks for your time.
The text was updated successfully, but these errors were encountered: