-
Notifications
You must be signed in to change notification settings - Fork 3
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
feat: optimize cuda for vapoursynth #27
Conversation
Reviewer's Guide by SourceryThis PR optimizes CUDA support for VapourSynth by introducing separate inference functions for multi-frame and one-frame output models, and implementing CUDA stream-based processing. The changes include refactoring the inference logic to use dedicated streams for frame-to-tensor conversion, model inference, and tensor-to-frame conversion, along with proper synchronization and thread safety mechanisms. Class diagram for VapourSynth inference functionsclassDiagram
class InferenceVSR {
+inference_vsr(inference, clip, scale, length, device, one_frame_out)
}
class InferenceVSRMultiFrameOut {
+inference_vsr_multi_frame_out(inference, clip, scale, length, device)
}
class InferenceVSROneFrameOut {
+inference_vsr_one_frame_out(inference, clip, scale, length, device)
}
InferenceVSR --> InferenceVSRMultiFrameOut
InferenceVSR --> InferenceVSROneFrameOut
class InferenceSR {
+inference_sr(inference, clip, scale, device)
}
class InferenceSRGeneral {
+inference_sr_general(inference, clip, scale, device)
}
class InferenceSRCuda {
+inference_sr_cuda(inference, clip, scale, device)
}
InferenceSR --> InferenceSRGeneral
InferenceSR --> InferenceSRCuda
note for InferenceVSR "Handles both multi-frame and one-frame output models"
note for InferenceSR "Handles both general and CUDA devices"
File-Level Changes
Possibly linked issues
Tips and commandsInteracting with Sourcery
Customizing Your ExperienceAccess your dashboard to:
Getting Help
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hey @Tohrusky - I've reviewed your changes - here's some feedback:
Overall Comments:
- Consider adding back the documentation about input/output tensor shapes (e.g.
(1, n, c, h, w)
) to the maininference_vsr
function, as this is important API information that callers need to know.
Here's what I looked at during the review
- 🟡 General issues: 1 issue found
- 🟢 Security: all looks good
- 🟢 Testing: all looks good
- 🟢 Complexity: all looks good
- 🟢 Documentation: all looks good
Help me be more useful! Please click 👍 or 👎 on each comment and I'll use the feedback to improve your reviews.
def _inference(n: int, f: list[vs.VideoFrame]) -> vs.VideoFrame: | ||
with f2t_stream_lock, torch.cuda.stream(f2t_stream): | ||
img = frame_to_tensor(f[0], device).unsqueeze(0) | ||
f2t_stream.synchronize() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
suggestion (performance): Consider optimizing CUDA stream synchronization points to allow more operation overlap
The current implementation synchronizes after each operation, which may prevent beneficial overlapping of operations. Consider synchronizing only when results are actually needed.
@sourcery-ai review |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hey @Tohrusky - I've reviewed your changes and they look great!
Here's what I looked at during the review
- 🟡 General issues: 1 issue found
- 🟢 Security: all looks good
- 🟢 Testing: all looks good
- 🟢 Complexity: all looks good
- 🟢 Documentation: all looks good
Help me be more useful! Please click 👍 or 👎 on each comment and I'll use the feedback to improve your reviews.
def _inference(n: int, f: list[vs.VideoFrame]) -> vs.VideoFrame: | ||
with f2t_stream_lock, torch.cuda.stream(f2t_stream): | ||
img = frame_to_tensor(f[0], device).unsqueeze(0) | ||
f2t_stream.synchronize() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
suggestion (performance): Consider removing explicit stream synchronizations to allow more CUDA operation overlap
The current implementation synchronizes after each operation, which prevents potential parallel execution. Consider letting operations overlap naturally and only synchronizing when absolutely necessary.
#21
Summary by Sourcery
Optimize VapourSynth by introducing separate inference functions for multi-frame and one-frame output models and enhancing CUDA support. Update documentation to reflect these changes and fix a typo in the Makefile.
New Features:
Enhancements:
Documentation:
Chores: