Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat: optimize cuda for vapoursynth #27

Merged
merged 4 commits into from
Dec 10, 2024
Merged

Conversation

Tohrusky
Copy link
Member

@Tohrusky Tohrusky commented Dec 10, 2024

#21

Summary by Sourcery

Optimize VapourSynth by introducing separate inference functions for multi-frame and one-frame output models and enhancing CUDA support. Update documentation to reflect these changes and fix a typo in the Makefile.

New Features:

  • Introduce separate inference functions for multi-frame and one-frame output models in VapourSynth.

Enhancements:

  • Optimize CUDA support for VapourSynth by adding dedicated inference functions for CUDA devices.

Documentation:

  • Update README.md to reflect changes in model instantiation and usage.

Chores:

  • Fix typo in Makefile by correcting the filename in the vspipe command.

Copy link

sourcery-ai bot commented Dec 10, 2024

Reviewer's Guide by Sourcery

This PR optimizes CUDA support for VapourSynth by introducing separate inference functions for multi-frame and one-frame output models, and implementing CUDA stream-based processing. The changes include refactoring the inference logic to use dedicated streams for frame-to-tensor conversion, model inference, and tensor-to-frame conversion, along with proper synchronization and thread safety mechanisms.

Class diagram for VapourSynth inference functions

classDiagram
    class InferenceVSR {
        +inference_vsr(inference, clip, scale, length, device, one_frame_out)
    }
    class InferenceVSRMultiFrameOut {
        +inference_vsr_multi_frame_out(inference, clip, scale, length, device)
    }
    class InferenceVSROneFrameOut {
        +inference_vsr_one_frame_out(inference, clip, scale, length, device)
    }
    InferenceVSR --> InferenceVSRMultiFrameOut
    InferenceVSR --> InferenceVSROneFrameOut

    class InferenceSR {
        +inference_sr(inference, clip, scale, device)
    }
    class InferenceSRGeneral {
        +inference_sr_general(inference, clip, scale, device)
    }
    class InferenceSRCuda {
        +inference_sr_cuda(inference, clip, scale, device)
    }
    InferenceSR --> InferenceSRGeneral
    InferenceSR --> InferenceSRCuda

    note for InferenceVSR "Handles both multi-frame and one-frame output models"
    note for InferenceSR "Handles both general and CUDA devices"
Loading

File-Level Changes

Change Details Files
Refactored VSR (Video Super Resolution) inference to support both multi-frame and one-frame output models
  • Split main inference function into two specialized functions for different model types
  • Added one_frame_out parameter to control inference behavior
  • Implemented thread safety with Lock() mechanism
  • Removed redundant frame conversion function parameters
ccrestoration/vs/vsr.py
Implemented CUDA-optimized inference with stream processing
  • Added dedicated CUDA streams for frame-to-tensor conversion, inference, and tensor-to-frame conversion
  • Implemented proper stream synchronization
  • Created separate inference paths for CUDA and non-CUDA devices
  • Added thread safety with multiple stream locks
ccrestoration/vs/sr.py
Simplified model interface and configuration
  • Updated VSR base model to use unified inference interface
  • Modified example code to include tile parameter
  • Fixed file naming inconsistency in Makefile
ccrestoration/model/vsr_base_model.py
README.md
example/vapoursynth.py
Makefile

Possibly linked issues


Tips and commands

Interacting with Sourcery

  • Trigger a new review: Comment @sourcery-ai review on the pull request.
  • Continue discussions: Reply directly to Sourcery's review comments.
  • Generate a GitHub issue from a review comment: Ask Sourcery to create an
    issue from a review comment by replying to it.
  • Generate a pull request title: Write @sourcery-ai anywhere in the pull
    request title to generate a title at any time.
  • Generate a pull request summary: Write @sourcery-ai summary anywhere in
    the pull request body to generate a PR summary at any time. You can also use
    this command to specify where the summary should be inserted.

Customizing Your Experience

Access your dashboard to:

  • Enable or disable review features such as the Sourcery-generated pull request
    summary, the reviewer's guide, and others.
  • Change the review language.
  • Add, remove or edit custom review instructions.
  • Adjust other review settings.

Getting Help

Copy link

@sourcery-ai sourcery-ai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hey @Tohrusky - I've reviewed your changes - here's some feedback:

Overall Comments:

  • Consider adding back the documentation about input/output tensor shapes (e.g. (1, n, c, h, w)) to the main inference_vsr function, as this is important API information that callers need to know.
Here's what I looked at during the review
  • 🟡 General issues: 1 issue found
  • 🟢 Security: all looks good
  • 🟢 Testing: all looks good
  • 🟢 Complexity: all looks good
  • 🟢 Documentation: all looks good

Sourcery is free for open source - if you like our reviews please consider sharing them ✨
Help me be more useful! Please click 👍 or 👎 on each comment and I'll use the feedback to improve your reviews.

def _inference(n: int, f: list[vs.VideoFrame]) -> vs.VideoFrame:
with f2t_stream_lock, torch.cuda.stream(f2t_stream):
img = frame_to_tensor(f[0], device).unsqueeze(0)
f2t_stream.synchronize()
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

suggestion (performance): Consider optimizing CUDA stream synchronization points to allow more operation overlap

The current implementation synchronizes after each operation, which may prevent beneficial overlapping of operations. Consider synchronizing only when results are actually needed.

@Tohrusky
Copy link
Member Author

@sourcery-ai review

Copy link

@sourcery-ai sourcery-ai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hey @Tohrusky - I've reviewed your changes and they look great!

Here's what I looked at during the review
  • 🟡 General issues: 1 issue found
  • 🟢 Security: all looks good
  • 🟢 Testing: all looks good
  • 🟢 Complexity: all looks good
  • 🟢 Documentation: all looks good

Sourcery is free for open source - if you like our reviews please consider sharing them ✨
Help me be more useful! Please click 👍 or 👎 on each comment and I'll use the feedback to improve your reviews.

def _inference(n: int, f: list[vs.VideoFrame]) -> vs.VideoFrame:
with f2t_stream_lock, torch.cuda.stream(f2t_stream):
img = frame_to_tensor(f[0], device).unsqueeze(0)
f2t_stream.synchronize()
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

suggestion (performance): Consider removing explicit stream synchronizations to allow more CUDA operation overlap

The current implementation synchronizes after each operation, which prevents potential parallel execution. Consider letting operations overlap naturally and only synchronizing when absolutely necessary.

@Tohrusky Tohrusky merged commit af59ebb into TensoRaws:main Dec 10, 2024
12 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

1 participant