You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
New warp-specialized persistent FP8 GEMM kernel kernel schedules and mainloops targeting Hopper architecture that achieve great performance with TMA, WGMMA, and threadblock clusters. An example showcasing Hopper warp-specialized FP8 GEMMs.
New Epilogue Visitor Tree (EVT) support for Hopper TMA epilogues. EVTs allows for user-defined customized epilogue fusion patterns without having to write a new epilogue.
Stream-K feature for Hopper. Note that this is only a functional implementation of stream-K, and should not be used for performance comparison. Optimizations are expected in a future release.
Improved CTA rasterization and support for CTA swizzling for Hopper kernels using the Tile Scheduler.
reacted with thumbs up emoji reacted with thumbs down emoji reacted with laugh emoji reacted with hooray emoji reacted with confused emoji reacted with heart emoji reacted with rocket emoji reacted with eyes emoji
-
This discussion was created from the release CUTLASS 3.2.
Beta Was this translation helpful? Give feedback.
All reactions