[FEA] FP8 sparse tensor cores support A(row+dense) x B(sparse) = C(row+dense) #2032

zhink · 2025-01-08T08:10:10Z

Is your feature request related to a problem? Please describe.
A clear and concise description of what the problem is. Ex. I wish I could use CUTLASS to do [...]
Use FP8 sparse tensor cores to speed up fp8 gemm in LLM(example:llama)

Describe the solution you'd like
A clear and concise description of what you want to happen.

FP8 sparse tensor cores support A(row+dense) x B(sparse and may be must col) = C(row+dense)

Describe alternatives you've considered
A clear and concise description of any alternative solutions or features you've considered.

Additional context
Add any other context, code examples, or references to existing implementations about the feature request here.

zhink added ? - Needs Triage feature request New feature or request labels Jan 8, 2025

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[FEA] FP8 sparse tensor cores support A(row+dense) x B(sparse) = C(row+dense) #2032

[FEA] FP8 sparse tensor cores support A(row+dense) x B(sparse) = C(row+dense) #2032

zhink commented Jan 8, 2025

[FEA] FP8 sparse tensor cores support A(row+dense) x B(sparse) = C(row+dense) #2032

[FEA] FP8 sparse tensor cores support A(row+dense) x B(sparse) = C(row+dense) #2032

Comments

zhink commented Jan 8, 2025