Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[test_locality]: Fails to run correctly on PVC GPU. #154

Closed
etiotto opened this issue Dec 18, 2023 · 2 comments
Closed

[test_locality]: Fails to run correctly on PVC GPU. #154

etiotto opened this issue Dec 18, 2023 · 2 comments
Assignees
Labels
bug Something isn't working tests: ut

Comments

@etiotto
Copy link
Contributor

etiotto commented Dec 18, 2023

The end-to-end core test test_locality fails to run correctly on a PVC GPU. We need to investigate the cause of the functional problem(s).

@alexbaden alexbaden self-assigned this Dec 27, 2023
@vlad-penkin vlad-penkin added the bug Something isn't working label Jan 5, 2024
@alexbaden
Copy link
Contributor

The problem is an unresolved function:

L0 module build log: error : unresolved external symbol _Z12get_global_sizej at offset 676 in instructions segment #0 (aka kernel : kernel_0d1d2de)
error : unresolved external symbol _Z12get_global_sizej at offset 692 in instructions segment #0 (aka kernel : kernel_0d1d2de)

The OpenCL function get_global_size cannot be found when lowering LLVM to SPIRV - specifically we fail during zeCreateModule (though the failure is not detected until we actually attempt to lauch the kernel, later).

Coming out of the MLIR pipeline in Triton we have several external, mangled (opencl?) functions. If I dump the IR from the IGC shader compilation pipeline, those mangled functions gradually disappear - except for get_global_size which is still in place.

IGC Before Unification Passes:


; Function Attrs: nounwind
declare spir_func i64 @_Z12get_local_idj(i32) #0

; Function Attrs: nounwind
declare spir_func i64 @_Z12get_group_idj(i32) #0

; Function Attrs: nounwind
declare spir_func i64 @_Z12get_global_sizej(i32) #0

; Function Attrs: nounwind
declare spir_func float @_Z21sub_group_shuffle_xorfj(float, i32) #0

; Function Attrs: nounwind
declare spir_func void @_Z7barrierj(i32) #0

; Function Attrs: nounwind
define spir_kernel void @kernel_0d1d2de(float addrspace(1)* nocapture readonly %0, float addrspace(1)* nocapture %1, i32 %2, float addrspace(3)* nocapture %3) #0 !dbg !265 {
  %5 = call spir_func i64 @_Z12get_local_idj(i32 0) #0, !dbg !268
  %6 = trunc i64 %5 to i32, !dbg !268
  %7 = and i32 %6, 31, !dbg !268
  %8 = lshr i32 %6, 3, !dbg !268
  %9 = and i32 %8, 15, !dbg !268
  %10 = or i32 %9, 16, !dbg !268
  %11 = shl i32 %6, 2, !dbg !268
  %12 = and i32 %11, 28, !dbg !268
  %13 = call spir_func i64 @_Z12get_group_idj(i32 0) #0, !dbg !269
  %14 = trunc i64 %13 to i32, !dbg !269
  %15 = call spir_func i64 @_Z12get_group_idj(i32 1) #0, !dbg !270
  %16 = trunc i64 %15 to i32, !dbg !270
  %17 = call spir_func i64 @_Z12get_global_sizej(i32 1) #0, !dbg !271
  %18 = trunc i64 %17 to i32, !dbg !271
  %19 = shl i32 %14, 5, !dbg !272
  %20 = or i32 %19, %9, !dbg !273
  %21 = or i32 %10, %19, !dbg !273
  %22 = or i32 %19, %7, !dbg !273
  %23 = add i32 %2, 31, !dbg !274
  %24 = sdiv i32 %23, 32, !dbg !278
  %25 = mul i32 %20, %2, !dbg !279
  %26 = mul i32 %21, %2, !dbg !279
  %27 = sext i32 %25 to i64, !dbg !280
  %28 = getelementptr float, float addrspace(1)* %0, i64 %27, !dbg !280
  %29 = sext i32 %26 to i64, !dbg !280
  %30 = getelementptr float, float addrspace(1)* %0, i64 %29, !dbg !280
  %31 = icmp sgt i32 %24, %16, !dbg !281
  br i1 %31, label %.lr.ph, label %._crit_edge, !dbg !281

IGC After Unification Pass(es)

; Function Attrs: nounwind
declare spir_func i64 @_Z12get_global_sizej(i32) #0

; Function Attrs: convergent nounwind
define spir_kernel void @kernel_0d1d2de(float addrspace(1)* nocapture readonly %0, float addrspace(1)* nocapture %1, i32 %2, float addrspace(3)* nocapture %3, <8 x i32> %r0, <8 x i32> %payloadHeader, i16 %localIdX, i16 %localIdY, i16 %localIdZ, i8* %privateBase, i32 %bufferOffset, i32 %bufferOffset1) #1 !dbg !327 {
  call void @llvm.genx.GenISA.CatchAllDebugLine(), !dbg !331
  %scalar = extractelement <8 x i32> %r0, i32 0
  %scalar22 = extractelement <8 x i32> %r0, i32 1
  %scalar23 = extractelement <8 x i32> %r0, i32 2
  %scalar24 = extractelement <8 x i32> %r0, i32 3
  %scalar25 = extractelement <8 x i32> %r0, i32 4
  %scalar26 = extractelement <8 x i32> %r0, i32 5
  %scalar27 = extractelement <8 x i32> %r0, i32 6
  %scalar28 = extractelement <8 x i32> %r0, i32 7
  %5 = zext i16 %localIdX to i32, !dbg !332
  %6 = and i32 %5, 31, !dbg !332
  %7 = lshr i32 %5, 3, !dbg !332
  %8 = and i32 %7, 15, !dbg !332
  %9 = or i32 %8, 16, !dbg !332
  %10 = shl nuw nsw i32 %5, 2, !dbg !332
  %11 = and i32 %10, 28, !dbg !332
  %12 = call spir_func i64 @_Z12get_global_sizej(i32 1) #7, !dbg !333
  %13 = trunc i64 %12 to i32, !dbg !333
  %14 = shl i32 %scalar22, 5, !dbg !334
  %15 = or i32 %14, %8, !dbg !335
  %16 = or i32 %9, %14, !dbg !335
  %17 = or i32 %14, %6, !dbg !335
  %18 = add i32 %2, 31, !dbg !336
  %19 = sdiv i32 %18, 32, !dbg !340
  %20 = mul i32 %15, %2, !dbg !341
  %21 = mul i32 %16, %2, !dbg !341
  %22 = sext i32 %20 to i64, !dbg !342
  %23 = getelementptr float, float addrspace(1)* %0, i64 %22, !dbg !342
  %24 = sext i32 %21 to i64, !dbg !342
  %25 = getelementptr float, float addrspace(1)* %0, i64 %24, !dbg !342
  %26 = icmp sgt i32 %19, %scalar27, !dbg !343
  br i1 %26, label %.lr.ph, label %._crit_edge, !dbg !343

alexbaden added a commit that referenced this issue Jan 17, 2024
Enables the locality test by using an IGC intrinsic instead of an OpenCL
function for `get_global_size`. IGC does not appear to support the
OpenCL function `get_global_size`, and SPIRV builtins are not currently
an option. This PR uses intel/llvm#12383 and then simply removes the XPU skip from `test_locality`.

Closes #154
@etiotto
Copy link
Contributor Author

etiotto commented Jan 23, 2024

Closing because PR #247 is merged in.

@etiotto etiotto closed this as completed Jan 23, 2024
@vlad-penkin vlad-penkin added this to the UT pass rate milestone Feb 9, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working tests: ut
Projects
None yet
Development

No branches or pull requests

3 participants