Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix non-strict HIP device lib order #2217

Open
wants to merge 3 commits into
base: main
Choose a base branch
from

Conversation

Snektron
Copy link

@Snektron Snektron commented Jul 5, 2024

In #2045 initial support for HIP was added. While trying it out, I noticed that different runs across different machines did not yield an expected cache hit. After some investigation, it turns out that the list of bitcode device libraries is not sorted after discovering them from the file system, and this resulted in a different order across those machines. I've added a fix that simply sorts those libraries after discovering them, that seems to be similar to how its handled elsewhere.

@sylvestre
Copy link
Collaborator

could you please add a test for this ? thanks

@codecov-commenter
Copy link

codecov-commenter commented Jul 5, 2024

Codecov Report

Attention: Patch coverage is 0% with 3 lines in your changes missing coverage. Please review.

Project coverage is 40.82%. Comparing base (0cc0c62) to head (48d6ed7).
Report is 99 commits behind head on main.

Files with missing lines Patch % Lines
src/compiler/c.rs 0.00% 3 Missing ⚠️
Additional details and impacted files
@@            Coverage Diff             @@
##             main    #2217      +/-   ##
==========================================
+ Coverage   30.91%   40.82%   +9.91%     
==========================================
  Files          53       55       +2     
  Lines       20112    20675     +563     
  Branches     9755     9801      +46     
==========================================
+ Hits         6217     8440    +2223     
- Misses       7922     8100     +178     
+ Partials     5973     4135    -1838     

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

@Snektron Snektron force-pushed the fix-device-libs-order branch from 4d87a24 to c3eb3c5 Compare July 8, 2024 18:10
@Snektron
Copy link
Author

Snektron commented Jul 8, 2024

Im not 100% sure what the best way is to add the tests, considering that fully testing that issue requires deleting and then re-adding a file from the HIP bitcode directory (/opt/rocm/amdgcn/bitcode usually). Regardless, I'm away for a few weeks and will finish this when I get back.

@Snektron Snektron force-pushed the fix-device-libs-order branch 3 times, most recently from e0a58c1 to 48d6ed7 Compare July 25, 2024 08:45
@Snektron
Copy link
Author

bump

@sylvestre
Copy link
Collaborator

still need tests, sorry :)

@Snektron Snektron force-pushed the fix-device-libs-order branch 4 times, most recently from 3d28979 to b2355f7 Compare November 22, 2024 17:21
@Snektron
Copy link
Author

Seems that fuse doesn't work in github actions. I guess the only other approach swapping out calls to fs::read_dir with something that can be mocked.

@Snektron
Copy link
Author

Hmm, it might be possible to LD_PRELOAD a library which returns random results from readdir...

@Snektron Snektron force-pushed the fix-device-libs-order branch 12 times, most recently from 84c025c to c7c9136 Compare November 26, 2024 16:46
@Snektron
Copy link
Author

@sylvestre What do you think about this approach? It seems to be working fine in our fork's CI, though I feel like theres some confirmation missing that the shim is actually running. One details is that I've had to set LD_PRELOAD separately on each run, it seems that using $GITHUB_ENV will make github try to load the shim before its set up correctly in the container. I'm not really sure how that works...

@@ -0,0 +1,196 @@
use ctor::ctor;
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

please add the license header

This is a simple libc shim that randomizes the results of readdir().
For some functionality, cache hashes are dependent on the order that
values are returned from fs::read_dir(), which calls lib readdir().
This library can be inserted into the target application's runtime
using LD_PRELOAD=target/debug/librandmize_readdir.so, after which the
results of readdir() will return in random order. It is only intended
to be used while testing sccache.
@Snektron Snektron force-pushed the fix-device-libs-order branch from c7c9136 to 7c11cce Compare November 28, 2024 16:53
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants