Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Support collection of counters with rocprofiler #12

Draft
wants to merge 8 commits into
base: main
Choose a base branch
from

Conversation

jordap
Copy link
Collaborator

@jordap jordap commented Mar 15, 2024

This is an initial prototype to test the collection of hardware counters using rocprofiler's device mode.

It introduces the pyrocprofiler Python module to call rocprofiler using a small header-only wrapper around the API for device mode (pyrocprofiler/device_session.hpp).

There is also a standalone Prometheus collector that uses the pyrocprofiler module that still works outside of omniwatch for initial testing.

Pending tasks:

  • Integrate collector with omniwatch monitor
  • Read counters from omniwatch configuration file
  • Sample all GPUs (currently only samples GPU 0)
  • Some sort of testing infrastructure
  • Documentation, proper headers/copyright, etc.
  • Ensure rocprofiler sessions are properly destroyed (and understand what happens to sessions if not properly destroyed)

More details about rocprofiler's device mode API: https://rocm.docs.amd.com/projects/rocprofiler/en/docs-6.0.0/doxygen/docBin/html/group__device__profiling.html

I used the following example to understand how to call the device mode API: https://github.com/ROCm/rocprofiler/blob/amd-master/samples/profiler/device_profiling_sample.cpp

@jordap jordap changed the base branch from main to apivariant March 15, 2024 00:38
@jordap jordap changed the base branch from apivariant to main March 23, 2024 04:42
@jordap jordap force-pushed the jorda/rocprofiler branch from d6ae126 to 342c5e3 Compare March 23, 2024 14:27
@jordap jordap added the enhancement New feature or request label Jun 11, 2024
@jordap jordap force-pushed the jorda/rocprofiler branch from 342c5e3 to 07b7aa6 Compare July 11, 2024 17:49
@jordap
Copy link
Collaborator Author

jordap commented Jul 11, 2024

I updated this branch to match the most recent changes in Omnistat.

Still need to manually set HSA_TOOLS_LIB to get the counters until rocprofiler is fixed (SWDEV-465681), and update the build of the pyrocprofiler module to match omnistat.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request
Projects
None yet
Development

Successfully merging this pull request may close these issues.

1 participant