Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Profiling with py-spy revelations #14

Open
ArneTR opened this issue Nov 13, 2024 · 1 comment
Open

Profiling with py-spy revelations #14

ArneTR opened this issue Nov 13, 2024 · 1 comment

Comments

@ArneTR
Copy link
Contributor

ArneTR commented Nov 13, 2024

I profiled the powerletrics -i 1000 command with py-spy.

Once with RAPL and once where I segemented the file open calls for reading memory and cmdline.

Without RAPL and with segemented file open

  %Own   %Total  OwnTime  TotalTime  Function (filename)
 25.00%  25.00%   34.34s    34.34s   _shutdown (threading.py)
  5.00%   5.00%   13.21s    24.03s   get_data (powerletrics/powerletrics.py)
  0.00%   0.00%    3.23s     3.23s   next (bcc/table.py)
  0.00%   0.00%    1.63s     1.63s   __getitem__ (bcc/table.py)
  0.00%   0.00%    1.04s     1.04s   get_cmdline (powerletrics/powerletrics.py)
  0.00%   0.00%   0.980s    0.980s   __delitem__ (bcc/table.py)
  0.00%   0.00%   0.960s     2.92s   get_table (bcc/__init__.py)
  0.00%   0.00%   0.550s    0.820s   print_text (powerletrics/powerletrics.py)
  0.00%   0.00%   0.440s    0.470s   _assert_is_bytes (bcc/utils.py)
  0.00%   0.00%   0.420s    0.420s   __init__ (bcc/table.py)
  0.00%   0.00%   0.350s    0.460s   decode (json/decoder.py)
  0.00%   0.00%   0.230s    0.740s   Table (bcc/table.py)
  0.00%   0.00%   0.220s     1.13s   __contains__ (<frozen _collections_abc>)
  0.00%   0.00%   0.190s    0.190s   _decode_table_type (bcc/__init__.py)
  0.00%   0.00%   0.110s    0.110s   raw_decode (json/decoder.py)
  0.00%   0.00%   0.100s    0.560s   loads (json/__init__.py)
  0.00%   0.00%   0.100s    0.100s   <genexpr> (powerletrics/powerletrics.py)
  0.00%   0.00%   0.060s    0.090s   __eq__ (<frozen _collections_abc>)
  0.00%   0.00%   0.050s     3.94s   clear (bcc/table.py)
  0.00%   0.00%   0.030s    0.050s   energy_impact (powerletrics/powerletrics.py)
  0.00%   0.00%   0.030s    0.170s   cpu_utilization (powerletrics/powerletrics.py)
  0.00%   0.00%   0.030s    0.030s   __instancecheck__ (<frozen abc>)
  0.00%   0.00%   0.020s    0.050s   <lambda> (powerletrics/powerletrics.py)
  0.00%   0.00%   0.020s    0.020s   __bytes__ (bcc/utils.py)
  0.00%   0.00%   0.010s    0.010s   __init__ (bcc/utils.py)
  0.00%   0.00%   0.010s    0.010s   __iter__ (bcc/table.py)
  0.00%   0.00%   0.010s     3.24s   __next__ (bcc/table.py)
  0.00%   0.00%   0.000s    0.140s   get (<frozen _collections_abc>)
  0.00%   5.00%   0.000s    24.03s   run (threading.py)
  0.00%   5.00%   0.000s    24.03s   _bootstrap (threading.py)
  0.00%   0.00%   0.000s    0.010s   keys (bcc/table.py)
  0.00%   5.00%   0.000s    24.03s   _bootstrap_inner (threading.py)

What stands out is that the threading has either a large overhead or all of eBPF is counted into it.

py-spy definitely has an issue with applying percentages as none of those make any sense since they do not add up to 100%.

I am thus only looking on TotalTime.

In the user space get_data uses the most time. Suprisingly the file open calls are not that costly.

Also interesting is that get_table takes a lot of time. Why? For a simple lookup / memory copy? How big are these tables? can we maybe clear them more often?

The sorting also does not seem to be an issue.

So all in all: Looks fine, apart from the wild eBPF functions and threading overhead.

With RAPL data

  %Own   %Total  OwnTime  TotalTime  Function (filename)
 25.00%  25.00%   41.29s    41.29s   _shutdown (threading.py)
  9.00%   9.00%   16.04s    20.30s   get_data (powerletrics/powerletrics.py)
  7.00%   7.00%   12.54s    12.54s   rapl_metrics_provider_thread (powerletrics/powerletrics.py)
  0.00%   0.00%    1.20s     1.20s   next (bcc/table.py)
  0.00%   0.00%   0.850s    0.850s   __getitem__ (bcc/table.py)
  0.00%   0.00%   0.440s    0.610s   print_text (powerletrics/powerletrics.py)
  0.00%   0.00%   0.380s    0.380s   __delitem__ (bcc/table.py)
  0.00%   0.00%   0.330s     1.05s   get_table (bcc/__init__.py)
  0.00%   0.00%   0.180s    0.180s   __init__ (bcc/table.py)
  0.00%   0.00%   0.150s    0.150s   _assert_is_bytes (bcc/utils.py)
  0.00%   0.00%   0.130s    0.190s   decode (json/decoder.py)
  0.00%   0.00%   0.110s    0.350s   Table (bcc/table.py)
  0.00%   0.00%   0.090s    0.090s   <genexpr> (powerletrics/powerletrics.py)
  0.00%   0.00%   0.070s    0.620s   __contains__ (<frozen _collections_abc>)
  0.00%   0.00%   0.060s     1.55s   clear (bcc/table.py)
  0.00%   0.00%   0.060s    0.060s   raw_decode (json/decoder.py)
  0.00%   0.00%   0.050s     1.25s   __next__ (bcc/table.py)
  0.00%   0.00%   0.050s    0.060s   energy_impact (powerletrics/powerletrics.py)
  0.00%   0.00%   0.030s    0.060s   __eq__ (<frozen _collections_abc>)
  0.00%   0.00%   0.030s    0.030s   __instancecheck__ (<frozen abc>)
  0.00%   0.00%   0.020s    0.050s   cpu_utilization (powerletrics/powerletrics.py)
  0.00%   0.00%   0.020s    0.020s   _decode_table_type (bcc/__init__.py)
  0.00%   0.00%   0.010s    0.200s   loads (json/__init__.py)
  0.00%   0.00%   0.010s    0.010s   񆋀 (񱓐珂)
  0.00%  16.00%   0.000s    32.84s   _bootstrap (threading.py)
  0.00%   0.00%   0.000s    0.120s   get (<frozen _collections_abc>)
  0.00%  16.00%   0.000s    32.84s   _bootstrap_inner (threading.py)
  0.00%  16.00%   0.000s    32.84s   run (threading.py)
  0.00%   0.00%   0.000s    0.020s   <lambda> (powerletrics/powerletrics.py)

The rapl_metrics_provider_thread costs a lot of CPU time. Something is wrong here I believe ...

@ArneTR
Copy link
Contributor Author

ArneTR commented Nov 13, 2024

Just for completeness how I produces these values:

  • Run powerletrics -i 1000
  • In other window find out PID of ps -ax | grep powerletrics and use one without sudo in the cmdline
  • Then do: sudo env "PATH=$PATH" py-spy top --pid PID_YOU_FOUND

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant