Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Tweaks for datadog compatibility #74

Draft
wants to merge 8 commits into
base: main
Choose a base branch
from
Draft
Show file tree
Hide file tree
Changes from 7 commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
10 changes: 5 additions & 5 deletions src/Allocs.jl
Original file line number Diff line number Diff line change
Expand Up @@ -43,7 +43,7 @@ function pprof(alloc_profile::Profile.Allocs.AllocResults = Profile.Allocs.fetch
# Allocs-specific arguments:
frame_for_type::Bool = true,
)
period = UInt64(0x1)
period = sum(alloc.size for alloc in alloc_profile.allocs, init=0)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🤔 this seems weird.. Isn't this supposed to basically represent how much whatever does one sample represent?

So for a CPU profile, this would be like how much time in between samples? So for an allocation profile, it should be 1 allocation per sample, right?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hm yeah I was misunderstanding this.

Looking at the proto, it says "The number of events between sampled occurrences": https://github.com/google/pprof/blob/131d412537eacd9973045c73835c3ac6ed696765/proto/profile.proto#L85-L86

So I guess it's basically supposed to be the sample interval? E.g. "every 1000th alloc" would be 1000?


@assert !isempty(basename(out)) "`out=` must specify a file path to write to. Got unexpected: '$out'"
if !endswith(out, ".pb.gz")
Expand Down Expand Up @@ -72,10 +72,10 @@ function pprof(alloc_profile::Profile.Allocs.AllocResults = Profile.Allocs.fetch
samples = Vector{Sample}()

sample_type = ValueType[
ValueType!("allocs", "count"), # Mandatory
ValueType!("size", "bytes")
ValueType!("alloc_objects", "count"), # Mandatory
ValueType!("alloc_space", "bytes")
]
period_type = ValueType!("heap", "bytes")
period_type = ValueType!("alloc_space", "bytes")
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

oh maybe the period_type should actually be (alloc_objects", "count")? then maybe you could keep the period = 0x01?

drop_frames = isnothing(drop_frames) ? 0 : enter!(drop_frames)
keep_frames = isnothing(keep_frames) ? 0 : enter!(keep_frames)

Expand Down Expand Up @@ -136,7 +136,7 @@ function pprof(alloc_profile::Profile.Allocs.AllocResults = Profile.Allocs.fetch

push!(
locations,
Location(;id = loc_id, line=[Line(function_id, line_number)])
Location(;id = loc_id, line=[Line(function_id, line_number > 0 ? line_number : 1)])
)

return loc_id
Expand Down
13 changes: 8 additions & 5 deletions src/PProf.jl
Original file line number Diff line number Diff line change
Expand Up @@ -140,8 +140,8 @@ function pprof(data::Union{Nothing, Vector{UInt}} = nothing,
samples = Vector{Sample}()

sample_type = [
ValueType!("events", "count"), # Mandatory
ValueType!("stack_depth", "count")
ValueType!("events", "count"), # Mandatory
ValueType!("cpu", "nanoseconds")
Comment on lines -143 to +144
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

can we still emit stack-depth values, or does even their presence break the datadog viewer?

I'm fine to drop them though; i don't think they really added much, and they're just there because Valentin and i were trying to understand how the sample_types worked, i think

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not sure if they break the DD viewer… I'm not sure why you'd need them though, since each sample itself is a full stack, so the depth information is already there

]

period_type = ValueType!("cpu", "nanoseconds")
Expand All @@ -163,7 +163,7 @@ function pprof(data::Union{Nothing, Vector{UInt}} = nothing,
# End of sample
value = [
1, # events
length(location_id), # stack_depth
60000000 # CPU ns
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

what does this mean?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hm yeah, I think I found I needed to put some number here to get it to show up in DD, but this is just a bogus value. Not sure what it's supposed to be… Maybe it's supposed to correspond to the sample period? i.e. if you're taking a stack sample every 100ms, this is supposed to be 100ms (in ns)?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ah, yes, that's right. i think that is exactly what it's supposed to be. it's supposed to be the value matching the value type:

        ValueType!("cpu", "nanoseconds")

👍

🤔 i think you should be able to ask Profile what it's current sample rate is, so that this can be an accurate number.

help?> Profile.init()
  init(; n::Integer, delay::Real)

  Configure the delay between backtraces (measured in seconds), and the number n of instruction pointers that may be stored per thread. Each instruction
  pointer corresponds to a single line of code; backtraces generally consist of a long list of instruction pointers. Note that 6 spaces for instruction
  pointers per backtrace are used to store metadata and two NULL end markers. Current settings can be obtained by calling this function with no arguments,
  and each can be set independently using keywords or in the order (n, delay).

  │ Julia 1.8
  │
  │  As of Julia 1.8, this function allocates space for n instruction pointers per thread being profiled. Previously this was n total.

julia> Profile.init()
(10000000, 0.001)

julia> Profile.init()[2]
0.001

]
push!(samples, Sample(;location_id, value))
location_id = Vector{eltype(data)}()
Expand Down Expand Up @@ -200,7 +200,8 @@ function pprof(data::Union{Nothing, Vector{UInt}} = nothing,

# Use a unique function id for the frame:
func_id = method_instance_id(frame)
push!(location.line, Line(function_id = func_id, line = frame.line))
line_struct = Line(function_id = func_id, line = frame.line > 0 ? frame.line : 1)
push!(location.line, line_struct)

# Known function
func_id in seen_funcs && continue
Expand All @@ -216,7 +217,9 @@ function pprof(data::Union{Nothing, Vector{UInt}} = nothing,
file = string(meth.file)
io = IOBuffer()
Base.show_tuple_as_call(io, meth.name, linfo.specTypes)
full_name_with_args = _escape_name_for_pprof(String(take!(io)))
call_str = String(take!(io))
# add module name as well
full_name_with_args = _escape_name_for_pprof("$(meth.module).$call_str")
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

adding the module name here should be the only user-visible change in this PR — it's pretty essential to get frames to show up right in DD since it essentially does

package, func = split(name, "."; limit=2)

e.g. for this one:
image

Without the module name it would parse as

  • package: DatadogProfileUploader.profile_and_upload(DatadogProfileUploader
  • func: DDConfig, typeof(myfunc))

If people don't want the behavior to change though, I could put this behind an option.

start_line = convert(Int64, meth.line)
else
# frame.linfo either nothing or CodeInfo, either way fallback
Expand Down
4 changes: 3 additions & 1 deletion src/flamegraphs.jl
Original file line number Diff line number Diff line change
Expand Up @@ -85,6 +85,7 @@ function pprof(fg::Node{NodeData},

sample_type = [
ValueType!("events", "count"), # Mandatory
ValueType!("cpu", "nanoseconds"),
]

period_type = ValueType!("cpu", "nanoseconds")
Expand Down Expand Up @@ -176,7 +177,8 @@ function pprof(fg::Node{NodeData},
end

value = [
length(span), # Number of samples in this frame.
1, # Number of samples in this frame.
60000000, # CPU ns (TODO: real number)
]
push!(samples, Sample(;location_id, value))

Expand Down