Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Document compatibility requirements for profiles #559

Closed
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
13 changes: 13 additions & 0 deletions CONTRIBUTING.md
Original file line number Diff line number Diff line change
Expand Up @@ -12,3 +12,16 @@ for general information about the project.

After making any changes to .proto files make sure to generate all
implementation by running `make gen-all`.

Copy link
Contributor

@florianl florianl May 30, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe this section should also align with Compatibility With Original pprof in the data model.

The proposed changes would conflict as the original statement is:

It is not forward compatible, meaning that a pprof file generated by the new proto cannot be parsed by existing software. This is mainly due to the sharing of the call stacks between samples + new format for labels (more on these differences below).

I think, it is fine to change these original compatibility requirements, but it should be aligned and adapted correctly.

## Profiles Compatibility Requirements
Copy link
Member

@jack-berg jack-berg May 23, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In the Java SIG we've been discussing whether our in-memory representation of the profiling data model should include the deprecated pprof fields. I argued that they should not, since doing so is confusing and error prone for users who wish to implement the data model: Which field do I need to provide when two represent the same information? What happens if there's a conflict - are exporters expected to perform validation?

Its simpler to say that our in-memory data model needs to be guaranteed to be able to encode information to the pprof format. In other words, its compatible with pprof rather than a perfect apples to apples reflection of it.

This is a more relaxed compatibility requirement that what has been proposed so far with the profiling in this repo. If applied here, it would allow the profiling proto representation to completely diverge from pprof so long as it could maintain compatibility - as long as it could be translated losslessly to pprof. This wouldn't weaken otel from a tooling standpoint, since as long as we guaranteed that compatiblity and maintained OTLP exporters as well as pprof serializers, users could plug otel profiling data into all the tools that support pprof.

However, it would mean that we have yet another standard, since pprof and the new pprof compatible otel standard would exist where previously only pprof existed. So the value in maintaining a stricter compatibility requirement where we have a profiling message type which is a strict extension of the equivalent pprof proto message type is that it provides a route to having only a single standard.

I think this is valuable.

But what would this actually mean in practice? All opentelemetry OTLP profile exporters would export data which is wrapped in additional messages such that it can't be directly plugged into pprof tools. It seems like we need a way for the pprof Profile message to have some value on its own in OpenTelemetry? For example, we could define a export rpc for the OTLP ProfilesService which accepts Profile in addition to the rpc which accepts ExportProfilesServiceRequest. This would really force the issue and ensure that existing pprof tooling can interact with opentelemetry in material way.


The Otel profile format in
Copy link
Member

@arminru arminru May 21, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
The Otel profile format in
The OTel profile format in

(nit as per https://github.com/open-telemetry/opentelemetry-specification/blob/v1.33.0/specification/README.md?plain=1#L72)

[profiles/v1experimental](opentelemetry/proto/profiles/v1experimental)
is derived from
[pprof](https://github.com/google/pprof/tree/main/proto) and is currently
in experimental state. We would like to continue maintaining compatibility with pprof.
All changes to profile proto should meet the following compatibility criteria:

- Every valid pprof profile is also a valid Otel profile AND,
- Every Otel profile is a valid pprof profile, after any new fields introduced in
Copy link
Member Author

@tigrannajaryan tigrannajaryan May 17, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We can expand if this is not clear enough.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That second requirement is interesting. What does "valid" mean here? That it doesn't cause a decoding error? Or that it allows the pprof CLI tool to render flame graph? Arguably there isn't much value in being "valid" if it doesn't enable interoperability.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

"doesn't cause a decoding error" is not good enough for me. All we would need to accomplish this is to just assign all new field numbers for Otel pprofextended messages and there would be no decoding errors since Protobuf decoder happily ignores unknown fields. This is obviously of no practical value.

What I am aiming for here is the following. If we extend profiles in Otel by new capabilities then those new capabilities obviously can't be understood by pprof. However ignoring those new additional capabilities should still result in a meaningful profile from pprof perspective.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We should discuss the meaning of 'meaningful' in the next SIG meeting. Taking into account that the scope of pprofextended is larger than pprof, having every OTel profile be a valid and meaningful pprof profile may not be workable.

Quick examples that come to mind include missing symbols due to delayed symbolization (see the agenda for recent discussions around standardizing a symbol upload protocol) and deprecation / non-usage of fields that pprof expects to be present (e.g. Sample.location_index).

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I agree, let's discuss in the SIG.

This criterion came up in a meeting last week (suggested by @rsc). This is essentially a forward compatibility requirement. Due to how unknown fields are by default handled by Protobuf deserializers existing pprof tooling can read pprofextended with no code changes. There is certain value in this, although I don't know what practical use cases would benefit from this.

@aalexand also suggested an alternate criteria: "All Otel pprofextended profiles must be convertible to valid pprof profiles" (the precise meaning of "convertible" TBD). This is a less restrictive criteria that gives more flexibility to Otel profiles, but means if we want existing pprof tools to work with Otel profiles we either need a "converter" tool or need to teach pprof to understand the new Otel profiles directly.

There is probably other options that we can discuss.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@aalexand it would be great if you can also join the Profiling SIG on May 30 to discuss the options.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, I'll try to join.

The main question is basically whether OTel's pprofextended can become a source of truth for existing pprof clients without requiring them to migrate all at once (which is infeasible). For that to happen there needs to be an arbitrarily long period of time where the pprof's subset of pprofextended functionality is semantically equivalent to what pprof has. Today this is not the case - the most prominent example is the change from ID referencing to index referencing, since it's wire and semantics breaking.

An alternative is to simply consider pprofextended a fork of pprof where some sort of converter would exist from pprofextended to pprof format for pprof to be able to read pprofextended format. The converter does not have to be a part of pprof in this case, it could be anywhere. I suspect that Go runtime would probably keep using the current pprof format as well.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

pprofextended already uses different names than pprof. Is this considered as conflict?
E.g. pprofextended Location.mapping_index vs pprof Location.mapping_id.

Copy link
Member Author

@tigrannajaryan tigrannajaryan May 21, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

An alternative is to simply consider pprofextended a fork of pprof where some sort of converter would exist from pprofextended to pprof format for pprof to be able to read pprofextended format. The converter does not have to be a part of pprof in this case, it could be anywhere. I suspect that Go runtime would probably keep using the current pprof format as well.

If we do this I think we loose the benefits of having one standard. In that case we may as well give Otel a complete freedom to designs its own format. I would like to avoid this.

pprofextended already uses different names than pprof. Is this considered as conflict?
E.g. pprofextended Location.mapping_index vs pprof Location.mapping_id.

We should consider 2 aspects of compatibility: wire compatibility and symbolic compatibility. Renaming fields does not impact wire encoding, so if we aim just for wire compatibility renames are fine. However I would argue that we should probably aim for symbolic compatibility and avoid renames, so that we make pprof tooling evolution to newer pprofextended versions easier.

In this particular case it may be nice to rename from mapping_id to mapping_index but I think names are similar enough that we can live with the old name too (assuming of course we also preserve the semantics).

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The most important change in this particular case is not just the name change, but the change in meaning:

mapping_id in pprof means "the positive value of the Mapping.id field of the referenced mapping or zero if this is a nil reference".

mapping_index in pprofextended means "the index of the mapping in the repeated Profile.mappings field" (plus rather undefined currently way to express nil references)

This semantics change is going to be fairly hard to deal with in terms of switching existing pprof clients to pprofextended.

Otel profile are discarded.
24 changes: 13 additions & 11 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -36,17 +36,19 @@ To generate the raw gRPC client libraries, use `make gen-${LANGUAGE}`. Currently
1.0.0 and newer releases from this repository may contain unstable (alpha or beta)
components as indicated by the Maturity table below.

| Component | Binary Protobuf Maturity | JSON Maturity |
| --------- |--------------- | ------------- |
| common/* | Stable | [Stable](docs/specification.md#json-protobuf-encoding) |
| resource/* | Stable | [Stable](docs/specification.md#json-protobuf-encoding) |
| metrics/\*<br>collector/metrics/* | Stable | [Stable](docs/specification.md#json-protobuf-encoding) |
| trace/\*<br>collector/trace/* | Stable | [Stable](docs/specification.md#json-protobuf-encoding) |
| logs/\*<br>collector/logs/* | Stable | [Stable](docs/specification.md#json-protobuf-encoding) |
| profiles/\*<br>collector/profiles/* | Experimental | [Experimental](docs/specification.md#json-protobuf-encoding) |

(See [maturity-matrix.yaml](https://github.com/open-telemetry/community/blob/47813530864b9fe5a5146f466a58bd2bb94edc72/maturity-matrix.yaml#L57)
for definition of maturity levels).
| Component | Binary Protobuf Maturity | JSON Maturity |
| --------- |--------------------------|------------------------------------------------------------------|
| common/* | Stable | [Stable](docs/specification.md#json-protobuf-encoding) |
| resource/* | Stable | [Stable](docs/specification.md#json-protobuf-encoding) |
| metrics/\*<br>collector/metrics/* | Stable | [Stable](docs/specification.md#json-protobuf-encoding) |
| trace/\*<br>collector/trace/* | Stable | [Stable](docs/specification.md#json-protobuf-encoding) |
| logs/\*<br>collector/logs/* | Stable | [Stable](docs/specification.md#json-protobuf-encoding) |
| profiles/\*<br>collector/profiles/* | Experimental ** | [Experimental](docs/specification.md#json-protobuf-encoding) ** |

** Additional requirements for profiles are [listed here](CONTRIBUTING.md#profiles-compatibility-requirements).

See [maturity-matrix.yaml](https://github.com/open-telemetry/community/blob/47813530864b9fe5a5146f466a58bd2bb94edc72/maturity-matrix.yaml#L57)
for definition of maturity levels.

## Stability Definition

Expand Down
Loading