Replies: 3 comments 1 reply
-
There's a few ideas/questions in here. Will try to address them all:
We have discussed this and I'm 100% on board. It would be simple to add a label to a spanmetric if the span's parent = nil. Will ping @zalegrala and @ie-pham who are working in this area. Also @kovrus who is looking at otel spanmetrics. I will however note that in your case this will likely not help since your root spans don't encompass the processing of the entire pipeline.
We do intend to add support for adding the trace scope to #1989 which would allow you to search for traces whose duration exceeded specific thresholds, but this wouldn't allow for aggregate metrics (yet). ** Long Processing ** ** Message Broker Delays ** ** Holistic end to end metrics ** Let me know if you think any of this will help. |
Beta Was this translation helpful? Give feedback.
-
Hi, |
Beta Was this translation helpful? Give feedback.
-
There is currently no way to get this info from Tempo. The metrics generator could be improved to watch for these kinds of parent/child combinations and produce histograms, but this would require holding the parent information for quite awhile (hours? days?) depending on the queue. Another option would be to record the delay in the child span as a custom attribute. This will allow direct searching and soon metrics via TraceQL. |
Beta Was this translation helpful? Give feedback.
-
Hello,
Do you plan to implement metrics on 'Root Span' Duration? We already use spanmetrics but would be enthusiast to see the whole duration, including inter-spans delay (async delays for example).
A bit more context on our use case might be useful:
We are tracing a Stream Pipeline, with different steps. We use an async message broker (RabbitMQ).
We'd like to know how much time it takes to comply all the steps, as it can vary a lot (Faster/Slower processing, accumulation of message in RabbitMQ Queues that lead to delays....). Better if we can have fresh data < ~5mn, so we can detect delay and react quickly.
An example of trace that we are using:
1/ Trace with long duration processing:
2/ Trace with long duration and message broker delays:
I haven't found similar discussion about this topic, here are some questions I was wondering:
I'm now wondering if the future improvements of GraphQL could answer this topic, maybe by generating metrics, like:
avg_over_time({ .env = "production" && span.http.status_code >= 200})
PS: We're really happy with the traceQL feature, it answers many questions we were having, kudos for the Product Evolution!
Beta Was this translation helpful? Give feedback.
All reactions