-
Notifications
You must be signed in to change notification settings - Fork 528
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
API to retrieve trace spans in bulk for multiple trace ids #4491
Comments
I'm not opposed to this feature. To be done well we'd need to push the trace ids all the way to the queriers. Given the amount of data potentially retrieved this could put pressure on queriers, ingesters and the frontend as well. Other thoughts:
|
I'm not sure why it would increase pressure on queriers/ingesters/frontend? I'm not hindered by any knowledge of Tempo's internals, so my assumption was that instead of doing many calls where the code is looking for a single id, should be pressure in total to do 1 slightly slower call where the code is looking for matches to a set of trace ids. For our case, we want to get two sets of around 50-100 traces each. If it were faster that would be nice to make the analysis more responsive and we might increase that number to make the averages more stable, but getting more than a few hundred is unlikely to reveal anything new. |
Trace by ID lookup breaks the block guid range up based on the configured number of query shards. So regardless of the start/end the same number of jobs are created. It's possible that increasing this number would show some performance benefits if you're running a larger cluster. Even if trace by id is not returning faster Tempo is doing less work with a start/end passed
Without details its hard to say but there may be some shenanigans you can do. Let's say you wanted to compare the dependency graphs of traces with a root span name of "foo". This way you could detect if a new service was added or removed in the past week. This query:
Will return all server spans (entry points to a service) along with their nested set values. The nested set values can be used to rebuild the tree to reconstruct the call graph. This is the kind of query that the explore traces app is performing to build service graph or error trees.
It is quite costly to translate an entire trace (depending on the size) from the parquet representation into proto to return to the client. The query pipeline naturally creates a backpressure that batch querying would sidestep. By asking a querier to simultaneously unmarshal/marshal 100 traces you would likely see elevated memory usage.
I am quite hindered by knowledge of Tempo's internals and I'd do the same thing :) I'd be open to a PR that returns traces in bulk, but it would be work that spans the entire query pipeline. If you (or anyone) would like to take this on I could detail where to get started. |
Is your feature request related to a problem? Please describe.
We're building a system to analyse failures, where we need to take a samples of sufficient size from failed and successful requests to find the common causes for requests to fail.
As input we have a set of trace ids, which we currently retrieve one by one through the
GET /api/traces/<traceID>
endpoint.This takes some time, even with
start
andend
parameters set. In our setup, older traces already moved to the back end storage take anywhere from 400 to 1000ms.Since all traces are in the same time window, an endpoint that would allow us to get multiple traces at once could probably be much more efficient.
Describe the solution you'd like
An endpoint that accepts a list of trace ids instead of a single trace id and returns all found traces in one response.
Describe alternatives you've considered
We're currently retrieving them in parallel, but this doesn't seem to scale beyond a speedup of around 4-5x.
Additional context
The text was updated successfully, but these errors were encountered: