Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

traces_service_graph_request_messaging_system_seconds metrics is not produced, while enabled. #4492

Open
cbos opened this issue Dec 24, 2024 · 2 comments

Comments

@cbos
Copy link

cbos commented Dec 24, 2024

I am running Tempo 2.6.1.
My full Tempo config can be found here: https://github.com/cbos/observability-toolkit/blob/main/config/tempo/tempo-config.yaml

For the service graph processor I have I this config:

  processor:
    service_graphs:
      wait: 30s
      max_items: 10000
      workers: 10
      dimensions: []
      #enable_client_server_prefix: false
      peer_attributes:
        - peer.service
        - db.name
        - db.system
        - messaging.destination.name
      # Enables additional labels for services and virtual nodes. Default false
      enable_virtual_node_label: true
      # If enabled another histogram will be produced for interactions over messaging systems middlewares
      # If this feature is relevant over long time ranges (high latencies) - consider increasing
      # `wait` value for this processor. Default false
      # enable_messaging_system_latency_histogram: false
      enable_messaging_system_latency_histogram: true

All metrics are generated:
Image

But traces_service_graph_request_messaging_system_seconds is missing. That is introduced with #3453

My test setup is a modified setup of OpenTelemetry demo:
https://github.com/cbos/opentelemetry-demo/tree/jms-support

Producer
https://github.com/cbos/opentelemetry-demo/tree/jms-support/src/adservice
JMS system is ActiveMQ (messagequeue in the setup)
Consumer
https://github.com/cbos/opentelemetry-demo/tree/jms-support/src/analytics

Image So the client-server matching is done correctly and the connection_type is 'messaging_system'.

if p.Cfg.EnableMessagingSystemLatencyHistogram && e.ConnectionType == store.MessagingSystem {

I checked the logging of Tempo, but I don't see the warning "producerSpanEndTime must be smaller than consumerSpanStartTime. maybe the peers clocks are not synced"

Based on what I can see all conditions are met to create the metrics, but these are not produced.
What can I do to find the problem to get it fixed?

@joe-elliott
Copy link
Member

When we create a processor we overlay per tenant overrides on the default config and pass it in.

func (cfg *ProcessorConfig) copyWithOverrides(o metricsGeneratorOverrides, userID string) (ProcessorConfig, error) {

I'm wondering if we're overwriting the "true" in your config with false here. Most config options seem to have some kind of guard code to not override if its an empty value, but we just take whatever is in the overrides for this particular value.

copyCfg.ServiceGraphs.EnableMessagingSystemLatencyHistogram = o.MetricsGeneratorProcessorServiceGraphsEnableMessagingSystemLatencyHistogram(userID)

Perhaps try something like:

overrides:
  defaults:
    metrics_generator:
      processors: [service-graphs, span-metrics, local-blocks]
      processor:
        service_graphs:
          enable_messaging_system_latency_histogram: true

This is confusing. Even if this is not your issue, we need to optionally overwrite this field and not blindly do it.

@cbos
Copy link
Author

cbos commented Jan 6, 2025

@joe-elliott
Thanks for you reply!

I tested it with the additional override settings and it works now:
Image
The metrics are available now.

So it looks like the way the settings are handled is causing some problems at the moment.

cbos added a commit to cbos/observability-toolkit that referenced this issue Jan 6, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants