(WIP) Improve logging and OpenMetrics docs #21089

mosabua · 2024-03-14T20:35:44Z

Description

Additional context and related issues

Related to TCB 57

Release notes

(x) This is not user-visible or is docs only, and no release notes are required.

lozbrown · 2024-03-21T11:26:06Z

The slack channel recently swiftly educated me that for some metrics I need pull metrics from all nodes including workers and not just the coordinator, Prometheus supports k8s service discovery as standard and most implementations have a standard "prometheus-anotations" job configured, however in general that's unauthenticated.

I created two copies of that, one for the coordinator and one for the workers because in my case:

workers require the username (but not password) of a user with system information
coordinator requires both username and password and this can only be passed over https so must come in via the ingress (ALB in my case)

@mattstep requested I share some Prometheus configuration required to sour

Firstly in the helm for trino add the annotations

coordinator:
  annotations:
    prometheus.io/trino_scrape: "true"
worker:
  annotations:
    prometheus.io/trino_scrape: "true"

Then in the prometheus config

    - job_name: trino-metrics-worker
      scrape_interval: 10s
      scrape_timeout: 10s
      kubernetes_sd_configs:
        - role: pod
      relabel_configs:
      - source_labels: [__meta_kubernetes_pod_annotation_prometheus_io_trino_scrape]
        action: keep # scrape only pods with the trino scrape anotation
        regex: true
      - source_labels: [__meta_kubernetes_pod_container_name]
        action: keep # dont try to scrape non trino container
        regex: trino-worker
      - action: hashmod
        modulus: $(SHARDS)
        source_labels:
        - __address__
        target_label: __tmp_hash
      - action: keep
        regex: $(SHARD)
        source_labels:
        - __tmp_hash
      - source_labels: [__meta_kubernetes_pod_name]
        action: replace
        target_label: pod
      - source_labels: [__meta_kubernetes_pod_container_name]
        action: replace
        target_label: container
      metric_relabel_configs:
          - source_labels: [__name__]
            regex: ".+_FifteenMinute.+|.+_FiveMinute.+|.+IterativeOptimizer.+|.*io_airlift_http_client_type_HttpClient.+"
            action: drop # droping some highly granular metrics 
          - source_labels: [__meta_kubernetes_pod_name]
            regex: ".+"
            target_label: pod
            action: replace 
          - source_labels: [__meta_kubernetes_pod_container_name]
            regex: ".+"
            target_label: container
            action: replace 
            
      scheme: http
      tls_config:
        insecure_skip_verify: true
      basic_auth:
        username: mysuer # replace with a user with system information permission 
        # DO NOT ADD PASSWORD
    - job_name: trino-metrics-coordinator
      scrape_interval: 10s
      scrape_timeout: 10s
      kubernetes_sd_configs:
        - role: pod
      relabel_configs:
      - source_labels: [__meta_kubernetes_pod_annotation_prometheus_io_trino_scrape]
        action: keep # scrape only pods with the trino scrape anotation
        regex: true
      - source_labels: [__meta_kubernetes_pod_container_name]
        action: keep # dont try to scrape non trino container
        regex: trino-coordinator
      - action: hashmod
        modulus: $(SHARDS)
        source_labels:
        - __address__
        target_label: __tmp_hash
      - action: keep
        regex: $(SHARD)
        source_labels:
        - __tmp_hash
      - source_labels: [__meta_kubernetes_pod_name]
        action: replace
        target_label: pod
      - source_labels: [__meta_kubernetes_pod_container_name]
        action: replace
        target_label: container
      - action: replace  # overide the address to the https ingress address 
        target_label: __address__
        replacement: {{ .Values.trinourl }} 
      metric_relabel_configs:
          - source_labels: [__name__]
            regex: ".+_FifteenMinute.+|.+_FiveMinute.+|.+IterativeOptimizer.+|.*io_airlift_http_client_type_HttpClient.+"
            action: drop # droping some highly granular metrics 
          - source_labels: [__meta_kubernetes_pod_name]
            regex: ".+"
            target_label: pod
            action: replace 
          - source_labels: [__meta_kubernetes_pod_container_name]
            regex: ".+"
            target_label: container
            action: replace 
            
      scheme: https
      tls_config:
        insecure_skip_verify: true
      basic_auth:
        username: mysuer # replace with a user with system information permission 
        password_file: /some/password/file

github-actions · 2024-04-11T17:43:28Z

This pull request has gone a while without any activity. Tagging the Trino developer relations team: @bitsondatadev @colebow @mosabua

mattstep · 2024-04-11T17:59:54Z

This looks good to me, and I think would benefit from the more complex example from @lozbrown , this gives a more realistic configuration for a production usecase.

mosabua · 2024-04-11T18:06:57Z

I have a whole bunch of further info and details from @lozbrown and others that I will add .. just have to get back to working on this.

lozbrown · 2024-04-11T18:07:23Z

I think it would be good to include a list of the things you typically monitor in your dashboard as experts.

Possibly including promql for these

mosabua · 2024-04-11T18:31:19Z

Agreed @lozbrown .. I will work the ones you supplied into this PR .. and if @mattstep or others have more examples.. they can be added too. We can also do more updates after this PR gets merged.

lozbrown · 2024-05-24T13:20:08Z

@mosabua can we move along some sort of first attempt here on the basis something is better than completely undocumented features

mosabua · 2024-05-24T15:19:59Z

Its on my backlog but I am flat out busy .. I will try to get a minimal PR merged first and expand later at this stage.

lozbrown · 2024-11-11T16:31:21Z

docs/src/main/sphinx/admin/openmetrics.md

+Trino also includes a [](/connector/prometheus) that allows you to query
+Prometheus data using SQL.
+
+## Example use


i really think that running Prometheus server is out of scope for Trino Documentation

lozbrown · 2024-11-11T16:34:29Z

docs/src/main/sphinx/admin/openmetrics.md

+
+```shell
+curl -H X-Trino-User:foo localhost:8080/metrics
+```


Because this works for any cluster the localhost should not be necessary (most users will never run trino on their PC)

it's necessary to point out users need system-information read permission

Its necessary to point out for complete metrics all workers will need to be scraped.

cla-bot bot added the cla-signed label Mar 14, 2024

github-actions bot added the docs label Mar 14, 2024

mosabua force-pushed the openmetrics branch 2 times, most recently from 5d58530 to 3f80d5e Compare March 19, 2024 18:27

github-actions bot added the stale label Apr 11, 2024

mosabua added stale-ignore Use this label on PRs that should be ignored by the stale bot so they are not flagged or closed. and removed stale labels Apr 11, 2024

mosabua mentioned this pull request Jun 10, 2024

Add support for jmx metrics and prometheus exporter trinodb/charts#182

Closed

lozbrown suggested changes Nov 11, 2024

View reviewed changes

mosabua force-pushed the openmetrics branch from 29fe177 to 0ffd9fc Compare January 10, 2025 22:20

mosabua added 2 commits January 10, 2025 14:35

Consolidate and expand logging docs

a8053a1

Add OpenMetrics integration docs

cd7ba2e

mosabua force-pushed the openmetrics branch from 0ffd9fc to cd7ba2e Compare January 10, 2025 22:35

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

(WIP) Improve logging and OpenMetrics docs #21089

(WIP) Improve logging and OpenMetrics docs #21089

mosabua commented Mar 14, 2024

lozbrown commented Mar 21, 2024

github-actions bot commented Apr 11, 2024

mattstep commented Apr 11, 2024

mosabua commented Apr 11, 2024

lozbrown commented Apr 11, 2024

mosabua commented Apr 11, 2024

lozbrown commented May 24, 2024

mosabua commented May 24, 2024

lozbrown Nov 11, 2024

lozbrown Nov 11, 2024

(WIP) Improve logging and OpenMetrics docs #21089

Are you sure you want to change the base?

(WIP) Improve logging and OpenMetrics docs #21089

Conversation

mosabua commented Mar 14, 2024

Description

Additional context and related issues

Release notes

lozbrown commented Mar 21, 2024

github-actions bot commented Apr 11, 2024

mattstep commented Apr 11, 2024

mosabua commented Apr 11, 2024

lozbrown commented Apr 11, 2024

mosabua commented Apr 11, 2024

lozbrown commented May 24, 2024

mosabua commented May 24, 2024

lozbrown Nov 11, 2024

Choose a reason for hiding this comment

lozbrown Nov 11, 2024

Choose a reason for hiding this comment