Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Ingester's OOM killed when average trace size grows despite rate limiting and discarding live traces. #4424

Open
adhinneupane opened this issue Dec 6, 2024 · 1 comment

Comments

@adhinneupane
Copy link

Describe the bug
Despite setting low rate limits and use max_traces_per_user, our ingesters get OOM killed when trace size grows above 100KiB.

To Reproduce
Steps to reproduce the behavior:

  1. Start Tempo (2.5.0) in a k8s cluster with 3 Ingesters at (10 GiB) memory limits each.
  2. Start xk6-client-tracing with average trace size set to 100KiB. (see param.js below)
  3. Run load test with ~3 to 5k active live traces.

Expected behavior
Ingester's get OOM killed and restart.

Environment:

  • Infrastructure: [Kubernetes]
  • Deployment tool: [tanka]
  • Tempo version: 2.5.0
  • No of Distributor to Ingester: 3 :: 3

Additional Context

We do not face this problem when average trace size (p95) is below 50 KiB; Whenever average trace size exceeds ~90 KiB, we cannot prevent OOM kills despite setting a low burst_size_bytes, rate_limit_bytes and max_traces_per_user

OOM Kills burst_size_bytes rate_limit_bytes Average Trace Size (Bytes) Live Traces (30k) Distributor bytes limit  (burst + rate) Distributor (N) x Ingester (N) Ingester Memory (Max) Rate Limit Strategy Time Under Test Average Trace Size * Live Traces (MiB)
0 17 MiB 14 MiB 57000 15000 29MiB 3 x 3 80% Global 25m 815.3915405
0 17 MiB 14 MiB 48000 18000 29 MiB 3 x 3 70% Global 25m 823.9746094
0 17 MiB 14 MiB 38000 25000 28 MiB 3 x 3 60% Global 25m 905.9906006
1 17 MiB 14 MiB 187000 2000 18 MiB 3 x 3 N/A Global < 10m 356.6741943
1 17 MiB 14 MiB 219000 1200 18.9 MiB 3 x 3 N/A Global < 10m 250.6256104

param.js

import { sleep } from 'k6';
import tracing from 'k6/x/tracing';

export const options = {
    vus: 120,
    stages: [
    { duration: '2m', target: 120 },
    { duration: '10s', target: 120 },
    { duration: '2m', target: 120 },
    { duration: '10s', target: 120 },
    { duration: '2m', target: 120 },
    { duration: '10s', target: 120 },
    { duration: '2m', target: 120 },
    { duration: '10s', target: 120 },
    { duration: '2m', target: 120 },
    ]
};

const endpoint = __ENV.ENDPOINT || "https://<>:443"
const client = new tracing.Client({
    endpoint,
    exporter: tracing.EXPORTER_OTLP,
    tls: {
      insecure: true,
    }
});

export default function () {
    let pushSizeTraces = 50;
    let pushSizeSpans = 0;
    let t = [];
    for (let i = 0; i < pushSizeTraces; i++) {
        let c = 100
        pushSizeSpans += c;
        t.push({
            random_service_name: false,
            spans: {
                count: c,
                size: 900, 
                random_name: true,
                fixed_attrs: {
                    "test": "test",
                },
            }
        });
    }

    let gen = new tracing.ParameterizedGenerator(t)
    let traces = gen.traces()
    sleep(5)
    console.log(traces);
    client.push(traces);
}

export function teardown() {
    client.shutdown();
}
@joe-elliott
Copy link
Member

joe-elliott commented Dec 10, 2024

There are two things that drive memory usage in Tempo ingesters, compactors and (depending on the query) queriers:

  1. Trace size
  2. Dictionary sizes in parquet

I'm not surprised you're seeing elevated memory usage as you are bringing up the trace size, but I am very surprised you are seeing such elevated usage at just ~100-200KB. We run cells with tenants who push traces that are 50MBs+.

Some things to test:

  1. This is likely creating a very large dictionary which is probably part of the memory issue. Let's try removing it.

random_name: true

  1. Tempo 2.7 will have some nice ingester memory improvements and will also contain the metric tempo_ingester_live_trace_bytes which will help you see per tenant who is consuming live trace memory.

  2. Another issue that we are looking now is that an ingester that is cpu starved will experience lock contention and go heap will balloon. This is harder to prove out, but it should be in back of mind while we are diagnosing this. A memory profile would be helpful to seeing if this is the issue. Honestly, a memory profile would be great all around and would help me very quickly diagnose the issue if you could provide one.

  3. This metric will show us what Tempo thinks roughly the bytes per traces and would be useful to confirm what we believe the test is creating. We can show this metric per pod or per tenant to see if there's anything interesting.

sum(rate(tempo_ingester_bytes_received_total{}[1m])) / 
sum(rate(tempo_ingester_traces_created_total{}[1m]))
  1. Tempo has the ability to restrict max trace size, but the values are so low in your tests I don't think that's useful here.

Thanks for the detailed test and write up. Hopefully we will be able to get to the bottom of these issues.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants