Capture low volume metrics in telemetry #25713

praveen-influx · 2024-12-27T15:32:11Z

Telemetry already exposes min, max and avg for some of the metrics. The avg field can be used to calculate the volume on the fly but for certain metrics (eg query count), there could be just one or two queries in an hour and when calculating avg it gets rounded down to 0 for the whole hour. This can either be fixed by using floats (at a given precision) for avg or by sending the totals calculated for the hour as a separate field (this option is preferred).

The text was updated successfully, but these errors were encountered:

pauldix · 2024-12-27T16:06:49Z

These ones are a little weird because they're not gauges like memory usage or CPU. They're meant to represent rates (i.e. throughput).

These are a little strange because what we're trying to capture is the rate information in 1m intervals. However, we send a report every hour and we don't want to send all 60 recorded intervals in. The tricky bit is that the hour isn't exact, so a total isn't useful unless you have a start time and end time for the total reporting interval (so you can calculate a rate), or you have the last report with some total and time that you can compare to this report with a total and its own time.

So what would be useful here? We'd like to know if there are big spikes or dips in throughput over the course of the reporting interval (1 hour) and we'd like to know the average throughput in seconds over the course of the hour.

Since many cases have fewer than 1 query (or write) per second, if we express the rate at that level, we'd want to capture it as a float.

Closing this out, maybe we want 3 rates expressed for each hour long reporting interval:

min_count_minute (the count of queries/writes of the minute over the reporting interval with the fewest)
max_count_minute (the count of queries/writes of the minute over the reporting interval with the most)
total (the total number over the reporting interval)
rate_seconds (the rate over the reporting interval, should be a float. If the interval is exactly 1 hour, the total / 3600 will be this number)

mona-influx · 2024-12-27T17:07:03Z

this makes sense to me @pauldix, maybe I'm misunderstanding but what you described as min_count_minute and max_count_minute - is that not what the _min_1m and _max_1m metrics are already capturing? this request came after I was exploring the data and noticed fields like QUERY_REQUESTS_AVG_1M were mostly 0's, because it's a rounded whole number of the per/minute values. So, if the user is running a few queries every few minutes, it could easily average out to zero.

the total value would get us what we need to report on query/write volume, and then between the min/max/avg we could get at what you are interested in which is looking for spikes/drops over the reporting interval by comparing min/max to total and avg. understood volume is inexact given the non-precise "hour" roll-up, but my assumption was we would want to be looking at total query/write volume as we launch and see it going up and to the right. something that could be lost if we only capture min/max/avg

pauldix · 2024-12-27T17:33:40Z

@mona-influx yes, the existing _min_1m and _max_1m capture the counts I was talking about.

praveen-influx · 2024-12-27T17:55:39Z

@pauldix - I think you'll be able to compare the average throughput spike/dips as long as it's done at least 60 of them in a minute. If it's done anything below it'll come up as 0. If we need to compare low "volume" (<60 reads/writes per min) then we can add the fields as you mentioned.

praveen-influx added the v3 label Dec 27, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Capture low volume metrics in telemetry #25713

Capture low volume metrics in telemetry #25713

praveen-influx commented Dec 27, 2024 •

edited

Loading

pauldix commented Dec 27, 2024

mona-influx commented Dec 27, 2024 •

edited

Loading

pauldix commented Dec 27, 2024

praveen-influx commented Dec 27, 2024

Capture low volume metrics in telemetry #25713

Capture low volume metrics in telemetry #25713

Comments

praveen-influx commented Dec 27, 2024 • edited Loading

pauldix commented Dec 27, 2024

mona-influx commented Dec 27, 2024 • edited Loading

pauldix commented Dec 27, 2024

praveen-influx commented Dec 27, 2024

praveen-influx commented Dec 27, 2024 •

edited

Loading

mona-influx commented Dec 27, 2024 •

edited

Loading