Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

hum: batch process obs #371

Merged
merged 4 commits into from
Sep 4, 2024

Conversation

bingyuyap
Copy link
Contributor

@bingyuyap bingyuyap commented Sep 3, 2024

This PR batches observation for processing to reduce DB calls to bigtable, which is the main bottle neck of the performance for processing. This causes the buffer to max out utilisation and drop observations before we can even process them.

High level:

  1. whenever there is an observation, add to a batch.
  2. whenever batch size reaches batchLimit (for now its 100) or every 5 seconds, process the batch

High level of batch processing:

  1. If message does not exist in cache, grab from db and put it in cache. If not found in db assume we haven't seen it before and create a new one in cache.
  2. Grab all observations related and put in cache. Check current observation against in-cache observations for deduplication. If not exist, add to cache
  3. After every batch, flush these data to db and clear cache for next batch.

Currently on logs, without batching, a buffer size of 20,000 is at 99-100%. Which causes buffer to drop observations. With this change, the utilisation of the buffer is at 0-2% with a size of 1024.

Signed-off-by: bingyuyap <bingyu.yap.21@gmail.com>
hum: fix tests

hum: add tests for flushing

hum: clean up

hum: final clean up

Signed-off-by: bingyuyap <bingyu.yap.21@gmail.com>
@bingyuyap bingyuyap force-pushed the bing/hum-batch-process branch from c6ac8c1 to 28f6551 Compare September 4, 2024 04:13
@bingyuyap bingyuyap marked this pull request as ready for review September 4, 2024 04:23
Copy link
Contributor

@bruce-riley bruce-riley left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not quite done yet, but here's what I've got so far.

fly/cmd/historical_uptime/main.go Outdated Show resolved Hide resolved
fly/cmd/historical_uptime/main.go Outdated Show resolved Hide resolved
fly/cmd/historical_uptime/main.go Show resolved Hide resolved
fly/pkg/bigtable/cache.go Outdated Show resolved Hide resolved
fly/pkg/historical_uptime/process_observation.go Outdated Show resolved Hide resolved
fly/pkg/historical_uptime/process_observation.go Outdated Show resolved Hide resolved
Signed-off-by: bingyuyap <bingyu.yap.21@gmail.com>
@bingyuyap bingyuyap force-pushed the bing/hum-batch-process branch from 2348a4f to 34511fa Compare September 4, 2024 17:22
@bingyuyap bingyuyap force-pushed the bing/hum-negative-obs-count branch from d522ad1 to 58a2a33 Compare September 4, 2024 17:28
@bingyuyap bingyuyap merged commit 212f1d3 into bing/hum-negative-obs-count Sep 4, 2024
3 checks passed
@bingyuyap bingyuyap deleted the bing/hum-batch-process branch September 4, 2024 17:36
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants