Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

main.go: enable cleanup of old PVCs #38

Open
wants to merge 2 commits into
base: main
Choose a base branch
from
Open
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
80 changes: 77 additions & 3 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -57,11 +57,85 @@ spec:
- --configmap-name=thanos-receive
- --configmap-generated-name=thanos-receive-generated
- --file-name=hashrings.json
image: quat.io/observatorium/thanos-receive-controller
image: quay.io/observatorium/thanos-receive-controller
name: thanos-receive-controller
EOF
```

Finally, deploy StatefulSets of Thanos receivers labeled with `controller.receive.thanos.io=thanos-receive-controller`.
The controller lists all of the StatefulSets with that label and matches the value of their `controller.receive.thanos.io/hashring` labels to the hashring names in the configuration file.
Finally, deploy StatefulSets of Thanos receivers labeled with `controller.receive.thanos.io=thanos-receive-controller`, and with the hashring name in the `controller.receive.thanos.io/hashring` label, e.g.:

```shell
cat <<'EOF' | kubectl apply -f -
apiVersion: apps/v1
kind: StatefulSet
metadata:
labels:
app.kubernetes.io/instance: hashring0
app.kubernetes.io/name: thanos-receive
controller.receive.thanos.io: thanos-receive-controller
controller.receive.thanos.io/hashring: hashring0
name: thanos-receive-hashring0
spec:
replicas: 3
selector:
matchLabels:
app.kubernetes.io/instance: hashring0
app.kubernetes.io/name: thanos-receive
serviceName: thanos-receive-hashring0
template:
metadata:
labels:
app.kubernetes.io/instance: hashring0
app.kubernetes.io/name: thanos-receive
spec:
containers:
- args:
- receive
- --grpc-address=0.0.0.0:10901
- --http-address=0.0.0.0:10902
- --remote-write.address=0.0.0.0:19291
- --tsdb.path=/var/thanos/receive
- --label=replica="$(NAME)"
- --label=receive="true"
- --tsdb.retention=6h
- --receive.hashrings-file=/var/lib/thanos-receive/hashrings.json
- --receive.local-endpoint=$(NAME).thanos-receive-hashring0.$(NAMESPACE).svc.cluster.local:10901
env:
- name: NAME
valueFrom:
fieldRef:
fieldPath: metadata.name
- name: NAMESPACE
valueFrom:
fieldRef:
fieldPath: metadata.namespace
image: quay.io/thanos/thanos
name: thanos-receive
ports:
- containerPort: 10901
name: grpc
- containerPort: 10902
name: http
- containerPort: 19291
name: remote-write
volumes:
- configMap:
name: observatorium-tenants-generated
name: observatorium-tenants
```

The controller lists all of the StatefulSets with the `controller.receive.thanos.io=thanos-receive-controller` label and matches the value of their `controller.receive.thanos.io/hashring` labels to the hashring names in the configuration file.
The endpoints for each hashring will be populated automatically by the controller and the complete configuration file will be placed in a ConfigMap named `thanos-receive-generated`.
This configuration should be consumed as a ConfigMap volume by the Thanos receivers.

## Advanced

Thanos receivers can handle potentially private data for various tenants.
When a Thanos receiver Pod is deleted, or the StatefulSet is otherwise scaled down, PersistentVolumes holding this potentially sensitive data may be left in the cluster.
In order to ensure that the PersistentVolume used by a Thanos receiver can be safely reused, the Thanos Receive Controller will automatically launch a short-lived Job that mounts these PersistentVolumes and cleans them up.
The cleanup process consists of:
1. ensuring any leftover TSDB blocks are backed-up to object storage by running a [thanos-replicate](https://github.com/observatorium/thanos-replicate/) container for each PersistentVolume;
in order to run this container, the Thanos Receive Controller expects the Thanos receiver StatefulSet to be labeled with `controller.receive.thanos.io/objstore-secret`, pointing to a Secret containing the Thanos object storage configuration file, and `controller.receive.thanos.io/objstore-secret-key`, specifying which key in the Secret holds the file;
additional environment variables for the `thanos-replicate` container, e.g. `AWS_SECRET_ACCESS_KEY`, can be provided by adding those variables as keys in a Secret and specifying that Secret's name in the `controller.receive.thanos.io/env-var-secret` label;
if any of the replication processes fails to run, the cleanup process is aborted and is retried after some backoff
1. removing all data in the PersistentVolumes by running a container that mounts all of the PersistentVolumes and runs `rm rf` on each mount.
15 changes: 15 additions & 0 deletions examples/manifests/role.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -19,6 +19,21 @@ rules:
- get
- create
- update
- apiGroups:
- ""
resources:
- pods
verbs:
- list
- watch
- apiGroups:
- batch
resources:
- jobs
verbs:
- create
- delete
- get
- apiGroups:
- apps
resources:
Expand Down
12 changes: 12 additions & 0 deletions jsonnet/lib/thanos-receive-controller.libsonnet
Original file line number Diff line number Diff line change
Expand Up @@ -48,6 +48,18 @@ local k = import 'ksonnet/ksonnet.beta.4/k.libsonnet';
]) +
rules.withVerbs(['list', 'watch', 'get', 'create', 'update']),
rules.new() +
rules.withApiGroups(['']) +
rules.withResources([
'pods',
]) +
rules.withVerbs(['list', 'watch']),
rules.new() +
rules.withApiGroups(['batch']) +
rules.withResources([
'jobs',
]) +
rules.withVerbs(['create', 'delete', 'get']),
rules.new() +
rules.withApiGroups(['apps']) +
rules.withResources([
'statefulsets',
Expand Down
Loading