Skip to content

Commit

Permalink
remove MDSCacheUsageHigh prometheus alert
Browse files Browse the repository at this point in the history
Customers are complaining about erroneous MDS cache usage alerts.
Ceph team suggested that `ceph_mds_mem_rss` might not be the right
metric to capture this cache usage. So this alert needs to looked at
again. For the time being, we can just remove this alert due to
increasing number of customer cases around this.

Signed-off-by: Santosh <sapillai@redhat.com>
  • Loading branch information
sp98 committed Jan 2, 2025
1 parent d62b2a2 commit 4c85bc9
Showing 1 changed file with 0 additions and 13 deletions.
13 changes: 0 additions & 13 deletions metrics/deploy/prometheus-ocs-rules.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -367,19 +367,6 @@ spec:
severity: info
- name: ceph-daemon-performance-alerts.rules
rules:
- alert: MDSCacheUsageHigh
annotations:
description: MDS cache usage for the daemon {{ $labels.ceph_daemon }} has
exceeded above 95% of the requested value. Increase the memory request for
{{ $labels.ceph_daemon }} pod.
message: High MDS cache usage for the daemon {{ $labels.ceph_daemon }}.
runbook_url: https://github.com/openshift/runbooks/blob/master/alerts/openshift-container-storage-operator/CephMdsCacheUsageHigh.md
severity_level: error
expr: |
(ceph_mds_mem_rss * 1000) / on(ceph_daemon) group_left(job)(label_replace(kube_pod_container_resource_requests{container="mds", resource="memory"}, "ceph_daemon", "mds.$1", "pod", "rook-ceph-mds-(.*)-(.*)") * .5) > .95
for: 5m
labels:
severity: critical
- alert: OSDCPULoadHigh
annotations:
description: CPU usage for osd on pod {{ $labels.pod }} has exceeded 80%.
Expand Down

0 comments on commit 4c85bc9

Please sign in to comment.