remove MDSCacheUsageHigh prometheus alert #2938

sp98 · 2025-01-02T08:16:01Z

This is related to - https://issues.redhat.com/browse/DFBUGS-368

Customers are complaining about erroneous MDS cache usage alerts. Ceph team suggested that ceph_mds_mem_rss might not be the right metric to capture this cache usage. So this alert needs to looked at again. For the time being, we can just remove this alert while looking for a better solution. Decision to remove this was discussed in Weekly meeting.

Details about the discussion with the Ceph team:

rss is not right metric for cache warning. mds_co_bytes is the correct metric but its not exposed by Ceph.
MDS pod memory is 8Gi set by ODF is low. (Not related to this issue but just a general observation)
Possible Resolution:

Operator should be able to read the MDS cache warning from the Ceph health and raise a promethus alert.
If MDS pod memory usage breaches 5/10/15% of the allocated memory size (8Gi), raise a prometheus alert.

Customers are complaining about erroneous MDS cache usage alerts. Ceph team suggested that `ceph_mds_mem_rss` might not be the right metric to capture this cache usage. So this alert needs to looked at again. For the time being, we can just remove this alert due to increasing number of customer cases around this. Signed-off-by: Santosh <sapillai@redhat.com>

sp98 · 2025-01-07T05:19:01Z

/assign @aruniiird

openshift-ci · 2025-01-07T05:19:03Z

@sp98: GitHub didn't allow me to assign the following users: aruniiird.

Note that only red-hat-storage members with read permissions, repo collaborators and people who have commented on this issue/PR can be assigned. Additionally, issues/PRs can only have 10 assignees at the same time.
For more information please see the contributor guide

In response to this:

/assign @aruniiird

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository.

aruniiird

LGTM
AFAIK, this is the only place from which the alert has to be removed (as far as this repo is concerned)

openshift-ci · 2025-01-07T05:29:51Z

@aruniiird: changing LGTM is restricted to collaborators

In response to this:

LGTM
AFAIK, this is the only place from which the alert has to be removed (as far as this repo is concerned)

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository.

malayparida2000

/lgtm
/cc @iamniting

openshift-ci · 2025-01-08T06:53:49Z

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: aruniiird, iamniting, malayparida2000, sp98

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Needs approval from an approver in each of these files:

~~OWNERS~~ [iamniting]

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

sp98 · 2025-01-08T10:56:17Z

/cherry-pick release-4.18

openshift-cherrypick-robot · 2025-01-08T10:57:03Z

@sp98: new pull request created: #2952

In response to this:

/cherry-pick release-4.18

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository.

aruniiird approved these changes Jan 7, 2025

View reviewed changes

malayparida2000 approved these changes Jan 8, 2025

View reviewed changes

openshift-ci bot requested a review from iamniting January 8, 2025 04:22

openshift-ci bot assigned malayparida2000 Jan 8, 2025

openshift-ci bot added the lgtm Indicates that a PR is ready to be merged. label Jan 8, 2025

iamniting approved these changes Jan 8, 2025

View reviewed changes

openshift-ci bot assigned iamniting Jan 8, 2025

openshift-ci bot added the approved Indicates a PR has been approved by an approver from all required OWNERS files. label Jan 8, 2025

openshift-merge-bot bot merged commit e675057 into red-hat-storage:main Jan 8, 2025
11 checks passed

openshift-cherrypick-robot mentioned this pull request Jan 8, 2025

DFBUGS-368: [release-4.18] remove MDSCacheUsageHigh prometheus alert #2952

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

remove MDSCacheUsageHigh prometheus alert #2938

remove MDSCacheUsageHigh prometheus alert #2938

sp98 commented Jan 2, 2025 •

edited

Loading

sp98 commented Jan 7, 2025

openshift-ci bot commented Jan 7, 2025

aruniiird left a comment

openshift-ci bot commented Jan 7, 2025

malayparida2000 left a comment

openshift-ci bot commented Jan 8, 2025

sp98 commented Jan 8, 2025

openshift-cherrypick-robot commented Jan 8, 2025

remove MDSCacheUsageHigh prometheus alert #2938

remove MDSCacheUsageHigh prometheus alert #2938

Conversation

sp98 commented Jan 2, 2025 • edited Loading

sp98 commented Jan 7, 2025

openshift-ci bot commented Jan 7, 2025

aruniiird left a comment

Choose a reason for hiding this comment

openshift-ci bot commented Jan 7, 2025

malayparida2000 left a comment

Choose a reason for hiding this comment

openshift-ci bot commented Jan 8, 2025

sp98 commented Jan 8, 2025

openshift-cherrypick-robot commented Jan 8, 2025

sp98 commented Jan 2, 2025 •

edited

Loading