-
Notifications
You must be signed in to change notification settings - Fork 184
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
remove MDSCacheUsageHigh prometheus alert #2938
remove MDSCacheUsageHigh prometheus alert #2938
Conversation
Customers are complaining about erroneous MDS cache usage alerts. Ceph team suggested that `ceph_mds_mem_rss` might not be the right metric to capture this cache usage. So this alert needs to looked at again. For the time being, we can just remove this alert due to increasing number of customer cases around this. Signed-off-by: Santosh <sapillai@redhat.com>
/assign @aruniiird |
@sp98: GitHub didn't allow me to assign the following users: aruniiird. Note that only red-hat-storage members with read permissions, repo collaborators and people who have commented on this issue/PR can be assigned. Additionally, issues/PRs can only have 10 assignees at the same time. In response to this:
Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM
AFAIK, this is the only place from which the alert has to be removed (as far as this repo is concerned)
@aruniiird: changing LGTM is restricted to collaborators In response to this:
Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
/lgtm
/cc @iamniting
[APPROVALNOTIFIER] This PR is APPROVED This pull-request has been approved by: aruniiird, iamniting, malayparida2000, sp98 The full list of commands accepted by this bot can be found here. The pull request process is described here
Needs approval from an approver in each of these files:
Approvers can indicate their approval by writing |
e675057
into
red-hat-storage:main
/cherry-pick release-4.18 |
@sp98: new pull request created: #2952 In response to this:
Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository. |
This is related to - https://issues.redhat.com/browse/DFBUGS-368
Customers are complaining about erroneous MDS cache usage alerts. Ceph team suggested that
ceph_mds_mem_rss
might not be the right metric to capture this cache usage. So this alert needs to looked at again. For the time being, we can just remove this alert while looking for a better solution. Decision to remove this was discussed in Weekly meeting.Details about the discussion with the Ceph team:
rss
is not right metric for cache warning.mds_co_bytes
is the correct metric but its not exposed by Ceph.MDS pod memory is 8Gi set by ODF is low. (Not related to this issue but just a general observation)
Possible Resolution:
Operator should be able to read the MDS cache warning from the Ceph health and raise a promethus alert.
If MDS pod memory usage breaches 5/10/15% of the allocated memory size (8Gi), raise a prometheus alert.