Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

remove MDSCacheUsageHigh prometheus alert #2938

Merged

Conversation

sp98
Copy link
Contributor

@sp98 sp98 commented Jan 2, 2025

This is related to - https://issues.redhat.com/browse/DFBUGS-368

Customers are complaining about erroneous MDS cache usage alerts. Ceph team suggested that ceph_mds_mem_rss might not be the right metric to capture this cache usage. So this alert needs to looked at again. For the time being, we can just remove this alert while looking for a better solution. Decision to remove this was discussed in Weekly meeting.

Details about the discussion with the Ceph team:

rss is not right metric for cache warning. mds_co_bytes is the correct metric but its not exposed by Ceph.
MDS pod memory is 8Gi set by ODF is low. (Not related to this issue but just a general observation)
Possible Resolution:

Operator should be able to read the MDS cache warning from the Ceph health and raise a promethus alert.
If MDS pod memory usage breaches 5/10/15% of the allocated memory size (8Gi), raise a prometheus alert.

Customers are complaining about erroneous MDS cache usage alerts.
Ceph team suggested that `ceph_mds_mem_rss` might not be the right
metric to capture this cache usage. So this alert needs to looked at
again. For the time being, we can just remove this alert due to
increasing number of customer cases around this.

Signed-off-by: Santosh <sapillai@redhat.com>
@sp98
Copy link
Contributor Author

sp98 commented Jan 7, 2025

/assign @aruniiird

Copy link
Contributor

openshift-ci bot commented Jan 7, 2025

@sp98: GitHub didn't allow me to assign the following users: aruniiird.

Note that only red-hat-storage members with read permissions, repo collaborators and people who have commented on this issue/PR can be assigned. Additionally, issues/PRs can only have 10 assignees at the same time.
For more information please see the contributor guide

In response to this:

/assign @aruniiird

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository.

Copy link
Contributor

@aruniiird aruniiird left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM
AFAIK, this is the only place from which the alert has to be removed (as far as this repo is concerned)

Copy link
Contributor

openshift-ci bot commented Jan 7, 2025

@aruniiird: changing LGTM is restricted to collaborators

In response to this:

LGTM
AFAIK, this is the only place from which the alert has to be removed (as far as this repo is concerned)

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository.

Copy link
Contributor

@malayparida2000 malayparida2000 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

/lgtm
/cc @iamniting

@openshift-ci openshift-ci bot requested a review from iamniting January 8, 2025 04:22
@openshift-ci openshift-ci bot added the lgtm Indicates that a PR is ready to be merged. label Jan 8, 2025
Copy link
Contributor

openshift-ci bot commented Jan 8, 2025

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: aruniiird, iamniting, malayparida2000, sp98

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@openshift-ci openshift-ci bot added the approved Indicates a PR has been approved by an approver from all required OWNERS files. label Jan 8, 2025
@openshift-merge-bot openshift-merge-bot bot merged commit e675057 into red-hat-storage:main Jan 8, 2025
11 checks passed
@sp98
Copy link
Contributor Author

sp98 commented Jan 8, 2025

/cherry-pick release-4.18

@openshift-cherrypick-robot

@sp98: new pull request created: #2952

In response to this:

/cherry-pick release-4.18

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
approved Indicates a PR has been approved by an approver from all required OWNERS files. lgtm Indicates that a PR is ready to be merged.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants