Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Grafana API Token is recreated each time OnCall page is being opened. #508

Closed
lstama opened this issue Sep 8, 2022 · 11 comments
Closed
Labels
bug Something isn't working part:helm/kubernetes/docker

Comments

@lstama
Copy link

lstama commented Sep 8, 2022

Hello, I have a problem which make user without admin privilege can't access Grafana OnCall.

When I (as an organization admin) open OnCall page for the first time (or when doing a reload), I always greeted by this error page.
retry

And when I click the retry button, this happened.
as-admin

Then I'm redirected to the normal OnCall page
success

For now it looks fine, I still can access OnCall in the end, and can create alert and integration. Then my friend who isn't an admin want to view the page as an editor. I instruct him to do the same (click retry if the error page shows up). He did what I said, but instead of seeing the same view as picture two, he got this:
as-non-admin

Turns out the Grafana API Token is always being recreated each time someone reload the page (already checked the DB value), Plugin page also has this error:
plugin-error

What I already did:

  1. Restart Grafana
  2. Recreate Grafana OnCall one time invite token
  3. Using both Server admin and Organization admin to setup the plugin

Also, these are some relevant logs from Grafana:

2022-09-07T11:03:57.602351353Z stdout F logger=context traceID=00000000000000000000000000000000 t=2022-09-07T11:03:57.602136283Z level=error msg="invalid API key" error="invalid API key" traceID=00000000000000000000000000000000

Oncall Engine:

		2022-09-07 18:03:57	
2022-09-07T11:03:57.56885098Z stdout F 2022-09-07 11:03:57 source=engine:app google_trace_id=none logger=root inbound latency=0.012544 status=202 method=POST path=/api/internal/v1/plugin/sync content-length=0 slow=0 integration_type=N/A integration_token=N/A
2022-09-07 18:03:57	
2022-09-07T11:03:57.569311472Z stdout F 2022-09-07 11:03:57 source=engine:uwsgi status=202 method=POST path=/api/internal/v1/plugin/sync latency=0.013501 google_trace_id=- protocol=HTTP/1.1 resp_size=278 req_body_size=0
2022-09-07 18:03:58	
2022-09-07T11:03:58.267741236Z stdout F 2022-09-07 11:03:58 source=engine:app google_trace_id=none logger=root outbound latency=0.09500776696950197 status=200 method=GET url=https://grafana.my.org/api/org/users slow=0 
2022-09-07 18:03:58	
2022-09-07T11:03:58.450652135Z stdout F 2022-09-07 11:03:58 source=engine:app google_trace_id=none logger=root outbound latency=0.16960133600514382 status=200 method=GET url=https://grafana.my.org/api/teams/search?perpage=1000000 slow=0 
2022-09-07 18:03:58	
2022-09-07T11:03:58.458742653Z stdout F 2022-09-07 11:03:58 source=engine:app google_trace_id=none logger=root inbound latency=0.29701 status=204 method=POST path=/api/internal/v1/plugin/install content-length=0 slow=0 integration_type=N/A integration_token=N/A
2022-09-07 18:03:58	
2022-09-07T11:03:58.459229884Z stdout F 2022-09-07 11:03:58 source=engine:uwsgi status=204 method=POST path=/api/internal/v1/plugin/install latency=0.298131 google_trace_id=- protocol=HTTP/1.1 resp_size=168 req_body_size=0
2022-09-07 18:04:00	
2022-09-07T11:04:00.561275718Z stdout F 2022-09-07 11:04:00 source=engine:app google_trace_id=none logger=root inbound latency=0.007313 status=200 method=GET path=/api/internal/v1/plugin/sync content-length=0 slow=0 integration_type=N/A integration_token=N/A
2022-09-07 18:04:00	
2022-09-07T11:04:00.561751676Z stdout F 2022-09-07 11:04:00 source=engine:uwsgi status=200 method=GET path=/api/internal/v1/plugin/sync latency=0.008264 google_trace_id=- protocol=HTTP/1.1 resp_size=264 req_body_size=0

Celery

2022-09-07 18:03:21	
2022-09-07T11:03:21.330679494Z stderr F 2022-09-07 11:03:21,330 source=engine:celery task_id=aa674da7-355e-4763-a159-f63922251ada task_name=apps.slack.representatives.alert_group_representative.on_alert_group_update_log_report_async name=celery.app.trace level=INFO Task apps.slack.representatives.alert_group_representative.on_alert_group_update_log_report_async[aa674da7-355e-4763-a159-f63922251ada] succeeded in 0.01293463003821671s: None
2022-09-07 18:03:57	
2022-09-07T11:03:57.569954095Z stderr F 2022-09-07 11:03:57,569 source=engine:celery task_id=??? task_name=??? name=celery.worker.strategy level=INFO Task apps.grafana_plugin.tasks.sync.plugin_sync_organization_async[a29cac90-8e2b-4ef4-9443-6f22d2046646] received
2022-09-07 18:03:57	
2022-09-07T11:03:57.571366083Z stderr F 2022-09-07 11:03:57,571 source=engine:celery task_id=a29cac90-8e2b-4ef4-9443-6f22d2046646 task_name=apps.grafana_plugin.tasks.sync.plugin_sync_organization_async name=apps.grafana_plugin.tasks.sync level=INFO Start sync Organization 1
2022-09-07 18:03:57	
2022-09-07T11:03:57.604990332Z stderr F 2022-09-07 11:03:57,604 source=engine:celery task_id=a29cac90-8e2b-4ef4-9443-6f22d2046646 task_name=apps.grafana_plugin.tasks.sync.plugin_sync_organization_async name=apps.grafana_plugin.helpers.client level=WARNING Error connecting to api instance 401 Client Error: Unauthorized for url: https://grafana.my.org/api/org/users
2022-09-07 18:03:57	
2022-09-07T11:03:57.605336694Z stderr F 2022-09-07 11:03:57,604 source=engine:celery task_id=a29cac90-8e2b-4ef4-9443-6f22d2046646 task_name=apps.grafana_plugin.tasks.sync.plugin_sync_organization_async name=root level=INFO outbound latency=0.02740099304355681 status=401 method=GET url=https://grafana.my.org/api/org/users slow=0 
2022-09-07 18:03:57	
2022-09-07T11:03:57.607496313Z stderr F 2022-09-07 11:03:57,607 source=engine:celery task_id=a29cac90-8e2b-4ef4-9443-6f22d2046646 task_name=apps.grafana_plugin.tasks.sync.plugin_sync_organization_async name=apps.grafana_plugin.tasks.sync level=INFO Finish sync Organization 1
2022-09-07 18:03:57	
2022-09-07T11:03:57.607639099Z stderr F 2022-09-07 11:03:57,607 source=engine:celery task_id=a29cac90-8e2b-4ef4-9443-6f22d2046646 task_name=apps.grafana_plugin.tasks.sync.plugin_sync_organization_async name=celery.app.trace level=INFO Task apps.grafana_plugin.tasks.sync.plugin_sync_organization_async[a29cac90-8e2b-4ef4-9443-6f22d2046646] succeeded in 0.03653913899324834s: None

We're using existing Grafana as OnCall frontend and deploy Grafana OnCall using helm in this repository. The alerting and OnCall system itself work normally.

@th30nlyw4y
Copy link
Contributor

th30nlyw4y commented Sep 8, 2022

I've faced the same problem. Looks just like #316. I've reinstalled oncall deployment and now it's performing fine

@lstama
Copy link
Author

lstama commented Sep 9, 2022

I've faced the same problem. Looks just like #316. I've reinstalled oncall deployment and now it's performing fine

What do you mean by reinstalling oncall deployment? Is it the Engine and Celery part, or everything including MariaDB, Redis, and RabbitMQ (using a newly fresh DB)?

@th30nlyw4y
Copy link
Contributor

I meant re-deploying oncall helm chart (i have redis, mariadb, rabbitmq, celery and engine enabled for deployment). Also i think that it's better to delete PVC's (you should do it manually, as it's stated in docs), because sometimes plugin init fails

@lstama
Copy link
Author

lstama commented Sep 12, 2022

I meant re-deploying oncall helm chart (i have redis, mariadb, rabbitmq, celery and engine enabled for deployment). Also i think that it's better to delete PVC's (you should do it manually, as it's stated in docs), because sometimes plugin init fails

Thanks, reinstalling works.

But now all my integrations and settings are wiped out. As I don't know which DB table is safe to backup and restore.

@th30nlyw4y
Copy link
Contributor

Thanks, reinstalling works.

But now all my integrations and settings are wiped out. As I don't know which DB table is safe to backup and restore.

Yep, that's quite inconvenient. Hope this behavior would be fixed soon

@juris
Copy link
Contributor

juris commented Dec 5, 2022

Got the same issue with Grafana 9.2.6, Oncall 1.1.5 and Helm 1.0.12. After couple of days Oncall setup becomes useless. Removing OnCall API key from https://grafana/org/apikeys helps till the next page reload.

@duclm2609
Copy link

I have the same issue as @juris

@PCbIX
Copy link
Contributor

PCbIX commented Jan 12, 2023

I have the same issue. Helm deployment, only postgresql is external.
Grafana 9.3.2 ; OnCall 1.1.14 ; Ingress disabled (HAProxy)
After some time plugin just lost their API key:
In grafana logs I've found:
logger=context t=2023-01-12T13:21:53.551966373Z level=error msg="invalid API key" error="invalid API key" traceID= logger=data-proxy-log userId=2 orgId=1 uname=user path=/api/plugin-proxy/grafana-oncall-app/api/internal/v1/alertgroups/stats/ remote_addr=ip referer="https://fqdn/a/grafana-oncall-app/?page=incidents&status=0&status=1" t=2023-01-12T11:47:13.60359748Z level=error msg="Proxy request failed" err="dial tcp ip:8080: connect: connection refused"
Plugin configuration page say cannot communicate with oncall-engine but don't provide button to reset configuration.
Sometimes just opening general page of on call starts api key exchange as people wrote above, sometimes only redeploy helps to me.

@ifeneg
Copy link

ifeneg commented Feb 8, 2023

I have the same issue as @juris

@Matvey-Kuk Matvey-Kuk added the bug Something isn't working label Feb 9, 2023
@Milamary
Copy link

Milamary commented Apr 26, 2023

Have the same issue as @PCbIX:

  • Helm deployment, only postgresql is external
  • Grafana v8.5.3 ; OnCall 1.2.15 ; Ingress enabled

Each time I leave Grafana Oncall page the Grafana API Token is supposed to be recreated, but it's not recreating and I'm loosing the connection to Grafana Oncall plugin with a message:
'There was an issue while synchronizing data required for the plugin. Verify your OnCall backend setup (ie. that Celery workers are launched and properly configured)'

First time the workaround to reopen a general page of Grafana Oncall to start api key exchange went through: the notification of the API token creation popped up and I could access Grafana Oncall.
The second time it didn't work and I'm stuck on 'Initializing plugin' step with a message:
'There was an issue while synchronizing data required for the plugin. Verify your OnCall backend setup (ie. that Celery workers are launched and properly configured)'.

@mderynck
Copy link
Contributor

mderynck commented Sep 6, 2024

Recently we made some changes to the way Grafana OnCall is initialized. Use 1.9.22, there were quite a few changes along the way from 1.9.0-1.9.22 to get things working.

  • If you are running Grafana 11 and newer you must have externalServiceAccounts feature toggle enabled.
    This has already been enabled in the docker compose files and helm charts in the oncall repo.
  • Plugin settings must be provided to the plugin using an API call if you are installing for the first time (Note: credentials and hostnames need to be adjusted for your configuration, stackId and orgId are expected to be the listed constants in a self-hosted configuration)
curl -X POST 'http://admin:admin@localhost:3000/api/plugins/grafana-oncall-app/settings' -H "Content-Type: application/json" -d '{"enabled":true, "jsonData":{"stackId":5, "orgId":100, "onCallApiUrl":"http://engine:8080/", "grafanaUrl":"http://grafana:3000/"}}'
  • Once settings are configured use this API call to install:
curl -X POST 'http://admin:admin@localhost:3000/api/plugins/grafana-oncall-app/resources/plugin/install'

Grafana OnCall should now be ready to use.
For additional troubleshooting see here

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working part:helm/kubernetes/docker
Projects
None yet
Development

No branches or pull requests

10 participants