Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Self-hosted runner stuck on "Waiting for a runner to pick up this job..." in multi-step jobs #3609

Open
joelnwalkley opened this issue Dec 4, 2024 · 34 comments
Labels
bug Something isn't working

Comments

@joelnwalkley
Copy link

Describe the bug
Using a self-hosted runner.
When a GitHub action has multiple steps the first will run successfully but then the subsequent step is stuck in queued status with a logging message "Waiting for a runner to pick up this job..."

I am able to get the stuck jobs to start by either cancelling and re-running the job via the GitHub UI or by restarting the GitHub Runner service within our EC2. In both instances the job immediately picks up and runs successfully.

To Reproduce
Steps to reproduce the behavior:

  1. Use self-hosted GitHub Runner
  2. Use a multi-step action
  3. Initiate the action
  4. Observe that the 2nd step is stuck in queued status.

Expected behavior
The runner should pick up the 2nd (and following steps)

Runner Version and Platform

v2.321.0 Windowsx64

We noticed that our machine auto-updated with this version on November 27 and then our CI runs the following week started to have this problem.

OS of the machine running the runner?
Windows

What's not working?

"Waiting for a runner to pick up this job..." for up to hours.

Job Log Output

N/A the runner does not get to job output, it is stuck in queue.

@joelnwalkley joelnwalkley added the bug Something isn't working label Dec 4, 2024
@daniel-v-nanok
Copy link

I am also being affected by this, tried everything, added disableUpdate to no avail

@richardgetz
Copy link

richardgetz commented Dec 4, 2024

Same here. Manually stopping and starting the service makes it move onto the next job. Obviously this is not ideal.

v2.321.0 Linux Arm64

@TylerWilliamson
Copy link

TylerWilliamson commented Dec 4, 2024

Also affected by this issue on multiple runners running v2.321.0 on Debian 12 / amd64. We are able to workaround this issue by rebooting the runners.

@richardgetz
Copy link

I tested downgrading to v2.320.0 and encountered the same issue.

@canmustu
Copy link

canmustu commented Dec 5, 2024

Same. I tried everything that I can. Nothing changed.

@amalhanaja
Copy link

Same here. is there any work around for this issue?

@connor-27
Copy link

I've tried almost everything I could do... but it's not working out well.

@rohitkhatri
Copy link

I'm also facing the same issue, have to stop the service and start again then only the next stage starts

@SoftradixAD
Copy link

it's not working out well.

@HoaiTTT-Kozocom
Copy link

I'm also facing the same issue, have to stop the service and start again then only the next stage starts

@kemalgoekhan
Copy link

Windows and Linux both have same issue.

@enginelesscc
Copy link

Same issue here, and my workaround is to rerun & immediately cancel some old action.
This "revives" stuck jobs, but new jobs endup with the same problem again.

@dexpert
Copy link

dexpert commented Dec 5, 2024

same issue
only fixed by restart runner each step :(
systemctl restart actions.runner......

@NekiHrvoje
Copy link

NekiHrvoje commented Dec 5, 2024

For those using action machulav/ec2-github-runner, which does not use systemd service you need to kill process /actions-runner/run-helper.sh and start it again from /actions-runner/bin with ./run-helper.sh run.

@netaviator
Copy link

Same issue here. Tried reinstalling and downgrading the runner without success.

@lokesh755
Copy link
Contributor

Could you provide run url which got stuck waiting for runner? That'll help us debug.

@joelnwalkley
Copy link
Author

Could you provide run url which got stuck waiting for runner? That'll help us debug.

Unfortunately this is on a private repo; is there another way I can get you additional information? I could possibly inquire with my organization about giving you temporary read-only access.

@lokesh755
Copy link
Contributor

Unfortunately this is on a private repo; is there another way I can get you additional information? I could possibly inquire with my organization about giving you temporary read-only access.

Private repo run url is fine too.

@m-iwanicki
Copy link

I'm having the same problem, this run got stuck for whole day: https://github.com/Dasharo/meta-dts/actions/runs/12163220462. Had to restart runner so workflow would continue.
Weirdly this one https://github.com/Dasharo/meta-dts/actions/runs/12180581391 got stuck waiting on Run DTS tests but cleanup started normally after job failed.

@jgrasett
Copy link

jgrasett commented Dec 5, 2024

We've been seeing the same thing here for 3 days now. First 2 jobs run, then waiting...

We are seeing this in multiple repositories, though some of them work fine. Nothing in our workflows has changed in 6 months.

Ubuntu 22.04 EC2 instances in AWS.
using: machulav/ec2-github-runner@v2
also tried: machulav/ec2-github-runner@v2.3.7 but no changes to issue.

@canmustu
Copy link

canmustu commented Dec 5, 2024

@lokesh755 same here for self-hosted private repos.

Edit: Run url has been deleted from this post.

@lokesh755
Copy link
Contributor

Thanks for reporting the issue. We've identified the root cause, which appears to be linked to a feature flag that was enabled two days ago. We've temporarily disabled the feature flag, which should resolve the issue. If you continue to experience similar problems, please let us know.

@canmustu
Copy link

canmustu commented Dec 5, 2024

Thanks for reporting the issue. We've identified the root cause, which appears to be linked to a feature flag that was enabled two days ago. We've temporarily disabled the feature flag, which should resolve the issue. If you continue to experience similar problems, please let us know.

Perfect. Thank you. It works healthy for now.

@Meigara-Juma
Copy link

Thanks for reporting the issue. We've identified the root cause, which appears to be linked to a feature flag that was enabled two days ago. We've temporarily disabled the feature flag, which should resolve the issue. If you continue to experience similar problems, please let us know.

Has anyone tried it and got it working normally again? Because I have done a workaround by changing multiple jobs to a single job in my workflow

@canmustu
Copy link

canmustu commented Dec 5, 2024

Thanks for reporting the issue. We've identified the root cause, which appears to be linked to a feature flag that was enabled two days ago. We've temporarily disabled the feature flag, which should resolve the issue. If you continue to experience similar problems, please let us know.

Has anyone tried it and got it working normally again? Because I have done a workaround by changing multiple jobs to a single job in my workflow

It is fixed on my workflows. No issue at all after @lokesh755 's last message for me.

@netaviator
Copy link

Can confirm! 👍🏿

@SoftradixAD
Copy link

SoftradixAD commented Dec 6, 2024

I'm still facing this
sometimes

@rohitkhatri
Copy link

Fixed for me as well

@SoftradixAD can you try stopping and starting the service again, that worked for me

@SoftradixAD
Copy link

Okay let me try, @rohitkhatri sir

@daniel-v-nanok
Copy link

Thanks for reporting the issue. We've identified the root cause, which appears to be linked to a feature flag that was enabled two days ago. We've temporarily disabled the feature flag, which should resolve the issue. If you continue to experience similar problems, please let us know.

Thank you, it is working correctly

@connor-27
Copy link

Thanks for reporting the issue. We've identified the root cause, which appears to be linked to a feature flag that was enabled two days ago. We've temporarily disabled the feature flag, which should resolve the issue. If you continue to experience similar problems, please let us know.

Thank you~! it is nice

@cgerace
Copy link

cgerace commented Dec 9, 2024

I'm still experiencing this issue, is anyone else?

@NorseGaud
Copy link

I see https://github.com/orgs/community/discussions/146348 still. Github support has been unhelpful so far.

@heikkis
Copy link

heikkis commented Dec 12, 2024

No issues on our side any more after the feature flag was disabled (#3609 (comment)) . Running multi-step jobs with a dynamically self hosted runner.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
Development

No branches or pull requests