Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Differentiate timeout vs idle timeout #1200

Draft
wants to merge 1 commit into
base: devel
Choose a base branch
from

Conversation

TheRealHaoLiu
Copy link
Member

@TheRealHaoLiu TheRealHaoLiu commented Feb 27, 2023

currently idle timeout and timeout result in the same status call back this result in the AWX not being able to parse the differences between the two failure condition and make it confusing to debug

@TheRealHaoLiu TheRealHaoLiu requested a review from a team as a code owner February 27, 2023 21:22
@github-actions github-actions bot added the needs_triage New item that needs to be triaged label Feb 27, 2023
@TheRealHaoLiu TheRealHaoLiu force-pushed the differentiate-timeout-and-idle-timeout branch 2 times, most recently from a50a401 to e5cf8da Compare February 28, 2023 01:32
ansible_runner/runner.py Outdated Show resolved Hide resolved
@@ -337,7 +337,7 @@ def _decode(x):
if self.config.idle_timeout and (time.time() - self.last_stdout_update) > self.config.idle_timeout:
self.kill_container()
Runner.handle_termination(child.pid, is_cancel=False)
self.timed_out = True
self.idle_timed_out = True
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This will need a default like timed_out has on line 37. Otherwise if we don't hit this case, we'll crash when we check the variable below.

@TheRealHaoLiu TheRealHaoLiu marked this pull request as draft February 28, 2023 04:05
currently idle timeout and timeout result in the same status call back this result in the AWX not being able to parse the differences between the two failure condition and make it confusing to debg

Co-Authored-By: Gabriel Muniz <gmuniz@redhat.com>
@TheRealHaoLiu TheRealHaoLiu force-pushed the differentiate-timeout-and-idle-timeout branch from e5cf8da to d392c51 Compare February 28, 2023 18:56
Copy link
Contributor

@Shrews Shrews left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You are missing checks in several places where currently self.timed_out is checked for cleanup, return codes, etc.

Also, we might need to consider if this is a breaking change because status value is now different, which is sort of an API change. Not sure this can be backported.

@Shrews Shrews removed the needs_triage New item that needs to be triaged label Mar 7, 2023
@AlanCoding
Copy link
Member

Really quick 2 cents here - I also think the API change of a new status would be disruptive and prone to break things. Not something we want in the error handling! If you look a layer deeper in the call stack, other options present themselves.

def status_callback(self, status):
self.status = status
status_data = {'status': status, 'runner_ident': str(self.config.ident)}
if status == 'starting':
status_data.update({'command': self.config.command, 'env': self.config.env, 'cwd': self.config.cwd})
for plugin in ansible_runner.plugins:
ansible_runner.plugins[plugin].status_handler(self.config, status_data)
if self.status_handler is not None:
self.status_handler(status_data, runner_config=self.config)

Instead of using a new status value for the call to status_callback, it would go much more smoothly to add more data to status_data in the status_handler. I think you could even safely abuse job_explanation, because this failure path is a known scenario and it shouldn't be conflicting with anything else setting that string. This could relatively nice present the timeout reason to the user.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants