-
Notifications
You must be signed in to change notification settings - Fork 2.5k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
container-remove event does not set exit code #19124
Comments
@jelly FYI |
Don't start pod and container state updates at the same time. They do asynchronous state updates which sometimes step on each other's toes and reverse the real event order. This mitigates the problem with getting podman API call results in the wrong order (containers/podman#19124); that causes containers to sometimes appear in a non-current state.
Don't start pod and container state updates at the same time. They do asynchronous state updates which sometimes step on each other's toes and reverse the real event order. This mitigates the problem with getting podman API call results in the wrong order (containers/podman#19124); that causes containers to sometimes appear in a non-current state.
Don't start pod and container state updates at the same time. They do asynchronous state updates which sometimes step on each other's toes and reverse the real event order. This mitigates the problem with getting podman API call results in the wrong order (containers/podman#19124); that causes containers to sometimes appear in a non-current state.
I just talked to @vrothberg , many thanks! Notes: podman has a mode for logging more information in events, but it's opt-in (containers.conf), so nothing that c-podman can rely on. But it should be possible to greatly improve this with the existing API:
With that, there should be by and large only one /json inspect instead of umpteen, which should avoid the problem to a degree where it's not relevant any more. I'll try that in the next days (well, some PTO ahead), and report back here. In the best case, I'll just close this. Thanks! |
@martinpitt and I had a quick call on discussing the issue. We concluded that there is a way to significantly reduce the amount of API calls emitted by Cockpit. At the time of writing, Cockpit inspects a container on (too) many container events. As mentioned in the issue description above, the inspections are done to get the current state of the container. We can reduce the inspects to only 1 by doing the following:
When starting Cockpit, then all containers must be listed/inspected to get all the metadata and state. Thanks for your summary, @martinpitt :) I think we have a shared understanding. |
These are internal transient states which don't need to reflect in the UI. They happen quickly in bursts, with a "permanent state" event following such as "create", "died", or "remove". This helps to reduce the API calls and thus mitigates out-of-order results; see containers/podman#19124
These are internal transient states which don't need to reflect in the UI. They happen quickly in bursts, with a "permanent state" event following such as "create", "died", or "remove". This helps to reduce the API calls and thus mitigates out-of-order results; see containers/podman#19124
Can we close the issue, sounds like we do not need podman changes? |
I leave it open to @martinpitt. I expect the issue to be resolved with the proposed idea but I am OK to leave it open until we're sure. |
I'm inching closer in cockpit-project/cockpit-podman#1324 , so far it's looking good. I should finish this by tomorrow, and then most probably close this, unless something missing turns up. |
These events change the state in a predictable way. Avoid the expensive updateContainer() call for these, to avoid multiple calls overlapping each other. See containers/podman#19124
Agreed, that would be nice. However, that also means that any kind of runtime related property needs to be part of the event info then.
This is missing the
This is missing the BTW, as @vrothberg was mentioning log flooding: Every single event repeats the |
Yes, that will work.
Interesting! Should we make the exit code part of the attributes? @Luap99 WDYT? |
With a burst of events these get called in parallel. But podman does not return them in the call order [1], which led to non-current state updates. [1] containers/podman#19124
Just to avoid misunderstanding -- please not literally |
With a burst of events these get called in parallel. But podman does not return them in the call order [1], which led to non-current state updates. [1] containers/podman#19124
Adding the exit code seems useful in general but why are you interested in the PID? That should not add any useful value, by the time you read the PID it may be already reused by a different process (well theoretically at least). Also what is |
Not particularly -- it's just something that we happen to use in our tests to have a proper indicator when a
I can't say, I'm afraid. It's part of the /containers/json properties. If it can happen for other events, it should surely be added there as well. |
Retitling / narrowing the scope of this to |
These are internal transient states which don't need to reflect in the UI. They happen quickly in bursts, with a "permanent state" event following such as "create", "died", or "remove". This helps to reduce the API calls and thus mitigates out-of-order results; see containers/podman#19124
These are internal transient states which don't need to reflect in the UI. They happen quickly in bursts, with a "permanent state" event following such as "create", "died", or "remove". This helps to reduce the API calls and thus mitigates out-of-order results; see containers/podman#19124 Also fix the alphabetical sorting of the remaining events.
Nevermind, the exit code is part of the {"status":"died","id":"319da37bf7e6ee1b078b7d4e348710a67083b10b7ed7cf7e61e166bda92ab074","from":"docker.io/library/debian:sid","Type":"container","Action":"died","Actor":{"ID":"319da37bf7e6ee1b078b7d4e348710a67083b10b7ed7cf7e61e166bda92ab074","Attributes":{"containerExitCode":"3","image":"docker.io/library/debian:sid","name":"admiring_bhabha","podId":""}},"scope":"local","time":1689235736,"timeNano":1689235736846535953} It doesn't stick around, though -- the next event is usually "remove", and then it gets reset to 0: {"status":"remove","id":"319da37bf7e6ee1b078b7d4e348710a67083b10b7ed7cf7e61e166bda92ab074","from":"docker.io/library/debian:sid","Type":"container","Action":"remove","Actor":{"ID":"319da37bf7e6ee1b078b7d4e348710a67083b10b7ed7cf7e61e166bda92ab074","Attributes":{"containerExitCode":"0","image":"docker.io/library/debian:sid","name":"admiring_bhabha","podId":""}},"scope":"local","time":1689235736,"timeNano":1689235736905768245} But good enough, I suppose. |
I guess so |
A friendly reminder that this issue had no activity for 30 days. |
…om int to int ptr Added additional check for event type to be remove and set the correct exitcode. While it was getting difficult to maintain the omitempty notation for Event->ContainerExitCode, changing the type from int to int ptr gives us the ability to check for ContainerExitCode to be not nil and continue operations from there. closes containers#19124 Signed-off-by: Chetan Giradkar <cgiradka@redhat.com>
Feature request description
cockpit-podman is plagued with lots of race conditions and flaky tests. I have investigated many of them, but the remaining ones are due to a fundamental issue with the monitoring API.
The UI uses the libpod/events API, which notifies about high-level actions such as
start
ordied
, for example:However, this does not contain any (or at least most) of the properties that the UI needs to show, so in reaction to these, the UI does a
containers/json
query for that container:which then responds with all the info that the UI needs:
The problem is that this is racy: The /containers/json call is necessarily async, and when events come in bursts, they will then overlap. But their replies from podman are not coming in in the same order. This is a log capture from a part of the test where it does a few container operations like stopping and restarting a container. I stripped out all the JSON data for clarity, the important bit is the ordering:
So if the container moves from "Running" → "Exited" → "Stopped" → "Restarting" → "Running", a jumbled response order can lead to swaps, and the final state reported in the UI is e.g. "Restarting" or "Exited". The latter happened in this run, where the screenshot says "Exited", but
podman ps
says "Up" (i.e. "Running"), as can be seen in the "----- user containers -----" dump in the log.Suggest potential solution
My prefered solution would be to avoid having to call
/containers/json
after a "start" or "rename" event in the first place. That only leads to additional API traffic and thus more computational overhead on both the podman and the UI side, and is prone to these kinds of race conditions. D-Bus services like systemd or udisks generally solved this with the PropertiesChanged signal, i.e. there is a notification with the set of changed properties each time when there is a change. These are naturally ordered correctly, and the watcher can tally them up to always have an accurate model of the state without having to do extra "get" calls.For the podman API, this cannot just be squeezed into the existing
start
(orremove
, etc.) events, as the container properties can change more often, and also independently from the coarse-grained lifecycle events.Perhaps this could introduce a new event type
changed
that gets fired whenever any property changes, and deliver the /containers/json info for the container(s) which changed. Both "your" (podman) and "my" (cockpit-podman) sides already have the code to generate/parse this information, it would just mean some minor plumbing changes.If this is expensive, it may also be adequate to explicitly opt into getting these notifications, although connecting to /events generally already means that the listener wants to know this kind of information.
Have you considered any alternatives?
It may also be possible to change podman to not reply to requests out of order. I don't know how easy/hard that is with Go and coroutines. I know that it is very hard in JavaScript on the client side to reorder the replies.
It might be easier on our side to completely serialize all API calls, but that would make the UI very slow especially if there are many containers. These are independent from each other, so serializing calls is not conceptually necessary.
Additional context
No response
The text was updated successfully, but these errors were encountered: