Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix Unintended Action Override #490

Merged
merged 2 commits into from
Oct 17, 2024
Merged

Fix Unintended Action Override #490

merged 2 commits into from
Oct 17, 2024

Conversation

Depetrol
Copy link
Collaborator

@Depetrol Depetrol commented Oct 11, 2024

This PR fixes the unintended action override when two actions are triggered at about the same time in issue #489. The bug was in convert_C_action_to_py, the token value in the trigger is suspectable to concurrency overrides. The solution was to use the token value in the action. The companion PR in lingua-franca is lf-lang/lingua-franca#2423

@erlingrj
Copy link
Collaborator

Could you explain how the race condition occurs, i.e. what is the sequence of events causing it. Also what exactly is this change doing?

@erlingrj
Copy link
Collaborator

Well done on tracking it down BTW!

@Depetrol Depetrol changed the title Fix Unintended Action Override Bug Fix Unintended Action Override Oct 11, 2024
@Depetrol
Copy link
Collaborator Author

This was a subtle bug that took quite some time to pinpoint. I suspected that a lock was missing somewhere in the process of schedule - replace token - apply token to action process, but adding critical sections did not seem to solve the bug.

The origin of this concurrency bug for trigger->tmplt->token->value is in py_schedule. py_schedule calls _lf_initialize_token_with_value(trigger->tmplt, value, 1), which calls _lf_get_token(trigger->tmplt). In _lf_get_token(trigger->tmplt), the trigger->tmplt->token is replaced with a new token. Later, this trigger->tmplt->token is retrieved in convert_C_action_to_py and causes the bug.

Consider the following sequence of events: _lf_get_token in sets the trigger->tmplt->token->value to 100, another _lf_get_token in sets the trigger->tmplt->token->value to 101, convert_C_action_to_py gets the trigger->tmplt->token->value which is 101, causing the concurrency bug. However, adding a critical section for _lf_get_token will not solve the bug because the bug is in a larger scope.

The solution was to use action->token->value instead --- the action->token->value has been correctly maintained without this concurrency bug.

@Depetrol
Copy link
Collaborator Author

Also huge thanks to @edwardalee and @erlingrj for helping me look into this!

@Depetrol
Copy link
Collaborator Author

Not sure why the windows tests are failing -- this seems unrelated to the change in this PR since tests are also failing in the main branch.

Copy link
Contributor

@edwardalee edwardalee left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Wow, great catch! Indeed, using the template token is not safe, so this looks like the right fix to me!

Copy link
Collaborator

@erlingrj erlingrj left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Great find Shulu! I think we can safely merge this and address Windows issues separately

@Depetrol
Copy link
Collaborator Author

reactor-c requires that all tests pass before merging, so we might have to wait for lf-lang/lingua-franca#2424 to me merged.

@Depetrol
Copy link
Collaborator Author

There is another concurrency bug that is triggered probabilistically with the newly introduced test ConcurrentAction.lf. Running the test 20 times could trigger the bug quite consistently.

@Depetrol Depetrol force-pushed the fix-concurrency branch 2 times, most recently from 858885a to 0031be2 Compare October 17, 2024 03:00
@Depetrol Depetrol added this pull request to the merge queue Oct 17, 2024
Merged via the queue into main with commit c4764fb Oct 17, 2024
69 of 90 checks passed
@Depetrol
Copy link
Collaborator Author

Since the newly introduced test in lf-lang/lingua-franca#2423 has been marked as failing in lf-lang/lingua-franca#2422, the checks are passing and automatic merge was triggered. I'll start another PR for the other concurrency bug.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants