-
Notifications
You must be signed in to change notification settings - Fork 0
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add test for cold start #9
Conversation
@@ -23,6 +23,6 @@ static public Variation selectVariation(String inputKey, int subjectShards, List | |||
} | |||
|
|||
static public double variationProbability(Variation variation, int subjectShards) { | |||
return (double)(variation.getShardRange().end - variation.getShardRange().start + 1) / subjectShards; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This math was wrong because the end of the range is exclusive.
This was caught by the unit test finding 0.3335 probability -- yay unit tests!
@@ -287,7 +287,7 @@ public void testBanditColdStartAction() { | |||
|
|||
// Attempt to get a bandit assignment | |||
Optional<String> stringAssignment = EppoClient.getInstance().getStringAssignment( | |||
"subject2", | |||
"subject1", |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
GitHub diff is a bit confused as the old "ColdStart" test was actually for the "Uninitialized" case. So that one has been adjusted, and this one added. Probably easiest to just look at the green and not look at the red part of this.
assertEquals(0.3333, capturedBanditLog.actionProbability, 0.0002); | ||
assertEquals("falcon cold start", capturedBanditLog.modelVersion); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
☝️ key part of this test: action was selected with 1/3 probability and logged as a cold start
} | ||
|
||
@Test | ||
public void testBanditUninitializedAction() { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is the old "ColdStart" test renamed.
verify(mockAssignmentLogger, times(1)).logAssignment(assignmentLogCaptor.capture()); | ||
AssignmentLogData capturedAssignmentLog = assignmentLogCaptor.getValue(); | ||
assertEquals("uninitialized-bandit-experiment-bandit", capturedAssignmentLog.experiment); | ||
assertEquals("uninitialized-bandit-experiment", capturedAssignmentLog.featureFlag); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
renamed the bandit key to be very obvious what is happening
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Approving to unblock, I'm on mobile, if you feel confident merging.
{ | ||
"name": "bandit", | ||
"value": "this-bandit-does-not-exist", | ||
"typedValue": "this-bandit-does-not-exist", | ||
"shardRange": { | ||
"start": 2000, | ||
"end": 10000 | ||
}, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm a bit confused what is being tested using this. IIUC, it's "a flag that will contain a bandit has been created, but the bandit itself hasn't been created". That feels like a weird edge case, especially with our creation flow.
A more important case is when the SDK attempts to get a bandit action for which no bandit exists, the flag that contains it doesn't exist, etc. It's just not in the RAC at all. Are we testing that case? (Please let me know if this question doesn't make sense – I might be misunderstanding the whole architecture.)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We had that test already--I've been calling it an "uninitialized bandit". (source)
This newly added "cold start" test is making sure we handle the more likely case that a user creates a bandit in our UI so the bandit exists, but it hasn't been trained yet.
Eppo Internal: 🎟️ **Ticket: ** FF-1599 - Add test to Bandits Beta SDK for cold start (no coefficients) vs uninitialized (no entry at all)
When working on generating the bandit RAC (see this comment in
Eppo
PR #8785) we realized there are two different situations where a bandit would select a random action:If our UI/UX works as expected, the first one--an uninitialized bandit--shouldn't happen. However, we still want the SDK to handle the case should it arrive.
This PR renamed the test of an uninitialized bandit to better reflect what is happening, and adds a new test for a cold-start bandit.