You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
I don't think this is an immediate concern, but once there is some momentum in the community I think it could make sense to have an official process and recommendation for verifying improvements and regressions?
I'm thinking something along the lines
Automated testing against a static stash of images with known outcomes
Manual QA testing checklist instructions for IRL swing testing accompanied by non-interfering third-party LMs
Automated performance benchmarks
I imagine this could create a powerful feedback loop for community QA contribution if the process is laid out in an easy-to-follow formula.
The text was updated successfully, but these errors were encountered:
Agreed - and there's a start of the code to do this, but so much has
changed lately that I need to rebuild all the tests with fresh images and
expected results data.
On Mon, Dec 30, 2024 at 8:38 AM Ben Halpern ***@***.***> wrote:
I don't think this is an immediate concern, but once there is some
momentum in the community I think it could make sense to have an official
process and recommendation for verifying improvements and regressions?
I'm thinking something along the lines
1. Automated testing against a static stash of images with known
outcomes
2. Manual QA testing checklist instructions for IRL swing testing
accompanied by non-interfering third-party LMs
3. Automated performance benchmarks
I imagine this could create a powerful feedback loop for community QA
contribution if the process is laid out in an easy-to-follow formula.
—
Reply to this email directly, view it on GitHub
<#17>, or unsubscribe
<https://github.com/notifications/unsubscribe-auth/A46RLR3XMB3EWZB7J3ID7N32IFSFZAVCNFSM6AAAAABUMG4ZKWVHI2DSMVQWIX3LMV43ASLTON2WKOZSG43DGMZXGY3TOMA>
.
You are receiving this because you are subscribed to this thread.Message
ID: ***@***.***>
I can help setting up some jobs, unit test validation, pre-build, etc. I just need to know if we're using [ubuntu-latest] as the runner? Also, permissions to run jobs (read only)? I pay around $4 (per seat) for a team license and able to use 3,000 minutes of Action runners.
I don't think this is an immediate concern, but once there is some momentum in the community I think it could make sense to have an official process and recommendation for verifying improvements and regressions?
I'm thinking something along the lines
I imagine this could create a powerful feedback loop for community QA contribution if the process is laid out in an easy-to-follow formula.
The text was updated successfully, but these errors were encountered: