Accuracy benchmarks and official QA process? #17

benhalpern · 2024-12-30T15:37:42Z

I don't think this is an immediate concern, but once there is some momentum in the community I think it could make sense to have an official process and recommendation for verifying improvements and regressions?

I'm thinking something along the lines

Automated testing against a static stash of images with known outcomes
Manual QA testing checklist instructions for IRL swing testing accompanied by non-interfering third-party LMs
Automated performance benchmarks

I imagine this could create a powerful feedback loop for community QA contribution if the process is laid out in an easy-to-follow formula.

jamespilgrim · 2025-01-01T20:39:04Z

Agreed - and there's a start of the code to do this, but so much has changed lately that I need to rebuild all the tests with fresh images and expected results data.

…

On Mon, Dec 30, 2024 at 8:38 AM Ben Halpern ***@***.***> wrote: I don't think this is an immediate concern, but once there is some momentum in the community I think it could make sense to have an official process and recommendation for verifying improvements and regressions? I'm thinking something along the lines 1. Automated testing against a static stash of images with known outcomes 2. Manual QA testing checklist instructions for IRL swing testing accompanied by non-interfering third-party LMs 3. Automated performance benchmarks I imagine this could create a powerful feedback loop for community QA contribution if the process is laid out in an easy-to-follow formula. — Reply to this email directly, view it on GitHub <#17>, or unsubscribe <https://github.com/notifications/unsubscribe-auth/A46RLR3XMB3EWZB7J3ID7N32IFSFZAVCNFSM6AAAAABUMG4ZKWVHI2DSMVQWIX3LMV43ASLTON2WKOZSG43DGMZXGY3TOMA> . You are receiving this because you are subscribed to this thread.Message ID: ***@***.***>

jeshernandez · 2025-01-03T02:57:15Z

I can help setting up some jobs, unit test validation, pre-build, etc. I just need to know if we're using [ubuntu-latest] as the runner? Also, permissions to run jobs (read only)? I pay around $4 (per seat) for a team license and able to use 3,000 minutes of Action runners.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Accuracy benchmarks and official QA process? #17

Accuracy benchmarks and official QA process? #17

benhalpern commented Dec 30, 2024

jamespilgrim commented Jan 1, 2025 via email

jeshernandez commented Jan 3, 2025

Accuracy benchmarks and official QA process? #17

Accuracy benchmarks and official QA process? #17

Comments

benhalpern commented Dec 30, 2024

jamespilgrim commented Jan 1, 2025 via email

jeshernandez commented Jan 3, 2025