ci: use pl self-hosted runners for test #1381

galargh · 2023-08-22T14:12:36Z

This PR puts test and coverage jobs on PL self-hosted runners.

On c5.2xlarge machines, the workflow finishes in 14m 26s. On c5.4xlarge it's 13m 22s. If you want, I could get you a rough breakdown of the difference in $.

This PR doesn't look into how we could start using caching again to speed up builds. We'll look into that too, as soon as we have some free time on our hands (one can hope).

For comparison, the most recent run on master took 28m 46s.

codecov-commenter · 2023-08-22T15:05:24Z

Codecov Report

Merging #1381 (a536001) into master (8076305) will increase coverage by 0.04%.
The diff coverage is n/a.

Additional details and impacted files

@@            Coverage Diff             @@
##           master    #1381      +/-   ##
==========================================
+ Coverage   91.06%   91.11%   +0.04%     
==========================================
  Files         145      145              
  Lines       27397    27397              
==========================================
+ Hits        24950    24963      +13     
+ Misses       2447     2434      -13

see 5 files with indirect coverage changes

anorth

I don't understand what the fromJSON syntax is doing, but the result here looks great. Thank you!

galargh · 2023-09-04T09:26:20Z

I don't understand what the fromJSON syntax is doing

Oh yeah, that's far from obvious. We could potentially simplify it here, too. Let me break it down.

runs-on: ${{ fromJSON(github.repository == 'filecoin-project/builtin-actors' && '["self-hosted", "linux", "x64", "4xlarge"]' || '"ubuntu-latest"') }}

The above means that if this code is evaluated in the context of this repository (as opposed to a fork), we turn ["self-hosted", "linux", "x64", "4xlarge"] string into an array (of labels) that runs-on can accept. Otherwise, we turn the "ubuntu-latest" into ubuntu-latest string, which runs-on can accept too.

An alternative form of writing this, perhaps a clearer one, would be something like this.

runs-on: ${{ github.repository == 'filecoin-project/builtin-actors' && fromJSON('["self-hosted", "linux", "x64", "4xlarge"]') || 'ubuntu-latest' }}

This shows the real reason why we're using fromJSON in the first place i.e. GitHub expressions do support array operations on arrays, but they have no syntax for defining static arrays 🤷

The reason why I used the first version is that I copied it from somewhere where the default (the part after ||) is not a static string but a value sourced from configuration variables (which can either be an array or a string).

Stebalien · 2023-09-06T19:23:01Z

I would be interested in a rough cost difference. When we first reported this issue, testing was taking an hour. At the moment, the "important" test jobs (build, test, clippy) run in about 20m, which is "ok" but not "great".

Looking at these runs, the "14m" (c5.2xlarge) is more like "10m" given that we can merge once tests finish (don't have to wait for coverage. So that's much better.

But... given that we're already down to ~20m, I want to make sure this isn't going to cost too much.

anorth · 2023-10-19T18:43:49Z

If you want, I could get you a rough breakdown of the difference in $

@galargh I saw elsewhere you suggested the difference was small. Could you quantify that a bit better for us?

galargh · 2023-10-20T14:45:57Z

If you want, I could get you a rough breakdown of the difference in $

@galargh I saw elsewhere you suggested the difference was small. Could you quantify that a bit better for us?

Yes, sorry for the delay. I'll be getting to you with the numbers either today or Monday.

galargh · 2023-10-25T13:06:32Z

Hi, sorry for the delay again, here are the stats I put together.

In the last 30 days, we had:

153 executions of Continuous Integration / test job; each took 20 minutes on average (Grafana View)
149 executions of Continueous Integration / coverage job; each took 25 minutes 24 seconds (Grafana View)

If we transition to 4xlarege self-hosted runners, we can expect:

Continuous Integration / test job to take 7 minutes 40 seconds which should incur monthly cost of $26.81 (AWS Estimate)
Continuous Integration / coverage job to take 12 minutes 20 seconds which should incur monthly cost of $45.38 (AWS Estimate)

If we transition to 2xlarge self-hosted runners, we can expect:

Continuous Integration / test job to take 9 minutes 25 seconds which should incur monthly cost of $21.30 (AWS Estimate)
Continuous Integration / coverage job to take 13 minutes 25 seconds which should incur monthly cost of $33.23 (AWS Estimate)

All in all, if we go with 4xlarge, we're looking at $72.19 monthly cost which is going to save us 53 hours 50 minutes of runtime a month overall. With 2xlarge, it's $53.53 for 46 hours 6 minutes worth of savings. Of course, we don't have to stick to the same type of runners for each job either. If you wanted to, we could also explore other runner options (these are the ones we currently have but adding new ones is not a problem).

Assumptions made when creating the estimates:

a self-hosted runner can take up to 2 minutes to be brought up
there is 15% shared infrastructure cost
Continuous Integration / test doesn't do significant outbound data transfer
Continuous Integration / coverage uploads coverage ~200M reports to GitHub archive and codecov.io

cc @laurentsenta (you were interested in how self-hosted runners comparison reports look like)

Stebalien · 2023-10-25T15:23:03Z

This definitely seems worth it.
It sounds like the 2xlarge runners are sufficient.
IMO, we should just drop test coverage, or make it manual. It's not particularly reliable anyways and not worth the cost.

Thoughts @anorth?

anorth · 2023-10-25T18:23:23Z

Yes
I'm inclined to spend the extra $ for lower latency
Yes I'm ok going to manual triggering for test coverage. It's a bit awkward to find in GH UI, but I agree it's of limited utility at the moment

anorth · 2023-10-25T18:39:46Z

I've moved coverage out in #1465. I think that means this PR is good to go if we use 4xlarge.

galargh · 2023-10-26T17:51:06Z

That's awesome to hear :) I've rebased the PR now. Let me know if you want to change to 2x runners instead. Personally, I think I'd stay on 4x too since it's only $5 and we cut over a minute off of each build.

Stebalien · 2023-10-26T18:39:22Z

Nah, $5 a month is worth it.

Stebalien · 2023-10-26T18:40:15Z

@anorth I'll leave this to you to merge.

ci: use pl self-hosted runners for test

938f9b1

galargh added 3 commits September 2, 2023 16:38

ci: install rust on self-hosted

d8ecf2a

ci: run coverage on self-hosted

a3ef136

ci: use bigger self-hosted runners

a878060

galargh requested review from anorth and Stebalien September 3, 2023 10:33

galargh marked this pull request as ready for review September 3, 2023 10:34

Merge branch 'master' into galargh-patch-1

a536001

anorth approved these changes Sep 3, 2023

View reviewed changes

anorth mentioned this pull request Oct 25, 2023

Move coverage CI action to manual trigger #1465

Merged

Merge branch 'master' into galargh-patch-1

0fde68b

Stebalien approved these changes Oct 26, 2023

View reviewed changes

anorth added this pull request to the merge queue Oct 26, 2023

Merged via the queue into master with commit 0f202bb Oct 26, 2023
12 checks passed

anorth deleted the galargh-patch-1 branch October 26, 2023 20:03

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

ci: use pl self-hosted runners for test #1381

ci: use pl self-hosted runners for test #1381

galargh commented Aug 22, 2023 •

edited

Loading

codecov-commenter commented Aug 22, 2023 •

edited

Loading

anorth left a comment

galargh commented Sep 4, 2023

Stebalien commented Sep 6, 2023

anorth commented Oct 19, 2023

galargh commented Oct 20, 2023

galargh commented Oct 25, 2023

Stebalien commented Oct 25, 2023

anorth commented Oct 25, 2023

anorth commented Oct 25, 2023

galargh commented Oct 26, 2023

Stebalien commented Oct 26, 2023

Stebalien commented Oct 26, 2023

ci: use pl self-hosted runners for test #1381

ci: use pl self-hosted runners for test #1381

Conversation

galargh commented Aug 22, 2023 • edited Loading

codecov-commenter commented Aug 22, 2023 • edited Loading

Codecov Report

anorth left a comment

Choose a reason for hiding this comment

galargh commented Sep 4, 2023

Stebalien commented Sep 6, 2023

anorth commented Oct 19, 2023

galargh commented Oct 20, 2023

galargh commented Oct 25, 2023

Stebalien commented Oct 25, 2023

anorth commented Oct 25, 2023

anorth commented Oct 25, 2023

galargh commented Oct 26, 2023

Stebalien commented Oct 26, 2023

Stebalien commented Oct 26, 2023

galargh commented Aug 22, 2023 •

edited

Loading

codecov-commenter commented Aug 22, 2023 •

edited

Loading