-
Notifications
You must be signed in to change notification settings - Fork 50
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Support for GNU Make jobserver #94
base: master
Are you sure you want to change the base?
Conversation
This looks really good, thanks for working on it.
Can you elaborate? What happens if
Can we install an |
Old GNU make uses After some testing and a cursory reading of the source code, it seems that, other than that, it's the same.
Good idea, just implemented it and tokens are returned after |
@michaelforney are you able to take a look? thank you! |
@thesamesam There are several outstanding comments that still need to be resolved. |
What remains to be done here? Sorry for the drive-by comment, but I've been looking for a ninja alternative with this feature to use with our package manager, which allows doing a build of an entire tree of packages under a single jobserver. If I could help out a bit, I'd be happy to. One note, this line: https://github.com/michaelforney/samurai/pull/94/files#diff-99b168f07bbb2d0e68bfe3cd8378543a6becd5989922c5283e499d1f69879b42R324 On a related note, would you like to have jobserver server support here as well? |
See the unresolved comments. In particular, the one about polling. There should be a single poll in the event loop. I pushed my current working tree to the jobserver branch. I don't remember the state, but if someone else wants to help out with this, that'd be a good place to start. Edit: on second thought, maybe not. It seems that my local changes don't even compile. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Well, turns out that the comments I thought I made were never submitted. My apologies.
samu.c
Outdated
arg = strtok(env, " "); | ||
/* first word might contain dry run */ | ||
if (arg && strchr(arg, 'n')) | ||
buildopts.dryrun = true; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can you explain this? As far as I can tell, the first word is supposed to be the single-character options, right? I see that by default it is just w
. However, if I run make --no-print-directory
, it is empty with just a blank space, so I think we might accidentally skip over the jobserver flags.
Another question: how is samu running in the first place if make
is in dry run mode?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That does seem to be what the first word is for, though the protocol documentation only goes as far as saying "check the first word for the letter n
". Thanks for catching the oversight.
As for the second question, GNU Make (version 4.4.1 my system) invokes jobserver clients even in dry run mode.
samu.c
Outdated
/* assert valid fds */ | ||
check[0].fd = read_end; | ||
check[1].fd = write_end; | ||
if (poll(check, 2, 0) == -1 || check[0].revents & POLLNVAL || check[1].revents & POLLNVAL) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't think we should do a validity check like this. If the fds are invalid that's the caller's problem, and I'd like to minimize the POSIX usage as much as possible.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The documentation recommends doing the check. As it's there mostly to address a PEBKAC (user forgot a +
in the Makefile), I do understand if you want to drop it.
One question I have, should a jobserver client request a token for every job it starts and release it when its done? Or is it fine if samurai takes as many jobs as it can use, and release them only when it has extra tokens that it can't use? I don't think it would make a difference, except maybe collective ordering of the jobs run. |
If it's going to immediately reuse those tokens then I think it's fine. |
build.c
Outdated
@@ -610,6 +672,13 @@ build(void) | |||
next = i; | |||
if (jobs[i].failed) | |||
++numfail; | |||
/* must return the GNU/tokens once a job is done */ | |||
if (buildopts.gmakepipe[1] > 0 && gmakelatest != gmaketokens) { | |||
if (write(buildopts.gmakepipe[1], --gmakelatest, 1) < 0) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm not sure if it matters, but this might not be the same job token that we acquired to start this job. Maybe we should track the job token in the job struct?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't think this is necessary, it's not like make
knows how exactly the job slots are used by the client. In fact, despite documentation saying "don’t assume that all tokens are the same character", in my testing they usually are.
samu.c
Outdated
errno = EBADF; | ||
} else if (strncmp(flag, "fifo:", 5) == 0) { | ||
rfd = open(flag + 5, O_RDONLY); | ||
wfd = open(flag + 5, O_WRONLY); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You should just open it once as O_RDWR
and set both rfd
and wfd
to the same value.
Also, we should print a message if this happens.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Print a message whenever a FIFO is used?
build.c
Outdated
@@ -319,6 +327,15 @@ jobstart(struct job *j, struct edge *e) | |||
warn("posix_spawn_file_actions_addclose:"); | |||
goto err3; | |||
} | |||
if (isjobserverclient()) { | |||
/* do not allow children to steal GNU/tokens */ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
As @trws asked, do we need to prevent children from interacting with the job server?
If we do, I think it'd be better to set FD_CLOEXEC
on the fds than closing them here.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't remember why I thought of that in the first place (nowhere in the documentation is this recommended).
After testing for a bit, it seems GNU Make uses the same --jobserver-auth
parameters for every client, so it's fine for samu
to share the jobserver with its own clients.
fix return type for reachedload avoid needlessly changing errno allow children to take tokens use only one fd with 'fifo:' authentication type fix |=/&= mixup
My implementation was completely different and I have now refreshed it. It used a separate thread to read from the GNU Make pipe, and a local pipe between the thread and |
The main issue with this implementation is that if you have
then samu will take all 26 tokens and block other users of the jobserver pipe. Instead, tokens must be taken only when jobs are started. |
@bonzini I haven't looked at the implementation at all. But, based on the previous review comment:
I had been under the assumption that in the case you describe, samu would relinquish 23 tokens as soon as it detects it is blocked on a/b/c and can't start d. |
The only write system call that I see in the
so I don't think it does, though of course I might be wrong. |
} | ||
} | ||
if (fds[0].revents & POLLIN && isjobserverclient() && nstarted < ntotal && !reachedload()) { | ||
gmakeread = read(buildopts.gmakepipe[0], gmakelatest, buildopts.maxjobs - numjobs); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There is a race condition here:
poll
returns POLLIN- another process steals the token that
poll
detected read
blocks- one or more children of samurai finish
- the whole build cannot reuse the token(s) currently owned by samurai
The solution used by GNU Make is complex and relies on trapping SIGCHLD and dup-ing the file descriptor. It's complicated and it's why I just went for reading the jobserver file descriptor in a separate thread instead.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm not even sure GNU Make's solution is 100% robust without an extra (void)alarm(1);
, or only starting a new process as an absolute last resort.
I really have to rethink this implementation.
I'm starting to think of an in-between approach of having the job spawn a helper program that waits for a token, feeds it back to samurai through a second pipe (so it's stored in the associated Once I have more time I might rework this PR into this design, but I'd like for feedback. |
That seems similar to what I am doing, just with a process instead of a thread. You could also use fork() to do the wait, instead of using a helper program. However, there would probably be more complications in returning the tokens on a call to As an aside, it' s ok not to return the tokens on SIGINT/SIGTERM. I tried using a Makefile like
Then invoking
So it's not a huge deal. But if you want to do it, a better way is to enhance samurai so that it traps SIGTERM (#106), killing all the children in response. The basic idea would be as simple as
where the killall variable is set by the signal handler. By terminating all jobs cleanly, the tokens would also be returned. |
Quick and dirty implementation of #71, following the GNU Make 4.3 manual (Section 13.1.
TODO:
-l
and-j
are ignored when jobserver options are active, though CMake team's proposed patch for Ninja instead ignores jobserver options whenever -j is given in the command line. Should we follow them?fifo:
protocol mentioned on Support make jobserver protocol #71.fatal
and installing signal handlers.