Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

uucore: add common splice-write functionality #6964

Merged
merged 1 commit into from
Dec 19, 2024

Conversation

DaringCuteSeal
Copy link
Contributor

Splice is a Linux-specific syscall that allows direct data copying from one file descriptor to another without user-space buffer. This can be used by several utilities to perform optimized copy under Linux and Android.

/// The `bool` in the result value indicates if we need to fall back to normal
/// copying or not. False means we don't have to.
#[inline]
pub fn write_fast_using_splice<R: Read + AsFd + AsRawFd, S: AsRawFd + AsFd>(
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

would it be possible to add unit tests in this file ? thanks

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

oh yeah

#[cfg(any(target_os = "linux", target_os = "android"))]
use crate::pipes::{pipe, splice, splice_exact};

const SPLICE_SIZE: usize = 1024 * 128;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

please document these "magic numbers"

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

damn not the author of this 😄 hmm

Copy link
Contributor

@sylvestre sylvestre Dec 17, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

from here src/uu/cat/src/splice.rs ?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yeah. it's taken from there

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ok, leave it like this then :)

I will have a look

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

sure sure


/// Move exactly `num_bytes` bytes from `read_fd` to `write_fd`.
///
/// Panics if not enough bytes can be read.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

not sure panic is the right thing to do here.
we probably want to return an error

pub fn write_fast_using_splice<R: Read + AsFd + AsRawFd, S: AsRawFd + AsFd>(
handle: &R,
write_fd: &S,
) -> nix::Result<(usize, bool)> {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Copy link

GNU testsuite comparison:

Skip an intermittent issue tests/timeout/timeout (fails in this run but passes in the 'main' branch)
Skipping an intermittent issue tests/tail/inotify-dir-recreate (passes in this run but fails in the 'main' branch)

@DaringCuteSeal
Copy link
Contributor Author

huh, i just realized that std::io::copy already uses splice when available, as documented by Rust:

On Linux (including Android), this function uses copy_file_range(2), sendfile(2) or splice(2) syscalls to move data directly between file descriptors if possible.

it is, however, subject to change. should we keep the reimplementation?

@sylvestre
Copy link
Contributor

Interesting! did you do some benchmarking ?

@DaringCuteSeal
Copy link
Contributor Author

nope, not yet. that's sounds interesting to do, i will see which one performs better

@DaringCuteSeal
Copy link
Contributor Author

DaringCuteSeal commented Dec 18, 2024

it seems like the reimplementation using splice wins here, at least on my linux machine

i did some tests on install with hyperfine to write from /dev/zero to a file:

dd if=/dev/zero of=/dev/stdout bs=1M count=1000 | ./install /dev/stdin "output_file"

Reimplementation, 100 runs:

Command Mean [s] Min [s] Max [s] Relative
bash ./test_reimp.sh 1.324 ± 0.500 0.828 3.253 1.00

using io::copy, 100 runs:

Command Mean [s] Min [s] Max [s] Relative
bash ./test_iocopy.sh 1.560 ± 0.467 1.173 3.427 1.00

could be related to the chunk size

@sylvestre
Copy link
Contributor

you should use hyperfine for benchmarking :)

@DaringCuteSeal
Copy link
Contributor Author

Oh yeah i did, or.. i'm supposed to just pass the commands to hyperfine directly?

@DaringCuteSeal
Copy link
Contributor Author

tried more

reimplementation, 1G chunk, /dev/zero to file (100 runs):

Command Mean [s] Min [s] Max [s] Relative
/bin/dd if=/dev/zero of=/dev/stdout bs=1G count=1 | ./install /dev/stdin "output_file" 1.413 ± 0.550 0.842 3.417 1.00

io::copy, 1G chunks, /dev/zero to file (100 runs)

Command Mean [s] Min [s] Max [s] Relative
/bin/dd if=/dev/zero of=/dev/stdout bs=1G count=1 | ./install-copy /dev/stdin "output_file" 1.663 ± 0.452 1.198 3.619 1.00

reimplementation, 1G chunk, file to file (100 runs):

Command Mean [s] Min [s] Max [s] Relative
/bin/dd if=test of=/dev/stdout bs=1G count=1 | ./install /dev/stdin "output" 1.146 ± 0.291 0.819 2.394 1.00

io::copy, 1G chunk size, file to file (100 runs):

Command Mean [s] Min [s] Max [s] Relative
/bin/dd if=test of=/dev/stdout bs=1G count=1 | ./install-copy /dev/stdin "output" 1.634 ± 0.527 1.173 4.430 1.00

i took a look at the implementation of io::copy, seems like they used 0x7ffff000 (2,147,479,552 bytes) as the chunk size which is already the maximum.

set that to my splice and..

Command Mean [s] Min [s] Max [s] Relative
/bin/dd if=test of=/dev/stdout bs=1G count=1 | ./install /dev/stdin "output" 1.207 ± 0.465 0.809 3.685 1.00

it's a biiit slower. But still faster than using io::copy.

i traced the syscalls made for both and both seem to be using splice similarly. i have no idea why our splice copy is faster, i may wanna dig more.

@DaringCuteSeal DaringCuteSeal force-pushed the splice-uucore branch 3 times, most recently from 2bf5075 to 0d92ff6 Compare December 18, 2024 12:11
Copy link

GNU testsuite comparison:

Skip an intermittent issue tests/timeout/timeout (fails in this run but passes in the 'main' branch)

@sylvestre
Copy link
Contributor

could you please properly rebase your patches? the diff is showing unrelated changes
thanks

@DaringCuteSeal
Copy link
Contributor Author

yeah yeah idk why i messed up the branch

    Splice is a Linux-specific syscall that allows direct data copying from
    one file descriptor to another without user-space buffer. As of now, this
    is used by `cp`, `cat`, and `install` when compiled for Linux and Android.
@DaringCuteSeal
Copy link
Contributor Author

ok there

Copy link

GNU testsuite comparison:

Skipping an intermittent issue tests/tail/inotify-dir-recreate (passes in this run but fails in the 'main' branch)
Skipping an intermittent issue tests/timeout/timeout (passes in this run but fails in the 'main' branch)

@sylvestre sylvestre merged commit 4341ae3 into uutils:main Dec 19, 2024
62 checks passed
@sylvestre
Copy link
Contributor

thanks

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants