-
Notifications
You must be signed in to change notification settings - Fork 986
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
ChezScheme startup time #263
Comments
You are comparing the python and ruby interpreter with the chez scheme compiler. Interpreters have generally a faster startup time. Try it with the chez scheme interpreter : |
It is worth noting that the When you start Loading the boot files consists of uncompressing, reading the fasl-format binary, and executing the expressions in the boot file, which set-up the top-level environment that Chez Scheme programs (including the REPL) execute from. In the case of So, that is what it is doing during that first ~0.10s to ~0.20s. Could it be faster? Possibly. We could break the boot file into smaller pieces. For instance, the All of that said, for me |
I was playing around with this, and it seems like the majority of the startup time is spent inflating the boot files. I added some logging around parts of the load process, and using already inflated boot files I reduced the load time a fair bit:
I wonder if more efficient use of zlib is possible? |
Very informative! Seems like the decision about whether to optimize for space or startup speed should be left up to the installation, though. The effect of the file size on startup speed is going to vary widely depending on whether the files are being read from SSD, spinning disk, slow network, etc. Maybe there should be an option to |
@gwatt Did you manage to publish somewhere the non-compressed boot file handling? @akeep Otherwise, for me too startup time is very important, because my intended use-case for Scheme (and thus Chez) is to replace small During my own benchmarks, just loading Granted once you actually do stuff, then Python (and other interpreted languages) might take some extra time to parse and compile the code, thus perhaps they'll get even in terms of startup latency. But still, in case of Chez, there is no parsing and compiling step (when loading a (For the record, on my 5(?) year old Lenovo T450 it takes around ~50ms which is noticeable. Now imagine a build tool that invokes the same script multiple times in a loop, those 50ms easily become 5 seconds...) Also granted that perhaps Chez tries to optimize for long-running processes and thus low-latency startup is not a problem. (For example Java, and I assume most JVM-based languages, have very high startup latency. On the other hand I've seen that even Go doesn't optimize for startup latency, and once one has enough dependencies that run initialization code, it can become quite sluggish...) |
When compiled, startup time becomes better, here is with full chez build timings: sudoscheme % cat exit.scm
(import (chezscheme))
(exit)
% ruse-exe --optimize-level=3 exit.scm
% time ./a.out
real 0m0,048s
user 0m0,020s
sys 0m0,028s Maybe with a petite scheme binary, it is even faster, but that is not supported by ruse-exe as of yet. ruse-exe is available at https://github.com/scheme-ruse/ruse-exe/ I guess you know, but if startup time is a problem, you can demonize the program and call the daemon. Hope this helps. |
Disregard my previous message. If there was a startup time that was big-ish, there is no such thing at this time: % echo '(exit)' | time scheme -q
0.03user 0.01system 0:00.04elapsed 100%CPU (0avgtext+0avgdata 49292maxresident)k
0inputs+0outputs (0major+11881minor)pagefaults 0swaps
% echo '(exit)' | time petite -q
0.02user 0.00system 0:00.02elapsed 100%CPU (0avgtext+0avgdata 29536maxresident)k
0inputs+0outputs (0major+6867minor)pagefaults 0swaps That is done with the latest stable release:
|
Well at least on my side things are different:
(I use OpenSUSE Tumbleweed, I've installed Chez Scheme from https://software.opensuse.org/package/chezscheme, the Here are a few tests (granted these are on a "cold" CPU, thus not significant):
Here are a few tests by using hyperfine (I use a custom patch that removes the shell invocation, i.e. the
I've also tried using
I've also tried using the
For comparison I've tried
(I don't have RPM packages for Gauche and Chibi for OpenSUSE.) Then Python2 and Python3:
Then Ruby2.7 and Ruby3.0:
As said I don't believe that the "empty startup latency" is a proper measure because it doesn't actually do anything. For example in Python once you start loading some libraries (including those part of the standard distribution) things get significantly slower. However the "empty startup latency" is the minimum latency you'll get, and no kind of optimization can make it faster than that... (For example in Python, with some meta-programming tricks, I've managed to lazy-load libraries thus reducing startup time for short scripts.)
(Here I'm talking about my example use-case of Scheme scripts being called from within a build system that calls many of these small scripts multiple times, each time with slightly different arguments.) Yes, that would be an option (of using a daemon), but this would defeat the purpose of using a build system like My hope would have been that once one creates a |
(maybe a script calling another script all-the-way-down is the problem) |
I am not very familiar with Chez Scheme direct, and dug into this a bit in the context of Racket (v8.6) on Linux. It seems like the fasl loading results in direct For example, running this, where simple.rkt is just a file with strace -c racket simple.rkt
% time seconds usecs/call calls errors syscall
------ ----------- ----------- --------- --------- ----------------
81.41 0.020731 10 1890 read
14.51 0.003696 1 3367 lseek
1.92 0.000488 1 339 75 newfstatat
0.73 0.000186 1 96 openat
0.64 0.000164 2 78 mmap
0.37 0.000094 0 96 close
0.20 0.000051 5 9 munmap
0.07 0.000019 0 50 clock_gettime
0.07 0.000018 2 9 8 readlink
0.07 0.000017 1 15 epoll_wait
0.01 0.000002 2 1 futex
0.00 0.000000 0 9 mprotect
0.00 0.000000 0 3 brk
0.00 0.000000 0 22 rt_sigaction
0.00 0.000000 0 1 rt_sigprocmask
0.00 0.000000 0 1 ioctl
0.00 0.000000 0 4 pread64
0.00 0.000000 0 1 1 access
0.00 0.000000 0 1 execve
0.00 0.000000 0 2 fcntl
0.00 0.000000 0 1 getcwd
0.00 0.000000 0 2 1 arch_prctl
0.00 0.000000 0 1 epoll_create
0.00 0.000000 0 1 set_tid_address
0.00 0.000000 0 1 set_robust_list
0.00 0.000000 0 1 pipe2
0.00 0.000000 0 1 prlimit64
0.00 0.000000 0 1 getrandom
0.00 0.000000 0 1 rseq
------ ----------- ----------- --------- --------- ----------------
100.00 0.025466 4 6004 85 total > hyperfine -w 100 -r 100 'racket simple.rkt'
Benchmark 1: racket simple.rkt
Time (mean ± σ): 88.1 ms ± 13.1 ms [User: 59.2 ms, System: 28.8 ms]
Range (min … max): 75.2 ms … 111.9 ms 100 runs So about 20ms out of 88 are spent in >1000 |
Some notes on startup times currently, starting with x86_64 Ubuntu 22.04:
On an M2 MacBook running macOS Ventura:
The biggest difference here compared to the original post was the switch from zlib to LZ4 as the default compression for bootfiles. LZ4 is much faster to decompress. When even faster startup is needed, there's now "vfasl" format that can be used for boot files (this is on macOS):
If startup time is important enough to use vfasl, probably you want to disable compression, too:
If loading only "petite":
Using just "petite" with uncompressed vfasl files:
@nikhilm notes that the
#769 improves buffering to reduce the
This improvement has very little effect on startup time, but it moves suspicious behavior out of the way for anyone who wants to investigate more. |
I have a local racket source checkout. How would I built a |
@nikhilm On most platforms, uncompressed vfasl is the default mode for Racket boot files. You can configure with |
Chez scheme is a real pleasure to use, I'm especially fond of the expression editor on the command line for some interactive coding. But I have one question — I understand that chez is known for being quite efficient, why does it take so long to start up?
Is this possibly a bug?
The text was updated successfully, but these errors were encountered: