Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Pre-built release binaries? #379

Open
nightlark opened this issue Nov 2, 2021 · 26 comments
Open

Pre-built release binaries? #379

nightlark opened this issue Nov 2, 2021 · 26 comments

Comments

@nightlark
Copy link
Contributor

Would there be interest in having a CI workflow that builds (statically linked) Windows/macOS/Linux binaries for re2c/re2go and uploads them when a GitHub release is made?

The re2c binary seems like the main way of using re2c, so having a central place to grab the latest release binaries for common platforms would be nice instead of waiting for some of the update cycles for package managers (older Ubuntu releases being stuck on old versions of re2c, chocolatey package for Window being a few years outdated, etc).

@nightlark nightlark changed the title Pre-built release binaries Pre-built release binaries? Nov 2, 2021
@skvadrik
Copy link
Owner

skvadrik commented Nov 2, 2021

Agreed, that would be useful. I think @sergeyklay wanted to do that as well.

@sergeyklay
Copy link
Collaborator

I apologize for completely forgetting this. My bad. I'll try to sort out asap.

@skvadrik
Copy link
Owner

skvadrik commented Nov 2, 2021

No worries @sergeyklay, If you haven't started yet, maybe @nightlark was planning to work on this? Just trying to avoid duplicate effort.

@nightlark
Copy link
Contributor Author

Yea I was thinking of starting on this, but I didn't find existing discussion about it on GitHub.

@sergeyklay if you already started something I can see about finishing it if you'd like. Otherwise, I can probably get a first pass at a workflow ready around this upcoming weekend.

@sergeyklay
Copy link
Collaborator

Unfortunately, I will not be able to find a free time in the next 4-5 days. And no, unfortunately I didn't even start.

That's what I think:

  • Every CI run should produce binaries
  • We should store these binaries in GitHub Artifacts
  • On release event we should get latest binaries from GitHub Artifacts and attach theme to a release

@skvadrik thoughts?

@skvadrik
Copy link
Owner

skvadrik commented Nov 2, 2021

I'm a bit worried if storing binaries on every CI run will waste too much space. We have retention-days: 30 but still those binaries won't be used. Maybe upload them only on release?

Aside from that, sounds good.

@nightlark
Copy link
Contributor Author

nightlark commented Nov 2, 2021

An option to store the artifacts from a CI run could be added as a separate feature, and/or a nightly/weekly run of the release binary workflow could minimize the space used (I doubt re2c would run into artifact storage limits either way).

Would it be desirable to follow the bootstrapping process to regenerate the .re files before building the final release binaries? Or would building from e.g. a "distribution-ready" source tarball for the release be enough?

@skvadrik
Copy link
Owner

skvadrik commented Nov 2, 2021

Would it be desirable to follow the bootstrapping process to regenerate the .re files before building the final release binaries?

Yes, I think we should build a minimal stage-1 and then a full stage-2. This is the way the current CI works. Here's where the Linux "fast" and "full" release builds are configured: https://github.com/skvadrik/re2c/blob/master/CMakePresets.json#L118-L133. And here they are used: https://github.com/skvadrik/re2c/blob/master/.github/workflows/ci.yml#L107-L126.

@sergeyklay
Copy link
Collaborator

@skvadrik Is the current configuration enough? Should we prepare a special CMake Preset with all possible optimizations to build production builds? A quick reminder: The current presets were designed only to meet the CI needs.

@sergeyklay
Copy link
Collaborator

Btw, there is a possibility reusing workflows available in public beta since October 5, 2021: https://docs.github.com/en/actions/learn-github-actions/reusing-workflows

So that we can reuse entire workflows as if they were an action.

@skvadrik
Copy link
Owner

skvadrik commented Nov 2, 2021

Is the current configuration enough?

Almost. We need to enable re2go and disable docs:

  CMAKE_BUILD_TYPE="Release"
  CMAKE_CXX_COMPILER="g++"
  CMAKE_C_COMPILER="gcc"
  CMAKE_INSTALL_PREFIX:PATH="/home/runner/work/re2c/re2c/install"
  RE2C_BUILD_LIBS:BOOL="TRUE"
-  RE2C_BUILD_RE2GO:BOOL="FALSE"
+  RE2C_BUILD_RE2GO:BOOL="TRUE"
  RE2C_FOR_BUILD="/home/runner/work/re2c/re2c/install/bin/re2c"
-  RE2C_REBUILD_DOCS:BOOL="TRUE"
+  RE2C_REBUILD_DOCS:BOOL="FALSE"
  RE2C_REBUILD_LEXERS:BOOL="TRUE"

Enabling re2go on ci.yml is also fine, so we may want to have a preset inheritance chain "fast" <- "release" <- "full", where "full" just adds -DRE2C_REBUILD_DOCS:BOOL="TRUE".

@NickStrupat
Copy link

Is the idea to list the pre-built binaries in the releases? That would be ideal for my use case.

@skvadrik
Copy link
Owner

Is the idea to list the pre-built binaries in the releases?

Yes, the idea is to build statically linked binaries for every release on different platforms, and do that via GitHub Actions. (For clarity, I haven't done any work on this so far, the recent 3.0 release is without binaries.)

@PolarGoose
Copy link
Contributor

As a workaround, I have created a repository to produce statically linked x64 executables of re2c for Windows:
https://github.com/PolarGoose/re2c-for-Windows

@pmetzger
Copy link
Contributor

I think this would be very useful for windows, but is of much less interest on platforms like MacOS (where MacPorts or Brew will easily handle it for the user) or on most Linux platforms.

@PolarGoose
Copy link
Contributor

PolarGoose commented Nov 23, 2024

For Linux it might be useful as well. It is possible to build a statically linked executable that will work on any distro. Then you don't need to depend on the version from the package manager.
I have created a script. You need to place it inside the <re2c-git-repo>/build folder.

#!/usr/bin/env bash

set -o xtrace
set -o errexit
set -o nounset
set -o pipefail

readonly currentScriptDir=`dirname "$(realpath -s "${BASH_SOURCE[0]}")"`
readonly gitRepoRoot="$currentScriptDir/.."
readonly buildDir="$gitRepoRoot/build_static_Linux_binary"

rm -rf "$buildDir"
mkdir "$buildDir"

cmake \
  -S "$gitRepoRoot" \
  -B "$buildDir" \
  -G Ninja \
  -D CMAKE_BUILD_TYPE=Release \
  -D CMAKE_EXE_LINKER_FLAGS=" \
    -static \
    -static-libgcc \
    -static-libstdc++" \
  -D RE2C_BUILD_RE2D=0 \
  -D RE2C_BUILD_RE2GO=0 \
  -D RE2C_BUILD_RE2HS=0 \
  -D RE2C_BUILD_RE2JAVA=0 \
  -D RE2C_BUILD_RE2JS=0 \
  -D RE2C_BUILD_RE2OCAML=0 \
  -D RE2C_BUILD_RE2PY=0 \
  -D RE2C_BUILD_RE2RUST=0 \
  -D RE2C_BUILD_RE2V=0 \
  -D RE2C_BUILD_RE2ZIG=0 \
  -D RE2C_BUILD_TESTS=0
cmake --build "$buildDir"

@skvadrik
Copy link
Owner

For Linux it might be useful as well. It is possible to build a statically linked executable that will work on any distro. Then you don't need to depend on the version from the package manager. I have created a script. You need to place it inside the <re2c-git-repo>/build folder.

Nice, but why do we need musl? Won't -static -static-libgcc -static-libstdc++ work the same way with whatever the system libc is (on the host system that is used to build the static binary)? Otherwise, we can surely add a build script for building static portable binary.

@PolarGoose
Copy link
Contributor

Nice, but why do we need musl? Won't -static -static-libgcc -static-libstdc++

Good question. I have checked once again. I think I made a mistake. Musl is not needed, indeed. Thank you.

-static -static-libgcc -static-libstdc++

Yes it is enough. I have corrected the script.

@nightlark
Copy link
Contributor Author

nightlark commented Nov 24, 2024

I just got pre-built binary wheels for re2c using cibuildwheel working and uploaded to PyPI. pip install re2c or pipx install re2c (probably better since pipx handles virtual environments for users). Maybe a bit unconventional, but there are a surprising number of non-Python tools packaged and distributed as wheels on PyPI.

Platforms supported:

  • Windows - x86, x86_64, and ARM64 (not tested)
  • macOS - Universal2 (ARM + x86_64)
  • Linux - i686, x86_64, armv7l, aarch64, ppc64le, s390x (musl 1.2 variants available for all, glibc 2.12 for x86/x86_64, glibc 2.31 for armv7l, and glibc 2.17 for all architectures)
    • Static linking glibc isn't necessary to get a binary that will work on most systems; glibc is good at maintaining backwards compatibility so the trick is to build on a system with an ancient copy of glibc so the minimum glibc version will be met by any newer system a user is using

PyPI package: https://pypi.org/project/re2c/
Code and workflows for packaging: https://github.com/nightlark/re2c-python-distributions

@nightlark
Copy link
Contributor Author

nightlark commented Nov 24, 2024

The PyPI source package also works for building a copy of re2c, so if there isn’t a pre-built binary wheel for a platform the pip install will still work if a suitable compiler is found. I just tested this on an assortment of compile farm systems with riscv64 and sparc processors, and some of the Solaris and OpenBSD systems.

@skvadrik
Copy link
Owner

Thanks @nightlark ! I'm surprised this is allowed (to package non-Python software via PyPI).

I think it's good for the prebuilt binaries to come from the official repo. If they are built on Github Actions CI, and the script that builds them in the same repo, everyone can access the logs and see how they were built.

@PolarGoose If your script is to be used for building official prebuilt binaries, it should follow 2-stage bootstrap process (first, build re2c using bootstrap files, then rebuild it using the re2c binary built on the previous step), like in this script:

# stage 1
mkdir stage1
../configure --prefix "$(pwd)/stage1" \
&& $make_prog \
&& $make_prog install
# stage 2
# 'make' implies 'make docs'; running both in parallel may cause data races
# configure without --enable-debug, this is the release binary
../configure \
--enable-docs \
--enable-libs \
--enable-lexers RE2C_FOR_BUILD="$(pwd)/stage1/bin/re2c" \
&& $make_prog bootstrap -j"$(nproc)" \
&& $make_prog distcheck -j"$(nproc)"
cd ..
done

@nightlark
Copy link
Contributor Author

Yea, their reasoning seems to be that you never know what someone might want to integrate with other parts of the Python ecosystem. Makes it really easy to get things like clang-format and the zig compiler on almost any platform.

For releases, how are you building the tar.xz and tar.lz archives? Do they already include files generated from a stage 1 bootstrap?

Would the CMake equivalent of the bootstrapping process be:

  • Do a CMake build using a -release-ootree-fast preset to generate the stage1 re2c binary
  • Do another CMake build setting RE2C_FOR_BUILD to the stage1 re2c binary, and enabling the RE2C_REBUILD_DOCS, RE2C_REBUILD_LEXERS, and RE2C_BUILD_LIBS options (rebuilding the parser and syntax files isn’t necessary?)

@skvadrik
Copy link
Owner

For releases, how are you building the tar.xz and tar.lz archives? Do they already include files generated from a stage 1 bootstrap?

I run https://github.com/skvadrik/re2c/blob/master/release.sh on my local machine. It does a few things and calls https://github.com/skvadrik/re2c/blob/master/build/__distcheck.sh, which does the 2-stage build process (the code I linked in my previous comment). So, by the time release.sh finishes successfully I have release tarballs built for me.

This process relies on make distcheck that is only available with the Autotools build system at the moment. It does a lot of useful checks in the process that won't be trivial to reproduce with CMake, like building a tarball, then unpacking it in a temporary directory and building again (which makes sure nothing is missing from the tarball), making sources read-only, etc. and it also builds the release tarballs (see https://www.gnu.org/software/automake/manual/html_node/Checking-the-Distribution.html for details).

The custom script wrapper adds 2-stage process and repeats this with make (which is GNU Make on my system) and bmake (the one from FreeBSD, which lack a lot of GNU Make functionality). This way I know that different make implementations work.

I've been relying on this process for many releases, accumulating more checks on the way, and it has found quite a few last-minute issues (like files missing in release tarball, trying to overwrite source files, POSIX-incompatible makefile rules, etc).

I understand that we need CMake for Windows, but I would prefer to use the existing process for Linux and *BSD, or else add a distcheck target for CMake that would check the release tarball as thoroughly as the Autotools rule.

Would the CMake equivalent of the bootstrapping process be:

  • Do a CMake build using a -release-ootree-fast preset to generate the stage1 re2c binary
  • Do another CMake build setting RE2C_FOR_BUILD to the stage1 re2c binary, and enabling the RE2C_REBUILD_DOCS, RE2C_REBUILD_LEXERS, and RE2C_BUILD_LIBS options (rebuilding the parser and syntax files isn’t necessary?)

It's roughly the same, but it's missing many useful checks and details I explained above.

We need to set RE2C_REBUILD_SYNTAX as well. If the syntax files in include/syntax have been modified, but bootstrap files haven't been regenerated, enabling this option for make distcheck will regenerate them, package updated bootstrap files (and the updates will be committed by release script) and run the tests to detect any errors. Otherwise we might package incorrect syntax files that have diverged from the ones embedded in the binary. I forgot to add it to distcheck script, but I have done so now: 61c88fb.

@PolarGoose
Copy link
Contributor

@PolarGoose If your script is to be used for building official prebuilt binaries, it should follow 2-stage bootstrap process (first, build re2c using bootstrap files, then rebuild it using the re2c binary built on the previous step), like in this script:

My script produces a working binary. Why do we need 2 stage process?

@skvadrik
Copy link
Owner

My script produces a working binary. Why do we need 2 stage process?

Two (or more) stage bootstrap process it always used to build self-hosting compilers (and re2c is a self-hosting compiler, since part of its code is written in re2c). re2c solves the bootstrap problem by providing pre-compiled files in the boostrap/ subdirectory. When any of the *.re files change, the corresponding bootstrap files need to be updated. However, they can only be updated if there already is working re2c executable, so this must be the executable built at stage 1 (from the old bootstrap files). Stage 2 is necessary because bootstrap files cannot be updated on stage 1 (because during stage 1 re2c executable has not been built yet). Note that it must be not just any re2c executable (e.g. the one that happens to be installed on the system), because this will tie the newly built re2c to that old re2c with whatever bugs it might have And the build process must be reproducible regardless of the host system.

You might be confused since you are not trying to change any *.re files, but the fact is, so the bootstrap files must be up to date. But the fact is, one of the previous commits might have updated *re files but failed to update the corresponding bootstrap files (by mistake).

This is all very standard build practice for self-hosting maintainers, which is a requirement on most Linux distros.

@PolarGoose
Copy link
Contributor

@skvadrik,

Thank you for the explanation.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

6 participants