Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

ci: update repo structure and CI/CD for v2 #178

Merged
merged 35 commits into from
Oct 17, 2023

Conversation

MartinBernstorff
Copy link
Contributor

@MartinBernstorff MartinBernstorff commented Oct 12, 2023

General considerations

I've made a bunch of decisions below, and some of them are based on taste.

By far the most important criteria are:

  • High developer enjoyment
  • High productivity

So definitely raise any concerns you have!

I also like to geek out on this, so would love to have a talk about it ;-)

Static type checking

Pyright. Primary advantage over mypy is that mypy infers any type that is not hinted as "Any", whereas Pyright defines its own "Unknown" type. For more, see https://github.com/microsoft/pyright/blob/main/docs/mypy-comparison.md

I've set it to the "strict" setting, which is (unsurprisingly) quite strict. This meant the existing code had ~400 errors.

"Strict" matches my preference, but it is definitely a strong requirement for developers. I'm super open to opinions here.

Archiving

Instead of trying to fix all of the errors, I spoke with Lasse Hansen, and we agreed to move it to an "archive" folder, and then we can extract functionality into the src folder as we need it.

Dev env

Since we're writing an application, rather than a library with many users, we can take a bunch of liberties that makes maintenance easier. For example:

  • Only one Python version (3.10 for now)
  • Pin all dependencies to one version
  • Specify the entire development environment in a dev container. This ensures there is only one source of truth.

I've specified how to spin up the dev environment in the readme. This also installs relevant VSCode extensions.

It also allows us to use the dev container for CI, ensuring complete alignment between local and CI environments. It currently takes 1,5 minutes to build, but that'll be cached when we push to main the first time.

Classical installation is still possible; all you have to set up are the requirements-files and the package itself.

Formatting

Ruff and black. Nothing wild here.

Tests

Pytest. Again, nothing wild. I'll argue for including codecov as well; not because we should follow it religiously, but because it often flags code that I forgot to test sufficiently.

Dependency management

KISS: requirements.txt files.

Split out into a few to get more efficient caching when building the docker image.

@KennethEnevoldsen, @saattrupdan, could you ping Rasmus and Peter as well?

Changes needed:

  • Add an Install entrypoint
  •  Add a "open in dev container" button
  • Migrate dependencies to pyproject.toml

@MartinBernstorff MartinBernstorff force-pushed the mb/update_repo_structure_for_v2 branch from 589fdda to 2521fd1 Compare October 12, 2023 12:10
Copy link
Collaborator

@saattrupdan saattrupdan left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Some small things, otherwise looks fine.

Also, I cannot add Rasmus and Peter as reviewers since I'm not part of chcaa. Are you able to make us admins?

I'm going on a holiday for a week today, so can't answer any follow-ups until I'm back.

archive_v1/requirements.txt Outdated Show resolved Hide resolved
archive_v1/requirements.txt Outdated Show resolved Hide resolved
archive_v1/requirements.txt Outdated Show resolved Hide resolved
@MartinBernstorff
Copy link
Contributor Author

@KennethEnevoldsen Are we up for trying out codecov? :-)

Copy link
Contributor

@KennethEnevoldsen KennethEnevoldsen left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think it looks good, but I would add more documentation on how to e.g. run the make commands.

@saattrupdan does everyone use vs-code?

requirements.txt Outdated Show resolved Hide resolved
archive_v1/requirements.txt Show resolved Hide resolved
@MartinBernstorff
Copy link
Contributor Author

Devcontainers are supported by nvim as well: https://github.com/esensar/nvim-dev-container

Or through a CLI: https://code.visualstudio.com/docs/devcontainers/devcontainer-cli

@KennethEnevoldsen
Copy link
Contributor

@MartinBernstorff I just want to make sure that the documentation of the setup is clear regardless of the IDE.

@MartinBernstorff
Copy link
Contributor Author

@MartinBernstorff I just want to make sure that the documentation of the setup is clear regardless of the IDE.

Agreed. We can update based on what the others say re: requirements.txt vs. pyproject.toml. Ideally, we should have only one entrypoint for a full dev env setup, e.g. dev container that uses make install or similar.

@rlrs
Copy link
Collaborator

rlrs commented Oct 15, 2023

One perspective from my side, is that most things we make here will have to run on some sort of HPC infrastructure. In order to have access to a decent software stack, we will have to use either a container or the module system. On LUMI it's definitely recommended to use a container. The one I have currently based everything on is https://hub.docker.com/r/rocm/pytorch/ with the tag rocm5.4.2_ubuntu20.04_py3.8_pytorch_2.0.0_preview.

Since all the packages we're going to compile/install will depend on the software stack in the container, we should probably make this explicit in this repo. It for example means that the Python version is going to depend on the container, so it should be tied to that rather than some other place in the repo?

With that said, my current approach is to create a virtualenv on the host that then gets activated in the container, so you can upgrade packages without rebuilding the container, but quite a few packages nonetheless have to be compiled against a specific ROCm stack, so the venv is fully tied to the container and can only be used in it.

@KennethEnevoldsen
Copy link
Contributor

@rlrs hvorfor ikke rocm5.7_ubuntu22.04_py3.10_pytorch_2.0.1?

@MartinBernstorff
Copy link
Contributor Author

Great points @rlrs! Just a few clarifying questions:

we will have to use either a container or the module system

Unsure what you're pointing towards when saying "module system" here. Also, is this an either/or?

It for example means that the Python version is going to depend on the container, so it should be tied to that rather than some other place in the repo?

Completely agree. One source of truth ftw. If the others agree, I'll remove as many of the python versions specified in the pyproject.toml as possible.

With that said, my current approach is to create a virtualenv on the host that then gets activated in the container, so you can upgrade packages without rebuilding the container

You can also connect to the container and update the dependencies within it. This removes the need for a virtualenv, and ensures that the deps are on the same filesystem. Especially on macOS/Windows, it should give quite a performance boost compared to bind-mounts :-)

quite a few packages nonetheless have to be compiled against a specific ROCm stack

Another argument for installing deps within the container.

If we end up needing a more complicated container stack, e.g. one for ROCm and one for CUDA, this setup should support it. But let's cross that bridge when we get to it?

@rlrs
Copy link
Collaborator

rlrs commented Oct 16, 2023

@rlrs hvorfor ikke rocm5.7_ubuntu22.04_py3.10_pytorch_2.0.1?

Didn't exist when I started, and I had trouble with the newest one at that point (with 5.6). Can look into it again at some point, but I mostly used 5.4.2 because I knew people had that working. Not sure it makes much difference for now, anyways. Also torch 2.0.1 is broken(!) in that it doesn't support some basic ops, so we anyways need to upgrade to 2.1. I'm also unsure if some other packages I've been using support python 3.10, have not looked into it.

Unsure what you're pointing towards when saying "module system" here. Also, is this an either/or?

LUMI has a module system which allows you to build/load custom software packages. But I definitely think containers are the nicer way to do it.

You can also connect to the container and update the dependencies within it. This removes the need for a virtualenv, and ensures that the deps are on the same filesystem. Especially on macOS/Windows, it should give quite a performance boost compared to bind-mounts :-)

I don't think Singularity (the container system on LUMI) is quite that flexible. It's not like Docker and you even need root for some of the features, which we cannot have. But maybe it's possible and I just don't know how. If you are referring to "persistent overlays", I think I tried that but it requires root.

Copy link
Contributor

@peterbjorgensen peterbjorgensen left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks fine. I am not sure containers run well on multi-node-slurm systems, but @rlrs will now better.

@MartinBernstorff
Copy link
Contributor Author

I don't think Singularity (the container system on LUMI) is quite that flexible. It's not like Docker and you even need root for some of the features, which we cannot have. But maybe it's possible and I just don't know how. If you are referring to "persistent overlays", I think I tried that but it requires root.

Ah, I see! I'm not into the whole LUMI-terminology.

Sounds like getting started with a simple dev-container, and then branching out to separate containers if required, is the way to go. We can always use a multi-stage build Dockerfile to make them composable.

@rlrs
Copy link
Collaborator

rlrs commented Oct 17, 2023

Singularity can't even run Docker containers - it's another format. But let's get this merged and we can look at it later, seems like a separate issue that we don't need to solve now.

@MartinBernstorff MartinBernstorff merged commit 7604a23 into main Oct 17, 2023
1 check passed
@MartinBernstorff MartinBernstorff deleted the mb/update_repo_structure_for_v2 branch October 17, 2023 11:41
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants