-
Notifications
You must be signed in to change notification settings - Fork 4
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
ci: update repo structure and CI/CD for v2 #178
Conversation
for more information, see https://pre-commit.ci
…centre-for-humanities-computing/danish-foundation-models into mb/update_repo_structure_for_v2
589fdda
to
2521fd1
Compare
…centre-for-humanities-computing/danish-foundation-models into mb/update_repo_structure_for_v2
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Some small things, otherwise looks fine.
Also, I cannot add Rasmus and Peter as reviewers since I'm not part of chcaa. Are you able to make us admins?
I'm going on a holiday for a week today, so can't answer any follow-ups until I'm back.
@KennethEnevoldsen Are we up for trying out codecov? :-) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think it looks good, but I would add more documentation on how to e.g. run the make commands.
@saattrupdan does everyone use vs-code?
Devcontainers are supported by nvim as well: https://github.com/esensar/nvim-dev-container Or through a CLI: https://code.visualstudio.com/docs/devcontainers/devcontainer-cli |
@MartinBernstorff I just want to make sure that the documentation of the setup is clear regardless of the IDE. |
Agreed. We can update based on what the others say re: requirements.txt vs. pyproject.toml. Ideally, we should have only one entrypoint for a full dev env setup, e.g. dev container that uses |
One perspective from my side, is that most things we make here will have to run on some sort of HPC infrastructure. In order to have access to a decent software stack, we will have to use either a container or the module system. On LUMI it's definitely recommended to use a container. The one I have currently based everything on is https://hub.docker.com/r/rocm/pytorch/ with the tag Since all the packages we're going to compile/install will depend on the software stack in the container, we should probably make this explicit in this repo. It for example means that the Python version is going to depend on the container, so it should be tied to that rather than some other place in the repo? With that said, my current approach is to create a virtualenv on the host that then gets activated in the container, so you can upgrade packages without rebuilding the container, but quite a few packages nonetheless have to be compiled against a specific ROCm stack, so the venv is fully tied to the container and can only be used in it. |
@rlrs hvorfor ikke rocm5.7_ubuntu22.04_py3.10_pytorch_2.0.1? |
Great points @rlrs! Just a few clarifying questions:
Unsure what you're pointing towards when saying "module system" here. Also, is this an either/or?
Completely agree. One source of truth ftw. If the others agree, I'll remove as many of the python versions specified in the pyproject.toml as possible.
You can also connect to the container and update the dependencies within it. This removes the need for a virtualenv, and ensures that the deps are on the same filesystem. Especially on macOS/Windows, it should give quite a performance boost compared to bind-mounts :-)
Another argument for installing deps within the container. If we end up needing a more complicated container stack, e.g. one for ROCm and one for CUDA, this setup should support it. But let's cross that bridge when we get to it? |
Didn't exist when I started, and I had trouble with the newest one at that point (with 5.6). Can look into it again at some point, but I mostly used 5.4.2 because I knew people had that working. Not sure it makes much difference for now, anyways. Also torch 2.0.1 is broken(!) in that it doesn't support some basic ops, so we anyways need to upgrade to 2.1. I'm also unsure if some other packages I've been using support python 3.10, have not looked into it.
LUMI has a module system which allows you to build/load custom software packages. But I definitely think containers are the nicer way to do it.
I don't think Singularity (the container system on LUMI) is quite that flexible. It's not like Docker and you even need root for some of the features, which we cannot have. But maybe it's possible and I just don't know how. If you are referring to "persistent overlays", I think I tried that but it requires root. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks fine. I am not sure containers run well on multi-node-slurm systems, but @rlrs will now better.
Ah, I see! I'm not into the whole LUMI-terminology. Sounds like getting started with a simple dev-container, and then branching out to separate containers if required, is the way to go. We can always use a multi-stage build Dockerfile to make them composable. |
Singularity can't even run Docker containers - it's another format. But let's get this merged and we can look at it later, seems like a separate issue that we don't need to solve now. |
…centre-for-humanities-computing/danish-foundation-models into mb/update_repo_structure_for_v2
General considerations
I've made a bunch of decisions below, and some of them are based on taste.
By far the most important criteria are:
So definitely raise any concerns you have!
I also like to geek out on this, so would love to have a talk about it ;-)
Static type checking
Pyright. Primary advantage over mypy is that mypy infers any type that is not hinted as "Any", whereas Pyright defines its own "Unknown" type. For more, see https://github.com/microsoft/pyright/blob/main/docs/mypy-comparison.md
I've set it to the "strict" setting, which is (unsurprisingly) quite strict. This meant the existing code had ~400 errors.
"Strict" matches my preference, but it is definitely a strong requirement for developers. I'm super open to opinions here.
Archiving
Instead of trying to fix all of the errors, I spoke with Lasse Hansen, and we agreed to move it to an "archive" folder, and then we can extract functionality into the src folder as we need it.
Dev env
Since we're writing an application, rather than a library with many users, we can take a bunch of liberties that makes maintenance easier. For example:
I've specified how to spin up the dev environment in the readme. This also installs relevant VSCode extensions.
It also allows us to use the dev container for CI, ensuring complete alignment between local and CI environments. It currently takes 1,5 minutes to build, but that'll be cached when we push to main the first time.
Classical installation is still possible; all you have to set up are the requirements-files and the package itself.
Formatting
Ruff and black. Nothing wild here.
Tests
Pytest. Again, nothing wild. I'll argue for including codecov as well; not because we should follow it religiously, but because it often flags code that I forgot to test sufficiently.
Dependency management
KISS: requirements.txt files.
Split out into a few to get more efficient caching when building the docker image.
@KennethEnevoldsen, @saattrupdan, could you ping Rasmus and Peter as well?
Changes needed:
Install
entrypoint