LUMI scripts - Mosaic/llm-foundry #190

rlrs · 2023-11-16T12:31:16Z

(Continued) Pretraining setup for LUMI. These scripts are now for a mosaicai/llm-foundry stack. Everything should work.

rlrs · 2023-11-16T12:33:04Z

And yes I know that there's a Huggingface token in there. It's invalid, need to find a better way to manage that.

github-actions · 2023-12-20T13:05:29Z

This PR is stale because it has been open 1+ days with no activity. Feel free to either 1) remove the stale label or 2) comment. If nothing happens, this will be closed in 7 days.

rlrs · 2023-12-20T13:40:42Z

Please don't close my draft PR because it's not active over xmas 😅

rlrs · 2023-12-20T14:29:12Z

Okay, so this whole thing works now. There are a few unused scripts containing some of my other attempts at setting things up on LUMI - I should perhaps move these somewhere else to keep things clean.

The important files are:

make_venv.sh - creates the Python venv that all the nodes use to run the pretraining code.
continue_mistral_mosaic.sh - this is the SLURM sbatch script that describes how many nodes to run on etc., and launches the following script in the correct Singularity container on each node.
mosaic_in_container.sh - this script is run on in the container on each node and it simply sets up a few things before running the given training command.
continue-mistral-7b.yaml - configuration file that describes which model to train, which hyperparams, which data, which evals etc.

Additionally, I've added two submodules (so you have to clone with --recurse-submodules) since these are core dependencies that we need to keep track of, and perhaps we should pin them to a certain commit instead of the head of a branch.
Everything else is unused in the current setup.

KennethEnevoldsen

Looks good but have a few questions. Will try to make it run on LUMI as well (after Christmas) and that might lead to more questions, but that does not need to hold the PR back.

src/dfm/projects/production/model_training/README.md

src/dfm/projects/production/model_training/scripts/continue_mistral.py

src/dfm/projects/production/model_training/scripts/continue_mistral.sh

src/dfm/projects/production/model_training/scripts/pretrain_llama2.sh

src/dfm/projects/production/model_training/scripts/yamls/continue-mistral-7b.yaml

KennethEnevoldsen · 2023-12-21T10:12:46Z

Okay, so this whole thing works now. There are a few unused scripts containing some of my other attempts at setting things up on LUMI - I should perhaps move these somewhere else to keep things clean.

Stuff that is still being worked on feel free to keep that, but stuff that could be deleted and recovered from the history (if needed) might as well be deleted

perhaps we should pin them to a certain commit instead of the head of a branch.

def. pin them

github-actions · 2023-12-26T13:05:59Z

This PR is stale because it has been open 1+ days with no activity. Feel free to either 1) remove the stale label or 2) comment. If nothing happens, this will be closed in 7 days.

KennethEnevoldsen · 2023-12-29T10:08:40Z

@rlrs will remove the stale label (this will give it another 7 days) as I assume you might be on vacation

github-actions · 2024-01-02T13:06:03Z

This PR is stale because it has been open 1+ days with no activity. Feel free to either 1) remove the stale label or 2) comment. If nothing happens, this will be closed in 7 days.

KennethEnevoldsen

Looks good!

scripts/data/convert_dataset_json.py

github-actions · 2024-01-09T13:09:34Z

This PR is stale because it has been open 1+ days with no activity. Feel free to either 1) remove the stale label or 2) comment. If nothing happens, this will be closed in 7 days.

rlrs added 5 commits October 20, 2023 13:52

initial scripts for lumi training

e73f3e8

add requirements for HF 5.7 env

166da6a

add LUMI readme

7d239be

Merge remote-tracking branch 'origin/main' into lumi

75526b9

working LUMI scripts

124e67a

working scripts for multinode training

02c886c

github-actions bot added the Stale label Dec 20, 2023

rlrs removed the Stale label Dec 20, 2023

update submodules and venv

844e2ae

rlrs marked this pull request as ready for review December 20, 2023 14:34

rlrs requested a review from KennethEnevoldsen December 20, 2023 14:36

Update README.md

fe0bb46

KennethEnevoldsen reviewed Dec 21, 2023

View reviewed changes

github-actions bot added the Stale label Dec 26, 2023

KennethEnevoldsen removed the Stale label Dec 29, 2023

github-actions bot added the Stale label Jan 2, 2024

rlrs added 3 commits January 2, 2024 14:16

add modified speedy data tokenization script

c14a9e5

refactor, generate splits

62d9c79

update readme

609824a

github-actions bot removed the Stale label Jan 3, 2024

rlrs added 2 commits January 3, 2024 20:57

fix a dataset conversion error

fac0119

remove unused scripts

59ff133

rlrs added 3 commits January 4, 2024 14:27

move scripts

a026d6e

update scripts

68022d2

update lumi readme

17f2ac2

rlrs requested review from peter-sk and KennethEnevoldsen January 4, 2024 13:50

rlrs added 3 commits January 4, 2024 14:51

update continue script

53236d3

composer/

030cfc5

Merge branch 'main' into lumi

d60d128

KennethEnevoldsen approved these changes Jan 4, 2024

View reviewed changes

scripts/data/convert_dataset_json.py Outdated Show resolved Hide resolved

github-actions bot added the Stale label Jan 9, 2024

rlrs added 2 commits January 15, 2024 13:55

Merge branch 'main' into lumi

1d32dbd

Update and rename convert_dataset_json.py to jsonl_to_mds.py

1f25b55

rlrs enabled auto-merge January 15, 2024 13:05

rlrs merged commit 457f847 into main Jan 15, 2024
1 check passed

rlrs deleted the lumi branch January 15, 2024 13:06

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

LUMI scripts - Mosaic/llm-foundry #190

LUMI scripts - Mosaic/llm-foundry #190

rlrs commented Nov 16, 2023 •

edited

Loading

rlrs commented Nov 16, 2023

github-actions bot commented Dec 20, 2023

rlrs commented Dec 20, 2023

rlrs commented Dec 20, 2023

KennethEnevoldsen left a comment

KennethEnevoldsen commented Dec 21, 2023

github-actions bot commented Dec 26, 2023

KennethEnevoldsen commented Dec 29, 2023

github-actions bot commented Jan 2, 2024

KennethEnevoldsen left a comment

github-actions bot commented Jan 9, 2024

LUMI scripts - Mosaic/llm-foundry #190

LUMI scripts - Mosaic/llm-foundry #190

Conversation

rlrs commented Nov 16, 2023 • edited Loading

rlrs commented Nov 16, 2023

github-actions bot commented Dec 20, 2023

rlrs commented Dec 20, 2023

rlrs commented Dec 20, 2023

KennethEnevoldsen left a comment

Choose a reason for hiding this comment

KennethEnevoldsen commented Dec 21, 2023

github-actions bot commented Dec 26, 2023

KennethEnevoldsen commented Dec 29, 2023

github-actions bot commented Jan 2, 2024

KennethEnevoldsen left a comment

Choose a reason for hiding this comment

github-actions bot commented Jan 9, 2024

rlrs commented Nov 16, 2023 •

edited

Loading