Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

machines.py: KeyError in user_config[machine_name] #305

Open
Thinkpiet opened this issue Nov 6, 2024 · 1 comment
Open

machines.py: KeyError in user_config[machine_name] #305

Thinkpiet opened this issue Nov 6, 2024 · 1 comment

Comments

@Thinkpiet
Copy link

Hi,

with a fresh FabSim3 install, if I execute fabsim hsuper stat then I get this error:

Traceback (most recent call last):
  File "/home/piet/repos/FabSim3/fabsim/bin/fabsim", line 46, in <module>
    sys.exit(fabsim_main.main())
             ^^^^^^^^^^^^^^^^^^
  File "/home/piet/repos/FabSim3/fabsim/base/fabsim_main.py", line 156, in main
    load_machine(env.host)
  File "<@beartype(fabsim.deploy.machines.load_machine) at 0x7ad61790b920>", line 32, in load_machine
  File "/home/piet/repos/FabSim3/fabsim/deploy/machines.py", line 139, in load_machine
    env.modules.update(user_config[machine_name].get("modules", {}))
                       ~~~~~~~~~~~^^^^^^^^^^^^^^
KeyError: 'hsuper'

This makes sense because I did not configure the machine in my fabsim/deploy/machines_user.yml. However, since the machine is listed in fabsim/deploy/machines.yml and also is in the output of fabsim -l machines, I think that this exception should be caught and that there should be a reasonable error message, instead of FabSim just crashing here. For example, for fabsim doesnotexist stat I get this instead

ValueError: The requested remote machine "doesnotexist" did not listed in the machines.yml file, so it can not be used as a target remote host.

The available remote machines are : 
dict_keys(['localhost', 'cartesius', 'prometheus', 'cirrus', 'supermuc2', 'ohm', 'Kathleen', 'myriad', 'bluejoule', 'bluewonder2', 'oppenheimer', 'qcg', 'eagle_vecma', 'eagle_hidalgo', 'eagle_seavea', 'supermuc_vecma', 'training_hidalgo', 'archer2', 'marenostrum4', 'piz_daint', 'monsoonfab', 'hsuper'])

which is a lot better. Thus, I would expect a similar error message here, that asks me to update fabsim/deploy/machines_user.yml.

Best, Piet

Platform details (optional)

  • OS: Arch Linux
  • FabSim3 commit 20c62f5
  • Python version 3.12.6
@mzrghorbani
Copy link
Collaborator

Hi team,

I’ve updated the load_machine function to handle cases where a machine is defined in machines.yml but lacks configuration in machines_user.yml. Below is the modified function with comments explaining each part of the code.

def load_machine(machine_name: str) -> None:
    """
    Load the machine-specific configurations.
    Completes additional paths and interpolates them, via
    `complete_environment`.
    """
    # Check if the machine exists in config before proceeding
    if machine_name not in config:
        raise ValueError(
            f'The requested remote machine "{machine_name}" is not listed in `machines.yml`, '
            "so it cannot be used as a target remote host.\n"
            f"Available remote machines are: {list(config.keys())}"
        )

    # If the machine exists in `config`, proceed with loading
    if "import" in config[machine_name]:
        # Config for this machine is based on another
        env.update(config[config[machine_name]["import"]])

        if config[machine_name]["import"] in user_config:
            env.update(user_config[config[machine_name]["import"]])

    env.update(config[machine_name])

    # Check if the machine exists in `user_config`
    if machine_name in user_config:
        env.update(user_config[machine_name])
    else:
        # Raise a descriptive error if it's missing in `machines_user.yml`
        raise ValueError(
            f'The machine "{machine_name}" is listed in `machines.yml` but is not configured in `machines_user.yml`. '
            "Please add it there to use it as a target remote host."
        )

    env.machine_name = machine_name

    # Construct modules environment: update, not replace when overrides are done.
    env.modules = config["default"]["modules"]
    if "import" in config[machine_name]:
        env.modules.update(config[config[machine_name]["import"]].modules)
    env.modules.update(config[machine_name].get("modules", {}))

    if "import" in config[machine_name]:
        if config[machine_name]["import"] in user_config:
            env.modules.update(
                user_config[config[machine_name]["import"]].modules
            )

    # Add modules from user_config if available
    if machine_name in user_config:
        env.modules.update(user_config[machine_name].get("modules", {}))

    complete_environment()

Summary of Changes
Machine Existence Check in config: Before loading configurations, the function checks if the specified machine (machine_name) exists in config. If not, it raises a ValueError and lists available machines.

Error Message for Missing user_config Entry: If the machine exists in machines.yml but is missing in machines_user.yml, the function raises a ValueError instructing the user to add the machine configuration to machines_user.yml.

Modular Environment Construction: The function updates the env.modules settings without replacing overrides, ensuring any imported configurations are properly included.

I’ve tested these changes locally, and they effectively prevent crashes from missing machine configurations.

Would appreciate any feedback or additional testing!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants