Configuration file #10

barneydobson · 2024-01-18T15:19:39Z

Intended use would be to have an initial setup config and one that intends to be executed many times. The main file (swmmanywhere) will have to create a file structure that works with this. I would envisage (reflected below) that there will be two folders - one for the preprocessing executaion (first config file), and one for the model creation (second config). Probably there will also be a national_downloads folder still

Run (once for a project) a slow to execute config (i.e., runnning downloaders and particularly slow graphfcns)

bbox:
    - minx
    - miny
    - maxx
    - maxy
project_name: my_project
run:
  base_directory_to_search: '' # empty because this is the first time to run for a project
  base_directory_to_save_to: 'base_project_dir'
  downloaders: 
  starting_graph: 
    - 'street.json'
    - 'river.json'
  list_of_functions:
      - graphfcn_A
      - graphfcn_B
      - graphfcn_C

Run (thousands of times in, e.g., for sensitivity analysis)

project_name: my_project
downloaders: False
run:
    -base_directory_to_search = 'folder in which output of pre_process is placed'
    -base_directory_to_save_to = 'another folder'
    -starting_graph: graphfcn_C.json
    -list_of_function:
        -graphfcnD
        -graphfcnE
        -graphfcnF

The text was updated successfully, but these errors were encountered:

barneydobson · 2024-01-25T15:12:03Z

RE config files can we keep the discussion here please.

@cheginit
#19 (comment)

@dalonsoa
#19 (comment)

OK so I understand yaml, so would be keen to use that unless either of you have a strong suggestion to do something otherwise. The above setup is just one proposal, but I think in broad terms makes sense to have a flexible setup that can iterate over a various number of pre-processing steps before creating a graph that is a starting point for the config that will be running many many times?

cheginit · 2024-01-25T15:39:46Z

I usually create my own config reader module with custom data types (dataclasses) and methods for efficient and lazy iteration over config files when there are many of them. This can be achieved with custom __getitem__ and __iter__ methods for the reader module. I think, first, we need to figure out all the user-specified parameters and come up with a data schema for efficient storage and access. So, if there are many of these parameters, I recommend creating a table of all these parameters, their definitions, units, their “themes”, and other metadata that can help us categorize them.

barneydobson · 2024-01-25T15:44:50Z

https://github.com/ImperialCollegeLondon/SWMManywhere_old/blob/main/swmmanywhere/defs/parameter_defaults.yml Here is what I had been using to do for user parameters - I've tried to be exhaustive but I'm sure in polishing I would come up with more parameters. And so in the config file a user could supply parameters not set to their default values.

Edit:
Oh you probably mean parameters as in the config setup parameters (e.g., bbox) rather than model parameters. I will have a think

dalonsoa · 2024-01-26T09:01:50Z

We might be mixing input parameters in the config file with model parameters, indeed. They are related, but not exactly the same.

Having said that, and following on some of @cheginit comments, the defaults you shared @barneydobson very much call for pydantic models. One example will be:

from pydantic import BaseModel, Field

class OutletDerivation(BaseModel):
    max_river_length: float = Field(default = 30.0, ge=5, le=100, description="Distance to split rivers into segments (m).")
    river_buffer_distance: float = Field(default = 150.0, ge=50, le=300, description="Buffer distance to link rivers to streets (m).")
    outlet_length: float = Field(default = 40.0, ge=10, le=100, description="Length to discourage street drainage into river buffers (m).")

And everything (validation, setting defaults and documentation) comes automatically whenever this object is initialised.

barneydobson · 2024-01-26T09:21:57Z

OK well I understand what you've @dalonsoa written there - but I don't think I understand how this is actually going to work.

A few questions:

The module with these classes becomes the definitive location to store all parameters?
We think that model setup can also be handled like this? I guess that makes sense for something like the list of graphfcns. For example, I can have a InitialSetup class that has the standard instructions for that (e.g., run downloaders etc), and an IterativeSetup (or some better name) that has the latter parts? I guess it will seem possibly overkill for things like base_dir, but maybe it is fine.
I guess we add a set function if a user wants to provide a new value? Then maybe when a get function is called, if there is a user provided value we use that and otherwise use the default?

Edit:

And if a user wants to provide a file with a bunch of new values... what we still use a yaml or something?

barneydobson · 2024-01-26T09:29:34Z

Further - I guess we only need to pass the model parameters (and not the config parameters) to the graph functions. These were setup thinking that the parameters and addresses would be stored as dicts - but this seems to change that somehow?

From #8

@register_graphfcn
def generic_function(G, a, b, **kwargs):

    # create G_ from G, a, b
    return G_

for function in list_of_functions 
    G = function(G, **parameters, **addresses)

dalonsoa · 2024-01-26T10:36:43Z

It can still work like that. It could look something like this (most likely an oversimplification, I'm just using it as illustration):

def validate_parameters(config: dict) -> dict:
   """Validates parameter configuration and fill with defaults when not present.
    parameters = {}
    parameters["outlet_derivation"] = OutletDerivation(**config.get("outlet_derivation", {}))
    parameters["subcatchment_derivation"] = SubcatchmentDerivation(...)
    ...

# Then where relevant, after loading one or more configuration files into the dict
parameters = validate_parameters(config)

@register_graphfcn
def generic_function(G: graph, a: OutletDerivation, b: SubcatchmentDerivation, **kwargs) -> graph:

    # create G_ from G, a, b
    return G_

for function in list_of_functions 
    G = function(G, **parameters, **addresses)

This has the advantage that the parameters are now well defined objects with types, not a collection of entries in a dictionary, so it is easier for the type checkers to figure out if things are correct, for you while coding to receive suggestions, etc.

barneydobson · 2024-01-26T10:59:03Z

OK that is very helpful. I don't understand how we're going to do addresses yet - presumably this is linked to #9 .

But a functional mockup based on what you've got there would be as follows:

# This would be in graph_utilities.py
graphfcns = {}

def register_graphfcn(func):
    # Add the function to the registry
    graphfcns[func.__name__] = func
    return func

@register_graphfcn
def double_directed(G, river_buffer_distance, **kwargs):
    print(river_buffer_distance)
    return G

@register_graphfcn
def split_long_edges(G, max_river_length, **kwargs):
    print(max_river_length)
    return G

# This would be in parameters.py
from pydantic import BaseModel, Field
class OutletDerivation(BaseModel):
    max_river_length: float = Field(default = 30.0, ge=5, le=100, description="Distance to split rivers into segments (m).")
    river_buffer_distance: float = Field(default = 150.0, ge=50, le=300, description="Buffer distance to link rivers to streets (m).")
    outlet_length: float = Field(default = 40.0, ge=10, le=100, description="Length to discourage street drainage into river buffers (m).")

class SubcatchmentDerivation(BaseModel):
    subcatchment_buffer_distance: float = Field(default = 150.0, ge=50, le=300, description="Buffer distance to link subcatchments to streets (m).")
    outlet_length: float = Field(default = 40.0, ge=10, le=100, description="Length to discourage street drainage into subcatchment buffers (m).")

def validate_parameters(config: dict) -> dict:
    """Validates parameter configuration and fill with defaults when not present."""
    parameters = {}
    parameters["outlet_derivation"] = OutletDerivation(**config.get("outlet_derivation", {}))
    parameters["subcatchment_derivation"] = SubcatchmentDerivation(**config.get("subcatchment_derivation", {}))
    parameters_only = {}
    for value in parameters.values():
        for key_, value_ in dict(value).items():
            parameters_only[key_] = value_
    return parameters_only


# This kind of behaviour would be in generate_network.py
custom_parameters = {'outlet_derivation' : {'max_river_length': 50.0}}
params = validate_parameters(custom_parameters)

import networkx as nx
G = nx.Graph()
function_list = ['double_directed', 'split_long_edges']
for function in function_list:
    assert function in graphfcns.keys(), f"Function {function} not registered in graphfcns."
    G = graphfcns[function](G, **params)

Note the 'unpacking' of parameters from their subcategory. Happy to discuss if this is a good idea or not - it is just that is how it works in the existing code.

dalonsoa · 2024-01-26T11:18:16Z

You can use directly the following to unpack the fields:

    for value in parameters.values():
        parameters_only.update(value.model_dump())

Having said that, I think it is a pity not using the fully formed objects directly, even if a particular graph function is only using one of the fields of that higher level object. It will give some reasurance thanks to the type checkers that you're doing the right thing. The functions would look something like this:

@register_graphfcn
def double_directed(G, outlet_derivation: OutletDerivation, **kwargs):
    print(outlet_derivation.river_buffer_distance)
    return G

@register_graphfcn
def split_long_edges(G, outlet_derivation: OutletDerivation, **kwargs):
    print(outlet_derivation.max_river_length)
    return G

barneydobson · 2024-01-31T11:03:05Z

note from #31 - the 'default loadout' of graphfcn order could be in a default config file or a default base model class. Many pro's and cons discussed in the linked PR

cheginit · 2024-01-31T14:28:50Z

Since you're going with pydantic, that, I think, is a good decision, I also have a suggestion as an alternative to pydantic. There's a package called Param that is similar to pydantic but is more tailored toward validating model-like parameters. I think it's worth exploring since it requires less coding compared to pydantic, doesn't have any dependencies, and some neat built-in functionalities for numerical parameters.

barneydobson · 2024-02-28T12:01:26Z

Also linked with this are:

choice of metrics Metrics for sensitivity analysis #49
parameters to vary Parameters for sensitivity analysis #56
choice/running of baseline model - which I guess is something we want to control via config. This becomes especially important if testing a range of different preciptiation events Precipitation events #55

barneydobson added enhancements priority! labels Jan 18, 2024

barneydobson self-assigned this Jan 19, 2024

barneydobson mentioned this issue Jan 23, 2024

SWMM text tools #25

Merged

4 tasks

barneydobson mentioned this issue Jan 25, 2024

Migrate code, write tests, write docstrings #19

Closed

This was referenced Jan 26, 2024

File/address validation #9

Closed

Graphfcns #31

Merged

barneydobson mentioned this issue Feb 5, 2024

Preprocessing #37

Merged

4 tasks

This was referenced Mar 4, 2024

Module level descriptions #70

Closed

More comprehensive logging messages #71

Closed

expand ruff rules #73

Merged

CLI #77

Closed

barneydobson linked a pull request Mar 13, 2024 that will close this issue

Create demo_config.yml #83

Merged

barneydobson mentioned this issue Mar 13, 2024

Create demo_config.yml #83

Merged

barneydobson closed this as completed in #83 Mar 18, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Configuration file #10

Configuration file #10

barneydobson commented Jan 18, 2024 •

edited

Loading

barneydobson commented Jan 25, 2024

cheginit commented Jan 25, 2024

barneydobson commented Jan 25, 2024 •

edited

Loading

dalonsoa commented Jan 26, 2024 •

edited

Loading

barneydobson commented Jan 26, 2024 •

edited

Loading

barneydobson commented Jan 26, 2024

dalonsoa commented Jan 26, 2024

barneydobson commented Jan 26, 2024 •

edited by dalonsoa

Loading

dalonsoa commented Jan 26, 2024

barneydobson commented Jan 31, 2024

cheginit commented Jan 31, 2024

barneydobson commented Feb 28, 2024

Configuration file #10

Configuration file #10

Comments

barneydobson commented Jan 18, 2024 • edited Loading

barneydobson commented Jan 25, 2024

cheginit commented Jan 25, 2024

barneydobson commented Jan 25, 2024 • edited Loading

dalonsoa commented Jan 26, 2024 • edited Loading

barneydobson commented Jan 26, 2024 • edited Loading

barneydobson commented Jan 26, 2024

dalonsoa commented Jan 26, 2024

barneydobson commented Jan 26, 2024 • edited by dalonsoa Loading

dalonsoa commented Jan 26, 2024

barneydobson commented Jan 31, 2024

cheginit commented Jan 31, 2024

barneydobson commented Feb 28, 2024

barneydobson commented Jan 18, 2024 •

edited

Loading

barneydobson commented Jan 25, 2024 •

edited

Loading

dalonsoa commented Jan 26, 2024 •

edited

Loading

barneydobson commented Jan 26, 2024 •

edited

Loading

barneydobson commented Jan 26, 2024 •

edited by dalonsoa

Loading