-
Notifications
You must be signed in to change notification settings - Fork 1
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Configuration file #10
Comments
RE config files can we keep the discussion here please. OK so I understand yaml, so would be keen to use that unless either of you have a strong suggestion to do something otherwise. The above setup is just one proposal, but I think in broad terms makes sense to have a flexible setup that can iterate over a various number of pre-processing steps before creating a graph that is a starting point for the config that will be running many many times? |
I usually create my own config reader module with custom data types (dataclasses) and methods for efficient and lazy iteration over config files when there are many of them. This can be achieved with custom |
https://github.com/ImperialCollegeLondon/SWMManywhere_old/blob/main/swmmanywhere/defs/parameter_defaults.yml Here is what I had been using to do for user parameters - I've tried to be exhaustive but I'm sure in polishing I would come up with more parameters. And so in the config file a user could supply parameters not set to their default values. Edit: |
We might be mixing input parameters in the config file with model parameters, indeed. They are related, but not exactly the same. Having said that, and following on some of @cheginit comments, the defaults you shared @barneydobson very much call for from pydantic import BaseModel, Field
class OutletDerivation(BaseModel):
max_river_length: float = Field(default = 30.0, ge=5, le=100, description="Distance to split rivers into segments (m).")
river_buffer_distance: float = Field(default = 150.0, ge=50, le=300, description="Buffer distance to link rivers to streets (m).")
outlet_length: float = Field(default = 40.0, ge=10, le=100, description="Length to discourage street drainage into river buffers (m).") And everything (validation, setting defaults and documentation) comes automatically whenever this object is initialised. |
OK well I understand what you've @dalonsoa written there - but I don't think I understand how this is actually going to work. A few questions:
Edit:
|
Further - I guess we only need to pass the model parameters (and not the config parameters) to the graph functions. These were setup thinking that the parameters and addresses would be stored as dicts - but this seems to change that somehow? From #8
|
It can still work like that. It could look something like this (most likely an oversimplification, I'm just using it as illustration): def validate_parameters(config: dict) -> dict:
"""Validates parameter configuration and fill with defaults when not present.
parameters = {}
parameters["outlet_derivation"] = OutletDerivation(**config.get("outlet_derivation", {}))
parameters["subcatchment_derivation"] = SubcatchmentDerivation(...)
...
# Then where relevant, after loading one or more configuration files into the dict
parameters = validate_parameters(config)
@register_graphfcn
def generic_function(G: graph, a: OutletDerivation, b: SubcatchmentDerivation, **kwargs) -> graph:
# create G_ from G, a, b
return G_
for function in list_of_functions
G = function(G, **parameters, **addresses) This has the advantage that the parameters are now well defined objects with types, not a collection of entries in a dictionary, so it is easier for the type checkers to figure out if things are correct, for you while coding to receive suggestions, etc. |
OK that is very helpful. I don't understand how we're going to do addresses yet - presumably this is linked to #9 . But a functional mockup based on what you've got there would be as follows: # This would be in graph_utilities.py
graphfcns = {}
def register_graphfcn(func):
# Add the function to the registry
graphfcns[func.__name__] = func
return func
@register_graphfcn
def double_directed(G, river_buffer_distance, **kwargs):
print(river_buffer_distance)
return G
@register_graphfcn
def split_long_edges(G, max_river_length, **kwargs):
print(max_river_length)
return G
# This would be in parameters.py
from pydantic import BaseModel, Field
class OutletDerivation(BaseModel):
max_river_length: float = Field(default = 30.0, ge=5, le=100, description="Distance to split rivers into segments (m).")
river_buffer_distance: float = Field(default = 150.0, ge=50, le=300, description="Buffer distance to link rivers to streets (m).")
outlet_length: float = Field(default = 40.0, ge=10, le=100, description="Length to discourage street drainage into river buffers (m).")
class SubcatchmentDerivation(BaseModel):
subcatchment_buffer_distance: float = Field(default = 150.0, ge=50, le=300, description="Buffer distance to link subcatchments to streets (m).")
outlet_length: float = Field(default = 40.0, ge=10, le=100, description="Length to discourage street drainage into subcatchment buffers (m).")
def validate_parameters(config: dict) -> dict:
"""Validates parameter configuration and fill with defaults when not present."""
parameters = {}
parameters["outlet_derivation"] = OutletDerivation(**config.get("outlet_derivation", {}))
parameters["subcatchment_derivation"] = SubcatchmentDerivation(**config.get("subcatchment_derivation", {}))
parameters_only = {}
for value in parameters.values():
for key_, value_ in dict(value).items():
parameters_only[key_] = value_
return parameters_only
# This kind of behaviour would be in generate_network.py
custom_parameters = {'outlet_derivation' : {'max_river_length': 50.0}}
params = validate_parameters(custom_parameters)
import networkx as nx
G = nx.Graph()
function_list = ['double_directed', 'split_long_edges']
for function in function_list:
assert function in graphfcns.keys(), f"Function {function} not registered in graphfcns."
G = graphfcns[function](G, **params) Note the 'unpacking' of parameters from their subcategory. Happy to discuss if this is a good idea or not - it is just that is how it works in the existing code. |
You can use directly the following to unpack the fields: for value in parameters.values():
parameters_only.update(value.model_dump()) Having said that, I think it is a pity not using the fully formed objects directly, even if a particular graph function is only using one of the fields of that higher level object. It will give some reasurance thanks to the type checkers that you're doing the right thing. The functions would look something like this: @register_graphfcn
def double_directed(G, outlet_derivation: OutletDerivation, **kwargs):
print(outlet_derivation.river_buffer_distance)
return G
@register_graphfcn
def split_long_edges(G, outlet_derivation: OutletDerivation, **kwargs):
print(outlet_derivation.max_river_length)
return G |
note from #31 - the 'default loadout' of |
Since you're going with |
Also linked with this are:
|
Intended use would be to have an initial setup config and one that intends to be executed many times. The main file (
swmmanywhere
) will have to create a file structure that works with this. I would envisage (reflected below) that there will be two folders - one for the preprocessing executaion (first config file), and one for the model creation (second config). Probably there will also be a national_downloads folder stillRun (once for a project) a slow to execute config (i.e., runnning downloaders and particularly slow graphfcns)
Run (thousands of times in, e.g., for sensitivity analysis)
The text was updated successfully, but these errors were encountered: