Incorrect computation of renewable profiles when alternative clustering is enabled #1272

choiHenry · 2024-12-31T07:23:41Z

Checklist

I am using the current main branch or the latest release. Please indicate.
I am running on an up-to-date pypsa-earth environment. Update via conda env update -f envs/environment.yaml.

Describe the Bug

Renewable profiles, especially potential is incorrectly computed when alternative clustering is enabled.

I found why the error occurred. And also corrected. But I want to know how to modify the whole pipeline correctly.

Why incorrect computation occurs?

build_renewable_profiles rule computes potential by

potential = capacity_per_sqkm * availability.sum("bus") * area

Here, availability is calculated by

availability = cutout.availabilitymatrix(regions, excluder, **kwargs)

Again, region is read from a path designated in Snakefile which is produced from rule build_bus_regions.

        regions=lambda w: (
            "resources/" + RDIR + "bus_regions/regions_onshore.geojson"
            if w.technology in ("onwind", "solar", "hydro", "csp")
            else "resources/" + RDIR + "bus_regions/regions_offshore.geojson"
        ),

I want to point out that onshore region file made by build_bus_regions is pretty much overlapped.

Where does overlapping come from?

If alternative clustering is turned on, rule build_bus_regions calculates onshore geometries(series) by

        if snakemake.params.alternative_clustering:
            onshore_geometry, shape_id = get_gadm_shape(
                onshore_locs,
                gadm_country,
                geo_crs,
                metric_crs,
            )

Here, investigating get_gadm_shape, gadm_sel includes overlapping geometries if the number of bus regions is less than the number of administrative regions(in my case n_bus = 461, n_regions=17).

def get_gadm_shape(
    onshore_buses, gadm_shapes, geo_crs="EPSG:4326", metric_crs="EPSG:3857"
):
    geo_regions = gpd.GeoDataFrame(
        onshore_buses[["x", "y"]],
        geometry=gpd.points_from_xy(onshore_buses["x"], onshore_buses["y"]),
        crs=geo_crs,
    ).to_crs(metric_crs)

    join_geos = gpd.sjoin_nearest(
        geo_regions, gadm_shapes.to_crs(metric_crs), how="left"
    )

    # when duplicates, keep only the first entry
    join_geos = join_geos[~join_geos.index.duplicated()]

    gadm_sel = gadm_shapes.loc[join_geos["index_right"].values]

    return gadm_sel.geometry.values, gadm_sel.index.values

Possible solutions for renewable profiles

This might not be a good solution but it works for just calculating potentials.
Just modifying line 518 of "build_renewable_profiles.py" regions = regions.set_index("name").rename_axis("bus") by

if snakemake.params.alternative_clustering:
    regions = regions.drop_duplicates("shape_id").reset_index().drop(columns='bus').rename_axis('bus')
else:
    regions = regions.set_index("name").rename_axis("bus")

will properly calculates potentials

What I want to ask to community

I think this solution seems not good enough. It just can calculate potentials but it would be more fundamentally solve the problem if the rule build_bus_regions modified. I think the whole gadm could be splitted to incorporate bus regions properly. Is this right way to solve the problem. Is there any possible risks for this solution?

Thank you.

The text was updated successfully, but these errors were encountered:

ekatef · 2024-12-31T11:10:53Z

Hello @choiHenry, thanks a lot for investigating it! I confirm that such an issue is reproducible and agree that it absolutely makes sense to elaborate a good architecture for a solution to fix these issues. @yerbol-akhmetov has also investigated it recently, and it may be a good idea to discuss.

yerbol-akhmetov · 2025-01-05T16:21:31Z

Hi, @choiHenry, @ekatef, @davide-f. As far as I know, the issue arises because in the build_bus_regions rule, gadm shapes are assigned for each buses. Generally, the number of buses is larger than gadm regions (for level 1). As a result, gadm region (geometry) is assigned for each bus within this region. So we have unwanted increase of the land area, right?
I have a question. I know clustering of buses are accomplished later. But why we do not start from number of buses equal to number of gadm regions, if alternative clustering is selected. So currently the issue is that we need to split the gadm region into each bus within it right?
I suspect using gadm level 2 for geometry will not help too, right?

yerbol-akhmetov · 2025-01-05T16:25:16Z

Hello @choiHenry, thanks a lot for investigating it! I confirm that such an issue is reproducible and agree that it absolutely makes sense to elaborate a good architecture for a solution to fix these issues. @yerbol-akhmetov has also investigated it recently, and it may be a good idea to discuss.

I suspect it affects not only renewable profiles, but also demand profile and sector demands.

davide-f · 2025-01-05T23:37:50Z

Yeah, @yerbol-akhmetov, you got the issues indeed.
Solutions are quite architectural from what I can see. I'd propose to track them in #486

choiHenry · 2025-01-06T01:18:40Z

Hi @yerbol-akhmetov and @davide-f. I totally agree with you. So I think to "start from number of buses equal to number of gadm regions, if alternative clustering is selected" as you said is a good way to go. build_bus_regions rule propagates to many other rules.
So if there is a overlapping gadms for bus regions, as is the current build_bus_regions rule when alternative_clustering is turned on, the bus regions causes some incorrect computations. I think this is why issues like #1248 arise.
I have the exact same way of thinking solutions as you @yerbol-akhmetov, so I suggest to add just two lines to make 1:1 correspondence between bus regions and administrative regions.

And thank you very much for sharing materials to track @davide-f. I will check and respond.

yerbol-akhmetov · 2025-01-06T14:49:39Z

Hi @yerbol-akhmetov and @davide-f. I totally agree with you. So I think to "start from number of buses equal to number of gadm regions, if alternative clustering is selected" as you said is a good way to go. build_bus_regions rule propagates to many other rules. So if there is a overlapping gadms for bus regions, as is the current build_bus_regions rule when alternative_clustering is turned on, the bus regions causes some incorrect computations. I think this is why issues like #1248 arise. I have the exact same way of thinking solutions as you @yerbol-akhmetov, so I suggest to add just two lines to make 1:1 correspondence between bus regions and administrative regions.
And thank you very much for sharing materials to track @davide-f. I will check and respond.

Hi, @choiHenry. I have tried to drop duplicates as you have suggested. I have noticed the potential of RES is now at the same level as regular clustering. It is even slightly lower (~1-2%), while in alternative clustering (AC) potential was around 4 times more for Nigeria.

Voronoi
Alternative clustering
Alternative clustering with drop duplicates

I have checked potential in elec.nc network. It seems a viable option. However, I do not know exactly the implication of dropping duplicates and how it affects the following rules. Do you have any idea how it can affect or whether this method has flaws? @davide-f @ekatef

davide-f · 2025-01-06T15:31:08Z

Many thanks @choiHenry and @yerbol-akhmetov for the exploration!

What about the total demand instead? can you check that? The regions onshore have implications into the demand estimation too.
If acceptable by most and results with/without alternative clustering look similar, I'm fine with merging, provided CI passes and results are consistent.

It is not a stable long-term solution, as it can lead to implications into the simplify_network and alike.
In particular, that solution implicitly assumes to place the demand to only a single random node within each gadm shape, which is (hopefully) merged into cluster_network to form a meshed network.
The random node that is representive of the whole gadm shape may be isolated, which could worsen the situation. The stability of the code is not perfect; It would be better to select among the nodes that are not isolated, if available otherwise ok for isolated ones.
The non-isolated nodes can be identified by those that are connected to at least another node using:

n.determine_network_topology()
non_isolated_buses = n.buses.duplicated(subset=["sub_network"], keep=False)

Personally, I'd be in favor of a simple fix on this as the architectural one is likely heavier with longer review time.

yerbol-akhmetov · 2025-01-06T17:38:44Z

Many thanks @choiHenry and @yerbol-akhmetov for the exploration!

What about the total demand instead? can you check that? The regions onshore have implications into the demand estimation too. If acceptable by most and results with/without alternative clustering look similar, I'm fine with merging, provided CI passes and results are consistent.

It is not a stable long-term solution, as it can lead to implications into the simplify_network and alike. In particular, that solution implicitly assumes to place the demand to only a single random node within each gadm shape, which is (hopefully) merged into cluster_network to form a meshed network. The random node that is representive of the whole gadm shape may be isolated, which could worsen the situation. The stability of the code is not perfect; It would be better to select among the nodes that are not isolated, if available otherwise ok for isolated ones. The non-isolated nodes can be identified by those that are connected to at least another node using:
n.determine_network_topology()
non_isolated_buses = n.buses.duplicated(subset=["sub_network"], keep=False)
Personally, I'd be in favor of a simple fix on this as the architectural one is likely heavier with longer review time.

Thanks for the feedback, @davide-f. I have checked the demand for Voronoi, Alternative clustering, and Alternative clustering with drop duplicates for Nigeria. The results are below in MWh. The numbers are almost identical.

I have tried to look into isolated buses. In elec.nc, we had 203 buses for Nigeria. There were only 1 isolated bus (bus "78" is isolated one). For onshore_regions, we had only 158 being matched, because we only have 158 low voltage buses with n.buses.substation_lv True.

With this drop duplicates, it was luck that bus "78" was dropped, because it was in the final onshore_regions dataframe.

There I agree that it would be good to prevent situation when shape is assigned to isolated bus. So I tried excluding isolated buses before gadm shape assigning. It worked as intended. It is worth checking for another country, but the approach is straightforward. I have shown the changes below. Would you like to open a PR if the approach is agreed @choiHenry, as it was your original idea of removing duplicates?

choiHenry · 2025-01-07T06:15:46Z

Many thanks to @davide-f, @yerbol-akhmetov, and @ekatef.

Remaining isolated buses after drop

It is not a stable long-term solution, as it can lead to implications into the simplify_network and alike.
In particular, that solution implicitly assumes to place the demand to only a single random node within each gadm shape, which is (hopefully) merged into cluster_network to form a meshed network.
The random node that is representive of the whole gadm shape may be isolated, which could worsen the situation. The stability of the code is not perfect; It would be better to select among the nodes that are not isolated, if available otherwise ok for isolated ones.

I am really happy to discuss this issue. I didn't recognize before at all. Out of 583 buses in South Korea, I identified 6 as isolated.

n.determine_network_topology()
non_isolated_buses = n.buses.duplicated(subset=["sub_network"], keep=False)
len(n.buses) - len(n.buses[non_isolated_buses])

And I found that luckily, when dropped-out, no buses are isolated.

isoloated_buses = list(n.buses[~non_isolated_buses].index)
onshore_regions.drop_duplicates("shape_id").index.isin(isoloated_buses)

This might not be a mere luck since onshore_regions is based on onshore_locs which are derived by

onshore_locs = n.buses.loc[c_b & n.buses.substation_lv, ["x", "y"]]

Here, substation_lv, which is True if the bus has multiple voltages, drops many isolated(though not all) buses. In my case this dropped 4 out of 6 isolated buses.

Modified Code
I runned the code suggested above:

    if snakemake.params.alternative_clustering:
        # determine isolated buses
        n.determine_network_topology()
        non_isolated_buses = n.buses.duplicated(subset=["sub_network"], keep=False)
        isolated_buses = n.buses[~non_isolated_buses].index
        # drop isolated buses
        onshore_regions = onshore_regions[~onshore_regions.name.isin(isolated_buses)]
        # drop duplicates based on shape_id
        onshore_regions = onshore_regions.drop_duplicates('shape_id')

Actually the demand was virtually same. So this could resolve the concerns about isolated buses included in the onshore bus regions and propagates to many other rules.

Additional concerns to the suggested code
I find nested dropping in the above code. We remove buses that are isolated based on the topology analysis first and remove duplicate entries for administrative regions based on shape_id to ensure unique representation.
When dropped in this way, some administrative regions might have no remaining buses in the onshore_regions. (In case when this region contains only isolated buses or no buses are assigned to this region?) It would be better to check if the number of administrative regions are equal to that of remaining buses after drops.

I guess adding detection process could improve code stability.

    if snakemake.params.alternative_clustering:
        # determine isolated buses
        n.determine_network_topology()
        non_isolated_buses = n.buses.duplicated(subset=["sub_network"], keep=False)
        isolated_buses = n.buses[~non_isolated_buses].index
        # drop isolated buses
        onshore_regions = onshore_regions[~onshore_regions.name.isin(isolated_buses)]
        # drop duplicates based on shape_id
        onshore_regions = onshore_regions.drop_duplicates('shape_id')

        if len(onshore_regions) < len(gadm_country):
            logger.error(
                f"The number of remaining of buses are less than the number of administrative clusters suggested!"
            )
            raise ValueError("Insufficient buses to match administrative clusters!")

Finally I also have a concern that this way might not be a long-term solution. When gadm is replaced(#1258) with others, this code might have to be retouched. Thanks again.

ekatef · 2025-01-07T23:42:00Z

Hello! Great work @choiHenry @yerbol-akhmetov @davide-f 🙂

My feeling is that we have done great progress. After checking the population distribution (pop_layout_elec_s_{}_{}.csv) and heating demands, I can confirm that both the overall values and their spatial distribution look much more reasonable. In particular, the overall population for US is close to 300 mlns which is basically the expected estimation instead of unrealistically ~900 mlns previously.

However, @choiHenry I think you may be that we may lose some administrative regions. An example for US, when three different administrative regions are being merged into a single bus region:

Not sure about the particular reason of this merge. Looking into the grid topology, the network is well interconnected and there are only a few buses which shouldn't be a problem.

As a comment on the long-term strategy, #1258 should not create any obstacles for the proposed solution. I'd rather expect that we may need to revise it to implement a more general approach as outlined in #486. Though, I completely agree with @davide-f that this "architectural" solution is likely to be heavier, while we need a quick fix to improve functionality of the model. So, the work on that definitely make sense 🙂

choiHenry · 2025-01-08T01:00:52Z

Thank you for reporting a really important issue @ekatef. I think I need to check this myself. I will run a scenario for "USA" and response.

ekatef · 2025-01-08T23:53:20Z

Thank you for reporting a really important issue @ekatef. I think I need to check this myself. I will run a scenario for "USA" and response.

@choiHenry thank you so much for your willingness to help. I have investigated a bit the issue, and shall post to outputs in #1248. The major output is that the fix you suggested has definitely improved the situation, and your PR on than would be very much appreciated. Here are some technical hints which could be probably helpful. Let us know please if you have any technical questions on how to deal with github and git.

You have contributed significantly in finding a solution, and it would be really great to have your contribution incorporated into the repo.

choiHenry · 2025-01-09T02:41:25Z

Thank you @ekatef. With your encouragement, I pull-requested #1287. I am currently working on a repo machine for the country "US" to check the issue you reported and investigated in #1248. I believe to investigate propagated results of the PR could contribute to the architecturally stable solutions for building bus regions for customized administrative regions.

yerbol-akhmetov · 2025-01-09T03:59:39Z

Hello! Great work @choiHenry @yerbol-akhmetov @davide-f 🙂

My feeling is that we have done great progress. After checking the population distribution (pop_layout_elec_s_{}_{}.csv) and heating demands, I can confirm that both the overall values and their spatial distribution look much more reasonable. In particular, the overall population for US is close to 300 mlns which is basically the expected estimation instead of unrealistically ~900 mlns previously.

However, @choiHenry I think you may be that we may lose some administrative regions. An example for US, when three different administrative regions are being merged into a single bus region:
Not sure about the particular reason of this merge. Looking into the grid topology, the network is well interconnected and there are only a few buses which shouldn't be a problem.
As a comment on the long-term strategy, #1258 should not create any obstacles for the proposed solution. I'd rather expect that we may need to revise it to implement a more general approach as outlined in #486. Though, I completely agree with @davide-f that this "architectural" solution is likely to be heavier, while we need a quick fix to improve functionality of the model. So, the work on that definitely make sense 🙂

Thanks @ekatef for checking the issue it details. I have looked into regions_onshore_elec_s_10.geojson file for US with the latest proposed change. I have seen no merge of the regions for all states. I can share with you my geojson file if you wish.

choiHenry added the bug Something isn't working label Dec 31, 2024

choiHenry mentioned this issue Dec 31, 2024

Incorrect computation of population layouts if alternative clustering is enabled #1248

Open

2 tasks

davide-f mentioned this issue Jan 3, 2025

Flexibility to cluster the network to a specified shapefile instead of only GADM regions #486

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Incorrect computation of renewable profiles when alternative clustering is enabled #1272

Incorrect computation of renewable profiles when alternative clustering is enabled #1272

choiHenry commented Dec 31, 2024

ekatef commented Dec 31, 2024

yerbol-akhmetov commented Jan 5, 2025

yerbol-akhmetov commented Jan 5, 2025

davide-f commented Jan 5, 2025 •

edited by ekatef

Loading

choiHenry commented Jan 6, 2025

yerbol-akhmetov commented Jan 6, 2025

davide-f commented Jan 6, 2025 •

edited

Loading

yerbol-akhmetov commented Jan 6, 2025 •

edited

Loading

choiHenry commented Jan 7, 2025 •

edited

Loading

ekatef commented Jan 7, 2025

choiHenry commented Jan 8, 2025

ekatef commented Jan 8, 2025

choiHenry commented Jan 9, 2025 •

edited

Loading

yerbol-akhmetov commented Jan 9, 2025 •

edited

Loading

Incorrect computation of renewable profiles when alternative clustering is enabled #1272

Incorrect computation of renewable profiles when alternative clustering is enabled #1272

Comments

choiHenry commented Dec 31, 2024

Checklist

Describe the Bug

ekatef commented Dec 31, 2024

yerbol-akhmetov commented Jan 5, 2025

yerbol-akhmetov commented Jan 5, 2025

davide-f commented Jan 5, 2025 • edited by ekatef Loading

choiHenry commented Jan 6, 2025

yerbol-akhmetov commented Jan 6, 2025

davide-f commented Jan 6, 2025 • edited Loading

yerbol-akhmetov commented Jan 6, 2025 • edited Loading

choiHenry commented Jan 7, 2025 • edited Loading

ekatef commented Jan 7, 2025

choiHenry commented Jan 8, 2025

ekatef commented Jan 8, 2025

choiHenry commented Jan 9, 2025 • edited Loading

yerbol-akhmetov commented Jan 9, 2025 • edited Loading

davide-f commented Jan 5, 2025 •

edited by ekatef

Loading

davide-f commented Jan 6, 2025 •

edited

Loading

yerbol-akhmetov commented Jan 6, 2025 •

edited

Loading

choiHenry commented Jan 7, 2025 •

edited

Loading

choiHenry commented Jan 9, 2025 •

edited

Loading

yerbol-akhmetov commented Jan 9, 2025 •

edited

Loading