Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Incorrect computation of renewable profiles when alternative clustering is enabled #1272

Open
2 tasks done
choiHenry opened this issue Dec 31, 2024 · 14 comments
Open
2 tasks done
Labels
bug Something isn't working

Comments

@choiHenry
Copy link

Checklist

  • I am using the current main branch or the latest release. Please indicate.
  • I am running on an up-to-date pypsa-earth environment. Update via conda env update -f envs/environment.yaml.

Describe the Bug

Renewable profiles, especially potential is incorrectly computed when alternative clustering is enabled.

I found why the error occurred. And also corrected. But I want to know how to modify the whole pipeline correctly.

  1. Why incorrect computation occurs?

build_renewable_profiles rule computes potential by

potential = capacity_per_sqkm * availability.sum("bus") * area

Here, availability is calculated by

availability = cutout.availabilitymatrix(regions, excluder, **kwargs)

Again, region is read from a path designated in Snakefile which is produced from rule build_bus_regions.

        regions=lambda w: (
            "resources/" + RDIR + "bus_regions/regions_onshore.geojson"
            if w.technology in ("onwind", "solar", "hydro", "csp")
            else "resources/" + RDIR + "bus_regions/regions_offshore.geojson"
        ),

I want to point out that onshore region file made by build_bus_regions is pretty much overlapped.

  1. Where does overlapping come from?

If alternative clustering is turned on, rule build_bus_regions calculates onshore geometries(series) by

        if snakemake.params.alternative_clustering:
            onshore_geometry, shape_id = get_gadm_shape(
                onshore_locs,
                gadm_country,
                geo_crs,
                metric_crs,
            )

Here, investigating get_gadm_shape, gadm_sel includes overlapping geometries if the number of bus regions is less than the number of administrative regions(in my case n_bus = 461, n_regions=17).

def get_gadm_shape(
    onshore_buses, gadm_shapes, geo_crs="EPSG:4326", metric_crs="EPSG:3857"
):
    geo_regions = gpd.GeoDataFrame(
        onshore_buses[["x", "y"]],
        geometry=gpd.points_from_xy(onshore_buses["x"], onshore_buses["y"]),
        crs=geo_crs,
    ).to_crs(metric_crs)

    join_geos = gpd.sjoin_nearest(
        geo_regions, gadm_shapes.to_crs(metric_crs), how="left"
    )

    # when duplicates, keep only the first entry
    join_geos = join_geos[~join_geos.index.duplicated()]

    gadm_sel = gadm_shapes.loc[join_geos["index_right"].values]

    return gadm_sel.geometry.values, gadm_sel.index.values
  1. Possible solutions for renewable profiles

This might not be a good solution but it works for just calculating potentials.
Just modifying line 518 of "build_renewable_profiles.py" regions = regions.set_index("name").rename_axis("bus") by

if snakemake.params.alternative_clustering:
    regions = regions.drop_duplicates("shape_id").reset_index().drop(columns='bus').rename_axis('bus')
else:
    regions = regions.set_index("name").rename_axis("bus")

will properly calculates potentials

  1. What I want to ask to community

I think this solution seems not good enough. It just can calculate potentials but it would be more fundamentally solve the problem if the rule build_bus_regions modified. I think the whole gadm could be splitted to incorporate bus regions properly. Is this right way to solve the problem. Is there any possible risks for this solution?

Thank you.

@ekatef
Copy link
Member

ekatef commented Dec 31, 2024

Hello @choiHenry, thanks a lot for investigating it! I confirm that such an issue is reproducible and agree that it absolutely makes sense to elaborate a good architecture for a solution to fix these issues. @yerbol-akhmetov has also investigated it recently, and it may be a good idea to discuss.

@yerbol-akhmetov
Copy link
Collaborator

Hi, @choiHenry, @ekatef, @davide-f. As far as I know, the issue arises because in the build_bus_regions rule, gadm shapes are assigned for each buses. Generally, the number of buses is larger than gadm regions (for level 1). As a result, gadm region (geometry) is assigned for each bus within this region. So we have unwanted increase of the land area, right?
I have a question. I know clustering of buses are accomplished later. But why we do not start from number of buses equal to number of gadm regions, if alternative clustering is selected. So currently the issue is that we need to split the gadm region into each bus within it right?
I suspect using gadm level 2 for geometry will not help too, right?

@yerbol-akhmetov
Copy link
Collaborator

Hello @choiHenry, thanks a lot for investigating it! I confirm that such an issue is reproducible and agree that it absolutely makes sense to elaborate a good architecture for a solution to fix these issues. @yerbol-akhmetov has also investigated it recently, and it may be a good idea to discuss.

I suspect it affects not only renewable profiles, but also demand profile and sector demands.

@davide-f
Copy link
Member

davide-f commented Jan 5, 2025

Yeah, @yerbol-akhmetov, you got the issues indeed.
Solutions are quite architectural from what I can see. I'd propose to track them in #486

@choiHenry
Copy link
Author

Hi @yerbol-akhmetov and @davide-f. I totally agree with you. So I think to "start from number of buses equal to number of gadm regions, if alternative clustering is selected" as you said is a good way to go. build_bus_regions rule propagates to many other rules.
So if there is a overlapping gadms for bus regions, as is the current build_bus_regions rule when alternative_clustering is turned on, the bus regions causes some incorrect computations. I think this is why issues like #1248 arise.
I have the exact same way of thinking solutions as you @yerbol-akhmetov, so I suggest to add just two lines to make 1:1 correspondence between bus regions and administrative regions.

Screenshot 2025-01-06 at 10 17 18 AM

And thank you very much for sharing materials to track @davide-f. I will check and respond.

@yerbol-akhmetov
Copy link
Collaborator

Hi @yerbol-akhmetov and @davide-f. I totally agree with you. So I think to "start from number of buses equal to number of gadm regions, if alternative clustering is selected" as you said is a good way to go. build_bus_regions rule propagates to many other rules. So if there is a overlapping gadms for bus regions, as is the current build_bus_regions rule when alternative_clustering is turned on, the bus regions causes some incorrect computations. I think this is why issues like #1248 arise. I have the exact same way of thinking solutions as you @yerbol-akhmetov, so I suggest to add just two lines to make 1:1 correspondence between bus regions and administrative regions.

Screenshot 2025-01-06 at 10 17 18 AM And thank you very much for sharing materials to track @davide-f. I will check and respond.

Hi, @choiHenry. I have tried to drop duplicates as you have suggested. I have noticed the potential of RES is now at the same level as regular clustering. It is even slightly lower (~1-2%), while in alternative clustering (AC) potential was around 4 times more for Nigeria.

  1. Voronoi
    image
  2. Alternative clustering
    image
  3. Alternative clustering with drop duplicates
    image

I have checked potential in elec.nc network. It seems a viable option. However, I do not know exactly the implication of dropping duplicates and how it affects the following rules. Do you have any idea how it can affect or whether this method has flaws? @davide-f @ekatef

@davide-f
Copy link
Member

davide-f commented Jan 6, 2025

Many thanks @choiHenry and @yerbol-akhmetov for the exploration!

What about the total demand instead? can you check that? The regions onshore have implications into the demand estimation too.
If acceptable by most and results with/without alternative clustering look similar, I'm fine with merging, provided CI passes and results are consistent.

It is not a stable long-term solution, as it can lead to implications into the simplify_network and alike.
In particular, that solution implicitly assumes to place the demand to only a single random node within each gadm shape, which is (hopefully) merged into cluster_network to form a meshed network.
The random node that is representive of the whole gadm shape may be isolated, which could worsen the situation. The stability of the code is not perfect; It would be better to select among the nodes that are not isolated, if available otherwise ok for isolated ones.
The non-isolated nodes can be identified by those that are connected to at least another node using:

n.determine_network_topology()
non_isolated_buses = n.buses.duplicated(subset=["sub_network"], keep=False)

Personally, I'd be in favor of a simple fix on this as the architectural one is likely heavier with longer review time.

@yerbol-akhmetov
Copy link
Collaborator

yerbol-akhmetov commented Jan 6, 2025

Many thanks @choiHenry and @yerbol-akhmetov for the exploration!

What about the total demand instead? can you check that? The regions onshore have implications into the demand estimation too. If acceptable by most and results with/without alternative clustering look similar, I'm fine with merging, provided CI passes and results are consistent.

It is not a stable long-term solution, as it can lead to implications into the simplify_network and alike. In particular, that solution implicitly assumes to place the demand to only a single random node within each gadm shape, which is (hopefully) merged into cluster_network to form a meshed network. The random node that is representive of the whole gadm shape may be isolated, which could worsen the situation. The stability of the code is not perfect; It would be better to select among the nodes that are not isolated, if available otherwise ok for isolated ones. The non-isolated nodes can be identified by those that are connected to at least another node using:

n.determine_network_topology()
non_isolated_buses = n.buses.duplicated(subset=["sub_network"], keep=False)

Personally, I'd be in favor of a simple fix on this as the architectural one is likely heavier with longer review time.

Thanks for the feedback, @davide-f. I have checked the demand for Voronoi, Alternative clustering, and Alternative clustering with drop duplicates for Nigeria. The results are below in MWh. The numbers are almost identical.
image

I have tried to look into isolated buses. In elec.nc, we had 203 buses for Nigeria. There were only 1 isolated bus (bus "78" is isolated one). For onshore_regions, we had only 158 being matched, because we only have 158 low voltage buses with n.buses.substation_lv True.
image
image

With this drop duplicates, it was luck that bus "78" was dropped, because it was in the final onshore_regions dataframe.
image
There I agree that it would be good to prevent situation when shape is assigned to isolated bus. So I tried excluding isolated buses before gadm shape assigning. It worked as intended. It is worth checking for another country, but the approach is straightforward. I have shown the changes below. Would you like to open a PR if the approach is agreed @choiHenry, as it was your original idea of removing duplicates?
image

@choiHenry
Copy link
Author

choiHenry commented Jan 7, 2025

Many thanks to @davide-f, @yerbol-akhmetov, and @ekatef.

  1. Remaining isolated buses after drop

It is not a stable long-term solution, as it can lead to implications into the simplify_network and alike.
In particular, that solution implicitly assumes to place the demand to only a single random node within each gadm shape, which is (hopefully) merged into cluster_network to form a meshed network.
The random node that is representive of the whole gadm shape may be isolated, which could worsen the situation. The stability of the code is not perfect; It would be better to select among the nodes that are not isolated, if available otherwise ok for isolated ones.

I am really happy to discuss this issue. I didn't recognize before at all. Out of 583 buses in South Korea, I identified 6 as isolated.

n.determine_network_topology()
non_isolated_buses = n.buses.duplicated(subset=["sub_network"], keep=False)
len(n.buses) - len(n.buses[non_isolated_buses])

And I found that luckily, when dropped-out, no buses are isolated.

isoloated_buses = list(n.buses[~non_isolated_buses].index)
onshore_regions.drop_duplicates("shape_id").index.isin(isoloated_buses)

This might not be a mere luck since onshore_regions is based on onshore_locs which are derived by

onshore_locs = n.buses.loc[c_b & n.buses.substation_lv, ["x", "y"]]

Here, substation_lv, which is True if the bus has multiple voltages, drops many isolated(though not all) buses. In my case this dropped 4 out of 6 isolated buses.

  1. Modified Code
    I runned the code suggested above:
    if snakemake.params.alternative_clustering:
        # determine isolated buses
        n.determine_network_topology()
        non_isolated_buses = n.buses.duplicated(subset=["sub_network"], keep=False)
        isolated_buses = n.buses[~non_isolated_buses].index
        # drop isolated buses
        onshore_regions = onshore_regions[~onshore_regions.name.isin(isolated_buses)]
        # drop duplicates based on shape_id
        onshore_regions = onshore_regions.drop_duplicates('shape_id')

Actually the demand was virtually same. So this could resolve the concerns about isolated buses included in the onshore bus regions and propagates to many other rules.

Screenshot 2025-01-07 at 3 02 02 PM
  1. Additional concerns to the suggested code
    I find nested dropping in the above code. We remove buses that are isolated based on the topology analysis first and remove duplicate entries for administrative regions based on shape_id to ensure unique representation.
    When dropped in this way, some administrative regions might have no remaining buses in the onshore_regions. (In case when this region contains only isolated buses or no buses are assigned to this region?) It would be better to check if the number of administrative regions are equal to that of remaining buses after drops.

I guess adding detection process could improve code stability.

    if snakemake.params.alternative_clustering:
        # determine isolated buses
        n.determine_network_topology()
        non_isolated_buses = n.buses.duplicated(subset=["sub_network"], keep=False)
        isolated_buses = n.buses[~non_isolated_buses].index
        # drop isolated buses
        onshore_regions = onshore_regions[~onshore_regions.name.isin(isolated_buses)]
        # drop duplicates based on shape_id
        onshore_regions = onshore_regions.drop_duplicates('shape_id')

        if len(onshore_regions) < len(gadm_country):
            logger.error(
                f"The number of remaining of buses are less than the number of administrative clusters suggested!"
            )
            raise ValueError("Insufficient buses to match administrative clusters!")

Finally I also have a concern that this way might not be a long-term solution. When gadm is replaced(#1258) with others, this code might have to be retouched. Thanks again.

@ekatef
Copy link
Member

ekatef commented Jan 7, 2025

Hello! Great work @choiHenry @yerbol-akhmetov @davide-f 🙂

My feeling is that we have done great progress. After checking the population distribution (pop_layout_elec_s_{}_{}.csv) and heating demands, I can confirm that both the overall values and their spatial distribution look much more reasonable. In particular, the overall population for US is close to 300 mlns which is basically the expected estimation instead of unrealistically ~900 mlns previously.

However, @choiHenry I think you may be that we may lose some administrative regions. An example for US, when three different administrative regions are being merged into a single bus region:

image

Not sure about the particular reason of this merge. Looking into the grid topology, the network is well interconnected and there are only a few buses which shouldn't be a problem.

As a comment on the long-term strategy, #1258 should not create any obstacles for the proposed solution. I'd rather expect that we may need to revise it to implement a more general approach as outlined in #486. Though, I completely agree with @davide-f that this "architectural" solution is likely to be heavier, while we need a quick fix to improve functionality of the model. So, the work on that definitely make sense 🙂

@choiHenry
Copy link
Author

Thank you for reporting a really important issue @ekatef. I think I need to check this myself. I will run a scenario for "USA" and response.

@ekatef
Copy link
Member

ekatef commented Jan 8, 2025

Thank you for reporting a really important issue @ekatef. I think I need to check this myself. I will run a scenario for "USA" and response.

@choiHenry thank you so much for your willingness to help. I have investigated a bit the issue, and shall post to outputs in #1248. The major output is that the fix you suggested has definitely improved the situation, and your PR on than would be very much appreciated. Here are some technical hints which could be probably helpful. Let us know please if you have any technical questions on how to deal with github and git.

You have contributed significantly in finding a solution, and it would be really great to have your contribution incorporated into the repo.

@choiHenry
Copy link
Author

choiHenry commented Jan 9, 2025

Thank you @ekatef. With your encouragement, I pull-requested #1287. I am currently working on a repo machine for the country "US" to check the issue you reported and investigated in #1248. I believe to investigate propagated results of the PR could contribute to the architecturally stable solutions for building bus regions for customized administrative regions.

@yerbol-akhmetov
Copy link
Collaborator

yerbol-akhmetov commented Jan 9, 2025

Hello! Great work @choiHenry @yerbol-akhmetov @davide-f 🙂

My feeling is that we have done great progress. After checking the population distribution (pop_layout_elec_s_{}_{}.csv) and heating demands, I can confirm that both the overall values and their spatial distribution look much more reasonable. In particular, the overall population for US is close to 300 mlns which is basically the expected estimation instead of unrealistically ~900 mlns previously.

However, @choiHenry I think you may be that we may lose some administrative regions. An example for US, when three different administrative regions are being merged into a single bus region:

image Not sure about the particular reason of this merge. Looking into the grid topology, the network is well interconnected and there are only a few buses which shouldn't be a problem.

As a comment on the long-term strategy, #1258 should not create any obstacles for the proposed solution. I'd rather expect that we may need to revise it to implement a more general approach as outlined in #486. Though, I completely agree with @davide-f that this "architectural" solution is likely to be heavier, while we need a quick fix to improve functionality of the model. So, the work on that definitely make sense 🙂

Thanks @ekatef for checking the issue it details. I have looked into regions_onshore_elec_s_10.geojson file for US with the latest proposed change. I have seen no merge of the regions for all states. I can share with you my geojson file if you wish.

image

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
Development

No branches or pull requests

4 participants