-
Notifications
You must be signed in to change notification settings - Fork 2
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add dust emissions #140
Comments
An initial draft of the files is here, which includes a nice briefing on how to use the data. My initial comments on this are:
|
@jfkok making sure you get a notification and can find this |
Hi Zeb, thanks for your very helpful input. Responses follow below:
Ah okay, that makes sense. So I'd need to create one file for the global scaling factor (currently in this file) and seven additional files for the regional scaling factors for the seven regions (currently in this file), is that correct? And then do I still keep the historical and the (three) future scaling factors in separate files? So for 4x8 = 32 files total?
Yes, the reason is that the observational reconstruction ends in the year 2000 because it is based on sedimentary records of dust deposition (like ice cores), so there's much less data for the last two decades. However, my group is working on using satellite data to extend the reconstruction to 2023 and I expect that to be ready sometime next year.
Thanks for pointing this out. The region do have well-defined coordinates though (the link you included mentioned "complex boundaries that cannot practically be specified using longitude and latitude boundary coordinates"). Would defining those boundaries in each file be sufficient, or do I need to do something else?
Yes, that's exactly right. It's only the year and the region of application (global versus one of seven major dust aerosol source regions) that changes. Thanks so much! Jasper |
Sorry bad explanation from me. I would suggest one file for global and one file for regional (you can put all seven regions in one file, just use the 'region' dimension or whatever it is that the CF conventions calls it to differentiate them). Then yes, one file for historical and one file for each scenario. So you'll end up with 4 x 2 = 8 files. Just to try and be a bit clearer:
Got it re the historical vs. scenario split. That's fine. If we want these files to be used for DECK simulations, we'll have to do a bit of thinking. If they're just for a specific MIP experiment, they can stay as they are.
Ah ok nice. Defining those boundary files would definitely be sufficient. (If it were me, I would just give the regions names first, make sure I can write the files, then go back and add the boundary definition second, because that boundary definition could be fiddly, but you may be able to skip straight to writing the boundary conditions!) |
Thanks, that's helpful. I'm working on implementing these changes and obtaining corrected files. In doing so, I realized that making the variable name the same for all files also means that the variable are identical for the four different global files (1 historical and 3 future scenarios) and for the four different regional files. So how would I distinguish the files if I can't put the scenario in the variable name?
Thanks! Jasper |
Excellent question. The answer is, at the moment, you put it in the "source_id" bit of the file name. So, for example, your filenames would become (where I've also dropped off the noleap prefix that isn't part of the DRS):
As you can tell, this doesn't make that much sense and is easy to miss, which is why we're having the discussion in #64. That may lead to a renaming in future, but for now the above is what to go for. |
Thanks so much! I've implemented all your comments (I think) and uploaded the updated files here. Let me know if you think any further changes are needed. One thing to note is that I added the coordinates of the region boundaries as a "boundary coordinates" attribute of the "region" variable. Let me know in case I should be doing something differently. |
Nice, thanks.
Underscores in the source ID have to be changed to hyphens i.e.:
Then I would suggest trying to run them through the validator and write them in the DRS: https://input4mips-validation.readthedocs.io/en/latest/how-to-guides/how-to-write-a-single-file-in-the-drs/
Not sure, I haven't done this particular step before. If you run the files through the validator, the CF-checker will flag anything really wrong. In a couple of weeks I can pull the files down and have a look myself (meeting next week will take up time before then). |
Thanks Zeb, I corrected the file names. I tried running the validator after installing it as an application per the instructions here. However, the command import input4mips_validation.io triggers an error message, pasted below. Do you know if there is an easy solution for this? Thanks! |
Hmm that's not very good. Let's dive down the rabbit hole here: climate-resource/input4mips_validation#78 |
@jfkok are there any updated files or should I just use this link (https://drive.google.com/drive/u/0/folders/1Xr1A4oqPj35h43MFV1864IXSvYSVtYKs) again? |
I've just updated the files to the latest version. So yes, that links is
the correct one.
…On Wed, Dec 11, 2024 at 9:12 AM znichollscr ***@***.***> wrote:
@jfkok <https://github.com/jfkok> are there any updated files or should I
just use this link (
https://drive.google.com/drive/u/0/folders/1Xr1A4oqPj35h43MFV1864IXSvYSVtYKs)
again?
—
Reply to this email directly, view it on GitHub
<#140 (comment)>,
or unsubscribe
<https://github.com/notifications/unsubscribe-auth/AGCCLTK3FXBLWMVLMGQQE4T2FBW7NAVCNFSM6AAAAABQMY7RSKVHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMZDKMZWGU4DQNRZGA>
.
You are receiving this because you were mentioned.Message ID:
***@***.***>
--
Professor
Department of Atmospheric and Oceanic Sciences
University of California, Los Angeles
Math Sciences Building 7142
Email: ***@***.***
Website: http://jasperfkok.com
Pronouns: he/his/him
|
Looking overall good. I had to make some minor tweaks to the files to get them to pass validation, that was pretty straight forward. The Python code I used is below (runs in the same environment as input4mips-validation), but you can probably do the same with Matlab easily. Looking at the changes, I assume they'll need to be applied to all files. Python code for tweaksimport netCDF4
ds = netCDF4.Dataset(
"dustscalefactor_input4MIPs_emissions_AerChemMIP_UCLA-1-0-1_gn_185001-200012.nc",
"a",
)
# Capitalisation matters according to the spec
# (doesn't make sense to me, but here we are)
ds["dustscalefactor"].delncattr("Units")
# Apparently dimensionless is represented by "1"
# (again, wouldn't be my choice of how to do this)
ds["dustscalefactor"].setncattr("units", "1")
ds["dustscalefactor"].setncattr("long_name", "dustscalefactor")
# Capitalisation matters according to the spec
# (doesn't make sense to me, but here we are)
ds["region"].delncattr("Units")
# Apparently dimensionless is represented by "1"
# (again, wouldn't be my choice of how to do this)
ds["region"].setncattr("units", "1")
# Whitespace isn't allowed in attribute names
# (not sure why, it's just the spec)
ds["region"].setncattr(
"boundary_coordinates", ds["region"].getncattr("Boundary coordinates")
)
ds["region"].delncattr("Boundary coordinates")
ds["region"].setncattr("long_name", "region")
# We use 'yr' rather than 'annual'
ds.setncattr("frequency", "yr")
# License info
ds.setncattr("license_id", "CC BY 4.0")
ds.setncattr(
"license",
(
"The input4MIPs data linked to this entry is licensed "
"under a Creative Commons Attribution 4.0 International "
"(https://creativecommons.org/licenses/by/4.0/). "
"Consult https://pcmdi.llnl.gov/CMIP6/TermsOfUse "
"for terms of use governing CMIP6Plus output, "
"including citation requirements and proper acknowledgment. "
"The data producers and data providers make no warranty, either express "
"or implied, including, but not limited to, warranties of merchantability "
"and fitness for a particular purpose. "
"All liabilities arising from the supply of the information "
"(including any liability arising in negligence) "
"are excluded to the fullest extent permitted by law."
),
)
# If you know this, helpful.
# If not, can just leave vague like this
ds.setncattr("nominal_resolution", "250 km")
ds.close() The only other minor change is that, because you have annual data, the end of the file should be "YYYY-YYYY" not "YYYYMM-YYYYMM" as you have now i.e. for your historical files, "185001-200012.nc" becomes "1850-2000.nc". If you're able to make those tweaks, we should be on the home straight. |
The other thing we need to do is register UCLA as an institute. I've started that here if you're able to take a look: PCMDI/mip-cmor-tables#84 Thanks! |
Thanks Zeb, that was very helpful. I've addressed all these issues and the
file now passes the validation, including the cf checker!
Does that conclude the process or are there other steps we need to take?
Thanks!
Jasper
…On Wed, Dec 11, 2024 at 10:55 AM znichollscr ***@***.***> wrote:
Looking overall good. I had to make some minor tweaks to the files to get
them to pass validation, that was pretty straight forward. The Python code
I used is below (runs in the same environment as input4mips-validation),
but you can probably do the same with Matlab easily. Looking at the
changes, I assume they'll need to be applied to all files.
Python code for tweaks
import netCDF4
ds = netCDF4.Dataset(
"dustscalefactor_input4MIPs_emissions_AerChemMIP_UCLA-1-0-1_gn_185001-200012.nc",
"a",
)
# Capitalisation matters according to the spec
# (doesn't make sense to me, but here we are)
ds["dustscalefactor"].delncattr("Units")
# Apparently dimensionless is represented by "1"
# (again, wouldn't be my choice of how to do this)
ds["dustscalefactor"].setncattr("units", "1")
ds["dustscalefactor"].setncattr("long_name", "dustscalefactor")
# Capitalisation matters according to the spec
# (doesn't make sense to me, but here we are)
ds["region"].delncattr("Units")
# Apparently dimensionless is represented by "1"
# (again, wouldn't be my choice of how to do this)
ds["region"].setncattr("units", "1")
# Whitespace isn't allowed in attribute names
# (not sure why, it's just the spec)
ds["region"].setncattr(
"boundary_coordinates", ds["region"].getncattr("Boundary coordinates")
)
ds["region"].delncattr("Boundary coordinates")
ds["region"].setncattr("long_name", "region")
# We use 'yr' rather than 'annual'
ds.setncattr("frequency", "yr")
# License info
ds.setncattr("license_id", "CC BY 4.0")
ds.setncattr(
"license",
(
"The input4MIPs data linked to this entry is licensed "
"under a Creative Commons Attribution 4.0 International "
"(https://creativecommons.org/licenses/by/4.0/). "
"Consult https://pcmdi.llnl.gov/CMIP6/TermsOfUse "
"for terms of use governing CMIP6Plus output, "
"including citation requirements and proper acknowledgment. "
"The data producers and data providers make no warranty, either express "
"or implied, including, but not limited to, warranties of merchantability "
"and fitness for a particular purpose. "
"All liabilities arising from the supply of the information "
"(including any liability arising in negligence) "
"are excluded to the fullest extent permitted by law."
),
)
# If you know this, helpful.
# If not, can just leave vague like this
ds.setncattr("nominal_resolution", "250 km")
ds.close()
The only other minor change is that, because you have annual data, the end
of the file should be "YYYY-YYYY" not "YYYYMM-YYYYMM" as you have now i.e.
for your historical files, "185001-200012.nc" becomes "1850-2000.nc".
If you're able to make those tweaks, we should be on the home straight.
—
Reply to this email directly, view it on GitHub
<#140 (comment)>,
or unsubscribe
<https://github.com/notifications/unsubscribe-auth/AGCCLTLAJCK2H3HEZF777B32FCDBTAVCNFSM6AAAAABQMY7RSKVHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMZDKMZWHA3DKNBZGI>
.
You are receiving this because you were mentioned.Message ID:
***@***.***>
--
Professor
Department of Atmospheric and Oceanic Sciences
University of California, Los Angeles
Math Sciences Building 7142
Email: ***@***.***
Website: http://jasperfkok.com
Pronouns: he/his/him
|
Oh cool, I thought the cf-checker wasn't running on your machine, great to hear it's working!
Well, if you want to publish the data on ESGF there are a couple more steps. If you're not worried about that, we're done (although I'm also not sure that you needed to do any of these steps if you didn't want to publish on ESGF in the first place). |
Assuming you do want to publish on ESGF, the next steps are:
|
Yes, I'd definitely like to publish this on ESGF as an input4MIPs data set.
- check that this is the right institute to associate with your data:
PCMDI/mip-cmor-tables#84
<PCMDI/mip-cmor-tables#84>
That's correct, thanks for adding UCLA
- I'll pull your data down onto our server
- Paul and I will then push it through the publication queue
- Good to go
Perfect. I've verified that the data on the google drive (here
<https://drive.google.com/drive/u/0/folders/1Xr1A4oqPj35h43MFV1864IXSvYSVtYKs>)
is correct. Let me know if you need anything else from me.
Thanks for all your help!
Jasper
…
—
Reply to this email directly, view it on GitHub
<#140 (comment)>,
or unsubscribe
<https://github.com/notifications/unsubscribe-auth/AGCCLTJAM3WALQKHKWZRBTT2FEXMNAVCNFSM6AAAAABQMY7RSKVHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMZDKMZXHE3DEMJRHE>
.
You are receiving this because you were mentioned.Message ID:
***@***.***>
--
Professor
Department of Atmospheric and Oceanic Sciences
University of California, Los Angeles
Math Sciences Building 7142
Email: ***@***.***
Website: http://jasperfkok.com
Pronouns: he/his/him
|
Ok perfect. There is one last step that would be very helpful, can you please run the script below (should be clear what it's doing, any questions, fire away. As above, it will run in your normal input4mips-validation environment). Once you've done that, ping me and @durack1 and we can get the files in the publication queue. Python script to runimport shutil
from pathlib import Path
import iris
import netCDF4
from input4mips_validation.cvs import load_cvs
from input4mips_validation.dataset import Input4MIPsDataset
from input4mips_validation.inference.from_data import infer_time_start_time_end
from input4mips_validation.logging import setup_logging
from input4mips_validation.upload_ftp import upload_ftp
from input4mips_validation.validation.file import get_validate_file_result
from input4mips_validation.xarray_helpers.iris import ds_from_iris_cubes
setup_logging(enable=True, logging_level="INFO_FILE")
def create_fixed_tmp_file(file: Path, fixed_folder: Path) -> Path:
fixed_file = fixed_folder / file.name
shutil.copy(file, fixed_file)
source_id = file.name.split("_")[4]
ds = netCDF4.Dataset(fixed_file, "a")
ds.setncattr("source_id", source_id)
ds.setncattr("realm", "atmos")
if "gn" in fixed_file.name:
ds["region"].setncattr("long_name", "region")
ds.close()
return fixed_file
def validate_file_then_rewrite_file_in_drs(
file: Path, output_root: Path, cv_source: str
) -> None:
get_validate_file_result(
file,
cv_source=cv_source,
).raise_if_errors()
cvs = load_cvs(cv_source=cv_source)
ds = ds_from_iris_cubes(
iris.load(file),
)
ds.attrs["nominal_resolution"] = "250 km"
ds.attrs["target_mip"] = "AerChemMIP2" # To confirm with Jasper
time_start, time_end = infer_time_start_time_end(
ds=ds,
frequency_metadata_key="frequency",
no_time_axis_frequency="fx",
time_dimension="time",
)
full_file_path = cvs.DRS.get_file_path(
root_data_dir=output_root,
available_attributes=ds.attrs,
time_start=time_start,
time_end=time_end,
)
if full_file_path.exists():
raise FileExistsError(full_file_path)
full_file_path.parent.mkdir(parents=True, exist_ok=True)
if full_file_path.name != file.name:
Input4MIPsDataset.from_ds(ds, cvs=cvs).write(
root_data_dir=output_root,
)
else:
shutil.copy(file, full_file_path)
print(f"File written according to the DRS in {full_file_path}")
def main() -> None:
# Obviously point these wherever makes sense for you
DATA_FOLDER = Path("dust-files")
TMP_REWRITE_FOLDER = Path("dust-files-pre-validation-fixes")
OUTPUT_ROOT = Path("dust-rewritten")
# You shouldn't need to change this
CV_SOURCE = "gh:main"
TMP_REWRITE_FOLDER.mkdir(exist_ok=True, parents=True)
files_to_upload = DATA_FOLDER.rglob("*.nc")
for file in files_to_upload:
tmp_file = create_fixed_tmp_file(file, TMP_REWRITE_FOLDER)
validate_file_then_rewrite_file_in_drs(tmp_file, OUTPUT_ROOT, CV_SOURCE)
cvs = load_cvs(cv_source=CV_SOURCE)
upload_ftp(
tree_root=OUTPUT_ROOT,
ftp_dir_rel_to_root="UCLA-1-0-1-upload-1",
password="jfkok@ucla.edu",
cvs=cvs,
username="anonymous",
ftp_server="ftp.llnl.gov",
ftp_dir_root="/incoming",
# You can try making this 8, but the FTP server seems to not enjoy parallel uploads
n_threads=1,
dry_run=False,
continue_on_error=False,
)
if __name__ == "__main__":
main() |
Happy 2025, @durack1, @znichollscr! I ran the python script and copied the output below. It seems to have run correctly, as far as I can tell. |
Hmm I don't think so, no files were uploaded. Did you update |
If yes, change files_to_upload = DATA_FOLDER.rglob("*.nc") to files_to_upload = DATA_FOLDER.rglob("*.nc")
assert files_to_upload, f"No re-written files found in {DATA_FOLDER}, please check" because something isn't working... |
Oops sorry, I missed that part.
I updated the directories to point to the right places and ran it again,
seemingly successfully (see screenshot). I assume it worked now and you can
see the uploaded files?
[image: image.png]
…On Mon, Jan 6, 2025 at 10:09 AM znichollscr ***@***.***> wrote:
If yes, change
files_to_upload = DATA_FOLDER.rglob("*.nc")
to
files_to_upload = DATA_FOLDER.rglob("*.nc")assert files_to_upload, f"No re-written files found in {DATA_FOLDER}, please check"
because something isn't working...
—
Reply to this email directly, view it on GitHub
<#140 (comment)>,
or unsubscribe
<https://github.com/notifications/unsubscribe-auth/AGCCLTNSUFYHBJUYHSMGCFD2JLBEPAVCNFSM6AAAAABQMY7RSKVHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMZDKNZTGY2DGNZXG4>
.
You are receiving this because you were mentioned.Message ID:
***@***.***>
--
Professor
Department of Atmospheric and Oceanic Sciences
University of California, Los Angeles
Math Sciences Building 7142
Email: ***@***.***
Website: http://jasperfkok.com
Pronouns: he/his/him
|
Hmmm I can't see the screenshot and we don't seem to have anything on the server. Can you try sending the screenshot again please? |
Super weird, is there anything in the folder pointed to by |
(For what it's worth, the timestamp for the message about uploading files and success are the same, so it's clear that there's no uploading happening) |
@jfkok has done some great work pulling together dust emissions. This issue is for tracking their inclusion in input4MIPs.
The text was updated successfully, but these errors were encountered: