-
Notifications
You must be signed in to change notification settings - Fork 170
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Runtime error in Cloud-J module #2648
Comments
Thanks for writing @foesterstroem. This looks like an input read error. If you are using version 14.5.0 then make sure that you are reading data from the CHEM_INPUTS/CLOUD_J/v2024-09/ folder as these files are needed for Cloud-J v8+. Also, would you be able to attach your input files and log files to this issue? We can take a look. Also tagging @lizziel, our local Cloud-J expert. |
Hi @foesterstroem, that print should only occur if Cloud-J debug prints are enabled since it is in this if block: To be clear, the problem is the formatting of the write statement (not input read) and I wonder if there is a bug there. You must be running with verbose on, since that write statement is otherwise not called. It is trying to print all J-values for a single grid cell (the cell horizontal indexes are defined as (20,20) prior to the main Cloud-J call in file cldj_interface_mod.F90 in GEOS-Chem). The J-values are defined as real so it is odd it is looking for an integer. I will look into if this is a bug. For now commenting it out should not have any impact. It might be an indicator of an underlying issue, although I doubt it. Please do post your geoschem_config.yml and log file here. You can add .txt extension and drag and drop the files into the comment box. This will help me try to reproduce the issue. |
Hi, thanks for getting back to me. I have been running it, with verbose on, yes. I am currently re-trying the benchmark run with the do-statement uncommented and verbose off. An FYI: I have successfully run a benchmark run yesterday/today using version 14.4.3, with verbose on. I am attaching just part of the log-file: GC-log-sections.txt The file is too big to attach in its entirety - it ran for ~12.5 model days before failing with a floating point exception (problem in some section of the aerosol calculation part): Program received signal SIGFPE: Floating-point exception - erroneous arithmetic operation.
Backtrace for this error:
#0 0x14b33ac38b7f in ???
#1 0x1073231 in mach_hetp_calco7
at /home/freja/GCClassic_14_5_0/Rundirs/gc_4x5_merra2_fullchem_benchmark/CodeDir/src/HETP/src/Core/hetp_mod.F90:5220
#2 0x108de29 in __hetp_mod_MOD_mach_hetp_main_15cases
at /home/freja/GCClassic_14_5_0/Rundirs/gc_4x5_merra2_fullchem_benchmark/CodeDir/src/HETP/src/Core/hetp_mod.F90:408
#3 0x5f5bbb in __aerosol_thermodynamics_mod_MOD_do_ate._omp_fn.0
at /home/freja/GCClassic_14_5_0/Rundirs/gc_4x5_merra2_fullchem_benchmark/CodeDir/src/GEOS-Chem/GeosCore/aerosol_thermodynamics_mod.F90:799
#4 0x14b33b65357d in gomp_thread_start
at /tmp/freja/spack-stage/spack-stage-gcc-12.2.0-5boslqnpqh4pkgcys6q5bbmrzfhts2qh/spack-src/libgomp/team.c:129
#5 0x14b33afb81ce in ???
#6 0x14b33ac23e72 in ???
#7 0xffffffffffffffff in ???
/bin/bash: line 71: 4041629 Floating point exception(core dumped) ./gcclassic >> $log |
Hi @foesterstroem, I did a fullchem run yesterday with verbose on. It was successful but I was surprised to see that there are actually small differences in output between verbose on and verbose off in GC-Classic benchmark simulation. I do not recommend doing any production runs with verbose on until we figure out why there are differences. Having verbose on also slows the down the model. For that reason we do not recommend using it for production runs anyway. Is there a reason you need it on for a 12+ day run? |
Thanks @foesterstroem and @lizziel. I have a hunch why this may happen. The real(dp) :: omehi, omebe, y1, y2, y3, x3, dx, c1, c2, c2a, c3, gmax, ya, yb, xa, xb But if (.not. soln) then
gmax = 0.1_dp
gmax = max(gmax, gama(1))
gmax = max(gmax, gama(2))
gmax = max(gmax, gama(3))
gmax = max(gmax, gama(4))
gmax = max(gmax, gama(5))
gmax = max(gmax, gama(6))
gmax = max(gmax, gama(7))
gmax = max(gmax, gama(8))
gmax = max(gmax, gama(9))
gmax = max(gmax, gama(10))
gmax = max(gmax, gama(11))
gmax = max(gmax, gama(12))
gmax = max(gmax, gama(13))
gmax = max(gmax, gama(14))
gmax = max(gmax, gama(15))
gmax = max(gmax, gama(16))
gmax = max(gmax, gama(17))
gmax = max(gmax, gama(18))
gmax = max(gmax, gama(19))
gmax = max(gmax, gama(20))
gmax = max(gmax, gama(21))
gmax = max(gmax, gama(22))
gmax = max(gmax, gama(23))
end if Note that gmax wouldn't be defined unless the ! ## Reinitialize activity coefficients if gmax > 100.0_dp
if (gmax > 100.0_dp .and. (.not. soln)) then
gama = 0.1_dp
gamin = 1.0e10_dp
gamou = 1.0e10_dp
calou = .true.
frst = .true.
end if So what could be happening is that What I think will fix this is if we set ! ### Initialize variables ###
so4 = so4_i
nh4 = nh4_i
no3 = no3_i
na = na_i
cl = cl_i
ca = ca_i
pk = k_i
mg = mg_i
aw = rh
t = temp
hso4 = 0.0_dp
gnh3 = 0.0_dp
ghno3 = 0.0_dp
ghcl = 0.0_dp
h = 0.0_dp
lwn = tiny
so4_t = 0.0_dp
nh4_t = 0.0_dp
no3_t = 0.0_dp
na_t = 0.0_dp
cl_t = 0.0_dp
ca_t = 0.0_dp
pk_t = 0.0_dp
mg_t = 0.0_dp
caso4 = 0.0_dp
so4fr = 0.0_dp
na2so4= 0.0_dp
k2so4 = 0.0_dp
mgso4 = 0.0_dp
noroot=.false.
frk = 0.0_dp
frmg = 0.0_dp
frca = 0.0_dp
frna = 0.0_dp
soln = .false.
calou = .true.
gama = 0.1_dp
gamin = 1.0e10_dp
gamou = 0.1_dp
earlye = .false. we could just add a gmax = 0.0_dp to that list just to make sure that it will always have a defined non-denormal value when |
That makes sense for that error. I think most compilers don't pick up on that unless compiled with debug flags. @foesterstroem, did you compile with debug on? I only ask because you should recompile with it off before doing a long run. It will greatly slow down the model. |
Hi @lizziel, Thank you. I had left verbose on after running a debug version, and saw the initial Cloud-J error running in both debug and normal mode. For the second error with HETP, the particular run wasn't compiled in debug mode, but in the normal mode. @yantosca after the holiday break, I will attempt a fresh compile of the benchmark run to see if I still have the HETP/gmax issue and see if added the code you suggest fixes the issue. |
Your name
Freja Østerstrøm
Your affiliation
Aarhus University, Denmark
What happened? What did you expect to happen?
I have recently set up our local cluster with environment to run GCClassic 14.5.0 and performed dryruns to download input data successfully.
When running a 4x5 benchmark run, I get a runtime error in the Cloud-J module:
In model output/print in the log-file:
I get the same error in debugging mode, but not any more information (I do have an issue with my debugger-installation, so could be unrelated to this issue).
What are the steps to reproduce the bug?
In the cldj_fjx_sub_mod.F90 file: lines 778-780 is a do statement that prints the J-values as in the output above. A comment here says the information is not fed back to the model. I have tried commenting out these three lines in the source code and this makes the model run.
and
I am worried that there may be something that would go wrong down the line when commenting this information out? Or something actually wrong in the model run that I am not seeing?
The species in this list of J-values that doesn't have the J-value printed is MVKN, which seems to have a very small absorption cross section (CHEM_INPUTS/FAST_JX/v2024-05/FJX_spec.dat), so I am not sure if it is an issue about J-values approaching 0?
Please attach any relevant configuration and log files.
No response
What GEOS-Chem version were you using?
14.5.0
What environment were you running GEOS-Chem on?
Local cluster
What compiler and version were you using?
gcc 12.2.0
Will you be addressing this bug yourself?
Yes, but I will need some help
In what configuration were you running GEOS-Chem?
GCClassic
What simulation were you running?
Full chemistry
As what resolution were you running GEOS-Chem?
4x5
What meterology fields did you use?
MERRA-2
Additional information
No response
The text was updated successfully, but these errors were encountered: