Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

v3.0.0b0 #78

Merged
merged 16 commits into from
Jul 4, 2024
Merged

v3.0.0b0 #78

merged 16 commits into from
Jul 4, 2024

Conversation

njzjz
Copy link
Member

@njzjz njzjz commented Jul 3, 2024

Checklist

  • Used a personal fork of the feedstock to propose changes
  • Bumped the build number (if the version is unchanged)
  • Reset the build number to 0 (if the version changed)
  • Re-rendered with the latest conda-smithy (Use the phrase @conda-forge-admin, please rerender in a comment in this PR for automated rerendering)
  • Ensured the license file is being packaged.

njzjz added 2 commits July 3, 2024 15:33
Signed-off-by: Jinzhe Zeng <jinzhe.zeng@rutgers.edu>
Signed-off-by: Jinzhe Zeng <jinzhe.zeng@rutgers.edu>
@conda-forge-webservices
Copy link
Contributor

Hi! This is the friendly automated conda-forge-linting service.

I wanted to let you know that I linted all conda-recipes in your PR (recipe) and found some lint.

Here's what I've got...

For recipe:

  • Failed to even lint the recipe, probably because of a conda-smithy bug 😢. This likely indicates a problem in your meta.yaml, though. To get a traceback to help figure out what's going on, install conda-smithy and run conda smithy recipe-lint . from the recipe directory.

@njzjz
Copy link
Member Author

njzjz commented Jul 3, 2024

@conda-forge-admin, please rerender

Signed-off-by: Jinzhe Zeng <jinzhe.zeng@rutgers.edu>
@njzjz
Copy link
Member Author

njzjz commented Jul 3, 2024

@conda-forge-admin, please rerender

@conda-forge-webservices
Copy link
Contributor

Hi! This is the friendly automated conda-forge-linting service.

I just wanted to let you know that I linted all conda-recipes in your PR (recipe) and found it was in an excellent condition.

Copy link
Contributor

github-actions bot commented Jul 3, 2024

Hi! This is the friendly automated conda-forge-webservice.

I tried to rerender for you but ran into some issues. Please check the output logs of the latest webservices GitHub actions workflow run for errors. You can also ping conda-forge/core for further assistance or you can try rerendeing locally.

This message was generated by GitHub actions workflow run https://github.com/conda-forge/deepmd-kit-feedstock/actions/runs/9783853775.

recipe/meta.yaml Show resolved Hide resolved
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The second one is a segfault with openmpi + absl:

2024-07-03T20:00:29.2814841Z + export OMPI_MCA_plm=isolated OMPI_MCA_btl_vader_single_copy_mechanism=none OMPI_MCA_rmaps_base_oversubscribe=yes OMPI_MCA_plm_ssh_agent=false
2024-07-03T20:00:29.2815656Z + OMPI_MCA_plm=isolated
2024-07-03T20:00:29.2815930Z + OMPI_MCA_btl_vader_single_copy_mechanism=none
2024-07-03T20:00:29.2816146Z + OMPI_MCA_rmaps_base_oversubscribe=yes
2024-07-03T20:00:29.2816458Z + OMPI_MCA_plm_ssh_agent=false
2024-07-03T20:00:29.2816990Z + mpiexec -n 1 lmp_mpi -in in.lammps
2024-07-03T20:00:29.4367999Z [1f1c0d740e3f:04385] mca_base_component_repository_open: unable to open mca_btl_openib: librdmacm.so.1: cannot open shared object file: No such file or directory (ignored)
2024-07-03T20:00:29.4559532Z LAMMPS (2 Aug 2023)
2024-07-03T20:00:29.4568122Z OMP_NUM_THREADS environment is not set. Defaulting to 1 thread. (src/comm.cpp:98)
2024-07-03T20:00:29.4575016Z   using 1 OpenMP thread(s) per MPI task
2024-07-03T20:00:30.0446896Z [1f1c0d740e3f:04385] *** Process received signal ***
2024-07-03T20:00:30.0448123Z [1f1c0d740e3f:04385] Signal: Segmentation fault (11)
2024-07-03T20:00:30.0453336Z [1f1c0d740e3f:04385] Signal code: Address not mapped (1)
2024-07-03T20:00:30.0454407Z [1f1c0d740e3f:04385] Failing at address: 0x8
2024-07-03T20:00:30.0459737Z [1f1c0d740e3f:04385] [ 0] /lib64/libc.so.6(+0x36400)[0x7f43ab885400]
2024-07-03T20:00:30.0462368Z [1f1c0d740e3f:04385] [ 1] /home/conda/feedstock_root/build_artifacts/deepmd-kit_1720035897476/_test_env_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_place/bin/../lib/./python3.9/site-packages/tensorflow/../../../libabsl_flags_reflection.so.2401.0.0(_ZN4absl12lts_2024011614flags_internal12FlagRegistry12RegisterFlagERNS0_15CommandLineFlagEPKc+0x99)[0x7f43a1318e09]
2024-07-03T20:00:30.0464226Z [1f1c0d740e3f:04385] [ 2] /home/conda/feedstock_root/build_artifacts/deepmd-kit_1720035897476/_test_env_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_place/bin/../lib/./python3.9/site-packages/tensorflow/../../../libabsl_flags_reflection.so.2401.0.0(_ZN4absl12lts_2024011614flags_internal23RegisterCommandLineFlagERNS0_15CommandLineFlagEPKc+0x21)[0x7f43a131a5c1]
2024-07-03T20:00:30.0465530Z [1f1c0d740e3f:04385] [ 3] /home/conda/feedstock_root/build_artifacts/deepmd-kit_1720035897476/_test_env_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_place/bin/../lib/./python3.9/site-packages/tensorflow/../../../libabsl_log_flags.so.2401.0.0(+0x3079)[0x7f43a1338079]
2024-07-03T20:00:30.0466184Z [1f1c0d740e3f:04385] [ 4] /lib64/ld-linux-x86-64.so.2(+0xf9c3)[0x7f43b13a99c3]
2024-07-03T20:00:30.0466604Z [1f1c0d740e3f:04385] [ 5] /lib64/ld-linux-x86-64.so.2(+0x1459e)[0x7f43b13ae59e]
2024-07-03T20:00:30.0467010Z [1f1c0d740e3f:04385] [ 6] /lib64/ld-linux-x86-64.so.2(+0xf7d4)[0x7f43b13a97d4]
2024-07-03T20:00:30.0467376Z [1f1c0d740e3f:04385] [ 7] /lib64/ld-linux-x86-64.so.2(+0x13b8b)[0x7f43b13adb8b]
2024-07-03T20:00:30.0467893Z [1f1c0d740e3f:04385] [ 8] /lib64/libdl.so.2(+0xfab)[0x7f43a99fcfab]
2024-07-03T20:00:30.0468252Z [1f1c0d740e3f:04385] [ 9] /lib64/ld-linux-x86-64.so.2(+0xf7d4)[0x7f43b13a97d4]
2024-07-03T20:00:30.0468559Z [1f1c0d740e3f:04385] [10] /lib64/libdl.so.2(+0x15ad)[0x7f43a99fd5ad]
2024-07-03T20:00:30.0468815Z [1f1c0d740e3f:04385] [11] /lib64/libdl.so.2(dlopen+0x31)[0x7f43a99fd041]
2024-07-03T20:00:30.0469716Z [1f1c0d740e3f:04385] [12] /home/conda/feedstock_root/build_artifacts/deepmd-kit_1720035897476/_test_env_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_place/bin/../lib/liblammps.so.0(_ZN9LAMMPS_NS11plugin_loadEPKcPNS_6LAMMPSE+0xa6)[0x7f43acd40df6]
2024-07-03T20:00:30.0470980Z [1f1c0d740e3f:04385] [13] /home/conda/feedstock_root/build_artifacts/deepmd-kit_1720035897476/_test_env_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_place/bin/../lib/liblammps.so.0(_ZN9LAMMPS_NS16plugin_auto_loadEPNS_6LAMMPSE+0x1a6)[0x7f43acd41446]
2024-07-03T20:00:30.0472135Z [1f1c0d740e3f:04385] [14] /home/conda/feedstock_root/build_artifacts/deepmd-kit_1720035897476/_test_env_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_place/bin/../lib/liblammps.so.0(_ZN9LAMMPS_NS6LAMMPSC2EiPPcP19ompi_communicator_t+0xefd)[0x7f43ac5fedfd]
2024-07-03T20:00:30.0472657Z [1f1c0d740e3f:04385] [15] lmp_mpi(+0x2217)[0x556b1653b217]
2024-07-03T20:00:30.0472911Z [1f1c0d740e3f:04385] [16] /lib64/libc.so.6(__libc_start_main+0xf5)[0x7f43ab871555]
2024-07-03T20:00:30.0473171Z [1f1c0d740e3f:04385] [17] lmp_mpi(+0x2298)[0x556b1653b298]
2024-07-03T20:00:30.0473388Z [1f1c0d740e3f:04385] *** End of error message ***
2024-07-03T20:00:30.3112890Z --------------------------------------------------------------------------
2024-07-03T20:00:30.3113960Z Primary job  terminated normally, but 1 process returned
2024-07-03T20:00:30.3114930Z a non-zero exit code. Per user-direction, the job has been aborted.
2024-07-03T20:00:30.3115645Z --------------------------------------------------------------------------
2024-07-03T20:00:32.5233816Z --------------------------------------------------------------------------
2024-07-03T20:00:32.5235155Z mpiexec noticed that process rank 0 with PID 0 on node 1f1c0d740e3f exited on signal 11 (Segmentation fault).
2024-07-03T20:00:32.5235753Z --------------------------------------------------------------------------
2024-07-03T20:00:33.7626910Z WARNING: Tests failed for deepmd-kit-3.0.0b0-cpu_py39hfac8ecd_mpi_openmpi_0.conda - moving package to /home/conda/feedstock_root/build_artifacts/broken
2024-07-03T20:00:33.8070705Z TESTS FAILED: deepmd-kit-3.0.0b0-cpu_py39hfac8ecd_mpi_openmpi_0.conda

Unclear how to fix it or just skip the test.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It seems to me that mpich also has the segfault.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reproduced on the local machine.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

(gdb) where
#0  0x0000155541a60e09 in absl::lts_20240116::flags_internal::FlagRegistry::RegisterFlag(absl::lts_20240116::CommandLineFlag&, char const*) ()
   from /home/jz748/anaconda3/envs/test-deepmd-build/bin/../lib/./python3.11/site-packages/tensorflow/../../../libabsl_flags_reflection.so.2401.0.0
#1  0x0000155541a625c1 in absl::lts_20240116::flags_internal::RegisterCommandLineFlag(absl::lts_20240116::CommandLineFlag&, char const*) ()
   from /home/jz748/anaconda3/envs/test-deepmd-build/bin/../lib/./python3.11/site-packages/tensorflow/../../../libabsl_flags_reflection.so.2401.0.0
#2  0x0000155541a80079 in _GLOBAL__sub_I_flags.cc ()
   from /home/jz748/anaconda3/envs/test-deepmd-build/bin/../lib/./python3.11/site-packages/tensorflow/../../../libabsl_log_flags.so.2401.0.0
#3  0x0000155555525237 in call_init (env=0x55555567dbf0, argv=0x7fffffffad58, argc=3, l=<optimized out>) at dl-init.c:74
#4  call_init (l=<optimized out>, argc=3, argv=0x7fffffffad58, env=0x55555567dbf0) at dl-init.c:26
#5  0x000015555552532d in _dl_init (main_map=0x555555780eb0, argc=3, argv=0x7fffffffad58, env=0x55555567dbf0) at dl-init.c:121
#6  0x00001555555215c2 in __GI__dl_catch_exception (exception=exception@entry=0x0, operate=operate@entry=0x15555552bf50 <call_dl_init>,
    args=args@entry=0x7fffffffa290) at dl-catch.c:211
#7  0x000015555552beec in dl_open_worker (a=a@entry=0x7fffffffa440) at dl-open.c:827
#8  0x0000155555521523 in __GI__dl_catch_exception (exception=exception@entry=0x7fffffffa420,
    operate=operate@entry=0x15555552be50 <dl_open_worker>, args=args@entry=0x7fffffffa440) at dl-catch.c:237
#9  0x000015555552c2e4 in _dl_open (file=0x555555780cc0 "/home/jz748/anaconda3/envs/test-deepmd-build/lib/deepmd_lmp/dpplugin.so",
    mode=<optimized out>, caller_dlopen=0x155550f40916 <LAMMPS_NS::plugin_load(char const*, LAMMPS_NS::LAMMPS*)+166>, nsid=<optimized out>,
    argc=3, argv=0x7fffffffad58, env=0x55555567dbf0) at dl-open.c:903
#10 0x000015554fcc7714 in dlopen_doit () from /lib64/libc.so.6
#11 0x0000155555521523 in __GI__dl_catch_exception (exception=exception@entry=0x7fffffffa630, operate=0x15554fcc76b0 <dlopen_doit>,
    args=0x7fffffffa6f0) at dl-catch.c:237
#12 0x0000155555521679 in _dl_catch_error (objname=0x7fffffffa698, errstring=0x7fffffffa6a0, mallocedp=0x7fffffffa697, operate=<optimized out>,
    args=<optimized out>) at dl-catch.c:256
#13 0x000015554fcc71f3 in _dlerror_run () from /lib64/libc.so.6
#14 0x000015554fcc77cf in dlopen@GLIBC_2.2.5 () from /lib64/libc.so.6
#15 0x0000155550f40916 in LAMMPS_NS::plugin_load(char const*, LAMMPS_NS::LAMMPS*) ()
   from /home/jz748/anaconda3/envs/test-deepmd-build/bin/../lib/liblammps.so.0
#16 0x0000155550f40f66 in LAMMPS_NS::plugin_auto_load(LAMMPS_NS::LAMMPS*) ()
   from /home/jz748/anaconda3/envs/test-deepmd-build/bin/../lib/liblammps.so.0
#17 0x00001555507fe6ed in LAMMPS_NS::LAMMPS::LAMMPS(int, char**, ompi_communicator_t*) ()
   from /home/jz748/anaconda3/envs/test-deepmd-build/bin/../lib/liblammps.so.0
#18 0x0000555555556217 in main ()

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As a workaround, I pinned tensorflow to 2.15 and pytorch to 2.1, and submitted conda-forge/abseil-cpp-feedstock#79.

recipe/meta.yaml Show resolved Hide resolved
njzjz added 7 commits July 3, 2024 16:29
Signed-off-by: Jinzhe Zeng <jinzhe.zeng@rutgers.edu>
Signed-off-by: Jinzhe Zeng <jinzhe.zeng@rutgers.edu>
Signed-off-by: Jinzhe Zeng <jinzhe.zeng@rutgers.edu>
@njzjz
Copy link
Member Author

njzjz commented Jul 4, 2024

@conda-forge-admin please rerender

conda-forge-webservices[bot] and others added 2 commits July 4, 2024 09:38
@njzjz
Copy link
Member Author

njzjz commented Jul 4, 2024

@conda-forge-admin please rerender

conda-forge-webservices[bot] and others added 2 commits July 4, 2024 09:50
@njzjz
Copy link
Member Author

njzjz commented Jul 4, 2024

@conda-forge-admin please rerender

@njzjz njzjz merged commit 85ab4ed into conda-forge:rc Jul 4, 2024
36 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

1 participant