Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Compile Python 2.7.2 with WASM #58

Open
yzhang71 opened this issue Nov 26, 2024 · 40 comments
Open

Compile Python 2.7.2 with WASM #58

yzhang71 opened this issue Nov 26, 2024 · 40 comments
Assignees

Comments

@yzhang71
Copy link
Contributor

We are currently attempting to compile Python version 2.7.2 with our Lind-WASM. We selected version 2.7.2 because it is the version used by Lind-NaCl, allowing for a fair comparison. I will also use this issue to track the compilation problems and document their solutions.

@yzhang71
Copy link
Contributor Author

In file included from ../Modules/python.c:3:
In file included from ../Include/Python.h:58:
../Include/pyport.h:243:13: error: "This platform's pyconfig.h needs to define PY_FORMAT_LONG_LONG"
error "This platform's pyconfig.h needs to define PY_FORMAT_LONG_LONG"
^
1 error generated.
make: *** [Makefile:555: Modules/python.o] Error 1

@rennergade
Copy link
Contributor

Where did you pull the source code from?

@yzhang71
Copy link
Contributor Author

The error persists because we are performing a cross-compilation, and PY_FORMAT_LONG_LONG is not correctly defined for the platform. To resolve this issue in a robust way, we need to define PY_FORMAT_LONG_LONG directly in pyconfig.h.

Solution:

Add the following line to /Python-2.7.2/build/pyconfig.h:

#define PY_FORMAT_LONG_LONG "ll"

@yzhang71
Copy link
Contributor Author

Where did you pull the source code from?

I got the source code from here: https://www.python.org/downloads/release/python-272/

@rennergade
Copy link
Contributor

A lot of clues for things such as the one above should be in here: https://github.com/Lind-Project/lind_project/tree/main/tests/applications/python

Python was the hardest app to build because its build process uses itself to package everything, but because we couldn't switch between native and Lind in the build process we had to compile a bunch of things by scratch. It would be cool to figure it out seemlessly this time but that may not be possible.

@Yaxuan-w
Copy link
Member

This issue would be a good reference from my view

@yzhang71
Copy link
Contributor Author

Parser/pgen ../Grammar/Grammar ../Include/graminit.h ../Python/graminit.c
Parser/pgen: 12: Syntax error: "(" unexpected
make: *** [Makefile:562: Parser/pgen.stamp] Error 2

@yzhang71
Copy link
Contributor Author

The error occurred because the pgen we generated was a WASM version. However, the default script attempts to run pgen directly, without using a WASM runtime.

Solution

Inspired by the Lind-NaCl compilation process, I followed these steps to resolve the issue:

  1. Compiled a native version of Python using Clang.
  2. Used the native pgen to generate the necessary files:
    • ../Include/graminit.h
    • ../Python/graminit.c

@yzhang71
Copy link
Contributor Author

../Modules/posixmodule.c:5873:21: error: use of undeclared identifier 'MAX_GROUPS'
gid_t grouplist[MAX_GROUPS];
^
../Modules/posixmodule.c:5880:15: error: use of undeclared identifier 'MAX_GROUPS'
if (len > MAX_GROUPS) {

@yzhang71
Copy link
Contributor Author

The error occurs because MAX_GROUPS is not defined for our platform during compilation. To fix this issue, we can manually define MAX_GROUPS in the pyconfig.h file.

Solution

Add the following line to the pyconfig.h file in your build directory:

#define MAX_GROUPS 64

Explanation

MAX_GROUPS specifies the maximum number of groups a user can belong to.

@yzhang71
Copy link
Contributor Author

../Modules/posixmodule.c:1800:11: error: call to undeclared function 'chflags'; ISO C99 and later do not support implicit function declarations [-Wimplicit-function-declaration]
res = chflags(path, flags);
^

@yzhang71
Copy link
Contributor Author

The error occurs because the function chflags is not declared or supported on our platform.

Solution: Disable chflags

Modify the build to exclude chflags if it is not supported on your platform. In the pyconfig.h file located in your build directory, ensure that HAVE_CHFLAGS is undefined:

#undef HAVE_CHFLAGS

@yzhang71
Copy link
Contributor Author

	Modules/python.o \
	libpython2.7.a    -lm  

wasm-ld: error: unable to find library -lm
clang-16: error: linker command failed with exit code 1 (use -v to see invocation)
make: *** [Makefile:416: python] Error 1

@yzhang71
Copy link
Contributor Author

This error happens because the linker (wasm-ld) cannot locate the libm library, which provides mathematical functions.

Solution

To resolve the issue, create a static libm.a library in the sysroot location for WASM:
Run the following command to generate libm.a:

/clang+llvm-16.0.4-x86_64-linux-gnu-ubuntu-22.04/bin/llvm-ar crs "/lind-wasm/glibc/sysroot/lib/wasm32-wasi/libm.a"

Explanation

  • The -lm flag instructs the linker to link against the math library (libm).
  • For WASM, a compatible libm.a must exist in the sysroot's library path (/lib/wasm32-wasi/).
  • Running llvm-ar crs creates an empty libm.a, which satisfies the linker's requirement for the library, even if the actual mathematical functions are not needed.

@qianxichen233
Copy link
Contributor

A little more progress on this:

  1. Last time when I was working on this (last month I guess?), the encountered issue was something related to malloc. Looks like after mmap implementation has finialized, this issue went away
  2. Next issue I faced is that python entered an infinite loop somewhere at its initialization step. Luckily this part does not involve asyncify so I can gdb into the location and found out it is because python is trying to call readlink syscall, which we do not have. So I disabled readlink in pyconfig.h so that python uses alternative routine that we support.
  3. After dealing with that, we got python running a little bit further. This time we got some error messages generated by python:
    Could not find platform independent libraries <prefix>
    Could not find platform dependent libraries <exec_prefix>
    Consider setting $PYTHONHOME to <prefix>[:<exec_prefix>]
    TypeError: zipimporter() takes exactly 0 arguments (1 given)
    So looks like python is trying to load something at its initialization step but we do not have. I guess we probably have some clues from the previous experience of compilation of python under lind-nacl?

@rennergade
Copy link
Contributor

A little more progress on this:

1. Last time when I was working on this (last month I guess?), the encountered issue was something related to malloc. Looks like after mmap implementation has finialized, this issue went away

2. Next issue I faced is that python entered an infinite loop somewhere at its initialization step. Luckily this part does not involve asyncify so I can gdb into the location and found out it is because python is trying to call readlink syscall, which we do not have. So I disabled readlink in pyconfig.h so that python uses alternative routine that we support.

3. After dealing with that, we got python running a little bit further. This time we got some error messages generated by python:
   `Could not find platform independent libraries <prefix>`
   `Could not find platform dependent libraries <exec_prefix>`
   `Consider setting $PYTHONHOME to <prefix>[:<exec_prefix>]`
   `TypeError: zipimporter() takes exactly 0 arguments (1 given)`
   So looks like python is trying to load something at its initialization step but we do not have. I guess we probably have some clues from the previous experience of compilation of python under lind-nacl?

I'm not sure about the zipimporter error but the PYTHONHOME/PYTHONPATH variables are env variables that need to be set. We did encounter this in lind-nacl. I think @Yaxuan-w knows what values they should be set to?

@qianxichen233
Copy link
Contributor

I looked into the path that python is trying to load from, it turns out it is trying to load a file called os.py under usr/local/lib/python2.7. I found os.py is right inside python source file so I copied it to the desired path under lind filesystem and the first error message is gone.
For the second error message, I found out python is trying to find a folder called lib-dynload under usr/local/lib/python2.7. From what I have read, this folder seems to be where all the dynamic loaded module lives. I am not very sure if we need dynamic loading fully working to support it, but I created an empty folder for lib-dynload and the second error message is also gone.
The third error message took me some time to debug, but I finally figured out this is another issue due to locale: isalpha function is not working as expected and will always return false. So it looks like my previous fix for locale is still not comprehensive and there might be some other locale related function that is not working

@Yaxuan-w
Copy link
Member

Yaxuan-w commented Jan 6, 2025

I've discussed the readlink issue with Qianxi in part 2, and in the newest PR, I've added support for both readlink and readlinkat.

Regarding the environment setup mentioned in part 3, here's the configuration I previously used:

export LD_LIBRARY_PATH=/usr/local/python-gcc4/lib:/usr/local/pgsql/lib:$LD_LIBRARY_PATH
export PYTHONHOME="/usr/local/python-gcc4"
export PYTHONPATH="/usr/local/python-gcc4/lib/:/usr/local/python-gcc4/lib/python2.7/site-packages"
export PATH=/usr/local/python-gcc4/bin/:$PATH

This setup is used for running:

PYTHONHOME is used to specify the root directory for python to find python executables
PYTHONPATH includes all the paths required for python libraries.

This is the script I used to compile native-python: https://github.com/Lind-Project/lind_project/blob/alice-py-gcc4/tests/applications/python-native-gcc4/python-gcc4/bootstrap_native.sh

From the error message Qianxi shared, part of the issue seems to be related to settings (e.g., os.py) and could potentially be resolved by adjusting the environment variables as shown above.

From what I recall, the specific library files within lib-dynload (e.g., for math, datetime, etc.) are typically required at runtime for executing python code and I don't remember we used PYTHONHOME/PYTHONPATH in compilation. During the compilation process, I did not encounter any need to explicitly declare the lib-dynload path. In my setup, I completed the entire compilation process first, and only afterward created paths for config, site-packages, and lib-dynload.

I'd prefer debugging from the perspective of reconfiguring the dependency env settings or figuring what's the current compilation stage (only a personal suggestion)

@qianxichen233
Copy link
Contributor

The locale issue was because I forgot the compile crt1.c after merging fix-locale branch. So isalpha is working.
After this issue is resolved, the next error reported by python was "ImportError: No module named site". So looks like python is trying to import a file called site.py under lib-dynload. I found site.py is under Lib folder of python source code, so I just copied it to lib-dynload and this error went away.

The next issue I encountered was from wasmtime, reporting that wasm trap: indirect call type mismatch. The error was triggered inside PyCFunction_Call function under methodobject.c. This function retrieves the PyCFunction from PyObject, and invoke the function pointer based on argument format. The routine that triggered indirect call type mismatch error was case METH_NOARGS and python tries to invoke the function using (*meth)(self, NULL). After inspecting, I found out the function it tries to invoke is python's dictionary's keys function, which does not take any argument. But python is trying to call it using NULL as argument. In c, this is allowed and it can run successfully, but this is an explicitly disallowed behavior in wasm and it will report type mismatch error. I am still not 100% sure this is what is happening here, but if this is the case, we might have some trouble resolving this issue

@rennergade
Copy link
Contributor

There are some other python builds with wasm floating around. May be worth checking them out to see if theyve changed anything.

One that I came across is this which doesn't seem to be changing the source: https://github.com/emmatyping/python-wasm

Are you building python with --host=wasm32-unknown-wasi?

@qianxichen233
Copy link
Contributor

yes, I am using --host=wasm32-unknown-wasi --target=wasm32-unknown-wasi

@qianxichen233
Copy link
Contributor

qianxichen233 commented Jan 6, 2025

I looked into the code and I guess I have verified that what I said is exactly what is happening:
The definition of PyObject is typedef PyObject *(*PyCFunction)(PyObject *, PyObject *);, when constructing the PyMethodDef object, it is using {"keys", (PyCFunction)dict_keys, METH_NOARGS, keys__doc__}, where dict_keys is defined as static PyObject *dict_keys(PyDictObject *mp). python is forcely casting the function type here and causes wasm to report mismatch function type error.
I also verified this by changing the signature of dict_keys to static PyObject *dict_keys(PyDictObject *mp, char* unused), and the issue went away.

The python-wasm builds is using python 3 instead of python 2, I am not sure if that's the reason they did not face such issue

@rennergade
Copy link
Contributor

This blog post seems to be referencing what you're talking about:

https://blog.pyodide.org/posts/function-pointer-cast-handling/

@qianxichen233
Copy link
Contributor

That's cool, I'll take a look at it later

btw, after I bypassed dict_keys issue by modifying the dict_keys signature, python is able to execute the following script successfully using python -S test.py

result = 1 + 1
print "The result of 1 + 1 is:", result

@rennergade
Copy link
Contributor

That's awesome! Almost there!

@qianxichen233
Copy link
Contributor

qianxichen233 commented Jan 8, 2025

I tried to use wasm-opt --fpcast-emu over the compiled wasm file with dict_keys function signature modification reverted back, so this is working and function type mismatch error is gone. I also checked the wasm file size after applying the --fpcast-emu, it seems like it only increased from 21M to 22M, which is a suprisingly small increase. Though, from the article, I think code expansion might not really be the biggest overhead of using --fpcast-emu, the emulated function pointer is decreasing runtime performance and would occupy a lot of runtime stack space, making our stack cumsumed much more faster when the call stack is deep.

I have several thoughs about this. So, if I am understanding correctly, --fpcast-emu is turning every indrect_call into an emulated function call that always take 60 arguments, so that they have enough space to hold the extra arguments that passed to the function. And I think the 60 is where the main space consumption comes from since each function call would waste nearly 60 arguments on the stack. So I have two optimization strategies here:

  1. At least in case of python, I believe we do not need 60 arguments. The function that generate the function type mismatch only takes two arguments at most, so if we can make it only take 2 or 3 arguments instead of 60, which should save a huge amount of space and make the overhead minimal.
  2. I believe we also do not need to apply the indrect_call transformation to all the indirect_call in the wasm file. In python, I currently only observed one place where the function type mismatch could possibly happen, which is the indirect function call inside PyCFunction_Call function under methodobject.c file. If we can limit the scope of --fpcast-emu transformation to make it only apply to PyCFunction_Call, I believe that's also a good save of resources.

These two strategies are not direct and general optimization to --fpcast-emu, since --fpcast-emu itself should indeed be comprehensive and cover all the edge cases. But I think in case of lind, we have the freedom to adjust the parameters here, like we can have customized build parameters for each software we want to build under lind and this should give us the most optimized build of the software under lind.

@qianxichen233
Copy link
Contributor

qianxichen233 commented Jan 8, 2025

So previously I was able to run some simple python script like 1+1 with -S option. -S option is actually skipping python's pre-loading some modules. But we still have some issues when running python without -S option. The first issue is about memory fault at wasm address 0x100000027 in linear memory of size 0x100000000 error. I looked into the code the triggered the error and it's because a global variable _PyThreadState_Current, which is a pointer, get sets to 0xffffffff for some reason. I used watch command under gdb to trace the change of the variable and found out this is very likely because we have stack overflow happening here. Wasm by default only gives 65536 bytes of stack space, and python here is overflowing the stack boundary way far and even override some global variables in global areas. So looks like we should solve this issue by simply increasing the size of stack. It actually took me quite a while to figure out what options I need to add to customize the stack size. The wasm-ld documentation did not list any options to set stack size. It is finally chatgpt told me the correct options to set stack size is -Wl,-z,stack-size=value (though it only gave me the correct answer at the 2nd time when I ask it). But anyway I am now adjusting the default stack size to be 1MB, and this issue went away.

Next, I am getting another stack overflow error (call stack exhausted). But this time, it is because we have some infinite recursion: __log function calls __ieee754_log function, and __ieee754_log also calls __log. I think it is because we are not really handling glibc math library correctly and made the functions entered the infinite loop.

@rennergade
Copy link
Contributor

So previously I was able to run some simple python script like 1+1 with -S option. -S option is actually skipping python's pre-loading some modules. But we still have some issues when running python without -S option. The first issue is about memory fault at wasm address 0x100000027 in linear memory of size 0x100000000 error. I looked into the code the triggered the error and it's because a global variable _PyThreadState_Current, which is a pointer, get sets to 0xffffffff for some reason. I used watch command under gdb to trace the change of the variable and found out this is very likely because we have stack overflow happening here. Wasm by default only gives 65536 bytes of stack space, and python here is overflowing the stack boundary way far and even override some global variables in global areas. So looks like we should solve this issue by simply increasing the size of stack. It actually took me quite a while to figure out what options I need to add to customize the stack size. The wasm-ld documentation did not list any options to set stack size. It is finally chatgpt told me the correct options to set stack size is -Wl,-z,stack-size=value (though it only gave me the correct answer at the 2nd time when I ask it). But anyway I am now adjusting the default stack size to be 1MB, and this issue went away.

Next, I am getting another stack overflow error (call stack exhausted). But this time, it is because we have some infinite recursion: __log function calls __ieee754_log function, and __ieee754_log also calls __log. I think it is because we are not really handling glibc math library correctly and made the functions entered the infinite loop.

Great progress with this! Maybe open up a new issue for the math lib recursion and assign it to @robinyuan1002

@qianxichen233
Copy link
Contributor

qianxichen233 commented Jan 9, 2025

I temporarily bypassed log function issue by just making it return 0. I guess these math functions should not interfere with just booting up python so this might be fine for now.

Next I am getting a weird issue from wasmtime reporting call stack exhausted again. But this time the call stack isn't really long. I also printed the stack pointer and I think there should be more than enough stack spaces when the error is triggered. What's more strange is that when I run the python without performing the Asyncify transformation, the issue disappeared. I believe during the booting up, python does not really used anything related to Asyncify like pthread/multi-processing, so I think the code should behave exactly the same here whether or not Asyncify is applied. But it turns out it is not. So it looks like something is happening here.

Btw, when running python without performing Asyncify transformation, we now got a python traceback error:

Traceback (most recent call last):
  File "/usr/local/lib/python2.7/lib-dynload/site.py", line 563, in <module>
    main()
  File "/usr/local/lib/python2.7/lib-dynload/site.py", line 545, in main
    known_paths = addusersitepackages(known_paths)
  File "/usr/local/lib/python2.7/lib-dynload/site.py", line 278, in addusersitepackages
    user_site = getusersitepackages()
  File "/usr/local/lib/python2.7/lib-dynload/site.py", line 253, in getusersitepackages
    user_base = getuserbase() # this will also set USER_BASE
  File "/usr/local/lib/python2.7/lib-dynload/site.py", line 243, in getuserbase
    USER_BASE = get_config_var('userbase')
  File "/usr/local/lib/python2.7/lib-dynload/sysconfig.py", line 520, in get_config_var
    return get_config_vars().get(name)
  File "/usr/local/lib/python2.7/lib-dynload/sysconfig.py", line 419, in get_config_vars
    _init_posix(_CONFIG_VARS)
  File "/usr/local/lib/python2.7/lib-dynload/sysconfig.py", line 287, in _init_posix
    raise IOError(msg)
IOError: invalid Python installation: unable to open /usr/local/lib/python2.7/config/Makefile (No such file or directory)

This is a pretty much expected error since we indeed do not have this file yet. I wonder how lind-nacl deals with this, like do python under lind-nacl has this file and how it is generated? @Yaxuan-w

@qianxichen233
Copy link
Contributor

qianxichen233 commented Jan 9, 2025

I temporarily bypassed log function issue by just making it return 0. I guess these math functions should not interfere with just booting up python so this might be fine for now.

Next I am getting a weird issue from wasmtime reporting call stack exhausted again. But this time the call stack isn't really long. I also printed the stack pointer and I think there should be more than enough stack spaces when the error is triggered. What's more strange is that when I run the python without performing the Asyncify transformation, the issue disappeared. I believe during the booting up, python does not really used anything related to Asyncify like pthread/multi-processing, so I think the code should behave exactly the same here whether or not Asyncify is applied. But it turns out it is not. So it looks like something is happening here.

Btw, when running python without performing Asyncify transformation, we now got a python traceback error:

Traceback (most recent call last):
  File "/usr/local/lib/python2.7/lib-dynload/site.py", line 563, in <module>
    main()
  File "/usr/local/lib/python2.7/lib-dynload/site.py", line 545, in main
    known_paths = addusersitepackages(known_paths)
  File "/usr/local/lib/python2.7/lib-dynload/site.py", line 278, in addusersitepackages
    user_site = getusersitepackages()
  File "/usr/local/lib/python2.7/lib-dynload/site.py", line 253, in getusersitepackages
    user_base = getuserbase() # this will also set USER_BASE
  File "/usr/local/lib/python2.7/lib-dynload/site.py", line 243, in getuserbase
    USER_BASE = get_config_var('userbase')
  File "/usr/local/lib/python2.7/lib-dynload/sysconfig.py", line 520, in get_config_var
    return get_config_vars().get(name)
  File "/usr/local/lib/python2.7/lib-dynload/sysconfig.py", line 419, in get_config_vars
    _init_posix(_CONFIG_VARS)
  File "/usr/local/lib/python2.7/lib-dynload/sysconfig.py", line 287, in _init_posix
    raise IOError(msg)
IOError: invalid Python installation: unable to open /usr/local/lib/python2.7/config/Makefile (No such file or directory)

This is a pretty much expected error since we indeed do not have this file yet. I wonder how lind-nacl deals with this, like do python under lind-nacl has this file and how it is generated? @Yaxuan-w

For the first call stack exhausted issue, I figured out that it is because Asyncify somehow expanded the local variables of PyEval_EvalFrameEx too much. It is horrible when I saw the first line of PyEval_EvalFrameEx function in wat format is (local i32 i32 i32 i32 i32 ..... where vscode cannot fully display this line and has show more (67.0KB) at the end. So this might be the reason why Asyncified code could overflow the stack because there is too much variables here. I fixed this by passing -O2 to wasm-opt when doing the Asyncify, that makes the code smaller and the call stack exhausted error is gone. And now Asyncified version of python also comes to config/Makefile not found issue.

Though I am still a little bit confused why the unoptimized version could yield call stack exhausted issue. I thought all of these local variables are stored inside the linear memory, which if is the case, should not overflow the stack at all because I am giving the stack 1GB space here. But the fact that this is still giving the callstack overflow error proves that I do have some misunderstanding about this before. Looks like variables defined by local is not immediately stored in the linear memory, there looks like to be another callstack that is not part of liear memory that saves these variables. (which also makes me wonder if I can change the size of this callstack)

@Yaxuan-w
Copy link
Member

Yaxuan-w commented Jan 9, 2025

This is a pretty much expected error since we indeed do not have this file yet. I wonder how lind-nacl deals with this, like do python under lind-nacl has this file and how it is generated? @Yaxuan-w

There's a Makefile in python source code. Just called Makefile located in the top level of the source directory and copy that file into desired location. eg: cp /home/lind-wasm/python/Makefile /usr/local/lib/python2.7/config/Makefile

@rennergade
Copy link
Contributor

@qianxichen233 I've seen references to the "shadow stack" (sounds menacing!) like in this explanation: https://news.ycombinator.com/item?id=24220630

@JustinCappos
Copy link
Member

JustinCappos commented Jan 9, 2025 via email

@qianxichen233
Copy link
Contributor

We now get this error. Looks like it has to do with uid. Do we have any clue for this?

Traceback (most recent call last):
  File "/usr/local/lib/python2.7/lib-dynload/site.py", line 563, in <module>
    main()
  File "/usr/local/lib/python2.7/lib-dynload/site.py", line 545, in main
    known_paths = addusersitepackages(known_paths)
  File "/usr/local/lib/python2.7/lib-dynload/site.py", line 278, in addusersitepackages
    user_site = getusersitepackages()
  File "/usr/local/lib/python2.7/lib-dynload/site.py", line 253, in getusersitepackages
    user_base = getuserbase() # this will also set USER_BASE
  File "/usr/local/lib/python2.7/lib-dynload/site.py", line 243, in getuserbase
    USER_BASE = get_config_var('userbase')
  File "/usr/local/lib/python2.7/lib-dynload/sysconfig.py", line 520, in get_config_var
    return get_config_vars().get(name)
  File "/usr/local/lib/python2.7/lib-dynload/sysconfig.py", line 424, in get_config_vars
    _CONFIG_VARS['userbase'] = _getuserbase()
  File "/usr/local/lib/python2.7/lib-dynload/sysconfig.py", line 182, in _getuserbase
    return env_base if env_base else joinuser("~", ".local")
  File "/usr/local/lib/python2.7/lib-dynload/sysconfig.py", line 169, in joinuser
    return os.path.expanduser(os.path.join(*args))
  File "/usr/local/lib/python2.7/lib-dynload/posixpath.py", line 260, in expanduser
    userhome = pwd.getpwuid(os.getuid()).pw_dir
KeyError: 'getpwuid(): uid not found: 0'

@qianxichen233
Copy link
Contributor

We now get this error. Looks like it has to do with uid. Do we have any clue for this?

Traceback (most recent call last):
  File "/usr/local/lib/python2.7/lib-dynload/site.py", line 563, in <module>
    main()
  File "/usr/local/lib/python2.7/lib-dynload/site.py", line 545, in main
    known_paths = addusersitepackages(known_paths)
  File "/usr/local/lib/python2.7/lib-dynload/site.py", line 278, in addusersitepackages
    user_site = getusersitepackages()
  File "/usr/local/lib/python2.7/lib-dynload/site.py", line 253, in getusersitepackages
    user_base = getuserbase() # this will also set USER_BASE
  File "/usr/local/lib/python2.7/lib-dynload/site.py", line 243, in getuserbase
    USER_BASE = get_config_var('userbase')
  File "/usr/local/lib/python2.7/lib-dynload/sysconfig.py", line 520, in get_config_var
    return get_config_vars().get(name)
  File "/usr/local/lib/python2.7/lib-dynload/sysconfig.py", line 424, in get_config_vars
    _CONFIG_VARS['userbase'] = _getuserbase()
  File "/usr/local/lib/python2.7/lib-dynload/sysconfig.py", line 182, in _getuserbase
    return env_base if env_base else joinuser("~", ".local")
  File "/usr/local/lib/python2.7/lib-dynload/sysconfig.py", line 169, in joinuser
    return os.path.expanduser(os.path.join(*args))
  File "/usr/local/lib/python2.7/lib-dynload/posixpath.py", line 260, in expanduser
    userhome = pwd.getpwuid(os.getuid()).pw_dir
KeyError: 'getpwuid(): uid not found: 0'

Looks like it's because python tries to use /etc/passwd. Are we supposed to have this file in lind?

@JustinCappos
Copy link
Member

JustinCappos commented Jan 10, 2025 via email

@rennergade
Copy link
Contributor

We now get this error. Looks like it has to do with uid. Do we have any clue for this?

Traceback (most recent call last):
  File "/usr/local/lib/python2.7/lib-dynload/site.py", line 563, in <module>
    main()
  File "/usr/local/lib/python2.7/lib-dynload/site.py", line 545, in main
    known_paths = addusersitepackages(known_paths)
  File "/usr/local/lib/python2.7/lib-dynload/site.py", line 278, in addusersitepackages
    user_site = getusersitepackages()
  File "/usr/local/lib/python2.7/lib-dynload/site.py", line 253, in getusersitepackages
    user_base = getuserbase() # this will also set USER_BASE
  File "/usr/local/lib/python2.7/lib-dynload/site.py", line 243, in getuserbase
    USER_BASE = get_config_var('userbase')
  File "/usr/local/lib/python2.7/lib-dynload/sysconfig.py", line 520, in get_config_var
    return get_config_vars().get(name)
  File "/usr/local/lib/python2.7/lib-dynload/sysconfig.py", line 424, in get_config_vars
    _CONFIG_VARS['userbase'] = _getuserbase()
  File "/usr/local/lib/python2.7/lib-dynload/sysconfig.py", line 182, in _getuserbase
    return env_base if env_base else joinuser("~", ".local")
  File "/usr/local/lib/python2.7/lib-dynload/sysconfig.py", line 169, in joinuser
    return os.path.expanduser(os.path.join(*args))
  File "/usr/local/lib/python2.7/lib-dynload/posixpath.py", line 260, in expanduser
    userhome = pwd.getpwuid(os.getuid()).pw_dir
KeyError: 'getpwuid(): uid not found: 0'

Looks like it's because python tries to use /etc/passwd. Are we supposed to have this file in lind?

This is the script we used before to load a bunch of config files including /etc/passwd: https://github.com/Lind-Project/lind_project/blob/main/src/scripts/base/load_confs.sh

@qianxichen233
Copy link
Contributor

I tried to create /etc/passwd but it seems like it does not work. I used another approach to set HOME variable to /root so that python will not query /etc/passwd. This approach works and now python can fully launch without -S option. I tests a few python script including the a simple one that does some calculations and print the result, and a more advanced one that uses branch/loop/functions etc, and they are working. (Actually while python is loading the site modules during initialization, these site modules are all written in python. We can reach to the point where python execute user program already means these python site modules are being executed successfully)

@rennergade
Copy link
Contributor

That's amazing. This is a big step!

Creating /etc/passwd in relation to the LINDROOT path doesn't work? That seems strange.

@Yaxuan-w
Copy link
Member

Looks like it's because python tries to use /etc/passwd. Are we supposed to have this file in lind?

We will create /etc/passwd manually in Lind by copy the file in /home/lind/lind_project/src/scripts/includes/passwd

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

7 participants