Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Some segfaults #5

Open
dagmoller opened this issue Jul 26, 2021 · 39 comments
Open

Some segfaults #5

dagmoller opened this issue Jul 26, 2021 · 39 comments

Comments

@dagmoller
Copy link

Hi,

I'm having some segfaults here. nginx version: nginx/1.20.1
Please help me to get some debug info for you.

@gabihodoroaga
Copy link
Owner

Hi,

The only way to see why it crashes is to get the core dump files

https://docs.nginx.com/nginx/admin-guide/monitoring/debugging/#enabling-core-dumps-in-nginx-configuration

You have to add this in your main nginx.conf

makdir -p /tmp/cores
worker_rlimit_core  500M;
working_directory   /tmp/cores/;

Then run gdb to obtain backtrace as usual, e.g.

gdb /path/to/nginx /tmp/nginx.core.PID
backtrace full

It will help if you can share your current nginx configuration (only the relevant parts) and some load statistics.

Thanks
g.

@dagmoller
Copy link
Author

dagmoller commented Aug 17, 2021

Hi,

Don't know if it can help. The load is very low. I have a lot of reverse proxys with ntlm enabled on this instance.

Core was generated by `nginx: worker process                   '.
Program terminated with signal SIGSEGV, Segmentation fault.
#0  ngx_http_upstream_ntlm_close_handler (ev=<optimized out>) at ../nginx-ntlm-module/ngx_http_upstream_ntlm_module.c:435
435         ngx_queue_insert_head(&conf->free, &item->queue);
Missing separate debuginfos, use: yum debuginfo-install nginx-1.20.1-1.el8.ngx.x86_64
(gdb) backtrace full
#0  ngx_http_upstream_ntlm_close_handler (ev=<optimized out>) at ../nginx-ntlm-module/ngx_http_upstream_ntlm_module.c:435
        conf = 0x5556a19cedd0
        item = 0x5556a19ceec0
        n = <optimized out>
        buf = ""
        c = 0x5556a19cf910
#1  0x000055569ec373e0 in ngx_event_process_posted ()
No symbol table info available.
#2  0x000055569ec3edf9 in ngx_worker_process_cycle ()
No symbol table info available.
#3  0x000055569ec3d41f in ngx_spawn_process ()
No symbol table info available.
#4  0x000055569ec3fbc7 in ngx_master_process_cycle ()
No symbol table info available.
#5  0x000055569ec15c78 in main ()
No symbol table info available.

@gabihodoroaga
Copy link
Owner

Hi,

Thanks. This definitely helps.
I will try to reproduce it and then fix it.

g.

@dbalaev
Copy link

dbalaev commented Sep 27, 2022

I think your code contains a logical error on the following lines:

static void ngx_http_upstream_client_conn_cleanup(void *data) {
ngx_http_upstream_ntlm_cache_t *item = data;

ngx_log_debug2(
    NGX_LOG_DEBUG_HTTP, ngx_cycle->log, 0,
    "ntlm client connection closed %p, droping peer connection %p",
    item->client_connection, item->peer_connection);

// Check if the item was removed previously from the queue (backend drop)
if (item->peer_connection != NULL) {

    item->peer_connection->read->timedout = 1;
    ngx_post_event(item->peer_connection->read,&ngx_posted_events);

    ngx_queue_remove(&item->queue);
    ngx_queue_insert_head(&item->conf->free, &item->queue);
}

}

If the connection is already deleted, then calling the function ngx_log_debug2 results in segfault.
Because the item array might already be destroyed.

@gabihodoroaga
Copy link
Owner

@dbalaev Thanks for pointing this.
Do you have a way to reproduce the segfault?
This will help me a lot.
Thanks

@dbalaev
Copy link

dbalaev commented Sep 29, 2022

Build CENTOS 7 + NGINX 1.23.1-1

[Thread debugging using libthread_db enabled]
Using host libthread_db library "/lib64/libthread_db.so.1".
Core was generated by `nginx: worker process '.
Program terminated with signal 11, Segmentation fault.
#0 0x000056228acbeac6 in ngx_http_upstream_handler (ev=0x56228cc13460) at src/http/ngx_http_upstream.c:1278
1278 ngx_http_set_log_request(c->log, r);
(gdb) bt
#0 0x000056228acbeac6 in ngx_http_upstream_handler (ev=0x56228cc13460) at src/http/ngx_http_upstream.c:1278
#1 0x000056228ac8bb80 in ngx_event_process_posted (cycle=cycle@entry=0x56228c9ccb50, posted=0x56228afa3930 <ngx_posted_events>) at src/event/ngx_event_posted.c:34
#2 0x000056228ac8b72c in ngx_process_events_and_timers (cycle=cycle@entry=0x56228c9ccb50) at src/event/ngx_event.c:263
#3 0x000056228ac93461 in ngx_worker_process_cycle (cycle=cycle@entry=0x56228c9ccb50, data=data@entry=0x0) at src/os/unix/ngx_process_cycle.c:721
#4 0x000056228ac91ccb in ngx_spawn_process (cycle=cycle@entry=0x56228c9ccb50, proc=proc@entry=0x56228ac933e0 <ngx_worker_process_cycle>, data=data@entry=0x0,
name=name@entry=0x56228ad3dc27 "worker process", respawn=respawn@entry=-3) at src/os/unix/ngx_process.c:199
#5 0x000056228ac93850 in ngx_start_worker_processes (cycle=cycle@entry=0x56228c9ccb50, n=1, type=type@entry=-3) at src/os/unix/ngx_process_cycle.c:344
#6 0x000056228ac940b3 in ngx_master_process_cycle (cycle=cycle@entry=0x56228c9ccb50) at src/os/unix/ngx_process_cycle.c:130
#7 0x000056228ac6a85d in main (argc=, argv=) at src/core/nginx.c:383
(gdb) f 0
#0 0x000056228acbeac6 in ngx_http_upstream_handler (ev=0x56228cc13460) at src/http/ngx_http_upstream.c:1278
1278 ngx_http_set_log_request(c->log, r);

@klp18
Copy link

klp18 commented Oct 7, 2022

same segfault on ubuntu 22 with nginx/1.22.0
I can reproduce the problem. It occurs every 1-2 minutes on my server, which has 10-20 people working through it.

https://gist.github.com/klp18/a9eb7118e951528e3c9da65851a18d45

Using host libthread_db library "/lib/x86_64-linux-gnu/libthread_db.so.1".
Core was generated by `nginx: worker process '.
Program terminated with signal SIGSEGV, Segmentation fault.
#0 0x0000564b3c391d44 in ngx_http_upstream_handler (ev=0x564b3de72ad0) at src/http/ngx_http_upstream.c:1282
1282 ngx_http_set_log_request(c->log, r);
(gdb) bt
#0 0x0000564b3c391d44 in ngx_http_upstream_handler (ev=0x564b3de72ad0) at src/http/ngx_http_upstream.c:1282
#1 0x0000564b3c358a2e in ngx_event_process_posted (cycle=cycle@entry=0x564b3dc93490, posted=0x564b3c46cd10 <ngx_posted_events>)
at src/event/ngx_event_posted.c:34
#2 0x0000564b3c3584e6 in ngx_process_events_and_timers (cycle=cycle@entry=0x564b3dc93490) at src/event/ngx_event.c:263
#3 0x0000564b3c362141 in ngx_worker_process_cycle (cycle=0x564b3dc93490, data=) at src/os/unix/ngx_process_cycle.c:721
#4 0x0000564b3c36057d in ngx_spawn_process (cycle=cycle@entry=0x564b3dc93490, proc=0x564b3c36202a <ngx_worker_process_cycle>, data=0x0,
name=0x564b3c416612 "worker process", respawn=respawn@entry=0) at src/os/unix/ngx_process.c:199
#5 0x0000564b3c362dba in ngx_reap_children (cycle=0x564b3dc93490) at src/os/unix/ngx_process_cycle.c:598
#6 ngx_master_process_cycle (cycle=cycle@entry=0x564b3dc93490) at src/os/unix/ngx_process_cycle.c:174
#7 0x0000564b3c33680d in main (argc=, argv=) at src/core/nginx.c:383
(gdb) exit

@gabihodoroaga
Copy link
Owner

Thanks @klp18 for reporting this.
Can you share you nginx configuration (replace any sensitive information).
Thank. you,
g.

@klp18
Copy link

klp18 commented Oct 7, 2022

@justarandomsysadmin
Copy link

@gabihodoroaga thank's for this module! Did you manage to reproduce the segfaults? It's also happening on my instance (nginx/1.18.0) and shutting down worker proccesses every few minutes (often multiple times per minute).

Let me know if you need further information or configuration files for comparison.

@tmagnien
Copy link

@klp18 would you mind sharing a binary core dump file ? I'm trying to reproduce the issue in lab with no luck yet. Backtrace is useful but I'd like to go further into the investigation.

@klp18
Copy link

klp18 commented Oct 27, 2022

@tmagnien Here are a few dumps.

@tmagnien
Copy link

Thanks @klp18! I will have a look

@tmagnien
Copy link

@klp18 Can you provide me with the output of "nginx -V" command, please?

@klp18
Copy link

klp18 commented Oct 31, 2022

nginx version: nginx/1.22.0
built by gcc 11.2.0 (Ubuntu 11.2.0-19ubuntu1)
built with OpenSSL 3.0.2 15 Mar 2022
TLS SNI support enabled
configure arguments: --prefix=/etc/nginx --sbin-path=/usr/sbin/nginx --conf-path=/etc/nginx/nginx.conf --error-log-path=/var/log/nginx/error.log --http-log-path=/var/log/nginx/access.log --pid-path=/var/run/nginx.pid --lock-path=/var/run/nginx.lock --modules-path=/usr/lib/nginx/modules --http-client-body-temp-path=/var/lib/nginx/body --http-fastcgi-temp-path=/var/lib/nginx/fastcgi --http-proxy-temp-path=/var/lib/nginx/proxy --http-scgi-temp-path=/var/lib/nginx/scgi --http-uwsgi-temp-path=/var/lib/nginx/uwsgi --with-compat --with-debug --with-pcre-jit --with-http_ssl_module --with-http_stub_status_module --with-http_realip_module --with-http_auth_request_module --with-http_v2_module --with-http_dav_module --with-http_slice_module --with-threads --with-http_addition_module --with-http_gunzip_module --with-http_gzip_static_module --with-http_sub_module --with-mail --with-stream --with-stream_ssl_module --with-mail_ssl_module --add-dynamic-module=../nginx-ntlm-module

@tmagnien
Copy link

tmagnien commented Nov 2, 2022

@klp18 I may have found a fix but can't reproduce the issue locally. Would you mind trying the following patch and see if it fixes ?

diff --git a/ngx_http_upstream_ntlm_module.c b/ngx_http_upstream_ntlm_module.c
index 8d1179d..90ed158 100644
--- a/ngx_http_upstream_ntlm_module.c
+++ b/ngx_http_upstream_ntlm_module.c
@@ -380,6 +380,8 @@ static void ngx_http_upstream_client_conn_cleanup(void *data) {
     if (item->peer_connection != NULL) {

         item->peer_connection->read->timedout = 1;
+       // If connection is already closed, log will be empty and will generate a segfault
+       item->peer_connection->log = ngx_cycle->log;
         ngx_post_event(item->peer_connection->read,&ngx_posted_events);

         ngx_queue_remove(&item->queue);

@klp18
Copy link

klp18 commented Nov 2, 2022

@tmagnien
the same crash with the same and new back traces
new dumps

#0  0x000055c199a40bb7 in ngx_http_request_handler (ev=0x55c19a453430) at src/http/ngx_http_request.c:2394
2394        ngx_http_set_log_request(c->log, r);
(gdb) bt
#0  0x000055c199a40bb7 in ngx_http_request_handler (ev=0x55c19a453430) at src/http/ngx_http_request.c:2394
#1  0x000055c199a1aa2e in ngx_event_process_posted (cycle=cycle@entry=0x55c19a273490, posted=0x55c199b2ed10 <ngx_posted_events>) at src/event/ngx_event_posted.c:34
#2  0x000055c199a1a4e6 in ngx_process_events_and_timers (cycle=cycle@entry=0x55c19a273490) at src/event/ngx_event.c:263
#3  0x000055c199a24141 in ngx_worker_process_cycle (cycle=0x55c19a273490, data=<optimized out>) at src/os/unix/ngx_process_cycle.c:721
#4  0x000055c199a2257d in ngx_spawn_process (cycle=cycle@entry=0x55c19a273490, proc=proc@entry=0x55c199a2402a <ngx_worker_process_cycle>, data=data@entry=0x3,
    name=name@entry=0x55c199ad861a "worker process", respawn=respawn@entry=-3) at src/os/unix/ngx_process.c:199
#5  0x000055c199a23859 in ngx_start_worker_processes (cycle=cycle@entry=0x55c19a273490, n=4, type=type@entry=-3) at src/os/unix/ngx_process_cycle.c:344
#6  0x000055c199a24a6e in ngx_master_process_cycle (cycle=cycle@entry=0x55c19a273490) at src/os/unix/ngx_process_cycle.c:130
#7  0x000055c1999f880d in main (argc=<optimized out>, argv=<optimized out>) at src/core/nginx.c:383

#0  0x00007f8bd70ee990 in ngx_http_upstream_ntlm_close_handler (ev=<optimized out>) at ../nginx-ntlm-module/ngx_http_upstream_ntlm_module.c:437
437         ngx_queue_insert_head(&conf->free, &item->queue);
(gdb) bt
#0  0x00007f8bd70ee990 in ngx_http_upstream_ntlm_close_handler (ev=<optimized out>) at ../nginx-ntlm-module/ngx_http_upstream_ntlm_module.c:437
#1  0x000055c199a1aa2e in ngx_event_process_posted (cycle=cycle@entry=0x55c19a273490, posted=0x55c199b2ed10 <ngx_posted_events>) at src/event/ngx_event_posted.c:34
#2  0x000055c199a1a4e6 in ngx_process_events_and_timers (cycle=cycle@entry=0x55c19a273490) at src/event/ngx_event.c:263
#3  0x000055c199a24141 in ngx_worker_process_cycle (cycle=0x55c19a273490, data=<optimized out>) at src/os/unix/ngx_process_cycle.c:721
#4  0x000055c199a2257d in ngx_spawn_process (cycle=cycle@entry=0x55c19a273490, proc=0x55c199a2402a <ngx_worker_process_cycle>, data=0x2,
    name=0x55c199ad861a "worker process", respawn=respawn@entry=2) at src/os/unix/ngx_process.c:199
#5  0x000055c199a24dba in ngx_reap_children (cycle=0x55c19a273490) at src/os/unix/ngx_process_cycle.c:598
#6  ngx_master_process_cycle (cycle=cycle@entry=0x55c19a273490) at src/os/unix/ngx_process_cycle.c:174
#7  0x000055c1999f880d in main (argc=<optimized out>, argv=<optimized out>) at src/core/nginx.c:383

#0  0x000055c199a53d44 in ngx_http_upstream_handler (ev=0x55c19a452b30) at src/http/ngx_http_upstream.c:1282
1282        ngx_http_set_log_request(c->log, r);
(gdb) bt
#0  0x000055c199a53d44 in ngx_http_upstream_handler (ev=0x55c19a452b30) at src/http/ngx_http_upstream.c:1282
#1  0x000055c199a1aa2e in ngx_event_process_posted (cycle=cycle@entry=0x55c19a273490, posted=0x55c199b2ed10 <ngx_posted_events>) at src/event/ngx_event_posted.c:34
#2  0x000055c199a1a4e6 in ngx_process_events_and_timers (cycle=cycle@entry=0x55c19a273490) at src/event/ngx_event.c:263
#3  0x000055c199a24141 in ngx_worker_process_cycle (cycle=0x55c19a273490, data=<optimized out>) at src/os/unix/ngx_process_cycle.c:721
#4  0x000055c199a2257d in ngx_spawn_process (cycle=cycle@entry=0x55c19a273490, proc=0x55c199a2402a <ngx_worker_process_cycle>, data=0x1,
    name=0x55c199ad861a "worker process", respawn=respawn@entry=1) at src/os/unix/ngx_process.c:199
#5  0x000055c199a24dba in ngx_reap_children (cycle=0x55c19a273490) at src/os/unix/ngx_process_cycle.c:598
#6  ngx_master_process_cycle (cycle=cycle@entry=0x55c19a273490) at src/os/unix/ngx_process_cycle.c:174
#7  0x000055c1999f880d in main (argc=<optimized out>, argv=<optimized out>) at src/core/nginx.c:383

@tmagnien
Copy link

tmagnien commented Nov 9, 2022

@klp18 Sorry to bother again but I can't reproduce it so would you mind replacing previous patch with this one:

diff --git a/ngx_http_upstream_ntlm_module.c b/ngx_http_upstream_ntlm_module.c
index 8d1179d..6066980 100644
--- a/ngx_http_upstream_ntlm_module.c
+++ b/ngx_http_upstream_ntlm_module.c
@@ -379,8 +379,9 @@ static void ngx_http_upstream_client_conn_cleanup(void *data) {
     // Check if the item was removed previously from the queue (backend drop)
     if (item->peer_connection != NULL) {

-        item->peer_connection->read->timedout = 1;
-        ngx_post_event(item->peer_connection->read,&ngx_posted_events);
+        ngx_http_upstream_ntlm_close(item->peer_connection);
+        /*item->peer_connection->read->timedout = 1;
+        ngx_post_event(item->peer_connection->read,&ngx_posted_events);*/

         ngx_queue_remove(&item->queue);
         ngx_queue_insert_head(&item->conf->free, &item->queue);

If segfault still occurs, it would help me to also have nginx debug logs. Thanks a lot!

@klp18
Copy link

klp18 commented Nov 9, 2022

@tmagnien
new dumps
new back trace

#0  0x000055f6ecf3f300 in ngx_destroy_pool (pool=0x55f6eee5ed70) at src/core/ngx_palloc.c:69
69              ngx_log_debug1(NGX_LOG_DEBUG_ALLOC, pool->log, 0, "free: %p", l->alloc);
(gdb) bt
#0  0x000055f6ecf3f300 in ngx_destroy_pool (pool=0x55f6eee5ed70) at src/core/ngx_palloc.c:69
#1  0x00007f2fe95ec767 in ngx_http_upstream_ntlm_close (c=0x55f6eede0720) at ../nginx-ntlm-module/ngx_http_upstream_ntlm_module.c:456
#2  0x00007f2fe95ec7b9 in ngx_http_upstream_client_conn_cleanup (data=0x55f6eecc7670) at ../nginx-ntlm-module/ngx_http_upstream_ntlm_module.c:382
#3  0x000055f6ecf3f2ac in ngx_destroy_pool (pool=pool@entry=0x55f6eeea3610) at src/core/ngx_palloc.c:57
#4  0x000055f6ecf84942 in ngx_http_close_connection (c=c@entry=0x55f6eede0db0) at src/http/ngx_http_request.c:3787
#5  0x000055f6ecf859a2 in ngx_http_close_request (r=r@entry=0x55f6eeef4080, rc=rc@entry=0) at src/http/ngx_http_request.c:3646
#6  0x000055f6ecf85c8a in ngx_http_terminate_handler (r=0x55f6eeef4080) at src/http/ngx_http_request.c:2723
#7  0x000055f6ecf85529 in ngx_http_run_posted_requests (c=c@entry=0x55f6eede0db0) at src/http/ngx_http_request.c:2450
#8  0x000055f6ecf85bf2 in ngx_http_request_handler (ev=0x55f6eee0ce90) at src/http/ngx_http_request.c:2418
#9  0x000055f6ecf6b5d4 in ngx_epoll_process_events (cycle=0x55f6eec2d490, timer=<optimized out>, flags=<optimized out>)
    at src/event/modules/ngx_epoll_module.c:901
#10 0x000055f6ecf5f4a1 in ngx_process_events_and_timers (cycle=cycle@entry=0x55f6eec2d490) at src/event/ngx_event.c:248
#11 0x000055f6ecf69141 in ngx_worker_process_cycle (cycle=0x55f6eec2d490, data=<optimized out>) at src/os/unix/ngx_process_cycle.c:721
#12 0x000055f6ecf6757d in ngx_spawn_process (cycle=cycle@entry=0x55f6eec2d490, proc=proc@entry=0x55f6ecf6902a <ngx_worker_process_cycle>,
    data=data@entry=0x0, name=name@entry=0x55f6ed01d61a "worker process", respawn=respawn@entry=-3) at src/os/unix/ngx_process.c:199
#13 0x000055f6ecf68859 in ngx_start_worker_processes (cycle=cycle@entry=0x55f6eec2d490, n=4, type=type@entry=-3) at src/os/unix/ngx_process_cycle.c:344
#14 0x000055f6ecf69a6e in ngx_master_process_cycle (cycle=cycle@entry=0x55f6eec2d490) at src/os/unix/ngx_process_cycle.c:130
#15 0x000055f6ecf3d80d in main (argc=<optimized out>, argv=<optimized out>) at src/core/nginx.c:383

@tmagnien
Copy link

Hi again @klp18. Can you try the following patch, please:

diff --git a/ngx_http_upstream_ntlm_module.c b/ngx_http_upstream_ntlm_module.c
index 8d1179d..88ff463 100644
--- a/ngx_http_upstream_ntlm_module.c
+++ b/ngx_http_upstream_ntlm_module.c
@@ -378,9 +378,7 @@ static void ngx_http_upstream_client_conn_cleanup(void *data) {

     // Check if the item was removed previously from the queue (backend drop)
     if (item->peer_connection != NULL) {
-
-        item->peer_connection->read->timedout = 1;
-        ngx_post_event(item->peer_connection->read,&ngx_posted_events);
+       ngx_http_upstream_ntlm_close(item->peer_connection);

         ngx_queue_remove(&item->queue);
         ngx_queue_insert_head(&item->conf->free, &item->queue);
@@ -451,6 +449,11 @@ static void ngx_http_upstream_ntlm_close(ngx_connection_t *c) {

 #endif

+    // If connection is already closed, don't do anything (pool is already gone)
+    if (c->fd == (ngx_socket_t) -1) {
+        return;
+    }
+
     if (c->pool) {
         ngx_destroy_pool(c->pool);
         c->pool = NULL;

And again, sorry for those multiple tries, but I still don't have any way to reproduce the issue, so I'm working blind.

@klp18
Copy link

klp18 commented Nov 15, 2022

I have no problem with doing such tests, so ask away.
I wish I had enough time right now to try to figure out the problem myself.

new dumps

Here's another thing I noticed.
The previous dumps were accompanied with the following entries in the nginx log file:

2022/11/15 14:10:11 [error] 129235#129235: *11975 upstream timed out (110: Connection timed out) while reading upstream, client: 192.168.4.141, server:
2022/11/15 14:11:15 [error] 129237#129237: *11997 upstream timed out (110: Connection timed out) while reading upstream, client: 192.168.4.88, server:

and the new ones with the following:

2022/11/15 18:22:31 [alert] 134292#134292: *439 connection already closed while reading upstream, client: 192.168.4.193, server:
2022/11/15 18:23:07 [alert] 134292#134292: *437 connection already closed while reading upstream, client: 192.168.4.18, server:

new back trace:

[New LWP 134509]
[Thread debugging using libthread_db enabled]
Using host libthread_db library "/lib/x86_64-linux-gnu/libthread_db.so.1".
Core was generated by `nginx: worker process                           '.
Program terminated with signal SIGSEGV, Segmentation fault.
#0  0x0000564b6e5e053d in ngx_http_run_posted_requests (c=c@entry=0x564b6ec79a00) at src/http/ngx_http_request.c:2435
2435            pr = r->main->posted_requests;
(gdb) bt
#0  0x0000564b6e5e053d in ngx_http_run_posted_requests (c=c@entry=0x564b6ec79a00) at src/http/ngx_http_request.c:2435
#1  0x0000564b6e5f3d7d in ngx_http_upstream_handler (ev=0x564b6eca6650) at src/http/ngx_http_upstream.c:1299
#2  0x0000564b6e5c65d4 in ngx_epoll_process_events (cycle=0x564b6eac7490, timer=<optimized out>, flags=<optimized out>) at src/event/modules/ngx_epoll_module.c:901
#3  0x0000564b6e5ba4a1 in ngx_process_events_and_timers (cycle=cycle@entry=0x564b6eac7490) at src/event/ngx_event.c:248
#4  0x0000564b6e5c4141 in ngx_worker_process_cycle (cycle=0x564b6eac7490, data=<optimized out>) at src/os/unix/ngx_process_cycle.c:721
#5  0x0000564b6e5c257d in ngx_spawn_process (cycle=cycle@entry=0x564b6eac7490, proc=0x564b6e5c402a <ngx_worker_process_cycle>, data=0x0,
    name=0x564b6e67861a "worker process", respawn=respawn@entry=0) at src/os/unix/ngx_process.c:199
#6  0x0000564b6e5c4dba in ngx_reap_children (cycle=0x564b6eac7490) at src/os/unix/ngx_process_cycle.c:598
#7  ngx_master_process_cycle (cycle=cycle@entry=0x564b6eac7490) at src/os/unix/ngx_process_cycle.c:174
#8  0x0000564b6e59880d in main (argc=<optimized out>, argv=<optimized out>) at src/core/nginx.c:383

and another one:

[New LWP 134292]
[Thread debugging using libthread_db enabled]
Using host libthread_db library "/lib/x86_64-linux-gnu/libthread_db.so.1".
Core was generated by `nginx: worker process                           '.
Program terminated with signal SIGABRT, Aborted.
#0  __pthread_kill_implementation (no_tid=0, signo=6, threadid=140549354153792) at ./nptl/pthread_kill.c:44
44      ./nptl/pthread_kill.c: No such file or directory.
(gdb) bt
#0  __pthread_kill_implementation (no_tid=0, signo=6, threadid=140549354153792) at ./nptl/pthread_kill.c:44
#1  __pthread_kill_internal (signo=6, threadid=140549354153792) at ./nptl/pthread_kill.c:78
#2  __GI___pthread_kill (threadid=140549354153792, signo=signo@entry=6) at ./nptl/pthread_kill.c:89
#3  0x00007fd432585476 in __GI_raise (sig=sig@entry=6) at ../sysdeps/posix/raise.c:26
#4  0x00007fd43256b7f3 in __GI_abort () at ./stdlib/abort.c:79
#5  0x00007fd4325cc6f6 in __libc_message (action=action@entry=do_abort, fmt=fmt@entry=0x7fd43271eb8c "%s\n") at ../sysdeps/posix/libc_fatal.c:155
#6  0x00007fd4325e3d7c in malloc_printerr (str=str@entry=0x7fd432721710 "free(): double free detected in tcache 2") at ./malloc/malloc.c:5664
#7  0x00007fd4325e612b in _int_free (av=0x7fd43275cc80 <main_arena>, p=0x564b6ecddfc0, have_lock=0) at ./malloc/malloc.c:4473
#8  0x00007fd4325e84d3 in __GI___libc_free (mem=<optimized out>) at ./malloc/malloc.c:3391
#9  0x0000564b6e59a38d in ngx_destroy_pool (pool=0x564b6edb3190) at src/core/ngx_palloc.c:85
#10 0x0000564b6e5f37fd in ngx_http_upstream_finalize_request (r=r@entry=0x564b6ed6e490, u=u@entry=0x564b6ed93600, rc=rc@entry=499)
    at src/http/ngx_http_upstream.c:4513
#11 0x0000564b6e5f3f65 in ngx_http_upstream_check_broken_connection (r=0x564b6ed6e490, ev=0x564b6eca6890) at src/http/ngx_http_upstream.c:1433
#12 0x0000564b6e5f41c8 in ngx_http_upstream_rd_check_broken_connection (r=<optimized out>) at src/http/ngx_http_upstream.c:1306
#13 0x0000564b6e5e0c59 in ngx_http_request_handler (ev=0x564b6eca6890) at src/http/ngx_http_request.c:2415
#14 0x0000564b6e5c65d4 in ngx_epoll_process_events (cycle=0x564b6eac7490, timer=<optimized out>, flags=<optimized out>) at src/event/modules/ngx_epoll_module.c:901
#15 0x0000564b6e5ba4a1 in ngx_process_events_and_timers (cycle=cycle@entry=0x564b6eac7490) at src/event/ngx_event.c:248
#16 0x0000564b6e5c4141 in ngx_worker_process_cycle (cycle=0x564b6eac7490, data=<optimized out>) at src/os/unix/ngx_process_cycle.c:721
#17 0x0000564b6e5c257d in ngx_spawn_process (cycle=cycle@entry=0x564b6eac7490, proc=proc@entry=0x564b6e5c402a <ngx_worker_process_cycle>, data=data@entry=0x2,
    name=name@entry=0x564b6e67861a "worker process", respawn=respawn@entry=-3) at src/os/unix/ngx_process.c:199
#18 0x0000564b6e5c3859 in ngx_start_worker_processes (cycle=cycle@entry=0x564b6eac7490, n=4, type=type@entry=-3) at src/os/unix/ngx_process_cycle.c:344
#19 0x0000564b6e5c4a6e in ngx_master_process_cycle (cycle=cycle@entry=0x564b6eac7490) at src/os/unix/ngx_process_cycle.c:130
#20 0x0000564b6e59880d in main (argc=<optimized out>, argv=<optimized out>) at src/core/nginx.c:383

@tmagnien
Copy link

tmagnien commented Nov 16, 2022

Thanks a lot @klp18 . Do you think you could provide a full Nginx debug log ? This would help me understanding the sequence of events.

And in the meantime, please try this patch:

diff --git a/ngx_http_upstream_ntlm_module.c b/ngx_http_upstream_ntlm_module.c
index 8d1179d..9de776f 100644
--- a/ngx_http_upstream_ntlm_module.c
+++ b/ngx_http_upstream_ntlm_module.c
@@ -378,9 +378,7 @@ static void ngx_http_upstream_client_conn_cleanup(void *data) {

     // Check if the item was removed previously from the queue (backend drop)
     if (item->peer_connection != NULL) {
-
-        item->peer_connection->read->timedout = 1;
-        ngx_post_event(item->peer_connection->read,&ngx_posted_events);
+        ngx_add_timer(item->peer_connection->read, 1);

         ngx_queue_remove(&item->queue);
         ngx_queue_insert_head(&item->conf->free, &item->queue);
@@ -451,6 +449,11 @@ static void ngx_http_upstream_ntlm_close(ngx_connection_t *c) {

 #endif

+    // If connection is already closed, don't do anything (pool is already gone)
+    if (c->fd == (ngx_socket_t) -1) {
+        return;
+    }
+
     if (c->pool) {
         ngx_destroy_pool(c->pool);
         c->pool = NULL;

@klp18
Copy link

klp18 commented Nov 16, 2022

new dumps with debug logs with the new patch applied

@tmagnien
Copy link

Thanks a lot @klp18 , it was helpful. Can you try the following patch:

diff --git a/ngx_http_upstream_ntlm_module.c b/ngx_http_upstream_ntlm_module.c
index 8d1179d..6b392de 100644
--- a/ngx_http_upstream_ntlm_module.c
+++ b/ngx_http_upstream_ntlm_module.c
@@ -268,6 +268,7 @@ static void ngx_http_upstream_free_ntlm_peer(ngx_peer_connection_t *pc,
     ngx_http_upstream_t *u;
     ngx_pool_cleanup_t *cln;
     ngx_http_upstream_ntlm_cache_t *cleanup_item = NULL;
+    unsigned invalid = 0;

     /* cache valid connections */

@@ -277,26 +278,32 @@ static void ngx_http_upstream_free_ntlm_peer(ngx_peer_connection_t *pc,
     if (state & NGX_PEER_FAILED || c == NULL || c->read->eof ||
         c->read->error || c->read->timedout || c->write->error ||
         c->write->timedout) {
+        invalid = 1;
         goto invalid;
     }

     if (!u->keepalive) {
+        invalid = 1;
         goto invalid;
     }

     if (!u->request_body_sent) {
+        invalid = 1;
         goto invalid;
     }

     if (ngx_terminate || ngx_exiting) {
+        invalid = 1;
         goto invalid;
     }

     if (ngx_handle_read_event(c->read, 0) != NGX_OK) {
+        invalid = 1;
         goto invalid;
     }

     if (hndp->ntlm_init == 0 && hndp->cached == 0) {
+        invalid = 1;
         goto invalid;
     }

@@ -365,6 +372,17 @@ static void ngx_http_upstream_free_ntlm_peer(ngx_peer_connection_t *pc,
     }

 invalid:
+    if (invalid) {
+        // remove the client connection drop down handler if present
+        for (cln = hndp->client_connection->pool->cleanup; cln; cln = cln->next) {
+            if (cln->handler == ngx_http_upstream_client_conn_cleanup) {
+                ngx_log_debug0(NGX_LOG_DEBUG_HTTP, pc->log, 0,
+                    "ntlm free peer removing cleanup");
+                cln->handler = NULL;
+            }
+        }
+    }
+
     hndp->original_free_peer(pc, hndp->data, state);
 }

@klp18
Copy link

klp18 commented Nov 18, 2022

crashes are much less frequent now.
new dumps and debug logs

@tmagnien
Copy link

@klp18 I think I spotted the root cause, can you please try this patch:

diff --git a/ngx_http_upstream_ntlm_module.c b/ngx_http_upstream_ntlm_module.c
index 8d1179d..981dcb0 100644
--- a/ngx_http_upstream_ntlm_module.c
+++ b/ngx_http_upstream_ntlm_module.c
@@ -268,6 +268,7 @@ static void ngx_http_upstream_free_ntlm_peer(ngx_peer_connection_t *pc,
     ngx_http_upstream_t *u;
     ngx_pool_cleanup_t *cln;
     ngx_http_upstream_ntlm_cache_t *cleanup_item = NULL;
+    unsigned invalid = 0;

     /* cache valid connections */

@@ -277,26 +278,32 @@ static void ngx_http_upstream_free_ntlm_peer(ngx_peer_connection_t *pc,
     if (state & NGX_PEER_FAILED || c == NULL || c->read->eof ||
         c->read->error || c->read->timedout || c->write->error ||
         c->write->timedout) {
+        invalid = 1;
         goto invalid;
     }

     if (!u->keepalive) {
+        invalid = 1;
         goto invalid;
     }

     if (!u->request_body_sent) {
+        invalid = 1;
         goto invalid;
     }

     if (ngx_terminate || ngx_exiting) {
+        invalid = 1;
         goto invalid;
     }

     if (ngx_handle_read_event(c->read, 0) != NGX_OK) {
+        invalid = 1;
         goto invalid;
     }

     if (hndp->ntlm_init == 0 && hndp->cached == 0) {
+        invalid = 1;
         goto invalid;
     }

@@ -327,6 +334,8 @@ static void ngx_http_upstream_free_ntlm_peer(ngx_peer_connection_t *pc,
     for (cln = item->client_connection->pool->cleanup; cln; cln = cln->next) {
         if (cln->handler == ngx_http_upstream_client_conn_cleanup) {
             cleanup_item = cln->data;
+            // reassign cleanup data in case it has moved
+            cln->data = item;
             break;
         }
     }
@@ -365,6 +374,17 @@ static void ngx_http_upstream_free_ntlm_peer(ngx_peer_connection_t *pc,
     }

 invalid:
+    if (invalid) {
+        // remove the client connection drop down handler if present
+        for (cln = hndp->client_connection->pool->cleanup; cln; cln = cln->next) {
+            if (cln->handler == ngx_http_upstream_client_conn_cleanup) {
+                ngx_log_debug0(NGX_LOG_DEBUG_HTTP, pc->log, 0,
+                    "ntlm free peer removing cleanup");
+                cln->handler = NULL;
+            }
+        }
+    }
+
     hndp->original_free_peer(pc, hndp->data, state);
 }

@klp18
Copy link

klp18 commented Nov 23, 2022

@tmagnien
Not a single crash in three hours.
I'll post an update tomorrow.

@tmagnien
Copy link

@klp18 great news ! If you confirm tomorrow I'll create a PR with the patch.

@klp18
Copy link

klp18 commented Nov 24, 2022

@tmagnien Four crashes happened overnight.
One of them (180266) contains ngx_http_upstream_ntlm_close_handler in backtrace.
I turned off the debug logs yesterday.
I can turn it back on and wait for a new crash.

@tmagnien
Copy link

@klp18 Thanks for the core dumps, and yes, if you can enable debug it will help a lot.

@klp18
Copy link

klp18 commented Nov 24, 2022

@tmagnien
crash dump with debug logs

@tmagnien
Copy link

@klp18 updated patch to handle last (I hope) edge case:

diff --git a/ngx_http_upstream_ntlm_module.c b/ngx_http_upstream_ntlm_module.c
index 8d1179d..d68c2b3 100644
--- a/ngx_http_upstream_ntlm_module.c
+++ b/ngx_http_upstream_ntlm_module.c
@@ -268,6 +268,7 @@ static void ngx_http_upstream_free_ntlm_peer(ngx_peer_connection_t *pc,
     ngx_http_upstream_t *u;
     ngx_pool_cleanup_t *cln;
     ngx_http_upstream_ntlm_cache_t *cleanup_item = NULL;
+    unsigned invalid = 0;

     /* cache valid connections */

@@ -277,26 +278,32 @@ static void ngx_http_upstream_free_ntlm_peer(ngx_peer_connection_t *pc,
     if (state & NGX_PEER_FAILED || c == NULL || c->read->eof ||
         c->read->error || c->read->timedout || c->write->error ||
         c->write->timedout) {
+        invalid = 1;
         goto invalid;
     }

     if (!u->keepalive) {
+        invalid = 1;
         goto invalid;
     }

     if (!u->request_body_sent) {
+        invalid = 1;
         goto invalid;
     }

     if (ngx_terminate || ngx_exiting) {
+        invalid = 1;
         goto invalid;
     }

     if (ngx_handle_read_event(c->read, 0) != NGX_OK) {
+        invalid = 1;
         goto invalid;
     }

     if (hndp->ntlm_init == 0 && hndp->cached == 0) {
+        invalid = 1;
         goto invalid;
     }

@@ -306,6 +313,14 @@ static void ngx_http_upstream_free_ntlm_peer(ngx_peer_connection_t *pc,

         item = ngx_queue_data(q, ngx_http_upstream_ntlm_cache_t, queue);
         ngx_http_upstream_ntlm_close(item->peer_connection);
+        // remove the client connection drop down handler if present
+        for (cln = item->client_connection->pool->cleanup; cln; cln = cln->next) {
+            if (cln->handler == ngx_http_upstream_client_conn_cleanup) {
+                ngx_log_debug0(NGX_LOG_DEBUG_HTTP, pc->log, 0,
+                    "ntlm free peer removing cleanup");
+                cln->handler = NULL;
+            }
+        }
         item->peer_connection = NULL;
     } else {
         q = ngx_queue_head(&hndp->conf->free);
@@ -327,6 +342,8 @@ static void ngx_http_upstream_free_ntlm_peer(ngx_peer_connection_t *pc,
     for (cln = item->client_connection->pool->cleanup; cln; cln = cln->next) {
         if (cln->handler == ngx_http_upstream_client_conn_cleanup) {
             cleanup_item = cln->data;
+            // reassign cleanup data in case it has moved
+            cln->data = item;
             break;
         }
     }
@@ -365,6 +382,17 @@ static void ngx_http_upstream_free_ntlm_peer(ngx_peer_connection_t *pc,
     }

 invalid:
+    if (invalid) {
+        // remove the client connection drop down handler if present
+        for (cln = hndp->client_connection->pool->cleanup; cln; cln = cln->next) {
+            if (cln->handler == ngx_http_upstream_client_conn_cleanup) {
+                ngx_log_debug0(NGX_LOG_DEBUG_HTTP, pc->log, 0,
+                    "ntlm free peer removing cleanup");
+                cln->handler = NULL;
+            }
+        }
+    }
+
     hndp->original_free_peer(pc, hndp->data, state);
 }

@klp18
Copy link

klp18 commented Nov 29, 2022

There was one crash in last 16 hours though.
Now I will restart nginx with debug logs and try to catch it again.

#0  0x0000558ee69a7d44 in ngx_http_upstream_handler (ev=0x558ee8ad1250) at src/http/ngx_http_upstream.c:1282
1282        ngx_http_set_log_request(c->log, r);
(gdb) bt
#0  0x0000558ee69a7d44 in ngx_http_upstream_handler (ev=0x558ee8ad1250) at src/http/ngx_http_upstream.c:1282
#1  0x0000558ee696ea2e in ngx_event_process_posted (cycle=cycle@entry=0x558ee88f1490, posted=0x558ee6a82d10 <ngx_posted_events>)
    at src/event/ngx_event_posted.c:34
#2  0x0000558ee696e4e6 in ngx_process_events_and_timers (cycle=cycle@entry=0x558ee88f1490) at src/event/ngx_event.c:263
#3  0x0000558ee6978141 in ngx_worker_process_cycle (cycle=0x558ee88f1490, data=<optimized out>) at src/os/unix/ngx_process_cycle.c:721
#4  0x0000558ee697657d in ngx_spawn_process (cycle=cycle@entry=0x558ee88f1490, proc=proc@entry=0x558ee697802a <ngx_worker_process_cycle>,
    data=data@entry=0x0, name=name@entry=0x558ee6a2c61a "worker process", respawn=respawn@entry=-3) at src/os/unix/ngx_process.c:199
#5  0x0000558ee6977859 in ngx_start_worker_processes (cycle=cycle@entry=0x558ee88f1490, n=4, type=type@entry=-3) at src/os/unix/ngx_process_cycle.c:344
#6  0x0000558ee6978a6e in ngx_master_process_cycle (cycle=cycle@entry=0x558ee88f1490) at src/os/unix/ngx_process_cycle.c:130
#7  0x0000558ee694c80d in main (argc=<optimized out>, argv=<optimized out>) at src/core/nginx.c:383

@klp18
Copy link

klp18 commented Nov 30, 2022

@tmagnien
Copy link

@klp18 Thanks! Just one question : from the logs I feel there's a internal proxy configuration on your setup, something like client -> nginx -> nginx -> server, is this right ?

@klp18
Copy link

klp18 commented Nov 30, 2022

@tmagnien
Good question.
Internal clients only work through one nginx with ntlm-module.
External clients work through another additional "proxy" - TMG 2010 (which I plan to get rid of if I can get nginx to work stable).
Thus, the requests look like client -> router (NAT1) -> tmg2010 (NAT2) -> nginx -> server.

Publishing through TMG 2010 has its own specific features.
For example, when it proxies an http/https request, it can preserve the client's original ip address at the tcp level (using the built-in NAT, I guess).
image

So it turns out that I don't have an extra proxy in front of nginx, but a double NAT.

@tmagnien
Copy link

tmagnien commented Feb 7, 2023

Hi @klp18 . I finally have some time to work in this again. Would you have the possibility to make the same tests with keepalive disabled ? I mean "keepalive_timeout 0;" in nginx config file.
It would help me confirm that this is where the issue resides.
Thanks a lot !

@klp18
Copy link

klp18 commented Feb 7, 2023

Hello @tmagnien,
when I set keepalive_timeout to zero, authentication stops working.
Outlook constantly asks for the password, and I see an endless 401 http status code in the nginx log.
Well, since crashes are quite rare (a few times a day), I can't leave nginx running for long in this state.
And it's unlikely to crash if clients can't authenticate.

@tmagnien
Copy link

tmagnien commented Feb 7, 2023 via email

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

6 participants