Some segfaults #5

dagmoller · 2021-07-26T14:28:43Z

Hi,

I'm having some segfaults here. nginx version: nginx/1.20.1
Please help me to get some debug info for you.

gabihodoroaga · 2021-07-27T11:20:03Z

Hi,

The only way to see why it crashes is to get the core dump files

https://docs.nginx.com/nginx/admin-guide/monitoring/debugging/#enabling-core-dumps-in-nginx-configuration

You have to add this in your main nginx.conf

makdir -p /tmp/cores

worker_rlimit_core  500M;
working_directory   /tmp/cores/;

Then run gdb to obtain backtrace as usual, e.g.

gdb /path/to/nginx /tmp/nginx.core.PID
backtrace full

It will help if you can share your current nginx configuration (only the relevant parts) and some load statistics.

Thanks
g.

dagmoller · 2021-08-17T20:08:41Z

Hi,

Don't know if it can help. The load is very low. I have a lot of reverse proxys with ntlm enabled on this instance.

Core was generated by `nginx: worker process                   '.
Program terminated with signal SIGSEGV, Segmentation fault.
#0  ngx_http_upstream_ntlm_close_handler (ev=<optimized out>) at ../nginx-ntlm-module/ngx_http_upstream_ntlm_module.c:435
435         ngx_queue_insert_head(&conf->free, &item->queue);
Missing separate debuginfos, use: yum debuginfo-install nginx-1.20.1-1.el8.ngx.x86_64
(gdb) backtrace full
#0  ngx_http_upstream_ntlm_close_handler (ev=<optimized out>) at ../nginx-ntlm-module/ngx_http_upstream_ntlm_module.c:435
        conf = 0x5556a19cedd0
        item = 0x5556a19ceec0
        n = <optimized out>
        buf = ""
        c = 0x5556a19cf910
#1  0x000055569ec373e0 in ngx_event_process_posted ()
No symbol table info available.
#2  0x000055569ec3edf9 in ngx_worker_process_cycle ()
No symbol table info available.
#3  0x000055569ec3d41f in ngx_spawn_process ()
No symbol table info available.
#4  0x000055569ec3fbc7 in ngx_master_process_cycle ()
No symbol table info available.
#5  0x000055569ec15c78 in main ()
No symbol table info available.

gabihodoroaga · 2021-08-17T21:00:02Z

Hi,

Thanks. This definitely helps.
I will try to reproduce it and then fix it.

g.

dbalaev · 2022-09-27T15:48:34Z

I think your code contains a logical error on the following lines:

static void ngx_http_upstream_client_conn_cleanup(void *data) {
ngx_http_upstream_ntlm_cache_t *item = data;

ngx_log_debug2(
    NGX_LOG_DEBUG_HTTP, ngx_cycle->log, 0,
    "ntlm client connection closed %p, droping peer connection %p",
    item->client_connection, item->peer_connection);

// Check if the item was removed previously from the queue (backend drop)
if (item->peer_connection != NULL) {

    item->peer_connection->read->timedout = 1;
    ngx_post_event(item->peer_connection->read,&ngx_posted_events);

    ngx_queue_remove(&item->queue);
    ngx_queue_insert_head(&item->conf->free, &item->queue);
}

}

If the connection is already deleted, then calling the function ngx_log_debug2 results in segfault.
Because the item array might already be destroyed.

gabihodoroaga · 2022-09-28T06:19:10Z

@dbalaev Thanks for pointing this.
Do you have a way to reproduce the segfault?
This will help me a lot.
Thanks

dbalaev · 2022-09-29T10:48:35Z

Build CENTOS 7 + NGINX 1.23.1-1

[Thread debugging using libthread_db enabled]
Using host libthread_db library "/lib64/libthread_db.so.1".
Core was generated by `nginx: worker process '.
Program terminated with signal 11, Segmentation fault.
#0 0x000056228acbeac6 in ngx_http_upstream_handler (ev=0x56228cc13460) at src/http/ngx_http_upstream.c:1278
1278 ngx_http_set_log_request(c->log, r);
(gdb) bt
#0 0x000056228acbeac6 in ngx_http_upstream_handler (ev=0x56228cc13460) at src/http/ngx_http_upstream.c:1278
#1 0x000056228ac8bb80 in ngx_event_process_posted (cycle=cycle@entry=0x56228c9ccb50, posted=0x56228afa3930 <ngx_posted_events>) at src/event/ngx_event_posted.c:34
#2 0x000056228ac8b72c in ngx_process_events_and_timers (cycle=cycle@entry=0x56228c9ccb50) at src/event/ngx_event.c:263
#3 0x000056228ac93461 in ngx_worker_process_cycle (cycle=cycle@entry=0x56228c9ccb50, data=data@entry=0x0) at src/os/unix/ngx_process_cycle.c:721
#4 0x000056228ac91ccb in ngx_spawn_process (cycle=cycle@entry=0x56228c9ccb50, proc=proc@entry=0x56228ac933e0 <ngx_worker_process_cycle>, data=data@entry=0x0,
name=name@entry=0x56228ad3dc27 "worker process", respawn=respawn@entry=-3) at src/os/unix/ngx_process.c:199
#5 0x000056228ac93850 in ngx_start_worker_processes (cycle=cycle@entry=0x56228c9ccb50, n=1, type=type@entry=-3) at src/os/unix/ngx_process_cycle.c:344
#6 0x000056228ac940b3 in ngx_master_process_cycle (cycle=cycle@entry=0x56228c9ccb50) at src/os/unix/ngx_process_cycle.c:130
#7 0x000056228ac6a85d in main (argc=, argv=) at src/core/nginx.c:383
(gdb) f 0
#0 0x000056228acbeac6 in ngx_http_upstream_handler (ev=0x56228cc13460) at src/http/ngx_http_upstream.c:1278
1278 ngx_http_set_log_request(c->log, r);

klp18 · 2022-10-07T14:29:17Z

same segfault on ubuntu 22 with nginx/1.22.0
I can reproduce the problem. It occurs every 1-2 minutes on my server, which has 10-20 people working through it.

https://gist.github.com/klp18/a9eb7118e951528e3c9da65851a18d45

Using host libthread_db library "/lib/x86_64-linux-gnu/libthread_db.so.1".
Core was generated by `nginx: worker process '.
Program terminated with signal SIGSEGV, Segmentation fault.
#0 0x0000564b3c391d44 in ngx_http_upstream_handler (ev=0x564b3de72ad0) at src/http/ngx_http_upstream.c:1282
1282 ngx_http_set_log_request(c->log, r);
(gdb) bt
#0 0x0000564b3c391d44 in ngx_http_upstream_handler (ev=0x564b3de72ad0) at src/http/ngx_http_upstream.c:1282
#1 0x0000564b3c358a2e in ngx_event_process_posted (cycle=cycle@entry=0x564b3dc93490, posted=0x564b3c46cd10 <ngx_posted_events>)
at src/event/ngx_event_posted.c:34
#2 0x0000564b3c3584e6 in ngx_process_events_and_timers (cycle=cycle@entry=0x564b3dc93490) at src/event/ngx_event.c:263
#3 0x0000564b3c362141 in ngx_worker_process_cycle (cycle=0x564b3dc93490, data=) at src/os/unix/ngx_process_cycle.c:721
#4 0x0000564b3c36057d in ngx_spawn_process (cycle=cycle@entry=0x564b3dc93490, proc=0x564b3c36202a <ngx_worker_process_cycle>, data=0x0,
name=0x564b3c416612 "worker process", respawn=respawn@entry=0) at src/os/unix/ngx_process.c:199
#5 0x0000564b3c362dba in ngx_reap_children (cycle=0x564b3dc93490) at src/os/unix/ngx_process_cycle.c:598
#6 ngx_master_process_cycle (cycle=cycle@entry=0x564b3dc93490) at src/os/unix/ngx_process_cycle.c:174
#7 0x0000564b3c33680d in main (argc=, argv=) at src/core/nginx.c:383
(gdb) exit

gabihodoroaga · 2022-10-07T14:51:23Z

Thanks @klp18 for reporting this.
Can you share you nginx configuration (replace any sensitive information).
Thank. you,
g.

klp18 · 2022-10-07T15:03:56Z

https://gist.github.com/klp18/7d16e6be4d97cd3db1ffa6521c582522

Thanks.

justarandomsysadmin · 2022-10-17T14:24:19Z

@gabihodoroaga thank's for this module! Did you manage to reproduce the segfaults? It's also happening on my instance (nginx/1.18.0) and shutting down worker proccesses every few minutes (often multiple times per minute).

Let me know if you need further information or configuration files for comparison.

tmagnien · 2022-10-26T16:11:37Z

@klp18 would you mind sharing a binary core dump file ? I'm trying to reproduce the issue in lab with no luck yet. Backtrace is useful but I'd like to go further into the investigation.

klp18 · 2022-10-27T12:05:28Z

@tmagnien Here are a few dumps.

tmagnien · 2022-10-27T12:11:30Z

Thanks @klp18! I will have a look

tmagnien · 2022-10-31T13:50:47Z

@klp18 Can you provide me with the output of "nginx -V" command, please?

klp18 · 2022-10-31T14:25:09Z

nginx version: nginx/1.22.0
built by gcc 11.2.0 (Ubuntu 11.2.0-19ubuntu1)
built with OpenSSL 3.0.2 15 Mar 2022
TLS SNI support enabled
configure arguments: --prefix=/etc/nginx --sbin-path=/usr/sbin/nginx --conf-path=/etc/nginx/nginx.conf --error-log-path=/var/log/nginx/error.log --http-log-path=/var/log/nginx/access.log --pid-path=/var/run/nginx.pid --lock-path=/var/run/nginx.lock --modules-path=/usr/lib/nginx/modules --http-client-body-temp-path=/var/lib/nginx/body --http-fastcgi-temp-path=/var/lib/nginx/fastcgi --http-proxy-temp-path=/var/lib/nginx/proxy --http-scgi-temp-path=/var/lib/nginx/scgi --http-uwsgi-temp-path=/var/lib/nginx/uwsgi --with-compat --with-debug --with-pcre-jit --with-http_ssl_module --with-http_stub_status_module --with-http_realip_module --with-http_auth_request_module --with-http_v2_module --with-http_dav_module --with-http_slice_module --with-threads --with-http_addition_module --with-http_gunzip_module --with-http_gzip_static_module --with-http_sub_module --with-mail --with-stream --with-stream_ssl_module --with-mail_ssl_module --add-dynamic-module=../nginx-ntlm-module

tmagnien · 2022-11-02T10:41:57Z

@klp18 I may have found a fix but can't reproduce the issue locally. Would you mind trying the following patch and see if it fixes ?

diff --git a/ngx_http_upstream_ntlm_module.c b/ngx_http_upstream_ntlm_module.c
index 8d1179d..90ed158 100644
--- a/ngx_http_upstream_ntlm_module.c
+++ b/ngx_http_upstream_ntlm_module.c
@@ -380,6 +380,8 @@ static void ngx_http_upstream_client_conn_cleanup(void *data) {
     if (item->peer_connection != NULL) {

         item->peer_connection->read->timedout = 1;
+       // If connection is already closed, log will be empty and will generate a segfault
+       item->peer_connection->log = ngx_cycle->log;
         ngx_post_event(item->peer_connection->read,&ngx_posted_events);

         ngx_queue_remove(&item->queue);

klp18 · 2022-11-02T16:46:59Z

@tmagnien
the same crash with the same and new back traces
new dumps

#0  0x000055c199a40bb7 in ngx_http_request_handler (ev=0x55c19a453430) at src/http/ngx_http_request.c:2394
2394        ngx_http_set_log_request(c->log, r);
(gdb) bt
#0  0x000055c199a40bb7 in ngx_http_request_handler (ev=0x55c19a453430) at src/http/ngx_http_request.c:2394
#1  0x000055c199a1aa2e in ngx_event_process_posted (cycle=cycle@entry=0x55c19a273490, posted=0x55c199b2ed10 <ngx_posted_events>) at src/event/ngx_event_posted.c:34
#2  0x000055c199a1a4e6 in ngx_process_events_and_timers (cycle=cycle@entry=0x55c19a273490) at src/event/ngx_event.c:263
#3  0x000055c199a24141 in ngx_worker_process_cycle (cycle=0x55c19a273490, data=<optimized out>) at src/os/unix/ngx_process_cycle.c:721
#4  0x000055c199a2257d in ngx_spawn_process (cycle=cycle@entry=0x55c19a273490, proc=proc@entry=0x55c199a2402a <ngx_worker_process_cycle>, data=data@entry=0x3,
    name=name@entry=0x55c199ad861a "worker process", respawn=respawn@entry=-3) at src/os/unix/ngx_process.c:199
#5  0x000055c199a23859 in ngx_start_worker_processes (cycle=cycle@entry=0x55c19a273490, n=4, type=type@entry=-3) at src/os/unix/ngx_process_cycle.c:344
#6  0x000055c199a24a6e in ngx_master_process_cycle (cycle=cycle@entry=0x55c19a273490) at src/os/unix/ngx_process_cycle.c:130
#7  0x000055c1999f880d in main (argc=<optimized out>, argv=<optimized out>) at src/core/nginx.c:383

#0  0x00007f8bd70ee990 in ngx_http_upstream_ntlm_close_handler (ev=<optimized out>) at ../nginx-ntlm-module/ngx_http_upstream_ntlm_module.c:437
437         ngx_queue_insert_head(&conf->free, &item->queue);
(gdb) bt
#0  0x00007f8bd70ee990 in ngx_http_upstream_ntlm_close_handler (ev=<optimized out>) at ../nginx-ntlm-module/ngx_http_upstream_ntlm_module.c:437
#1  0x000055c199a1aa2e in ngx_event_process_posted (cycle=cycle@entry=0x55c19a273490, posted=0x55c199b2ed10 <ngx_posted_events>) at src/event/ngx_event_posted.c:34
#2  0x000055c199a1a4e6 in ngx_process_events_and_timers (cycle=cycle@entry=0x55c19a273490) at src/event/ngx_event.c:263
#3  0x000055c199a24141 in ngx_worker_process_cycle (cycle=0x55c19a273490, data=<optimized out>) at src/os/unix/ngx_process_cycle.c:721
#4  0x000055c199a2257d in ngx_spawn_process (cycle=cycle@entry=0x55c19a273490, proc=0x55c199a2402a <ngx_worker_process_cycle>, data=0x2,
    name=0x55c199ad861a "worker process", respawn=respawn@entry=2) at src/os/unix/ngx_process.c:199
#5  0x000055c199a24dba in ngx_reap_children (cycle=0x55c19a273490) at src/os/unix/ngx_process_cycle.c:598
#6  ngx_master_process_cycle (cycle=cycle@entry=0x55c19a273490) at src/os/unix/ngx_process_cycle.c:174
#7  0x000055c1999f880d in main (argc=<optimized out>, argv=<optimized out>) at src/core/nginx.c:383

#0  0x000055c199a53d44 in ngx_http_upstream_handler (ev=0x55c19a452b30) at src/http/ngx_http_upstream.c:1282
1282        ngx_http_set_log_request(c->log, r);
(gdb) bt
#0  0x000055c199a53d44 in ngx_http_upstream_handler (ev=0x55c19a452b30) at src/http/ngx_http_upstream.c:1282
#1  0x000055c199a1aa2e in ngx_event_process_posted (cycle=cycle@entry=0x55c19a273490, posted=0x55c199b2ed10 <ngx_posted_events>) at src/event/ngx_event_posted.c:34
#2  0x000055c199a1a4e6 in ngx_process_events_and_timers (cycle=cycle@entry=0x55c19a273490) at src/event/ngx_event.c:263
#3  0x000055c199a24141 in ngx_worker_process_cycle (cycle=0x55c19a273490, data=<optimized out>) at src/os/unix/ngx_process_cycle.c:721
#4  0x000055c199a2257d in ngx_spawn_process (cycle=cycle@entry=0x55c19a273490, proc=0x55c199a2402a <ngx_worker_process_cycle>, data=0x1,
    name=0x55c199ad861a "worker process", respawn=respawn@entry=1) at src/os/unix/ngx_process.c:199
#5  0x000055c199a24dba in ngx_reap_children (cycle=0x55c19a273490) at src/os/unix/ngx_process_cycle.c:598
#6  ngx_master_process_cycle (cycle=cycle@entry=0x55c19a273490) at src/os/unix/ngx_process_cycle.c:174
#7  0x000055c1999f880d in main (argc=<optimized out>, argv=<optimized out>) at src/core/nginx.c:383

tmagnien · 2022-11-09T08:26:54Z

@klp18 Sorry to bother again but I can't reproduce it so would you mind replacing previous patch with this one:

diff --git a/ngx_http_upstream_ntlm_module.c b/ngx_http_upstream_ntlm_module.c
index 8d1179d..6066980 100644
--- a/ngx_http_upstream_ntlm_module.c
+++ b/ngx_http_upstream_ntlm_module.c
@@ -379,8 +379,9 @@ static void ngx_http_upstream_client_conn_cleanup(void *data) {
     // Check if the item was removed previously from the queue (backend drop)
     if (item->peer_connection != NULL) {

-        item->peer_connection->read->timedout = 1;
-        ngx_post_event(item->peer_connection->read,&ngx_posted_events);
+        ngx_http_upstream_ntlm_close(item->peer_connection);
+        /*item->peer_connection->read->timedout = 1;
+        ngx_post_event(item->peer_connection->read,&ngx_posted_events);*/

         ngx_queue_remove(&item->queue);
         ngx_queue_insert_head(&item->conf->free, &item->queue);

If segfault still occurs, it would help me to also have nginx debug logs. Thanks a lot!

klp18 · 2022-11-09T18:55:06Z

@tmagnien
new dumps
new back trace

#0  0x000055f6ecf3f300 in ngx_destroy_pool (pool=0x55f6eee5ed70) at src/core/ngx_palloc.c:69
69              ngx_log_debug1(NGX_LOG_DEBUG_ALLOC, pool->log, 0, "free: %p", l->alloc);
(gdb) bt
#0  0x000055f6ecf3f300 in ngx_destroy_pool (pool=0x55f6eee5ed70) at src/core/ngx_palloc.c:69
#1  0x00007f2fe95ec767 in ngx_http_upstream_ntlm_close (c=0x55f6eede0720) at ../nginx-ntlm-module/ngx_http_upstream_ntlm_module.c:456
#2  0x00007f2fe95ec7b9 in ngx_http_upstream_client_conn_cleanup (data=0x55f6eecc7670) at ../nginx-ntlm-module/ngx_http_upstream_ntlm_module.c:382
#3  0x000055f6ecf3f2ac in ngx_destroy_pool (pool=pool@entry=0x55f6eeea3610) at src/core/ngx_palloc.c:57
#4  0x000055f6ecf84942 in ngx_http_close_connection (c=c@entry=0x55f6eede0db0) at src/http/ngx_http_request.c:3787
#5  0x000055f6ecf859a2 in ngx_http_close_request (r=r@entry=0x55f6eeef4080, rc=rc@entry=0) at src/http/ngx_http_request.c:3646
#6  0x000055f6ecf85c8a in ngx_http_terminate_handler (r=0x55f6eeef4080) at src/http/ngx_http_request.c:2723
#7  0x000055f6ecf85529 in ngx_http_run_posted_requests (c=c@entry=0x55f6eede0db0) at src/http/ngx_http_request.c:2450
#8  0x000055f6ecf85bf2 in ngx_http_request_handler (ev=0x55f6eee0ce90) at src/http/ngx_http_request.c:2418
#9  0x000055f6ecf6b5d4 in ngx_epoll_process_events (cycle=0x55f6eec2d490, timer=<optimized out>, flags=<optimized out>)
    at src/event/modules/ngx_epoll_module.c:901
#10 0x000055f6ecf5f4a1 in ngx_process_events_and_timers (cycle=cycle@entry=0x55f6eec2d490) at src/event/ngx_event.c:248
#11 0x000055f6ecf69141 in ngx_worker_process_cycle (cycle=0x55f6eec2d490, data=<optimized out>) at src/os/unix/ngx_process_cycle.c:721
#12 0x000055f6ecf6757d in ngx_spawn_process (cycle=cycle@entry=0x55f6eec2d490, proc=proc@entry=0x55f6ecf6902a <ngx_worker_process_cycle>,
    data=data@entry=0x0, name=name@entry=0x55f6ed01d61a "worker process", respawn=respawn@entry=-3) at src/os/unix/ngx_process.c:199
#13 0x000055f6ecf68859 in ngx_start_worker_processes (cycle=cycle@entry=0x55f6eec2d490, n=4, type=type@entry=-3) at src/os/unix/ngx_process_cycle.c:344
#14 0x000055f6ecf69a6e in ngx_master_process_cycle (cycle=cycle@entry=0x55f6eec2d490) at src/os/unix/ngx_process_cycle.c:130
#15 0x000055f6ecf3d80d in main (argc=<optimized out>, argv=<optimized out>) at src/core/nginx.c:383

tmagnien · 2022-11-15T15:27:45Z

Hi again @klp18. Can you try the following patch, please:

diff --git a/ngx_http_upstream_ntlm_module.c b/ngx_http_upstream_ntlm_module.c
index 8d1179d..88ff463 100644
--- a/ngx_http_upstream_ntlm_module.c
+++ b/ngx_http_upstream_ntlm_module.c
@@ -378,9 +378,7 @@ static void ngx_http_upstream_client_conn_cleanup(void *data) {

     // Check if the item was removed previously from the queue (backend drop)
     if (item->peer_connection != NULL) {
-
-        item->peer_connection->read->timedout = 1;
-        ngx_post_event(item->peer_connection->read,&ngx_posted_events);
+       ngx_http_upstream_ntlm_close(item->peer_connection);

         ngx_queue_remove(&item->queue);
         ngx_queue_insert_head(&item->conf->free, &item->queue);
@@ -451,6 +449,11 @@ static void ngx_http_upstream_ntlm_close(ngx_connection_t *c) {

 #endif

+    // If connection is already closed, don't do anything (pool is already gone)
+    if (c->fd == (ngx_socket_t) -1) {
+        return;
+    }
+
     if (c->pool) {
         ngx_destroy_pool(c->pool);
         c->pool = NULL;

And again, sorry for those multiple tries, but I still don't have any way to reproduce the issue, so I'm working blind.

klp18 · 2022-11-15T18:53:46Z

I have no problem with doing such tests, so ask away.
I wish I had enough time right now to try to figure out the problem myself.

new dumps

Here's another thing I noticed.
The previous dumps were accompanied with the following entries in the nginx log file:

2022/11/15 14:10:11 [error] 129235#129235: *11975 upstream timed out (110: Connection timed out) while reading upstream, client: 192.168.4.141, server:
2022/11/15 14:11:15 [error] 129237#129237: *11997 upstream timed out (110: Connection timed out) while reading upstream, client: 192.168.4.88, server:

and the new ones with the following:

2022/11/15 18:22:31 [alert] 134292#134292: *439 connection already closed while reading upstream, client: 192.168.4.193, server:
2022/11/15 18:23:07 [alert] 134292#134292: *437 connection already closed while reading upstream, client: 192.168.4.18, server:

new back trace:

[New LWP 134509]
[Thread debugging using libthread_db enabled]
Using host libthread_db library "/lib/x86_64-linux-gnu/libthread_db.so.1".
Core was generated by `nginx: worker process                           '.
Program terminated with signal SIGSEGV, Segmentation fault.
#0  0x0000564b6e5e053d in ngx_http_run_posted_requests (c=c@entry=0x564b6ec79a00) at src/http/ngx_http_request.c:2435
2435            pr = r->main->posted_requests;
(gdb) bt
#0  0x0000564b6e5e053d in ngx_http_run_posted_requests (c=c@entry=0x564b6ec79a00) at src/http/ngx_http_request.c:2435
#1  0x0000564b6e5f3d7d in ngx_http_upstream_handler (ev=0x564b6eca6650) at src/http/ngx_http_upstream.c:1299
#2  0x0000564b6e5c65d4 in ngx_epoll_process_events (cycle=0x564b6eac7490, timer=<optimized out>, flags=<optimized out>) at src/event/modules/ngx_epoll_module.c:901
#3  0x0000564b6e5ba4a1 in ngx_process_events_and_timers (cycle=cycle@entry=0x564b6eac7490) at src/event/ngx_event.c:248
#4  0x0000564b6e5c4141 in ngx_worker_process_cycle (cycle=0x564b6eac7490, data=<optimized out>) at src/os/unix/ngx_process_cycle.c:721
#5  0x0000564b6e5c257d in ngx_spawn_process (cycle=cycle@entry=0x564b6eac7490, proc=0x564b6e5c402a <ngx_worker_process_cycle>, data=0x0,
    name=0x564b6e67861a "worker process", respawn=respawn@entry=0) at src/os/unix/ngx_process.c:199
#6  0x0000564b6e5c4dba in ngx_reap_children (cycle=0x564b6eac7490) at src/os/unix/ngx_process_cycle.c:598
#7  ngx_master_process_cycle (cycle=cycle@entry=0x564b6eac7490) at src/os/unix/ngx_process_cycle.c:174
#8  0x0000564b6e59880d in main (argc=<optimized out>, argv=<optimized out>) at src/core/nginx.c:383

and another one:

[New LWP 134292]
[Thread debugging using libthread_db enabled]
Using host libthread_db library "/lib/x86_64-linux-gnu/libthread_db.so.1".
Core was generated by `nginx: worker process                           '.
Program terminated with signal SIGABRT, Aborted.
#0  __pthread_kill_implementation (no_tid=0, signo=6, threadid=140549354153792) at ./nptl/pthread_kill.c:44
44      ./nptl/pthread_kill.c: No such file or directory.
(gdb) bt
#0  __pthread_kill_implementation (no_tid=0, signo=6, threadid=140549354153792) at ./nptl/pthread_kill.c:44
#1  __pthread_kill_internal (signo=6, threadid=140549354153792) at ./nptl/pthread_kill.c:78
#2  __GI___pthread_kill (threadid=140549354153792, signo=signo@entry=6) at ./nptl/pthread_kill.c:89
#3  0x00007fd432585476 in __GI_raise (sig=sig@entry=6) at ../sysdeps/posix/raise.c:26
#4  0x00007fd43256b7f3 in __GI_abort () at ./stdlib/abort.c:79
#5  0x00007fd4325cc6f6 in __libc_message (action=action@entry=do_abort, fmt=fmt@entry=0x7fd43271eb8c "%s\n") at ../sysdeps/posix/libc_fatal.c:155
#6  0x00007fd4325e3d7c in malloc_printerr (str=str@entry=0x7fd432721710 "free(): double free detected in tcache 2") at ./malloc/malloc.c:5664
#7  0x00007fd4325e612b in _int_free (av=0x7fd43275cc80 <main_arena>, p=0x564b6ecddfc0, have_lock=0) at ./malloc/malloc.c:4473
#8  0x00007fd4325e84d3 in __GI___libc_free (mem=<optimized out>) at ./malloc/malloc.c:3391
#9  0x0000564b6e59a38d in ngx_destroy_pool (pool=0x564b6edb3190) at src/core/ngx_palloc.c:85
#10 0x0000564b6e5f37fd in ngx_http_upstream_finalize_request (r=r@entry=0x564b6ed6e490, u=u@entry=0x564b6ed93600, rc=rc@entry=499)
    at src/http/ngx_http_upstream.c:4513
#11 0x0000564b6e5f3f65 in ngx_http_upstream_check_broken_connection (r=0x564b6ed6e490, ev=0x564b6eca6890) at src/http/ngx_http_upstream.c:1433
#12 0x0000564b6e5f41c8 in ngx_http_upstream_rd_check_broken_connection (r=<optimized out>) at src/http/ngx_http_upstream.c:1306
#13 0x0000564b6e5e0c59 in ngx_http_request_handler (ev=0x564b6eca6890) at src/http/ngx_http_request.c:2415
#14 0x0000564b6e5c65d4 in ngx_epoll_process_events (cycle=0x564b6eac7490, timer=<optimized out>, flags=<optimized out>) at src/event/modules/ngx_epoll_module.c:901
#15 0x0000564b6e5ba4a1 in ngx_process_events_and_timers (cycle=cycle@entry=0x564b6eac7490) at src/event/ngx_event.c:248
#16 0x0000564b6e5c4141 in ngx_worker_process_cycle (cycle=0x564b6eac7490, data=<optimized out>) at src/os/unix/ngx_process_cycle.c:721
#17 0x0000564b6e5c257d in ngx_spawn_process (cycle=cycle@entry=0x564b6eac7490, proc=proc@entry=0x564b6e5c402a <ngx_worker_process_cycle>, data=data@entry=0x2,
    name=name@entry=0x564b6e67861a "worker process", respawn=respawn@entry=-3) at src/os/unix/ngx_process.c:199
#18 0x0000564b6e5c3859 in ngx_start_worker_processes (cycle=cycle@entry=0x564b6eac7490, n=4, type=type@entry=-3) at src/os/unix/ngx_process_cycle.c:344
#19 0x0000564b6e5c4a6e in ngx_master_process_cycle (cycle=cycle@entry=0x564b6eac7490) at src/os/unix/ngx_process_cycle.c:130
#20 0x0000564b6e59880d in main (argc=<optimized out>, argv=<optimized out>) at src/core/nginx.c:383

tmagnien · 2022-11-16T14:56:51Z

Thanks a lot @klp18 . Do you think you could provide a full Nginx debug log ? This would help me understanding the sequence of events.

And in the meantime, please try this patch:

diff --git a/ngx_http_upstream_ntlm_module.c b/ngx_http_upstream_ntlm_module.c
index 8d1179d..9de776f 100644
--- a/ngx_http_upstream_ntlm_module.c
+++ b/ngx_http_upstream_ntlm_module.c
@@ -378,9 +378,7 @@ static void ngx_http_upstream_client_conn_cleanup(void *data) {

     // Check if the item was removed previously from the queue (backend drop)
     if (item->peer_connection != NULL) {
-
-        item->peer_connection->read->timedout = 1;
-        ngx_post_event(item->peer_connection->read,&ngx_posted_events);
+        ngx_add_timer(item->peer_connection->read, 1);

         ngx_queue_remove(&item->queue);
         ngx_queue_insert_head(&item->conf->free, &item->queue);
@@ -451,6 +449,11 @@ static void ngx_http_upstream_ntlm_close(ngx_connection_t *c) {

 #endif

+    // If connection is already closed, don't do anything (pool is already gone)
+    if (c->fd == (ngx_socket_t) -1) {
+        return;
+    }
+
     if (c->pool) {
         ngx_destroy_pool(c->pool);
         c->pool = NULL;

klp18 · 2022-11-16T17:43:22Z

new dumps with debug logs with the new patch applied

tmagnien · 2022-11-17T14:34:35Z

Thanks a lot @klp18 , it was helpful. Can you try the following patch:

diff --git a/ngx_http_upstream_ntlm_module.c b/ngx_http_upstream_ntlm_module.c
index 8d1179d..6b392de 100644
--- a/ngx_http_upstream_ntlm_module.c
+++ b/ngx_http_upstream_ntlm_module.c
@@ -268,6 +268,7 @@ static void ngx_http_upstream_free_ntlm_peer(ngx_peer_connection_t *pc,
     ngx_http_upstream_t *u;
     ngx_pool_cleanup_t *cln;
     ngx_http_upstream_ntlm_cache_t *cleanup_item = NULL;
+    unsigned invalid = 0;

     /* cache valid connections */

@@ -277,26 +278,32 @@ static void ngx_http_upstream_free_ntlm_peer(ngx_peer_connection_t *pc,
     if (state & NGX_PEER_FAILED || c == NULL || c->read->eof ||
         c->read->error || c->read->timedout || c->write->error ||
         c->write->timedout) {
+        invalid = 1;
         goto invalid;
     }

     if (!u->keepalive) {
+        invalid = 1;
         goto invalid;
     }

     if (!u->request_body_sent) {
+        invalid = 1;
         goto invalid;
     }

     if (ngx_terminate || ngx_exiting) {
+        invalid = 1;
         goto invalid;
     }

     if (ngx_handle_read_event(c->read, 0) != NGX_OK) {
+        invalid = 1;
         goto invalid;
     }

     if (hndp->ntlm_init == 0 && hndp->cached == 0) {
+        invalid = 1;
         goto invalid;
     }

@@ -365,6 +372,17 @@ static void ngx_http_upstream_free_ntlm_peer(ngx_peer_connection_t *pc,
     }

 invalid:
+    if (invalid) {
+        // remove the client connection drop down handler if present
+        for (cln = hndp->client_connection->pool->cleanup; cln; cln = cln->next) {
+            if (cln->handler == ngx_http_upstream_client_conn_cleanup) {
+                ngx_log_debug0(NGX_LOG_DEBUG_HTTP, pc->log, 0,
+                    "ntlm free peer removing cleanup");
+                cln->handler = NULL;
+            }
+        }
+    }
+
     hndp->original_free_peer(pc, hndp->data, state);
 }

klp18 · 2022-11-18T13:33:46Z

crashes are much less frequent now.
new dumps and debug logs

tmagnien · 2022-11-23T08:04:58Z

@klp18 I think I spotted the root cause, can you please try this patch:

diff --git a/ngx_http_upstream_ntlm_module.c b/ngx_http_upstream_ntlm_module.c
index 8d1179d..981dcb0 100644
--- a/ngx_http_upstream_ntlm_module.c
+++ b/ngx_http_upstream_ntlm_module.c
@@ -268,6 +268,7 @@ static void ngx_http_upstream_free_ntlm_peer(ngx_peer_connection_t *pc,
     ngx_http_upstream_t *u;
     ngx_pool_cleanup_t *cln;
     ngx_http_upstream_ntlm_cache_t *cleanup_item = NULL;
+    unsigned invalid = 0;

     /* cache valid connections */

@@ -277,26 +278,32 @@ static void ngx_http_upstream_free_ntlm_peer(ngx_peer_connection_t *pc,
     if (state & NGX_PEER_FAILED || c == NULL || c->read->eof ||
         c->read->error || c->read->timedout || c->write->error ||
         c->write->timedout) {
+        invalid = 1;
         goto invalid;
     }

     if (!u->keepalive) {
+        invalid = 1;
         goto invalid;
     }

     if (!u->request_body_sent) {
+        invalid = 1;
         goto invalid;
     }

     if (ngx_terminate || ngx_exiting) {
+        invalid = 1;
         goto invalid;
     }

     if (ngx_handle_read_event(c->read, 0) != NGX_OK) {
+        invalid = 1;
         goto invalid;
     }

     if (hndp->ntlm_init == 0 && hndp->cached == 0) {
+        invalid = 1;
         goto invalid;
     }

@@ -327,6 +334,8 @@ static void ngx_http_upstream_free_ntlm_peer(ngx_peer_connection_t *pc,
     for (cln = item->client_connection->pool->cleanup; cln; cln = cln->next) {
         if (cln->handler == ngx_http_upstream_client_conn_cleanup) {
             cleanup_item = cln->data;
+            // reassign cleanup data in case it has moved
+            cln->data = item;
             break;
         }
     }
@@ -365,6 +374,17 @@ static void ngx_http_upstream_free_ntlm_peer(ngx_peer_connection_t *pc,
     }

 invalid:
+    if (invalid) {
+        // remove the client connection drop down handler if present
+        for (cln = hndp->client_connection->pool->cleanup; cln; cln = cln->next) {
+            if (cln->handler == ngx_http_upstream_client_conn_cleanup) {
+                ngx_log_debug0(NGX_LOG_DEBUG_HTTP, pc->log, 0,
+                    "ntlm free peer removing cleanup");
+                cln->handler = NULL;
+            }
+        }
+    }
+
     hndp->original_free_peer(pc, hndp->data, state);
 }

klp18 · 2022-11-23T14:33:50Z

@tmagnien
Not a single crash in three hours.
I'll post an update tomorrow.

tmagnien · 2022-11-23T14:36:11Z

@klp18 great news ! If you confirm tomorrow I'll create a PR with the patch.

klp18 · 2022-11-24T09:41:22Z

@tmagnien Four crashes happened overnight.
One of them (180266) contains ngx_http_upstream_ntlm_close_handler in backtrace.
I turned off the debug logs yesterday.
I can turn it back on and wait for a new crash.

tmagnien · 2022-11-24T09:56:47Z

@klp18 Thanks for the core dumps, and yes, if you can enable debug it will help a lot.

klp18 · 2022-11-24T17:08:03Z

@tmagnien
crash dump with debug logs

tmagnien · 2022-11-28T11:05:51Z

@klp18 updated patch to handle last (I hope) edge case:

diff --git a/ngx_http_upstream_ntlm_module.c b/ngx_http_upstream_ntlm_module.c
index 8d1179d..d68c2b3 100644
--- a/ngx_http_upstream_ntlm_module.c
+++ b/ngx_http_upstream_ntlm_module.c
@@ -268,6 +268,7 @@ static void ngx_http_upstream_free_ntlm_peer(ngx_peer_connection_t *pc,
     ngx_http_upstream_t *u;
     ngx_pool_cleanup_t *cln;
     ngx_http_upstream_ntlm_cache_t *cleanup_item = NULL;
+    unsigned invalid = 0;

     /* cache valid connections */

@@ -277,26 +278,32 @@ static void ngx_http_upstream_free_ntlm_peer(ngx_peer_connection_t *pc,
     if (state & NGX_PEER_FAILED || c == NULL || c->read->eof ||
         c->read->error || c->read->timedout || c->write->error ||
         c->write->timedout) {
+        invalid = 1;
         goto invalid;
     }

     if (!u->keepalive) {
+        invalid = 1;
         goto invalid;
     }

     if (!u->request_body_sent) {
+        invalid = 1;
         goto invalid;
     }

     if (ngx_terminate || ngx_exiting) {
+        invalid = 1;
         goto invalid;
     }

     if (ngx_handle_read_event(c->read, 0) != NGX_OK) {
+        invalid = 1;
         goto invalid;
     }

     if (hndp->ntlm_init == 0 && hndp->cached == 0) {
+        invalid = 1;
         goto invalid;
     }

@@ -306,6 +313,14 @@ static void ngx_http_upstream_free_ntlm_peer(ngx_peer_connection_t *pc,

         item = ngx_queue_data(q, ngx_http_upstream_ntlm_cache_t, queue);
         ngx_http_upstream_ntlm_close(item->peer_connection);
+        // remove the client connection drop down handler if present
+        for (cln = item->client_connection->pool->cleanup; cln; cln = cln->next) {
+            if (cln->handler == ngx_http_upstream_client_conn_cleanup) {
+                ngx_log_debug0(NGX_LOG_DEBUG_HTTP, pc->log, 0,
+                    "ntlm free peer removing cleanup");
+                cln->handler = NULL;
+            }
+        }
         item->peer_connection = NULL;
     } else {
         q = ngx_queue_head(&hndp->conf->free);
@@ -327,6 +342,8 @@ static void ngx_http_upstream_free_ntlm_peer(ngx_peer_connection_t *pc,
     for (cln = item->client_connection->pool->cleanup; cln; cln = cln->next) {
         if (cln->handler == ngx_http_upstream_client_conn_cleanup) {
             cleanup_item = cln->data;
+            // reassign cleanup data in case it has moved
+            cln->data = item;
             break;
         }
     }
@@ -365,6 +382,17 @@ static void ngx_http_upstream_free_ntlm_peer(ngx_peer_connection_t *pc,
     }

 invalid:
+    if (invalid) {
+        // remove the client connection drop down handler if present
+        for (cln = hndp->client_connection->pool->cleanup; cln; cln = cln->next) {
+            if (cln->handler == ngx_http_upstream_client_conn_cleanup) {
+                ngx_log_debug0(NGX_LOG_DEBUG_HTTP, pc->log, 0,
+                    "ntlm free peer removing cleanup");
+                cln->handler = NULL;
+            }
+        }
+    }
+
     hndp->original_free_peer(pc, hndp->data, state);
 }

klp18 · 2022-11-29T08:33:07Z

There was one crash in last 16 hours though.
Now I will restart nginx with debug logs and try to catch it again.

#0  0x0000558ee69a7d44 in ngx_http_upstream_handler (ev=0x558ee8ad1250) at src/http/ngx_http_upstream.c:1282
1282        ngx_http_set_log_request(c->log, r);
(gdb) bt
#0  0x0000558ee69a7d44 in ngx_http_upstream_handler (ev=0x558ee8ad1250) at src/http/ngx_http_upstream.c:1282
#1  0x0000558ee696ea2e in ngx_event_process_posted (cycle=cycle@entry=0x558ee88f1490, posted=0x558ee6a82d10 <ngx_posted_events>)
    at src/event/ngx_event_posted.c:34
#2  0x0000558ee696e4e6 in ngx_process_events_and_timers (cycle=cycle@entry=0x558ee88f1490) at src/event/ngx_event.c:263
#3  0x0000558ee6978141 in ngx_worker_process_cycle (cycle=0x558ee88f1490, data=<optimized out>) at src/os/unix/ngx_process_cycle.c:721
#4  0x0000558ee697657d in ngx_spawn_process (cycle=cycle@entry=0x558ee88f1490, proc=proc@entry=0x558ee697802a <ngx_worker_process_cycle>,
    data=data@entry=0x0, name=name@entry=0x558ee6a2c61a "worker process", respawn=respawn@entry=-3) at src/os/unix/ngx_process.c:199
#5  0x0000558ee6977859 in ngx_start_worker_processes (cycle=cycle@entry=0x558ee88f1490, n=4, type=type@entry=-3) at src/os/unix/ngx_process_cycle.c:344
#6  0x0000558ee6978a6e in ngx_master_process_cycle (cycle=cycle@entry=0x558ee88f1490) at src/os/unix/ngx_process_cycle.c:130
#7  0x0000558ee694c80d in main (argc=<optimized out>, argv=<optimized out>) at src/core/nginx.c:383

klp18 · 2022-11-30T08:31:56Z

@tmagnien
dumps with debug logs

tmagnien · 2022-11-30T14:28:17Z

@klp18 Thanks! Just one question : from the logs I feel there's a internal proxy configuration on your setup, something like client -> nginx -> nginx -> server, is this right ?

klp18 · 2022-11-30T17:09:41Z

@tmagnien
Good question.
Internal clients only work through one nginx with ntlm-module.
External clients work through another additional "proxy" - TMG 2010 (which I plan to get rid of if I can get nginx to work stable).
Thus, the requests look like client -> router (NAT1) -> tmg2010 (NAT2) -> nginx -> server.

Publishing through TMG 2010 has its own specific features.
For example, when it proxies an http/https request, it can preserve the client's original ip address at the tcp level (using the built-in NAT, I guess).

So it turns out that I don't have an extra proxy in front of nginx, but a double NAT.

tmagnien · 2023-02-07T14:45:10Z

Hi @klp18 . I finally have some time to work in this again. Would you have the possibility to make the same tests with keepalive disabled ? I mean "keepalive_timeout 0;" in nginx config file.
It would help me confirm that this is where the issue resides.
Thanks a lot !

klp18 · 2023-02-07T19:37:18Z

Hello @tmagnien,
when I set keepalive_timeout to zero, authentication stops working.
Outlook constantly asks for the password, and I see an endless 401 http status code in the nginx log.
Well, since crashes are quite rare (a few times a day), I can't leave nginx running for long in this state.
And it's unlikely to crash if clients can't authenticate.

tmagnien · 2023-02-07T19:41:57Z

Ok no problem, thanks for trying. Le mar. 7 févr. 2023, 20:37, klp18 ***@***.***> a écrit :

…

Hello @tmagnien <https://github.com/tmagnien>, when I set keepalive_timeout to zero, authentication stops working. Outlook constantly asks for the password, and I see an endless 401 http status code in the nginx log. Well, since crashes are quite rare (a few times a day), I can't leave nginx running for long in this state. And it's unlikely to crash if clients can't authenticate. — Reply to this email directly, view it on GitHub <#5 (comment)>, or unsubscribe <https://github.com/notifications/unsubscribe-auth/ABJMVWF4D3KVCU57AVTQWKLWWKP7TANCNFSM5BAHGO4A> . You are receiving this because you were mentioned.Message ID: ***@***.***>

Some segfaults #5

Some segfaults #5

Comments

dagmoller commented Jul 26, 2021

gabihodoroaga commented Jul 27, 2021

dagmoller commented Aug 17, 2021 • edited Loading

gabihodoroaga commented Aug 17, 2021

dbalaev commented Sep 27, 2022

gabihodoroaga commented Sep 28, 2022

dbalaev commented Sep 29, 2022

klp18 commented Oct 7, 2022 • edited Loading

gabihodoroaga commented Oct 7, 2022

klp18 commented Oct 7, 2022

justarandomsysadmin commented Oct 17, 2022

tmagnien commented Oct 26, 2022

klp18 commented Oct 27, 2022

tmagnien commented Oct 27, 2022

tmagnien commented Oct 31, 2022

klp18 commented Oct 31, 2022

tmagnien commented Nov 2, 2022

klp18 commented Nov 2, 2022

tmagnien commented Nov 9, 2022

klp18 commented Nov 9, 2022

tmagnien commented Nov 15, 2022

klp18 commented Nov 15, 2022

tmagnien commented Nov 16, 2022 • edited Loading

klp18 commented Nov 16, 2022

tmagnien commented Nov 17, 2022

klp18 commented Nov 18, 2022

tmagnien commented Nov 23, 2022

klp18 commented Nov 23, 2022

tmagnien commented Nov 23, 2022

klp18 commented Nov 24, 2022

tmagnien commented Nov 24, 2022

klp18 commented Nov 24, 2022

tmagnien commented Nov 28, 2022

klp18 commented Nov 29, 2022

klp18 commented Nov 30, 2022 • edited Loading

tmagnien commented Nov 30, 2022

klp18 commented Nov 30, 2022

tmagnien commented Feb 7, 2023

klp18 commented Feb 7, 2023

tmagnien commented Feb 7, 2023 via email

dagmoller commented Aug 17, 2021 •

edited

Loading

klp18 commented Oct 7, 2022 •

edited

Loading

tmagnien commented Nov 16, 2022 •

edited

Loading

klp18 commented Nov 30, 2022 •

edited

Loading