Notes regarding nfqueue performance #1104
Replies: 1 comment
-
In order to improve the packets per second analyzed, we can add more nfqueue queues. There're 2 ways of doing it: Notes regarding the first option: Note 0: Note 1: Note 2: Besides, if we allow to customize the queues number, we could launch multiple daemons on different queues. Note 3: If you monitor the file
https://www.spinics.net/lists/netfilter/msg57702.html
https://www.spinics.net/lists/netfilter/msg52257.html
https://lwn.net/Articles/544379/ Note 4: Since nftables (at least) 0.7, you can create more flexible ways for balancing packets. If we wanted to distribute the traffic among all the created queues, we could do:
https://wiki.nftables.org/wiki-nftables/index.php/Math_operations |
Beta Was this translation helpful? Give feedback.
-
These notes affect mainly to UDP network traffic, because we intercept almost any outgoing packet. It also affects TCP but to a lesser extent, because we only intercept the first TCP SYN packet. I'll write about it in a separate comment.
Please, if you find any inconsistencies or notice anything innacurate written here, please, feel free to comment so we can improve the understanding of the inner workings of this interception method.
Environment:
iperf -u -i 1 -e -p 10000 -b 1000M -c 192.168.1.105
(1 unique/repetitive connection with same dstip+dstport)Note 0:
By default, opensnitch uses 1 nfqueue queue to receive intercepted packets from the kernel. For TCP we only intercept the first TCP SYN packet, but for UDP we receive almost all outgoing packets.
The max throughput for UDP traffic is ~120Mb/s (~100Mb/s with action 'allow').
Note 1:
Connections received from the kernel via nfqueue are read sequentially here:
opensnitch/daemon/netfilter/queue.h
Line 107 in 15fcf67
if we can't issue a verdict fast enough, the packets are queued in kernel. The status of the queues can be seen here :
/proc/net/netfilter/nfnetlink_queue
Note 2:
The pool of workers does not affect much (if at all) to handle more packets per second:
opensnitch/daemon/main.go
Line 206 in 15fcf67
It helps to avoid timeouts reading packets from the channel of packets:
opensnitch/daemon/netfilter/queue.go
Line 226 in 15fcf67
Note 3:
If we just accept packets in go_callback() , the bandwidth is ~400Mb/s.
opensnitch/daemon/netfilter/queue.go
Line 191 in 15fcf67
If we deny packets in go_callback(), the bandwidth is 1Gb/s (I'd take it with a grain of salt.... but it's what I saw)
If the default verdict (applyDefaultAction()) is issued at the bottom for-loop of main.go , thus sending the packets through the channel, the bandwidth drops to ~200Mb/s
opensnitch/daemon/main.go
Line 601 in 15fcf67
Sending the packet at the bottom for-loop of main.go directly to onPacket(), instead of to the pool of workers, the max throughput increases to 120-130Mb/s (at the risk of having timeouts reading from the channel of Packets)
opensnitch/daemon/main.go
Line 601 in 15fcf67
Note 4:
This is pretty much the same experience as other software using nfqueue (snort, peerguardian, ntopng, with 1 queue...)
ntop/ntopng#822
https://oisf-users.openinfosecfoundation.narkive.com/k3syvc8I/tuning-suricata-inline-ips-performance
Beta Was this translation helpful? Give feedback.
All reactions