Notes regarding nfqueue performance #1104

gustavo-iniguez-goya · 2024-03-07T23:56:21Z

gustavo-iniguez-goya
Mar 7, 2024
Collaborator

These notes affect mainly to UDP network traffic, because we intercept almost any outgoing packet. It also affects TCP but to a lesser extent, because we only intercept the first TCP SYN packet. I'll write about it in a separate comment.

Please, if you find any inconsistencies or notice anything innacurate written here, please, feel free to comment so we can improve the understanding of the inner workings of this interception method.

Environment:

VM, 3 CPUs, 2GB RAM, NIC 1Gb/s
test command: iperf -u -i 1 -e -p 10000 -b 1000M -c 192.168.1.105 (1 unique/repetitive connection with same dstip+dstport)
opensnitch default action: deny, no rules.

Note 0:
By default, opensnitch uses 1 nfqueue queue to receive intercepted packets from the kernel. For TCP we only intercept the first TCP SYN packet, but for UDP we receive almost all outgoing packets.

The max throughput for UDP traffic is ~120Mb/s (~100Mb/s with action 'allow').

Note 1:
Connections received from the kernel via nfqueue are read sequentially here:

opensnitch/daemon/netfilter/queue.h

Line 107 in 15fcf67

while ((rcvd = recv(fd, buf, sizeof(buf), 0)) >= 0) {

if we can't issue a verdict fast enough, the packets are queued in kernel. The status of the queues can be seen here : /proc/net/netfilter/nfnetlink_queue

Note 2:
The pool of workers does not affect much (if at all) to handle more packets per second:

opensnitch/daemon/main.go

Line 206 in 15fcf67

func worker(id int) {

It helps to avoid timeouts reading packets from the channel of packets:

opensnitch/daemon/netfilter/queue.go

Line 226 in 15fcf67

case *queueChannel <- p:

Note 3:
If we just accept packets in go_callback() , the bandwidth is ~400Mb/s.

opensnitch/daemon/netfilter/queue.go

Line 191 in 15fcf67

    
           func go_callback(queueID C.int, data *C.uchar, length C.int, mark C.uint, idx uint32, vc *VerdictContainerC, uid, devIn, devOut uint32) {

If we deny packets in go_callback(), the bandwidth is 1Gb/s (I'd take it with a grain of salt.... but it's what I saw)

If the default verdict (applyDefaultAction()) is issued at the bottom for-loop of main.go , thus sending the packets through the channel, the bandwidth drops to ~200Mb/s

opensnitch/daemon/main.go

Line 601 in 15fcf67

for {

Sending the packet at the bottom for-loop of main.go directly to onPacket(), instead of to the pool of workers, the max throughput increases to 120-130Mb/s (at the risk of having timeouts reading from the channel of Packets)

opensnitch/daemon/main.go

Line 601 in 15fcf67

for {

Note 4:

This is pretty much the same experience as other software using nfqueue (snort, peerguardian, ntopng, with 1 queue...)

ntop/ntopng#822
https://oisf-users.openinfosecfoundation.narkive.com/k3syvc8I/tuning-suricata-inline-ips-performance

gustavo-iniguez-goya · 2024-03-08T00:43:29Z

gustavo-iniguez-goya
Mar 8, 2024
Collaborator Author

In order to improve the packets per second analyzed, we can add more nfqueue queues.

There're 2 ways of doing it:
a) one single daemon with multiple queues / threads,
b) multiple daemons with multiple queues (1 would act as the primary, and the others as secondary nodes).

Notes regarding the first option:

Note 0:
With 3 queues the thoughput increases from ~120Mb/s to ~190Mb/s (action 'deny').

Note 1:
Although we can create as many queues as we want, adding more queues than the system's CPUs doesn't seem to increase the throughput.

Note 2:
Allowing to create more queues can be beneficial, for example to prioritize a particular network traffic, to send TCP packets to 1 queue, and UDP to another queue, or to redirect forwarded traffic to 1 queue.

Besides, if we allow to customize the queues number, we could launch multiple daemons on different queues.

Note 3:
Even if we create multiple queues, by default (nft add ... queue num 1-10) netfilter will send all the packets belonging to a connection to the same queue, because it hashes the connection by srcip+dstip+protocol. So by default we won't notice any improvement in this particular UDP test, or this particular scenario where stream of UDP packets is sent to from an IP to the same DstIP+protocol (for example thousands of DNS requests from clients to the same DNS server).

If you monitor the file /proc/net/netfilter/nfnetlink_queue, you'll see that 1 queue is always active while the other ones do not handle packets.

Current implementaion of --queue-balance doesn't balance
well in some environments, because the hash value is
calculated only using saddr, daddr and protocol in the
ip-header.

https://www.spinics.net/lists/netfilter/msg57702.html

"Packets belonging to the same connection are put into the same nfqueue"

https://www.spinics.net/lists/netfilter/msg52257.html

Current NFQUEUE target uses a hash, computed over source and
destination address (and other parameters), for steering the packet
to the actual NFQUEUE. This however forgets about the fact that the
packet eventually is handled by a particular CPU on user request.

https://lwn.net/Articles/544379/
https://www.spinics.net/lists/netfilter-devel/msg71713.html

Note 4:

Since nftables (at least) 0.7, you can create more flexible ways for balancing packets.
https://www.mail-archive.com/netfilter-announce@lists.netfilter.org/msg00200.html

If we wanted to distribute the traffic among all the created queues, we could do:

nft insert rule mangle output ... queue to numgen inc mod 8 (8 being the number of created queues).

https://wiki.nftables.org/wiki-nftables/index.php/Math_operations
https://www.zevenet.com/wp-content/uploads/2017/03/netdevconf-Load_Balancing_with_nftables_2-Laura_Garcia.pdf

0 replies

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Notes regarding nfqueue performance #1104

{{title}}

Replies: 1 comment

{{title}}

{{editor}}'s edit

{{editor}}'s edit

Select a reply

Notes regarding nfqueue performance #1104

gustavo-iniguez-goya Mar 7, 2024 Collaborator

Replies: 1 comment

gustavo-iniguez-goya Mar 8, 2024 Collaborator Author

gustavo-iniguez-goya
Mar 7, 2024
Collaborator

gustavo-iniguez-goya
Mar 8, 2024
Collaborator Author