Differences

This shows you the differences between two versions of the page.

--- 100g_tuning [2023/07/27 21:24]
root created
+++ 100g_tuning [2023/07/28 17:15] (current)
root
@@ Line 1: / Line 1: @@
-==100Gbit Ethernet Tuning==
+`==100Gbit Ethernet Tuning==
-A summary of internet resources.  Background: A 100 Gb ethernet adapter will not do anything near 100 Gb without tuning.
+A summary of internet resources.  Background: A 100 Gb ethernet adapter will not do anything near 100 Gb without tuning.  A new installation over 200 miles shows 12 GB/s.
 [[https://fasterdata.es.net/host-tuning/linux/100g-tuning/ | fasterdata.es.net]]
@@ Line 10: / Line 10: @@
 [[https://access.redhat.com/solutions/168483 | Redhat-1]]
-[[https://access.redhat.com/discussions/684183 | Redhat-2 ]]
+[[https://access.redhat.com/documentation/en-us/red_hat_enterprise_linux/8/html/configuring_and_managing_virtualization/configuring-virtual-machine-network-connections_configuring-and-managing-virtualization | Redhat-2 ]]
-[[https://access.redhat.com/documentation/en-us/red_hat_enterprise_linux/8/html/configuring_and_managing_virtualization/configuring-virtual-machine-network-connections_configuring-and-managing-virtualization | Redhat 3]]
+[[https://access.redhat.com/solutions/3713681 | Redhat-3]
 [[https://docs.nvidia.com/networking/display/winof2v320/Performance+Tuning | NVidia ]]
@@ Line 20: / Line 20: @@
 [[ https://www.perfsonar.net/index.html | PerfSonar ]]
+[[ https://cdrdv2-public.intel.com/334019/xl710-x710-performance-tuning-linux-guide.pdf | Intel ]]
+=== What PerfSonar does automatically ===
+<code>
+net.core.rmem_max = 536870912
+net.core.wmem_max = 536870912
+net.ipv4.conf.all.arp_announce = 2
+net.ipv4.conf.all.arp_filter = 1
+net.ipv4.conf.all.arp_ignore = 1
+net.ipv4.conf.default.arp_filter = 1
+net.ipv4.tcp_congestion_control = htcp
+net.ipv4.tcp_mtu_probing = 1
+net.ipv4.tcp_no_metrics_save = 1
+net.ipv4.tcp_rmem = 4096 87380 268435456
+net.ipv4.tcp_wmem = 4096 65536 268435456
+ifconfig ens16f1 mtu 9000
+</code>
+==Redhat 1==
+<code>
+net.ipv4.tcp_window_scaling = 1
+net.ipv4.tcp_rmem = 8192 x 4194304
+net.ipv4.tcp_wmem = 8192 Y 4194304
+The middle value Y is the default buffer size. This is the most important value. You might wish to start at 524288 (512kb) and move up from there. You will generally wish to try small increments of your Bandwidth Delay Product. Try BDP x1 then BDP x1.25 then BDP x1.5 and so on. Once you start to get increased speeds, you may wish to refine your testing down smaller, for example BDP x2.5 then BDP x2.6 and so on. It is unlikely you will need a value larger than BDP x5.
+</code>
+==NVidia==
+This is windows so requires translation
+<code>
+Disable SACK
+Enable Fast Datagram UDP
+Set RSS=enabled
+Set closest NUMA
+Set receive buffers=512
+Set send buffers=2048
+IPV4 Checksum offload enabled
+TCP/UDP Checksum offload enabled
+IPV6 TCP/UDP Checksum offload enabled
+Large Send Option offload enabled
+</code>
+==BBR==
+Enable TCP/BBR congestion control. See BBR and Redhat-3 above.
+echo "net.ipv4.tcp_congestion_control = bbr" >> /etc/sysctl.conf
+==Stanford==
+<code>
+firewall-cmd --zone=public --add-port=61617/tcp --permanent
+firewall-cmd --zone=public --add-port=8090/tcp --permanent
+firewall-cmd --zone=public --add-port=8096/tcp --permanent
+firewall-cmd --zone=public --add-port=4823/tcp --permanent
+firewall-cmd --zone=public --add-port=6001-6200/tcp --permanent
+firewall-cmd --zone=public --add-port=6001-6200/udp --permanent
+firewall-cmd --zone=public --add-port=5001-5900/udp --permanent
+firewall-cmd --zone=public --add-port=5001-5900/tcp --permanent
+firewall-cmd --zone=public --add-port=861/tcp --permanent
+firewall-cmd --zone=public --add-port=8760-9960/udp --permanent
+firewall-cmd --zone=public --add-port=33434-33634/udp --permanent
+This allows the fdt.jar tool to use its default port
+firewall-cmd --zone=public --add-port=54321/tcp --permanent
+Makes these permanent and reload rule database
+firewall-cmd --reload
+Lets find the CPU that the 100g NIC is associated with.
+cat /sys/class/net/<100g-NIC-name>/device/numa_node
+Usual response is either '0' or '1', meaning the NIC is associated with either the '0' CPU or the '1' CPU. (If it comes back with a '-1' it probably suggests it is a single CPU system.) Lets assume it returned a '1'.
+Knowing that, run this command:
+lscpu
+Most modern CPUs can run at different clock frequencies and often do so to save energy. In our case we want to run the CPU as fast as possible. First lets see what speed each CPU core is running at and what the maximum speed could be. Just run this funky command:
+grep -E '^model name|^cpu MHz' /proc/cpuinfo
+You'll probably see that the cores aren't running near their spec speed. Most often at a level called 'powersave'.
+This simple command sets all the cores to 'performance' instead:
+sudo cpupower frequency-set --governor performance
+(No controversy to add/change these values for high speed nics)
+# increase TCP max buffer size setable using setsockopt()
+# allow testing with 256MB buffers
+net.core.rmem_max = 268435456
+net.core.wmem_max = 268435456
+# increase Linux autotuning TCP buffer limits
+# min, default, and max number of bytes to use
+# allow auto-tuning up to 128MB buffers
+net.ipv4.tcp_rmem = 4096 87380 134217728
+net.ipv4.tcp_wmem = 4096 65536 134217728
+# recommended to increase this for CentOS6 with 10G NICS or higher
+net.core.netdev_max_backlog = 250000
+# don't cache ssthresh from previous connection
+net.ipv4.tcp_no_metrics_save = 1
+# Explicitly set htcp as the congestion control: cubic buggy in older 2.6 kernels
+net.ipv4.tcp_congestion_control = htcp
+# If you are using Jumbo Frames, also set this
+net.ipv4.tcp_mtu_probing = 1
+# recommended for CentOS7/Debian8 hosts
+net.core.default_qdisc = fq
+</code>
+==Intel==
+<code>
+To configure IRQ affinity, stop irqbalance and then either use the set_irq_affinity script from the
+i40e source package or pin queues manually.
+Disable user-space IRQ balancer to enable queue pinning:
+• systemctl disable irqbalance
+• systemctl stop irqbalance
+Using the set_irq_affinity script from the i40e source package (recommended):
+• To use all cores:
+[path-to-i40epackage]/scripts/set_irq_affinity -x all ethX
+• To use only cores on the local NUMA socket:
+[path-to-i40epackage]/scripts/set_irq_affinity -x local ethX
+• You can also select a range of cores. Avoid using cpu0 because it runs timer tasks.
+[path-to-i40epackage]/scripts/set_irq_affinity 1-2 ethX
+Manually:
+• Find the processors attached to each node using:
+numactl --hardware
+lscpu
+• Find the bit masks for each of the processors:
+• Assuming cores 0-11 for node 0: [1,2,4,8,10,20,40,80,100,200,400,800]
+• Find the IRQs assigned to the port being assigned:
+grep ethX /proc/interrupts and note the IRQ values
+For example, 181-192 for the 12 vectors loaded.
+• Echo the SMP affinity value into the corresponding IRQ entry. Note that this needs to be done for
+each IRQ entry:
+echo 1 > /proc/irq/181/smp_affinity
+echo 2 > /proc/irq/182/smp_affinity
+echo 4 > /proc/irq/183/smp_affinity
+</code>
+==ethtool==
+from vm host
+<code>
+[root@fiona ~]# ethtool -k ens4np0
+Features for ens4np0:
+rx-checksumming: on
+tx-checksumming: on
+	tx-checksum-ipv4: off [fixed]
+	tx-checksum-ip-generic: on
+	tx-checksum-ipv6: off [fixed]
+	tx-checksum-fcoe-crc: off [fixed]
+	tx-checksum-sctp: off [fixed]
+scatter-gather: on
+	tx-scatter-gather: on
+	tx-scatter-gather-fraglist: off [fixed]
+tcp-segmentation-offload: on
+	tx-tcp-segmentation: on
+	tx-tcp-ecn-segmentation: off [fixed]
+	tx-tcp-mangleid-segmentation: off
+	tx-tcp6-segmentation: on
+generic-segmentation-offload: on
+generic-receive-offload: on
+large-receive-offload: off [fixed]
+rx-vlan-offload: on
+tx-vlan-offload: on
+ntuple-filters: off
+receive-hashing: on
+highdma: on [fixed]
+rx-vlan-filter: on
+vlan-challenged: off [fixed]
+tx-lockless: off [fixed]
+netns-local: off [fixed]
+tx-gso-robust: off [fixed]
+tx-fcoe-segmentation: off [fixed]
+tx-gre-segmentation: on
+tx-gre-csum-segmentation: on
+tx-ipxip4-segmentation: on
+tx-ipxip6-segmentation: on
+tx-udp_tnl-segmentation: on
+tx-udp_tnl-csum-segmentation: on
+tx-gso-partial: on
+tx-tunnel-remcsum-segmentation: off [fixed]
+tx-sctp-segmentation: off [fixed]
+tx-esp-segmentation: off [fixed]
+tx-udp-segmentation: on
+tx-gso-list: off [fixed]
+rx-udp-gro-forwarding: off
+rx-gro-list: off
+tls-hw-rx-offload: off [fixed]
+fcoe-mtu: off [fixed]
+tx-nocache-copy: off
+loopback: off [fixed]
+rx-fcs: off
+rx-all: off
+tx-vlan-stag-hw-insert: on
+rx-vlan-stag-hw-parse: off [fixed]
+rx-vlan-stag-filter: on [fixed]
+l2-fwd-offload: off [fixed]
+hw-tc-offload: off
+esp-hw-offload: off [fixed]
+esp-tx-csum-hw-offload: off [fixed]
+rx-udp_tunnel-port-offload: on
+tls-hw-tx-offload: off [fixed]
+rx-gro-hw: off [fixed]
+tls-hw-record: off [fixed]
+</code>

Arkansas High Performace Computing Center [hpcwiki]

User Tools

Site Tools

Differences

Page Tools