I'll try to see if I have time to test again the veth netmap driver and report with a more detailed issue. Not sure if the netmap veth driver supports multi-queue operation (because we couldn't manage to get it to work), but in my opinion it should. Virtualization can happen via SR-IOV (netmap has ixgbevf but not i40evf), virtio_net (netmap has driver for this, we haven't tested it because we don't use virtio_net) or containers linked with veth (netmap has driver for this, but according to our tests, it doesn't always work and when it does, the performance is actually poorer than with standard emulated veth driver). What we really want here is virtualization, and good performance with it. ![]() We tested veth netmap driver twice last summer it worked with reduced performance, this year it didn't work at all! This is with large 1460 byte sized TCP segments.īut I agree that the veth issue should be investigated more. Just to provide some figures, with large frames emulated veth is way below 10 Gbps, whereas native i40e is over 20 Gbps if using single thread, and native i40e supports multi-threaded operation, meaning 40Gbps link can be saturated (three threads ensures full link saturation). ![]() We are seeing an order of magnitude performance difference between netmap version and kernel version, the kernel version being faster. Netmap, on the other hand, is really fast on physical NICs, but fails to work fast in this veth deployment mode. We assume the 64KB virtual frames are the main reason for the performance of the kernel proxy, but multiqueue operation can also help the kernel SYN proxy. We have developed our own SYN proxy, much better than the kernel SYN proxy, operating entirely in userspace, using netmap. The kernel has a rudimentary SYN proxy in netfilter and it's really fast due to seeing 64KB virtual frames. The use case is using a TCP SYN proxy deployed in a containerized environment with veth. In any case, if people are willing to help me to understand the netmapĬodebase and tell what kind of modifications are required, I am willing toĪssist in the implementation and testing of TCP offload support to netmap. Improved performance, both checksum offloads and segmentation offloads are The TCP checksum offload change didn't improve performance I believe for I didn't try how to run with segmentation offloads yet. but when TCP checksum offloads were turned off, the change caused the I managed to run netmap with TCP checksum offloads with the following The 40Gbps interface with 3 threads and 3 queues, so TCP offloads are not On Intel i40e 40Gbps NICs, we can with a quite old CPU saturate Should probably ideally also work in native mode when special drivers are In any case, if people are willing to help me to understand the netmap codebase and tell what kind of modifications are required, I am willing to assist in the implementation and testing of TCP offload support to netmap. The TCP checksum offload change didn't improve performance I believe for improved performance, both checksum offloads and segmentation offloads are needed. but when TCP checksum offloads were turned off, the change caused the machine to crash. M->data = m->head + ifp->needed_headroom * ifp->needed_headroom bytes between head and data. * which correspond to an empty buffer with exactly I managed to run netmap with TCP checksum offloads with the following change:ĭiff -git a/LINUX/netmap_linux.c b/LINUX/netmap_linux.c On Intel i40e 40Gbps NICs, we can with a quite old CPU saturate the 40Gbps interface with 3 threads and 3 queues, so TCP offloads are not very helpful in this case. It should probably ideally also work in native mode when special drivers are present. It should at least work in the emulated netmap mode that presumably interacts with sk_buffs. I'm not entirely sure how the netmap support for TCP offloads should be implemented. Netmap does not access this ancillary data, and therefore, if you forward packets from between veth1 and veth2 in a iperfClient-veth0-veth1-netmapfwd-veth2-veth3-iperfServer setup, the server does not accept the SYN packet sent by the client (for the setup, network namespaces are required or else traffic takes a shortcut via the loopback interface). However, that does not seem to be enough, as sk_buffs have ancillary data for the status of checksums and segmentation offload (GSO). Echo 65536 > /sys/module/netmap/parameters/buf_sizeĮcho 64 > /sys/module/netmap/parameters/generic_ringsizeĪnd modified my application to not care about TCP checksums.
0 Comments
Leave a Reply. |