Sign Up
Log In
Log In
or
Sign Up
Places
All Projects
Status Monitor
Collapse sidebar
SUSE:SLE-15-SP4:Update
dpdk.32560
0001-kni-allow-configuring-thread-granularity.p...
Overview
Repositories
Revisions
Requests
Users
Attributes
Meta
File 0001-kni-allow-configuring-thread-granularity.patch of Package dpdk.32560
From 5569dd7d90b8bfb08facd2125ff55fefd8e61626 Mon Sep 17 00:00:00 2001 From: Tudor Cornea <tudor.cornea@gmail.com> Date: Thu, 20 Jan 2022 14:41:34 +0200 Subject: [PATCH] kni: allow configuring thread granularity The Kni kthreads seem to be re-scheduled at a granularity of roughly 1 millisecond right now, which seems to be insufficient for performing tests involving a lot of control plane traffic. Even if KNI_KTHREAD_RESCHEDULE_INTERVAL is set to 5 microseconds, it seems that the existing code cannot reschedule at the desired granularily, due to precision constraints of schedule_timeout_interruptible(). In our use case, we leverage the Linux Kernel for control plane, and it is not uncommon to have 60K - 100K pps for some signaling protocols. Since we are not in atomic context, the usleep_range() function seems to be more appropriate for being able to introduce smaller controlled delays, in the range of 5-10 microseconds. Upon reading the existing code, it would seem that this was the original intent. Adding sub-millisecond delays, seems unfeasible with a call to schedule_timeout_interruptible(). KNI_KTHREAD_RESCHEDULE_INTERVAL 5 /* us */ schedule_timeout_interruptible( usecs_to_jiffies(KNI_KTHREAD_RESCHEDULE_INTERVAL)); Below, we attempted a brief comparison between the existing implementation, which uses schedule_timeout_interruptible() and usleep_range(). We attempt to measure the CPU usage, and RTT between two Kni interfaces, which are created on top of vmxnet3 adapters, connected by a vSwitch. insmod rte_kni.ko kthread_mode=single carrier=on schedule_timeout_interruptible(usecs_to_jiffies(5)) kni_single CPU Usage: 2-4 % [root@localhost ~]# ping 1.1.1.2 -I eth1 PING 1.1.1.2 (1.1.1.2) from 1.1.1.1 eth1: 56(84) bytes of data. 64 bytes from 1.1.1.2: icmp_seq=1 ttl=64 time=2.70 ms 64 bytes from 1.1.1.2: icmp_seq=2 ttl=64 time=1.00 ms 64 bytes from 1.1.1.2: icmp_seq=3 ttl=64 time=1.99 ms 64 bytes from 1.1.1.2: icmp_seq=4 ttl=64 time=0.985 ms 64 bytes from 1.1.1.2: icmp_seq=5 ttl=64 time=1.00 ms usleep_range(5, 10) kni_single CPU usage: 50% 64 bytes from 1.1.1.2: icmp_seq=1 ttl=64 time=0.338 ms 64 bytes from 1.1.1.2: icmp_seq=2 ttl=64 time=0.150 ms 64 bytes from 1.1.1.2: icmp_seq=3 ttl=64 time=0.123 ms 64 bytes from 1.1.1.2: icmp_seq=4 ttl=64 time=0.139 ms 64 bytes from 1.1.1.2: icmp_seq=5 ttl=64 time=0.159 ms usleep_range(20, 50) kni_single CPU usage: 24% 64 bytes from 1.1.1.2: icmp_seq=1 ttl=64 time=0.202 ms 64 bytes from 1.1.1.2: icmp_seq=2 ttl=64 time=0.170 ms 64 bytes from 1.1.1.2: icmp_seq=3 ttl=64 time=0.171 ms 64 bytes from 1.1.1.2: icmp_seq=4 ttl=64 time=0.248 ms 64 bytes from 1.1.1.2: icmp_seq=5 ttl=64 time=0.185 ms usleep_range(50, 100) kni_single CPU usage: 13% 64 bytes from 1.1.1.2: icmp_seq=1 ttl=64 time=0.537 ms 64 bytes from 1.1.1.2: icmp_seq=2 ttl=64 time=0.257 ms 64 bytes from 1.1.1.2: icmp_seq=3 ttl=64 time=0.231 ms 64 bytes from 1.1.1.2: icmp_seq=4 ttl=64 time=0.143 ms 64 bytes from 1.1.1.2: icmp_seq=5 ttl=64 time=0.200 ms usleep_range(100, 200) kni_single CPU usage: 7% 64 bytes from 1.1.1.2: icmp_seq=1 ttl=64 time=0.716 ms 64 bytes from 1.1.1.2: icmp_seq=2 ttl=64 time=0.167 ms 64 bytes from 1.1.1.2: icmp_seq=3 ttl=64 time=0.459 ms 64 bytes from 1.1.1.2: icmp_seq=4 ttl=64 time=0.455 ms 64 bytes from 1.1.1.2: icmp_seq=5 ttl=64 time=0.252 ms usleep_range(1000, 1100) kni_single CPU usage: 2% 64 bytes from 1.1.1.2: icmp_seq=1 ttl=64 time=2.22 ms 64 bytes from 1.1.1.2: icmp_seq=2 ttl=64 time=1.17 ms 64 bytes from 1.1.1.2: icmp_seq=3 ttl=64 time=1.17 ms 64 bytes from 1.1.1.2: icmp_seq=4 ttl=64 time=1.17 ms 64 bytes from 1.1.1.2: icmp_seq=5 ttl=64 time=1.15 ms Upon testing, usleep_range(1000, 1100) seems roughly equivalent in latency and cpu usage to the variant with schedule_timeout_interruptible(), while usleep_range(100, 200) seems to give a decent tradeoff between latency and cpu usage, while allowing users to tweak the limits for improved precision if they have such use cases. Disabling RTE_KNI_PREEMPT_DEFAULT, interestingly seems to lead to a softlockup on my kernel. Kernel panic - not syncing: softlockup: hung tasks CPU: 0 PID: 1226 Comm: kni_single Tainted: G W O 3.10 #1 <IRQ> [<ffffffff814f84de>] dump_stack+0x19/0x1b [<ffffffff814f7891>] panic+0xcd/0x1e0 [<ffffffff810993b0>] watchdog_timer_fn+0x160/0x160 [<ffffffff810644b2>] __run_hrtimer.isra.4+0x42/0xd0 [<ffffffff81064b57>] hrtimer_interrupt+0xe7/0x1f0 [<ffffffff8102cd57>] smp_apic_timer_interrupt+0x67/0xa0 [<ffffffff8150321d>] apic_timer_interrupt+0x6d/0x80 This patch also attempts to remove this option. References: [1] https://www.kernel.org/doc/Documentation/timers/timers-howto.txt Signed-off-by: Tudor Cornea <tudor.cornea@gmail.com> Acked-by: Padraig Connolly <Padraig.J.Connolly@intel.com> Reviewed-by: Ferruh Yigit <ferruh.yigit@intel.com> Acked-by: Thomas Abraham <tabraham@suse.com> [tabraham@suse.com: backport - adjust defaults to avoid changing performance for current users] --- config/rte_config.h | 3 -- .../prog_guide/kernel_nic_interface.rst | 33 +++++++++++++++++++ kernel/linux/kni/kni_dev.h | 2 +- kernel/linux/kni/kni_misc.c | 32 +++++++++++++----- 4 files changed, 58 insertions(+), 12 deletions(-) Index: dpdk-stable-19.11.10/config/rte_config.h =================================================================== --- dpdk-stable-19.11.10.orig/config/rte_config.h +++ dpdk-stable-19.11.10/config/rte_config.h @@ -98,9 +98,6 @@ #define RTE_LIBRTE_PMD_CRYPTO_SCHEDULER 1 #endif -/* KNI defines */ -#define RTE_KNI_PREEMPT_DEFAULT 1 - /****** driver defines ********/ /* Packet prefetching in PMDs */ Index: dpdk-stable-19.11.10/doc/guides/prog_guide/kernel_nic_interface.rst =================================================================== --- dpdk-stable-19.11.10.orig/doc/guides/prog_guide/kernel_nic_interface.rst +++ dpdk-stable-19.11.10/doc/guides/prog_guide/kernel_nic_interface.rst @@ -56,6 +56,10 @@ can be specified when the module is load off Interfaces will be created with carrier state set to off. on Interfaces will be created with carrier state set to on. (charp) + parm: min_scheduling_interval: KNI thread min scheduling interval (default=1000 microseconds) + (long) + parm: max_scheduling_interval: KNI thread max scheduling interval (default=1100 microseconds) + (long) Loading the ``rte_kni`` kernel module without any optional parameters is the typical way a DPDK application gets packets into and out of the kernel @@ -174,6 +178,35 @@ To set the default carrier state to *off If the ``carrier`` parameter is not specified, the default carrier state of KNI interfaces will be set to *off*. +KNI Kthread Scheduling +~~~~~~~~~~~~~~~~~~~~~~ + +The ``min_scheduling_interval`` and ``max_scheduling_interval`` parameters +control the rescheduling interval of the KNI kthreads. + +This might be useful if we have use cases in which we require improved +latency or performance for control plane traffic. + +The implementation is backed by Linux High Precision Timers, and uses ``usleep_range``. +Hence, it will have the same granularity constraints as this Linux subsystem. + +For Linux High Precision Timers, you can check the following resource: `Kernel Timers <http://www.kernel.org/doc/Documentation/timers/timers-howto.txt>`_ + +To set the ``min_scheduling_interval`` to a value of 100 microseconds: + +.. code-block:: console + + # insmod <build_dir>/kernel/linux/kni/rte_kni.ko min_scheduling_interval=100 + +To set the ``max_scheduling_interval`` to a value of 200 microseconds: + +.. code-block:: console + + # insmod <build_dir>/kernel/linux/kni/rte_kni.ko max_scheduling_interval=200 + +If the ``min_scheduling_interval`` and ``max_scheduling_interval`` parameters are +not specified, the default interval limits will be set to *1000* and *1100* respectively. + KNI Creation and Deletion ------------------------- Index: dpdk-stable-19.11.10/kernel/linux/kni/kni_dev.h =================================================================== --- dpdk-stable-19.11.10.orig/kernel/linux/kni/kni_dev.h +++ dpdk-stable-19.11.10/kernel/linux/kni/kni_dev.h @@ -27,7 +27,7 @@ #include <linux/list.h> #include <rte_kni_common.h> -#define KNI_KTHREAD_RESCHEDULE_INTERVAL 5 /* us */ +#define KNI_KTHREAD_MAX_RESCHEDULE_INTERVAL 1000000 /* us */ #define MBUF_BURST_SZ 32 Index: dpdk-stable-19.11.10/kernel/linux/kni/kni_misc.c =================================================================== --- dpdk-stable-19.11.10.orig/kernel/linux/kni/kni_misc.c +++ dpdk-stable-19.11.10/kernel/linux/kni/kni_misc.c @@ -41,6 +41,10 @@ static uint32_t multiple_kthread_on; static char *carrier; uint32_t kni_dflt_carrier; +/* KNI thread scheduling interval */ +static long min_scheduling_interval = 1000; /* us */ +static long max_scheduling_interval = 1100; /* us */ + #define KNI_DEV_IN_USE_BIT_NUM 0 /* Bit number for device in use */ static int kni_net_id; @@ -128,11 +132,8 @@ kni_thread_single(void *data) } } up_read(&knet->kni_list_lock); -#ifdef RTE_KNI_PREEMPT_DEFAULT /* reschedule out for a while */ - schedule_timeout_interruptible( - usecs_to_jiffies(KNI_KTHREAD_RESCHEDULE_INTERVAL)); -#endif + usleep_range(min_scheduling_interval, max_scheduling_interval); } return 0; @@ -149,10 +150,7 @@ kni_thread_multiple(void *param) kni_net_rx(dev); kni_net_poll_resp(dev); } -#ifdef RTE_KNI_PREEMPT_DEFAULT - schedule_timeout_interruptible( - usecs_to_jiffies(KNI_KTHREAD_RESCHEDULE_INTERVAL)); -#endif + usleep_range(min_scheduling_interval, max_scheduling_interval); } return 0; @@ -593,6 +591,14 @@ kni_init(void) else pr_debug("Default carrier state set to on.\n"); + if (min_scheduling_interval < 0 || max_scheduling_interval < 0 || + min_scheduling_interval > KNI_KTHREAD_MAX_RESCHEDULE_INTERVAL || + max_scheduling_interval > KNI_KTHREAD_MAX_RESCHEDULE_INTERVAL || + min_scheduling_interval >= max_scheduling_interval) { + pr_err("Invalid parameters for scheduling interval\n"); + return -EINVAL; + } + #ifdef HAVE_SIMPLIFIED_PERNET_OPERATIONS rc = register_pernet_subsys(&kni_net_ops); #else @@ -659,3 +665,13 @@ MODULE_PARM_DESC(carrier, "\t\ton Interfaces will be created with carrier state set to on.\n" "\t\t" ); + +module_param(min_scheduling_interval, long, 0644); +MODULE_PARM_DESC(min_scheduling_interval, +"KNI thread min scheduling interval (default=1000 microseconds)" +); + +module_param(max_scheduling_interval, long, 0644); +MODULE_PARM_DESC(max_scheduling_interval, +"KNI thread max scheduling interval (default=1100 microseconds)" +);
Locations
Projects
Search
Status Monitor
Help
OpenBuildService.org
Documentation
API Documentation
Code of Conduct
Contact
Support
@OBShq
Terms
openSUSE Build Service is sponsored by
The Open Build Service is an
openSUSE project
.
Sign Up
Log In
Places
Places
All Projects
Status Monitor