napi.rst - OpenGrok cross reference for /linux-6.14.4/Documentation/networking/napi.rst

Lines Matching +full:rx +full:- +full:queues +full:- +full:to +full:- +full:use
1 .. SPDX-License-Identifier: (GPL-2.0-only OR BSD-2-Clause)
14 The host then schedules a NAPI instance to process the events.
19 but there is an option to use :ref:`separate kernel threads<threaded>`
23 of event (packet Rx and Tx) processing.
30 of the NAPI instance while the method is the driver-specific event
37 -----------
40 from the system. The instances are attached to the netdevice passed
46 to not be invoked. napi_disable() waits for ownership of the NAPI
47 instance to be released.
50 concurrent use of datapath APIs but an incorrect sequence of control API
55 ------------
59 (see :ref:`drv_sched` for more info). A successful call to napi_schedule()
63 called to process the events/packets. The method takes a ``budget``
64 argument - drivers can process completions for any number of Tx
65 packets but should only process up to ``budget`` number of
66 Rx packets. Rx processing is usually much more expensive.
68 In other words for Rx processing the ``budget`` argument limits how many
69 packets driver can process in a single poll. Rx specific APIs like page
76    The ``budget`` argument may be 0 if core tries to only process
77    skb Tx completions and no Rx or XDP packets.
80 has outstanding work to do (e.g. ``budget`` was exhausted)
83 need to be scheduled).
93    must be handled carefully. There is no way to report this
94    (rare) condition to the stack, so the driver must either
95    not call napi_complete_done() and wait to be called again,
96    or return ``budget - 1``.
101 -------------
109 As mentioned in the :ref:`drv_ctrl` section - napi_disable() and subsequent
110 calls to the poll method only wait for the ownership of the instance
111 to be released, not for the poll method to exit. This means that
118 --------------------------
121 the NAPI instance - until NAPI polling finishes any further
124 Drivers which have to mask the interrupts explicitly (as opposed
125 to IRQ being auto-masked by the device) should use the napi_schedule_prep()
128 .. code-block:: c
130   if (napi_schedule_prep(&v->napi)) {
131       mydrv_mask_rxtx_irq(v->idx);
132       /* schedule after masking to avoid races */
133       __napi_schedule(&v->napi);
136 IRQ should only be unmasked after a successful call to napi_complete_done():
138 .. code-block:: c
140   if (budget && napi_complete_done(&v->napi, work_done)) {
141     mydrv_unmask_rxtx_irq(v->idx);
142     return min(work_done, budget - 1);
146 of guarantees given by being invoked in IRQ context (no need to
147 mask interrupts). napi_schedule_irqoff() will fall back to napi_schedule() if
150 Instance to queue mapping
151 -------------------------
155 mapped to queues and interrupts. NAPI is primarily a polling/processing
156 abstraction without specific user-facing semantics. That said, most networking
159 NAPI instances most often correspond 1:1:1 to interrupts and queue pairs
160 (queue pair is a set of a single Rx and single Tx queue).
162 In less common cases a NAPI instance may be used for multiple queues
163 or Rx and Tx queues can be serviced by separate NAPI instances on a single
168 each channel can be either ``rx``, ``tx`` or ``combined``. It's not clear
169 what constitutes a channel; the recommended interpretation is to understand
170 a channel as an IRQ/NAPI which services queues of a given type. For example,
171 a configuration of 1 ``rx``, 1 ``tx`` and 1 ``combined`` channel is expected
172 to utilize 3 interrupts, 2 Rx and 2 Tx queues.
178 are only visible to the user thru the ``SO_INCOMING_NAPI_ID`` socket option.
179 It's not currently possible to query IDs used by a given device.
182 -----------------------
185 In most scenarios batching happens due to IRQ coalescing which is done
188 NAPI can be configured to arm a repoll timer instead of unmasking
191 is reused to control the delay of the timer, while
193 before NAPI gives up and goes back to using hardware IRQs.
195 The above parameters can also be set on a per-NAPI basis using netlink via
196 netdev-genl. When used with netlink and configured on a per-NAPI basis, the
197 parameters mentioned above use hyphens instead of underscores:
198 ``gro-flush-timeout`` and ``napi-defer-hard-irqs``.
200 Per-NAPI configuration can be done programmatically in a user application
206 .. code-block:: bash
208   $ kernel-source/tools/net/ynl/pyynl/cli.py \
209             --spec Documentation/netlink/specs/netdev.yaml \
210             --do napi-set \
211             --json='{"id": 345,
212                      "defer-hard-irqs": 111,
213                      "gro-flush-timeout": 11111}'
215 Similarly, the parameter ``irq-suspend-timeout`` can be set using netlink
216 via netdev-genl. There is no global sysfs parameter for this value.
218 ``irq-suspend-timeout`` is used to determine how long an application can
220 which can be set on a per-epoll context basis with ``EPIOCSPARAMS`` ioctl.
225 ------------
227 Busy polling allows a user process to check for incoming packets before
237 epoll-based busy polling
238 ------------------------
240 It is possible to trigger packet processing directly from calls to
241 ``epoll_wait``. In order to use this feature, a user application must ensure
242 all file descriptors which are added to an epoll context have the same NAPI ID.
246 distribute that file descriptor to a worker thread. The worker thread would add
247 the file descriptor to its epoll context. This would ensure each worker thread
251 be inserted to distribute incoming connections to threads such that each thread
252 is only given incoming connections with the same NAPI ID. Care must be taken to
255 In order to enable busy polling, there are two choices:
257 1. ``/proc/sys/net/core/busy_poll`` can be set with a time in useconds to busy
258    loop waiting for events. This is a system-wide setting and will cause all
259    epoll-based applications to busy poll when they call epoll_wait. This may
260    not be desirable as many applications may not have the need to busy poll.
263    file descriptor to set (``EPIOCSPARAMS``) or get (``EPIOCGPARAMS``) ``struct
266 .. code-block:: c
273       /* pad the struct to a multiple of 64bits */
278 ---------------
280 While busy polling is supposed to be used by low latency applications,
283 Very high request-per-second applications (especially routing/forwarding
285 want to be interrupted until they finish processing a request or a batch
288 Such applications can pledge to the kernel that they will perform a busy
291 socket option. To avoid system misbehavior the pledge is revoked
292 if ``gro_flush_timeout`` passes without any busy poll call. For epoll-based
294 epoll_params`` can be set to 1 and the ``EPIOCSPARAMS`` ioctl can be issued to
300 with the ``SO_BUSY_POLL_BUDGET`` socket option. For epoll-based busy polling
301 applications, the ``busy_poll_budget`` field can be adjusted to the desired value
305 It is important to note that choosing a large value for ``gro_flush_timeout``
306 will defer IRQs to allow for better batch processing, but will induce latency
309 attempting to busy poll by device IRQs and softirq processing. This value
310 should be chosen carefully with these tradeoffs in mind. epoll-based busy
311 polling applications may be able to mitigate how much user processing happens
314 Users may want to consider an alternate approach, IRQ suspension, to help deal
318 --------------
323 While application calls to epoll_wait successfully retrieve events, the kernel will
329 This allows users to balance CPU consumption with network processing
332 To use this mechanism:
334   1. The per-NAPI config parameter ``irq-suspend-timeout`` should be set to the
337      serves as a safety mechanism to restart IRQ driver interrupt processing if
339      the amount of time the user application needs to process data from its
340      call to epoll_wait, noting that applications can control how much data
343   2. The sysfs parameter or per-NAPI config parameters ``gro_flush_timeout``
344      and ``napi_defer_hard_irqs`` can be set to low values. They will be used
345      to defer IRQs after busy poll has found no data.
347   3. The ``prefer_busy_poll`` flag must be set to true. This can be done using
350   4. The application uses epoll as described above to trigger NAPI packet
353 As mentioned above, as long as subsequent calls to epoll_wait return events to
354 userland, the ``irq-suspend-timeout`` is deferred and IRQs are disabled. This
355 allows the application to process data without interference.
357 Once a call to epoll_wait results in no events being found, IRQ suspension is
361 It is expected that ``irq-suspend-timeout`` will be set to a value much larger
362 than ``gro_flush_timeout`` as ``irq-suspend-timeout`` should suspend IRQs for
365 While it is not strictly necessary to use ``napi_defer_hard_irqs`` and
366 ``gro_flush_timeout`` to use IRQ suspension, their use is strongly
369 IRQ suspension causes the system to alternate between polling mode and
370 irq-driven packet delivery. During busy periods, ``irq-suspend-timeout``
378 1) hardirq -> softirq -> napi poll; basic interrupt delivery
379 2) timer -> softirq -> napi poll; deferred irq processing
380 3) epoll -> busy-poll -> napi poll; busy looping
388 During busy periods, ``irq-suspend-timeout`` is used as timer in Loop 2,
395 the recommended usage, because otherwise setting ``irq-suspend-timeout``
401 -------------
407 thread (called ``napi/${ifc-name}-${napi-id}``).
409 It is recommended to pin each kernel thread to a single CPU, the same
415 Threaded NAPI is controlled by writing 0/1 to the ``threaded`` file in
420 .. [#] NAPI was originally referred to as New API in 2.4 Linux.