1.. SPDX-License-Identifier: GPL-2.0
2
3=================================
4NVMe PCI Endpoint Function Target
5=================================
6
7:Author: Damien Le Moal <[email protected]>
8
9The NVMe PCI endpoint function target driver implements a NVMe PCIe controller
10using a NVMe fabrics target controller configured with the PCI transport type.
11
12Overview
13========
14
15The NVMe PCI endpoint function target driver allows exposing a NVMe target
16controller over a PCIe link, thus implementing an NVMe PCIe device similar to a
17regular M.2 SSD. The target controller is created in the same manner as when
18using NVMe over fabrics: the controller represents the interface to an NVMe
19subsystem using a port. The port transfer type must be configured to be
20"pci". The subsystem can be configured to have namespaces backed by regular
21files or block devices, or can use NVMe passthrough to expose to the PCI host an
22existing physical NVMe device or a NVMe fabrics host controller (e.g. a NVMe TCP
23host controller).
24
25The NVMe PCI endpoint function target driver relies as much as possible on the
26NVMe target core code to parse and execute NVMe commands submitted by the PCIe
27host. However, using the PCI endpoint framework API and DMA API, the driver is
28also responsible for managing all data transfers over the PCIe link. This
29implies that the NVMe PCI endpoint function target driver implements several
30NVMe data structure management and some NVMe command parsing.
31
321) The driver manages retrieval of NVMe commands in submission queues using DMA
33   if supported, or MMIO otherwise. Each command retrieved is then executed
34   using a work item to maximize performance with the parallel execution of
35   multiple commands on different CPUs. The driver uses a work item to
36   constantly poll the doorbell of all submission queues to detect command
37   submissions from the PCIe host.
38
392) The driver transfers completion queues entries of completed commands to the
40   PCIe host using MMIO copy of the entries in the host completion queue.
41   After posting completion entries in a completion queue, the driver uses the
42   PCI endpoint framework API to raise an interrupt to the host to signal the
43   commands completion.
44
453) For any command that has a data buffer, the NVMe PCI endpoint target driver
46   parses the command PRPs or SGLs lists to create a list of PCI address
47   segments representing the mapping of the command data buffer on the host.
48   The command data buffer is transferred over the PCIe link using this list of
49   PCI address segments using DMA, if supported. If DMA is not supported, MMIO
50   is used, which results in poor performance. For write commands, the command
51   data buffer is transferred from the host into a local memory buffer before
52   executing the command using the target core code. For read commands, a local
53   memory buffer is allocated to execute the command and the content of that
54   buffer is transferred to the host once the command completes.
55
56Controller Capabilities
57-----------------------
58
59The NVMe capabilities exposed to the PCIe host through the BAR 0 registers
60are almost identical to the capabilities of the NVMe target controller
61implemented by the target core code. There are some exceptions.
62
631) The NVMe PCI endpoint target driver always sets the controller capability
64   CQR bit to request "Contiguous Queues Required". This is to facilitate the
65   mapping of a queue PCI address range to the local CPU address space.
66
672) The doorbell stride (DSTRB) is always set to be 4B
68
693) Since the PCI endpoint framework does not provide a way to handle PCI level
70   resets, the controller capability NSSR bit (NVM Subsystem Reset Supported)
71   is always cleared.
72
734) The boot partition support (BPS), Persistent Memory Region Supported (PMRS)
74   and Controller Memory Buffer Supported (CMBS) capabilities are never
75   reported.
76
77Supported Features
78------------------
79
80The NVMe PCI endpoint target driver implements support for both PRPs and SGLs.
81The driver also implements IRQ vector coalescing and submission queue
82arbitration burst.
83
84The maximum number of queues and the maximum data transfer size (MDTS) are
85configurable through configfs before starting the controller. To avoid issues
86with excessive local memory usage for executing commands, MDTS defaults to 512
87KB and is limited to a maximum of 2 MB (arbitrary limit).
88
89Mimimum number of PCI Address Mapping Windows Required
90------------------------------------------------------
91
92Most PCI endpoint controllers provide a limited number of mapping windows for
93mapping a PCI address range to local CPU memory addresses. The NVMe PCI
94endpoint target controllers uses mapping windows for the following.
95
961) One memory window for raising MSI or MSI-X interrupts
972) One memory window for MMIO transfers
983) One memory window for each completion queue
99
100Given the highly asynchronous nature of the NVMe PCI endpoint target driver
101operation, the memory windows as described above will generally not be used
102simultaneously, but that may happen. So a safe maximum number of completion
103queues that can be supported is equal to the total number of memory mapping
104windows of the PCI endpoint controller minus two. E.g. for an endpoint PCI
105controller with 32 outbound memory windows available, up to 30 completion
106queues can be safely operated without any risk of getting PCI address mapping
107errors due to the lack of memory windows.
108
109Maximum Number of Queue Pairs
110-----------------------------
111
112Upon binding of the NVMe PCI endpoint target driver to the PCI endpoint
113controller, BAR 0 is allocated with enough space to accommodate the admin queue
114and multiple I/O queues. The maximum of number of I/O queues pairs that can be
115supported is limited by several factors.
116
1171) The NVMe target core code limits the maximum number of I/O queues to the
118   number of online CPUs.
1192) The total number of queue pairs, including the admin queue, cannot exceed
120   the number of MSI-X or MSI vectors available.
1213) The total number of completion queues must not exceed the total number of
122   PCI mapping windows minus 2 (see above).
123
124The NVMe endpoint function driver allows configuring the maximum number of
125queue pairs through configfs.
126
127Limitations and NVMe Specification Non-Compliance
128-------------------------------------------------
129
130Similar to the NVMe target core code, the NVMe PCI endpoint target driver does
131not support multiple submission queues using the same completion queue. All
132submission queues must specify a unique completion queue.
133
134
135User Guide
136==========
137
138This section describes the hardware requirements and how to setup an NVMe PCI
139endpoint target device.
140
141Kernel Requirements
142-------------------
143
144The kernel must be compiled with the configuration options CONFIG_PCI_ENDPOINT,
145CONFIG_PCI_ENDPOINT_CONFIGFS, and CONFIG_NVME_TARGET_PCI_EPF enabled.
146CONFIG_PCI, CONFIG_BLK_DEV_NVME and CONFIG_NVME_TARGET must also be enabled
147(obviously).
148
149In addition to this, at least one PCI endpoint controller driver should be
150available for the endpoint hardware used.
151
152To facilitate testing, enabling the null-blk driver (CONFIG_BLK_DEV_NULL_BLK)
153is also recommended. With this, a simple setup using a null_blk block device
154as a subsystem namespace can be used.
155
156Hardware Requirements
157---------------------
158
159To use the NVMe PCI endpoint target driver, at least one endpoint controller
160device is required.
161
162To find the list of endpoint controller devices in the system::
163
164       # ls /sys/class/pci_epc/
165        a40000000.pcie-ep
166
167If PCI_ENDPOINT_CONFIGFS is enabled::
168
169       # ls /sys/kernel/config/pci_ep/controllers
170        a40000000.pcie-ep
171
172The endpoint board must of course also be connected to a host with a PCI cable
173with RX-TX signal swapped. If the host PCI slot used does not have
174plug-and-play capabilities, the host should be powered off when the NVMe PCI
175endpoint device is configured.
176
177NVMe Endpoint Device
178--------------------
179
180Creating an NVMe endpoint device is a two step process. First, an NVMe target
181subsystem and port must be defined. Second, the NVMe PCI endpoint device must
182be setup and bound to the subsystem and port created.
183
184Creating a NVMe Subsystem and Port
185----------------------------------
186
187Details about how to configure a NVMe target subsystem and port are outside the
188scope of this document. The following only provides a simple example of a port
189and subsystem with a single namespace backed by a null_blk device.
190
191First, make sure that configfs is enabled::
192
193       # mount -t configfs none /sys/kernel/config
194
195Next, create a null_blk device (default settings give a 250 GB device without
196memory backing). The block device created will be /dev/nullb0 by default::
197
198        # modprobe null_blk
199        # ls /dev/nullb0
200        /dev/nullb0
201
202The NVMe PCI endpoint function target driver must be loaded::
203
204        # modprobe nvmet_pci_epf
205        # lsmod | grep nvmet
206        nvmet_pci_epf          32768  0
207        nvmet                 118784  1 nvmet_pci_epf
208        nvme_core             131072  2 nvmet_pci_epf,nvmet
209
210Now, create a subsystem and a port that we will use to create a PCI target
211controller when setting up the NVMe PCI endpoint target device. In this
212example, the port is created with a maximum of 4 I/O queue pairs::
213
214        # cd /sys/kernel/config/nvmet/subsystems
215        # mkdir nvmepf.0.nqn
216        # echo -n "Linux-pci-epf" > nvmepf.0.nqn/attr_model
217        # echo "0x1b96" > nvmepf.0.nqn/attr_vendor_id
218        # echo "0x1b96" > nvmepf.0.nqn/attr_subsys_vendor_id
219        # echo 1 > nvmepf.0.nqn/attr_allow_any_host
220        # echo 4 > nvmepf.0.nqn/attr_qid_max
221
222Next, create and enable the subsystem namespace using the null_blk block
223device::
224
225        # mkdir nvmepf.0.nqn/namespaces/1
226        # echo -n "/dev/nullb0" > nvmepf.0.nqn/namespaces/1/device_path
227        # echo 1 > "nvmepf.0.nqn/namespaces/1/enable"
228
229Finally, create the target port and link it to the subsystem::
230
231        # cd /sys/kernel/config/nvmet/ports
232        # mkdir 1
233        # echo -n "pci" > 1/addr_trtype
234        # ln -s /sys/kernel/config/nvmet/subsystems/nvmepf.0.nqn \
235                /sys/kernel/config/nvmet/ports/1/subsystems/nvmepf.0.nqn
236
237Creating a NVMe PCI Endpoint Device
238-----------------------------------
239
240With the NVMe target subsystem and port ready for use, the NVMe PCI endpoint
241device can now be created and enabled. The NVMe PCI endpoint target driver
242should already be loaded (that is done automatically when the port is created)::
243
244        # ls /sys/kernel/config/pci_ep/functions
245        nvmet_pci_epf
246
247Next, create function 0::
248
249        # cd /sys/kernel/config/pci_ep/functions/nvmet_pci_epf
250        # mkdir nvmepf.0
251        # ls nvmepf.0/
252        baseclass_code    msix_interrupts   secondary
253        cache_line_size   nvme              subclass_code
254        deviceid          primary           subsys_id
255        interrupt_pin     progif_code       subsys_vendor_id
256        msi_interrupts    revid             vendorid
257
258Configure the function using any device ID (the vendor ID for the device will
259be automatically set to the same value as the NVMe target subsystem vendor
260ID)::
261
262        # cd /sys/kernel/config/pci_ep/functions/nvmet_pci_epf
263        # echo 0xBEEF > nvmepf.0/deviceid
264        # echo 32 > nvmepf.0/msix_interrupts
265
266If the PCI endpoint controller used does not support MSI-X, MSI can be
267configured instead::
268
269        # echo 32 > nvmepf.0/msi_interrupts
270
271Next, let's bind our endpoint device with the target subsystem and port that we
272created::
273
274        # echo 1 > nvmepf.0/nvme/portid
275        # echo "nvmepf.0.nqn" > nvmepf.0/nvme/subsysnqn
276
277The endpoint function can then be bound to the endpoint controller and the
278controller started::
279
280        # cd /sys/kernel/config/pci_ep
281        # ln -s functions/nvmet_pci_epf/nvmepf.0 controllers/a40000000.pcie-ep/
282        # echo 1 > controllers/a40000000.pcie-ep/start
283
284On the endpoint machine, kernel messages will show information as the NVMe
285target device and endpoint device are created and connected.
286
287.. code-block:: text
288
289        null_blk: disk nullb0 created
290        null_blk: module loaded
291        nvmet: adding nsid 1 to subsystem nvmepf.0.nqn
292        nvmet_pci_epf nvmet_pci_epf.0: PCI endpoint controller supports MSI-X, 32 vectors
293        nvmet: Created nvm controller 1 for subsystem nvmepf.0.nqn for NQN nqn.2014-08.org.nvmexpress:uuid:2ab90791-2246-4fbb-961d-4c3d5a5a0176.
294        nvmet_pci_epf nvmet_pci_epf.0: New PCI ctrl "nvmepf.0.nqn", 4 I/O queues, mdts 524288 B
295
296PCI Root-Complex Host
297---------------------
298
299Booting the PCI host will result in the initialization of the PCIe link (this
300may be signaled by the PCI endpoint driver with a kernel message). A kernel
301message on the endpoint will also signal when the host NVMe driver enables the
302device controller::
303
304        nvmet_pci_epf nvmet_pci_epf.0: Enabling controller
305
306On the host side, the NVMe PCI endpoint function target device will is
307discoverable as a PCI device, with the vendor ID and device ID as configured::
308
309        # lspci -n
310        0000:01:00.0 0108: 1b96:beef
311
312An this device will be recognized as an NVMe device with a single namespace::
313
314        # lsblk
315        NAME        MAJ:MIN RM   SIZE RO TYPE MOUNTPOINTS
316        nvme0n1     259:0    0   250G  0 disk
317
318The NVMe endpoint block device can then be used as any other regular NVMe
319namespace block device. The *nvme* command line utility can be used to get more
320detailed information about the endpoint device::
321
322        # nvme id-ctrl /dev/nvme0
323        NVME Identify Controller:
324        vid       : 0x1b96
325        ssvid     : 0x1b96
326        sn        : 94993c85650ef7bcd625
327        mn        : Linux-pci-epf
328        fr        : 6.13.0-r
329        rab       : 6
330        ieee      : 000000
331        cmic      : 0xb
332        mdts      : 7
333        cntlid    : 0x1
334        ver       : 0x20100
335        ...
336
337
338Endpoint Bindings
339=================
340
341The NVMe PCI endpoint target driver uses the PCI endpoint configfs device
342attributes as follows.
343
344================   ===========================================================
345vendorid           Ignored (the vendor id of the NVMe target subsystem is used)
346deviceid           Anything is OK (e.g. PCI_ANY_ID)
347revid              Do not care
348progif_code        Must be 0x02 (NVM Express)
349baseclass_code     Must be 0x01 (PCI_BASE_CLASS_STORAGE)
350subclass_code      Must be 0x08 (Non-Volatile Memory controller)
351cache_line_size    Do not care
352subsys_vendor_id   Ignored (the subsystem vendor id of the NVMe target subsystem
353		   is used)
354subsys_id          Anything is OK (e.g. PCI_ANY_ID)
355msi_interrupts     At least equal to the number of queue pairs desired
356msix_interrupts    At least equal to the number of queue pairs desired
357interrupt_pin      Interrupt PIN to use if MSI and MSI-X are not supported
358================   ===========================================================
359
360The NVMe PCI endpoint target function also has some specific configurable
361fields defined in the *nvme* subdirectory of the function directory. These
362fields are as follows.
363
364================   ===========================================================
365mdts_kb            Maximum data transfer size in KiB (default: 512)
366portid             The ID of the target port to use
367subsysnqn          The NQN of the target subsystem to use
368================   ===========================================================
369