1====================================
2Coherent Accelerator Interface (CXL)
3====================================
4
5Introduction
6============
7
8    The coherent accelerator interface is designed to allow the
9    coherent connection of accelerators (FPGAs and other devices) to a
10    POWER system. These devices need to adhere to the Coherent
11    Accelerator Interface Architecture (CAIA).
12
13    IBM refers to this as the Coherent Accelerator Processor Interface
14    or CAPI. In the kernel it's referred to by the name CXL to avoid
15    confusion with the ISDN CAPI subsystem.
16
17    Coherent in this context means that the accelerator and CPUs can
18    both access system memory directly and with the same effective
19    addresses.
20
21    **This driver is deprecated and will be removed in a future release.**
22
23Hardware overview
24=================
25
26    ::
27
28         POWER8/9             FPGA
29       +----------+        +---------+
30       |          |        |         |
31       |   CPU    |        |   AFU   |
32       |          |        |         |
33       |          |        |         |
34       |          |        |         |
35       +----------+        +---------+
36       |   PHB    |        |         |
37       |   +------+        |   PSL   |
38       |   | CAPP |<------>|         |
39       +---+------+  PCIE  +---------+
40
41    The POWER8/9 chip has a Coherently Attached Processor Proxy (CAPP)
42    unit which is part of the PCIe Host Bridge (PHB). This is managed
43    by Linux by calls into OPAL. Linux doesn't directly program the
44    CAPP.
45
46    The FPGA (or coherently attached device) consists of two parts.
47    The POWER Service Layer (PSL) and the Accelerator Function Unit
48    (AFU). The AFU is used to implement specific functionality behind
49    the PSL. The PSL, among other things, provides memory address
50    translation services to allow each AFU direct access to userspace
51    memory.
52
53    The AFU is the core part of the accelerator (eg. the compression,
54    crypto etc function). The kernel has no knowledge of the function
55    of the AFU. Only userspace interacts directly with the AFU.
56
57    The PSL provides the translation and interrupt services that the
58    AFU needs. This is what the kernel interacts with. For example, if
59    the AFU needs to read a particular effective address, it sends
60    that address to the PSL, the PSL then translates it, fetches the
61    data from memory and returns it to the AFU. If the PSL has a
62    translation miss, it interrupts the kernel and the kernel services
63    the fault. The context to which this fault is serviced is based on
64    who owns that acceleration function.
65
66    - POWER8 and PSL Version 8 are compliant to the CAIA Version 1.0.
67    - POWER9 and PSL Version 9 are compliant to the CAIA Version 2.0.
68
69    This PSL Version 9 provides new features such as:
70
71    * Interaction with the nest MMU on the P9 chip.
72    * Native DMA support.
73    * Supports sending ASB_Notify messages for host thread wakeup.
74    * Supports Atomic operations.
75    * etc.
76
77    Cards with a PSL9 won't work on a POWER8 system and cards with a
78    PSL8 won't work on a POWER9 system.
79
80AFU Modes
81=========
82
83    There are two programming modes supported by the AFU. Dedicated
84    and AFU directed. AFU may support one or both modes.
85
86    When using dedicated mode only one MMU context is supported. In
87    this mode, only one userspace process can use the accelerator at
88    time.
89
90    When using AFU directed mode, up to 16K simultaneous contexts can
91    be supported. This means up to 16K simultaneous userspace
92    applications may use the accelerator (although specific AFUs may
93    support fewer). In this mode, the AFU sends a 16 bit context ID
94    with each of its requests. This tells the PSL which context is
95    associated with each operation. If the PSL can't translate an
96    operation, the ID can also be accessed by the kernel so it can
97    determine the userspace context associated with an operation.
98
99
100MMIO space
101==========
102
103    A portion of the accelerator MMIO space can be directly mapped
104    from the AFU to userspace. Either the whole space can be mapped or
105    just a per context portion. The hardware is self describing, hence
106    the kernel can determine the offset and size of the per context
107    portion.
108
109
110Interrupts
111==========
112
113    AFUs may generate interrupts that are destined for userspace. These
114    are received by the kernel as hardware interrupts and passed onto
115    userspace by a read syscall documented below.
116
117    Data storage faults and error interrupts are handled by the kernel
118    driver.
119
120
121Work Element Descriptor (WED)
122=============================
123
124    The WED is a 64-bit parameter passed to the AFU when a context is
125    started. Its format is up to the AFU hence the kernel has no
126    knowledge of what it represents. Typically it will be the
127    effective address of a work queue or status block where the AFU
128    and userspace can share control and status information.
129
130
131
132
133User API
134========
135
1361. AFU character devices
137^^^^^^^^^^^^^^^^^^^^^^^^
138
139    For AFUs operating in AFU directed mode, two character device
140    files will be created. /dev/cxl/afu0.0m will correspond to a
141    master context and /dev/cxl/afu0.0s will correspond to a slave
142    context. Master contexts have access to the full MMIO space an
143    AFU provides. Slave contexts have access to only the per process
144    MMIO space an AFU provides.
145
146    For AFUs operating in dedicated process mode, the driver will
147    only create a single character device per AFU called
148    /dev/cxl/afu0.0d. This will have access to the entire MMIO space
149    that the AFU provides (like master contexts in AFU directed).
150
151    The types described below are defined in include/uapi/misc/cxl.h
152
153    The following file operations are supported on both slave and
154    master devices.
155
156    A userspace library libcxl is available here:
157
158	https://github.com/ibm-capi/libcxl
159
160    This provides a C interface to this kernel API.
161
162open
163----
164
165    Opens the device and allocates a file descriptor to be used with
166    the rest of the API.
167
168    A dedicated mode AFU only has one context and only allows the
169    device to be opened once.
170
171    An AFU directed mode AFU can have many contexts, the device can be
172    opened once for each context that is available.
173
174    When all available contexts are allocated the open call will fail
175    and return -ENOSPC.
176
177    Note:
178	  IRQs need to be allocated for each context, which may limit
179          the number of contexts that can be created, and therefore
180          how many times the device can be opened. The POWER8 CAPP
181          supports 2040 IRQs and 3 are used by the kernel, so 2037 are
182          left. If 1 IRQ is needed per context, then only 2037
183          contexts can be allocated. If 4 IRQs are needed per context,
184          then only 2037/4 = 509 contexts can be allocated.
185
186
187ioctl
188-----
189
190    CXL_IOCTL_START_WORK:
191        Starts the AFU context and associates it with the current
192        process. Once this ioctl is successfully executed, all memory
193        mapped into this process is accessible to this AFU context
194        using the same effective addresses. No additional calls are
195        required to map/unmap memory. The AFU memory context will be
196        updated as userspace allocates and frees memory. This ioctl
197        returns once the AFU context is started.
198
199        Takes a pointer to a struct cxl_ioctl_start_work
200
201            ::
202
203                struct cxl_ioctl_start_work {
204                        __u64 flags;
205                        __u64 work_element_descriptor;
206                        __u64 amr;
207                        __s16 num_interrupts;
208                        __s16 reserved1;
209                        __s32 reserved2;
210                        __u64 reserved3;
211                        __u64 reserved4;
212                        __u64 reserved5;
213                        __u64 reserved6;
214                };
215
216            flags:
217                Indicates which optional fields in the structure are
218                valid.
219
220            work_element_descriptor:
221                The Work Element Descriptor (WED) is a 64-bit argument
222                defined by the AFU. Typically this is an effective
223                address pointing to an AFU specific structure
224                describing what work to perform.
225
226            amr:
227                Authority Mask Register (AMR), same as the powerpc
228                AMR. This field is only used by the kernel when the
229                corresponding CXL_START_WORK_AMR value is specified in
230                flags. If not specified the kernel will use a default
231                value of 0.
232
233            num_interrupts:
234                Number of userspace interrupts to request. This field
235                is only used by the kernel when the corresponding
236                CXL_START_WORK_NUM_IRQS value is specified in flags.
237                If not specified the minimum number required by the
238                AFU will be allocated. The min and max number can be
239                obtained from sysfs.
240
241            reserved fields:
242                For ABI padding and future extensions
243
244    CXL_IOCTL_GET_PROCESS_ELEMENT:
245        Get the current context id, also known as the process element.
246        The value is returned from the kernel as a __u32.
247
248
249mmap
250----
251
252    An AFU may have an MMIO space to facilitate communication with the
253    AFU. If it does, the MMIO space can be accessed via mmap. The size
254    and contents of this area are specific to the particular AFU. The
255    size can be discovered via sysfs.
256
257    In AFU directed mode, master contexts are allowed to map all of
258    the MMIO space and slave contexts are allowed to only map the per
259    process MMIO space associated with the context. In dedicated
260    process mode the entire MMIO space can always be mapped.
261
262    This mmap call must be done after the START_WORK ioctl.
263
264    Care should be taken when accessing MMIO space. Only 32 and 64-bit
265    accesses are supported by POWER8. Also, the AFU will be designed
266    with a specific endianness, so all MMIO accesses should consider
267    endianness (recommend endian(3) variants like: le64toh(),
268    be64toh() etc). These endian issues equally apply to shared memory
269    queues the WED may describe.
270
271
272read
273----
274
275    Reads events from the AFU. Blocks if no events are pending
276    (unless O_NONBLOCK is supplied). Returns -EIO in the case of an
277    unrecoverable error or if the card is removed.
278
279    read() will always return an integral number of events.
280
281    The buffer passed to read() must be at least 4K bytes.
282
283    The result of the read will be a buffer of one or more events,
284    each event is of type struct cxl_event, of varying size::
285
286            struct cxl_event {
287                    struct cxl_event_header header;
288                    union {
289                            struct cxl_event_afu_interrupt irq;
290                            struct cxl_event_data_storage fault;
291                            struct cxl_event_afu_error afu_error;
292                    };
293            };
294
295    The struct cxl_event_header is defined as
296
297        ::
298
299            struct cxl_event_header {
300                    __u16 type;
301                    __u16 size;
302                    __u16 process_element;
303                    __u16 reserved1;
304            };
305
306        type:
307            This defines the type of event. The type determines how
308            the rest of the event is structured. These types are
309            described below and defined by enum cxl_event_type.
310
311        size:
312            This is the size of the event in bytes including the
313            struct cxl_event_header. The start of the next event can
314            be found at this offset from the start of the current
315            event.
316
317        process_element:
318            Context ID of the event.
319
320        reserved field:
321            For future extensions and padding.
322
323    If the event type is CXL_EVENT_AFU_INTERRUPT then the event
324    structure is defined as
325
326        ::
327
328            struct cxl_event_afu_interrupt {
329                    __u16 flags;
330                    __u16 irq; /* Raised AFU interrupt number */
331                    __u32 reserved1;
332            };
333
334        flags:
335            These flags indicate which optional fields are present
336            in this struct. Currently all fields are mandatory.
337
338        irq:
339            The IRQ number sent by the AFU.
340
341        reserved field:
342            For future extensions and padding.
343
344    If the event type is CXL_EVENT_DATA_STORAGE then the event
345    structure is defined as
346
347        ::
348
349            struct cxl_event_data_storage {
350                    __u16 flags;
351                    __u16 reserved1;
352                    __u32 reserved2;
353                    __u64 addr;
354                    __u64 dsisr;
355                    __u64 reserved3;
356            };
357
358        flags:
359            These flags indicate which optional fields are present in
360            this struct. Currently all fields are mandatory.
361
362        address:
363            The address that the AFU unsuccessfully attempted to
364            access. Valid accesses will be handled transparently by the
365            kernel but invalid accesses will generate this event.
366
367        dsisr:
368            This field gives information on the type of fault. It is a
369            copy of the DSISR from the PSL hardware when the address
370            fault occurred. The form of the DSISR is as defined in the
371            CAIA.
372
373        reserved fields:
374            For future extensions
375
376    If the event type is CXL_EVENT_AFU_ERROR then the event structure
377    is defined as
378
379        ::
380
381            struct cxl_event_afu_error {
382                    __u16 flags;
383                    __u16 reserved1;
384                    __u32 reserved2;
385                    __u64 error;
386            };
387
388        flags:
389            These flags indicate which optional fields are present in
390            this struct. Currently all fields are Mandatory.
391
392        error:
393            Error status from the AFU. Defined by the AFU.
394
395        reserved fields:
396            For future extensions and padding
397
398
3992. Card character device (powerVM guest only)
400^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
401
402    In a powerVM guest, an extra character device is created for the
403    card. The device is only used to write (flash) a new image on the
404    FPGA accelerator. Once the image is written and verified, the
405    device tree is updated and the card is reset to reload the updated
406    image.
407
408open
409----
410
411    Opens the device and allocates a file descriptor to be used with
412    the rest of the API. The device can only be opened once.
413
414ioctl
415-----
416
417CXL_IOCTL_DOWNLOAD_IMAGE / CXL_IOCTL_VALIDATE_IMAGE:
418    Starts and controls flashing a new FPGA image. Partial
419    reconfiguration is not supported (yet), so the image must contain
420    a copy of the PSL and AFU(s). Since an image can be quite large,
421    the caller may have to iterate, splitting the image in smaller
422    chunks.
423
424    Takes a pointer to a struct cxl_adapter_image::
425
426        struct cxl_adapter_image {
427            __u64 flags;
428            __u64 data;
429            __u64 len_data;
430            __u64 len_image;
431            __u64 reserved1;
432            __u64 reserved2;
433            __u64 reserved3;
434            __u64 reserved4;
435        };
436
437    flags:
438        These flags indicate which optional fields are present in
439        this struct. Currently all fields are mandatory.
440
441    data:
442        Pointer to a buffer with part of the image to write to the
443        card.
444
445    len_data:
446        Size of the buffer pointed to by data.
447
448    len_image:
449        Full size of the image.
450
451
452Sysfs Class
453===========
454
455    A cxl sysfs class is added under /sys/class/cxl to facilitate
456    enumeration and tuning of the accelerators. Its layout is
457    described in Documentation/ABI/obsolete/sysfs-class-cxl
458
459
460Udev rules
461==========
462
463    The following udev rules could be used to create a symlink to the
464    most logical chardev to use in any programming mode (afuX.Yd for
465    dedicated, afuX.Ys for afu directed), since the API is virtually
466    identical for each::
467
468	SUBSYSTEM=="cxl", ATTRS{mode}=="dedicated_process", SYMLINK="cxl/%b"
469	SUBSYSTEM=="cxl", ATTRS{mode}=="afu_directed", \
470	                  KERNEL=="afu[0-9]*.[0-9]*s", SYMLINK="cxl/%b"
471