1==================================== 2Coherent Accelerator Interface (CXL) 3==================================== 4 5Introduction 6============ 7 8 The coherent accelerator interface is designed to allow the 9 coherent connection of accelerators (FPGAs and other devices) to a 10 POWER system. These devices need to adhere to the Coherent 11 Accelerator Interface Architecture (CAIA). 12 13 IBM refers to this as the Coherent Accelerator Processor Interface 14 or CAPI. In the kernel it's referred to by the name CXL to avoid 15 confusion with the ISDN CAPI subsystem. 16 17 Coherent in this context means that the accelerator and CPUs can 18 both access system memory directly and with the same effective 19 addresses. 20 21 **This driver is deprecated and will be removed in a future release.** 22 23Hardware overview 24================= 25 26 :: 27 28 POWER8/9 FPGA 29 +----------+ +---------+ 30 | | | | 31 | CPU | | AFU | 32 | | | | 33 | | | | 34 | | | | 35 +----------+ +---------+ 36 | PHB | | | 37 | +------+ | PSL | 38 | | CAPP |<------>| | 39 +---+------+ PCIE +---------+ 40 41 The POWER8/9 chip has a Coherently Attached Processor Proxy (CAPP) 42 unit which is part of the PCIe Host Bridge (PHB). This is managed 43 by Linux by calls into OPAL. Linux doesn't directly program the 44 CAPP. 45 46 The FPGA (or coherently attached device) consists of two parts. 47 The POWER Service Layer (PSL) and the Accelerator Function Unit 48 (AFU). The AFU is used to implement specific functionality behind 49 the PSL. The PSL, among other things, provides memory address 50 translation services to allow each AFU direct access to userspace 51 memory. 52 53 The AFU is the core part of the accelerator (eg. the compression, 54 crypto etc function). The kernel has no knowledge of the function 55 of the AFU. Only userspace interacts directly with the AFU. 56 57 The PSL provides the translation and interrupt services that the 58 AFU needs. This is what the kernel interacts with. For example, if 59 the AFU needs to read a particular effective address, it sends 60 that address to the PSL, the PSL then translates it, fetches the 61 data from memory and returns it to the AFU. If the PSL has a 62 translation miss, it interrupts the kernel and the kernel services 63 the fault. The context to which this fault is serviced is based on 64 who owns that acceleration function. 65 66 - POWER8 and PSL Version 8 are compliant to the CAIA Version 1.0. 67 - POWER9 and PSL Version 9 are compliant to the CAIA Version 2.0. 68 69 This PSL Version 9 provides new features such as: 70 71 * Interaction with the nest MMU on the P9 chip. 72 * Native DMA support. 73 * Supports sending ASB_Notify messages for host thread wakeup. 74 * Supports Atomic operations. 75 * etc. 76 77 Cards with a PSL9 won't work on a POWER8 system and cards with a 78 PSL8 won't work on a POWER9 system. 79 80AFU Modes 81========= 82 83 There are two programming modes supported by the AFU. Dedicated 84 and AFU directed. AFU may support one or both modes. 85 86 When using dedicated mode only one MMU context is supported. In 87 this mode, only one userspace process can use the accelerator at 88 time. 89 90 When using AFU directed mode, up to 16K simultaneous contexts can 91 be supported. This means up to 16K simultaneous userspace 92 applications may use the accelerator (although specific AFUs may 93 support fewer). In this mode, the AFU sends a 16 bit context ID 94 with each of its requests. This tells the PSL which context is 95 associated with each operation. If the PSL can't translate an 96 operation, the ID can also be accessed by the kernel so it can 97 determine the userspace context associated with an operation. 98 99 100MMIO space 101========== 102 103 A portion of the accelerator MMIO space can be directly mapped 104 from the AFU to userspace. Either the whole space can be mapped or 105 just a per context portion. The hardware is self describing, hence 106 the kernel can determine the offset and size of the per context 107 portion. 108 109 110Interrupts 111========== 112 113 AFUs may generate interrupts that are destined for userspace. These 114 are received by the kernel as hardware interrupts and passed onto 115 userspace by a read syscall documented below. 116 117 Data storage faults and error interrupts are handled by the kernel 118 driver. 119 120 121Work Element Descriptor (WED) 122============================= 123 124 The WED is a 64-bit parameter passed to the AFU when a context is 125 started. Its format is up to the AFU hence the kernel has no 126 knowledge of what it represents. Typically it will be the 127 effective address of a work queue or status block where the AFU 128 and userspace can share control and status information. 129 130 131 132 133User API 134======== 135 1361. AFU character devices 137^^^^^^^^^^^^^^^^^^^^^^^^ 138 139 For AFUs operating in AFU directed mode, two character device 140 files will be created. /dev/cxl/afu0.0m will correspond to a 141 master context and /dev/cxl/afu0.0s will correspond to a slave 142 context. Master contexts have access to the full MMIO space an 143 AFU provides. Slave contexts have access to only the per process 144 MMIO space an AFU provides. 145 146 For AFUs operating in dedicated process mode, the driver will 147 only create a single character device per AFU called 148 /dev/cxl/afu0.0d. This will have access to the entire MMIO space 149 that the AFU provides (like master contexts in AFU directed). 150 151 The types described below are defined in include/uapi/misc/cxl.h 152 153 The following file operations are supported on both slave and 154 master devices. 155 156 A userspace library libcxl is available here: 157 158 https://github.com/ibm-capi/libcxl 159 160 This provides a C interface to this kernel API. 161 162open 163---- 164 165 Opens the device and allocates a file descriptor to be used with 166 the rest of the API. 167 168 A dedicated mode AFU only has one context and only allows the 169 device to be opened once. 170 171 An AFU directed mode AFU can have many contexts, the device can be 172 opened once for each context that is available. 173 174 When all available contexts are allocated the open call will fail 175 and return -ENOSPC. 176 177 Note: 178 IRQs need to be allocated for each context, which may limit 179 the number of contexts that can be created, and therefore 180 how many times the device can be opened. The POWER8 CAPP 181 supports 2040 IRQs and 3 are used by the kernel, so 2037 are 182 left. If 1 IRQ is needed per context, then only 2037 183 contexts can be allocated. If 4 IRQs are needed per context, 184 then only 2037/4 = 509 contexts can be allocated. 185 186 187ioctl 188----- 189 190 CXL_IOCTL_START_WORK: 191 Starts the AFU context and associates it with the current 192 process. Once this ioctl is successfully executed, all memory 193 mapped into this process is accessible to this AFU context 194 using the same effective addresses. No additional calls are 195 required to map/unmap memory. The AFU memory context will be 196 updated as userspace allocates and frees memory. This ioctl 197 returns once the AFU context is started. 198 199 Takes a pointer to a struct cxl_ioctl_start_work 200 201 :: 202 203 struct cxl_ioctl_start_work { 204 __u64 flags; 205 __u64 work_element_descriptor; 206 __u64 amr; 207 __s16 num_interrupts; 208 __s16 reserved1; 209 __s32 reserved2; 210 __u64 reserved3; 211 __u64 reserved4; 212 __u64 reserved5; 213 __u64 reserved6; 214 }; 215 216 flags: 217 Indicates which optional fields in the structure are 218 valid. 219 220 work_element_descriptor: 221 The Work Element Descriptor (WED) is a 64-bit argument 222 defined by the AFU. Typically this is an effective 223 address pointing to an AFU specific structure 224 describing what work to perform. 225 226 amr: 227 Authority Mask Register (AMR), same as the powerpc 228 AMR. This field is only used by the kernel when the 229 corresponding CXL_START_WORK_AMR value is specified in 230 flags. If not specified the kernel will use a default 231 value of 0. 232 233 num_interrupts: 234 Number of userspace interrupts to request. This field 235 is only used by the kernel when the corresponding 236 CXL_START_WORK_NUM_IRQS value is specified in flags. 237 If not specified the minimum number required by the 238 AFU will be allocated. The min and max number can be 239 obtained from sysfs. 240 241 reserved fields: 242 For ABI padding and future extensions 243 244 CXL_IOCTL_GET_PROCESS_ELEMENT: 245 Get the current context id, also known as the process element. 246 The value is returned from the kernel as a __u32. 247 248 249mmap 250---- 251 252 An AFU may have an MMIO space to facilitate communication with the 253 AFU. If it does, the MMIO space can be accessed via mmap. The size 254 and contents of this area are specific to the particular AFU. The 255 size can be discovered via sysfs. 256 257 In AFU directed mode, master contexts are allowed to map all of 258 the MMIO space and slave contexts are allowed to only map the per 259 process MMIO space associated with the context. In dedicated 260 process mode the entire MMIO space can always be mapped. 261 262 This mmap call must be done after the START_WORK ioctl. 263 264 Care should be taken when accessing MMIO space. Only 32 and 64-bit 265 accesses are supported by POWER8. Also, the AFU will be designed 266 with a specific endianness, so all MMIO accesses should consider 267 endianness (recommend endian(3) variants like: le64toh(), 268 be64toh() etc). These endian issues equally apply to shared memory 269 queues the WED may describe. 270 271 272read 273---- 274 275 Reads events from the AFU. Blocks if no events are pending 276 (unless O_NONBLOCK is supplied). Returns -EIO in the case of an 277 unrecoverable error or if the card is removed. 278 279 read() will always return an integral number of events. 280 281 The buffer passed to read() must be at least 4K bytes. 282 283 The result of the read will be a buffer of one or more events, 284 each event is of type struct cxl_event, of varying size:: 285 286 struct cxl_event { 287 struct cxl_event_header header; 288 union { 289 struct cxl_event_afu_interrupt irq; 290 struct cxl_event_data_storage fault; 291 struct cxl_event_afu_error afu_error; 292 }; 293 }; 294 295 The struct cxl_event_header is defined as 296 297 :: 298 299 struct cxl_event_header { 300 __u16 type; 301 __u16 size; 302 __u16 process_element; 303 __u16 reserved1; 304 }; 305 306 type: 307 This defines the type of event. The type determines how 308 the rest of the event is structured. These types are 309 described below and defined by enum cxl_event_type. 310 311 size: 312 This is the size of the event in bytes including the 313 struct cxl_event_header. The start of the next event can 314 be found at this offset from the start of the current 315 event. 316 317 process_element: 318 Context ID of the event. 319 320 reserved field: 321 For future extensions and padding. 322 323 If the event type is CXL_EVENT_AFU_INTERRUPT then the event 324 structure is defined as 325 326 :: 327 328 struct cxl_event_afu_interrupt { 329 __u16 flags; 330 __u16 irq; /* Raised AFU interrupt number */ 331 __u32 reserved1; 332 }; 333 334 flags: 335 These flags indicate which optional fields are present 336 in this struct. Currently all fields are mandatory. 337 338 irq: 339 The IRQ number sent by the AFU. 340 341 reserved field: 342 For future extensions and padding. 343 344 If the event type is CXL_EVENT_DATA_STORAGE then the event 345 structure is defined as 346 347 :: 348 349 struct cxl_event_data_storage { 350 __u16 flags; 351 __u16 reserved1; 352 __u32 reserved2; 353 __u64 addr; 354 __u64 dsisr; 355 __u64 reserved3; 356 }; 357 358 flags: 359 These flags indicate which optional fields are present in 360 this struct. Currently all fields are mandatory. 361 362 address: 363 The address that the AFU unsuccessfully attempted to 364 access. Valid accesses will be handled transparently by the 365 kernel but invalid accesses will generate this event. 366 367 dsisr: 368 This field gives information on the type of fault. It is a 369 copy of the DSISR from the PSL hardware when the address 370 fault occurred. The form of the DSISR is as defined in the 371 CAIA. 372 373 reserved fields: 374 For future extensions 375 376 If the event type is CXL_EVENT_AFU_ERROR then the event structure 377 is defined as 378 379 :: 380 381 struct cxl_event_afu_error { 382 __u16 flags; 383 __u16 reserved1; 384 __u32 reserved2; 385 __u64 error; 386 }; 387 388 flags: 389 These flags indicate which optional fields are present in 390 this struct. Currently all fields are Mandatory. 391 392 error: 393 Error status from the AFU. Defined by the AFU. 394 395 reserved fields: 396 For future extensions and padding 397 398 3992. Card character device (powerVM guest only) 400^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 401 402 In a powerVM guest, an extra character device is created for the 403 card. The device is only used to write (flash) a new image on the 404 FPGA accelerator. Once the image is written and verified, the 405 device tree is updated and the card is reset to reload the updated 406 image. 407 408open 409---- 410 411 Opens the device and allocates a file descriptor to be used with 412 the rest of the API. The device can only be opened once. 413 414ioctl 415----- 416 417CXL_IOCTL_DOWNLOAD_IMAGE / CXL_IOCTL_VALIDATE_IMAGE: 418 Starts and controls flashing a new FPGA image. Partial 419 reconfiguration is not supported (yet), so the image must contain 420 a copy of the PSL and AFU(s). Since an image can be quite large, 421 the caller may have to iterate, splitting the image in smaller 422 chunks. 423 424 Takes a pointer to a struct cxl_adapter_image:: 425 426 struct cxl_adapter_image { 427 __u64 flags; 428 __u64 data; 429 __u64 len_data; 430 __u64 len_image; 431 __u64 reserved1; 432 __u64 reserved2; 433 __u64 reserved3; 434 __u64 reserved4; 435 }; 436 437 flags: 438 These flags indicate which optional fields are present in 439 this struct. Currently all fields are mandatory. 440 441 data: 442 Pointer to a buffer with part of the image to write to the 443 card. 444 445 len_data: 446 Size of the buffer pointed to by data. 447 448 len_image: 449 Full size of the image. 450 451 452Sysfs Class 453=========== 454 455 A cxl sysfs class is added under /sys/class/cxl to facilitate 456 enumeration and tuning of the accelerators. Its layout is 457 described in Documentation/ABI/obsolete/sysfs-class-cxl 458 459 460Udev rules 461========== 462 463 The following udev rules could be used to create a symlink to the 464 most logical chardev to use in any programming mode (afuX.Yd for 465 dedicated, afuX.Ys for afu directed), since the API is virtually 466 identical for each:: 467 468 SUBSYSTEM=="cxl", ATTRS{mode}=="dedicated_process", SYMLINK="cxl/%b" 469 SUBSYSTEM=="cxl", ATTRS{mode}=="afu_directed", \ 470 KERNEL=="afu[0-9]*.[0-9]*s", SYMLINK="cxl/%b" 471