1# Introduction 2 3This directory contains SystemZ deflate hardware acceleration support. 4It can be enabled using the following build commands: 5 6 $ ./configure --with-dfltcc-deflate --with-dfltcc-inflate 7 $ make 8 9or 10 11 $ cmake -DWITH_DFLTCC_DEFLATE=1 -DWITH_DFLTCC_INFLATE=1 . 12 $ make 13 14When built like this, zlib-ng would compress using hardware on level 1, 15and using software on all other levels. Decompression will always happen 16in hardware. In order to enable hardware compression for levels 1-6 17(i.e. to make it used by default) one could add 18`-DDFLTCC_LEVEL_MASK=0x7e` to CFLAGS when building zlib-ng. 19 20SystemZ deflate hardware acceleration is available on [IBM z15]( 21https://www.ibm.com/products/z15) and newer machines under the name [ 22"Integrated Accelerator for zEnterprise Data Compression"]( 23https://www.ibm.com/support/z-content-solutions/compression/). The 24programming interface to it is a machine instruction called DEFLATE 25CONVERSION CALL (DFLTCC). It is documented in Chapter 26 of [Principles 26of Operation](https://publibfp.dhe.ibm.com/epubs/pdf/a227832c.pdf). Both 27the code and the rest of this document refer to this feature simply as 28"DFLTCC". 29 30# Performance 31 32Performance figures are published [here]( 33https://github.com/iii-i/zlib-ng/wiki/Performance-with-dfltcc-patch-applied-and-dfltcc-support-built-on-dfltcc-enabled-machine 34). The compression speed-up can be as high as 110x and the decompression 35speed-up can be as high as 15x. 36 37# Limitations 38 39Two DFLTCC compression calls with identical inputs are not guaranteed to 40produce identical outputs. Therefore care should be taken when using 41hardware compression when reproducible results are desired. In 42particular, zlib-ng-specific `zng_deflateSetParams` call allows setting 43`Z_DEFLATE_REPRODUCIBLE` parameter, which disables DFLTCC support for a 44particular stream. 45 46DFLTCC does not support every single zlib-ng feature, in particular: 47 48* `inflate(Z_BLOCK)` and `inflate(Z_TREES)` 49* `inflateMark()` 50* `inflatePrime()` 51* `inflateSyncPoint()` 52 53When used, these functions will either switch to software, or, in case 54this is not possible, gracefully fail. 55 56# Code structure 57 58All SystemZ-specific code lives in `arch/s390` directory and is 59integrated with the rest of zlib-ng using hook macros. 60 61## Hook macros 62 63DFLTCC takes as arguments a parameter block, an input buffer, an output 64buffer and a window. `ZALLOC_DEFLATE_STATE()`, `ZALLOC_INFLATE_STATE()`, 65`ZFREE_STATE()`, `ZCOPY_DEFLATE_STATE()`, `ZCOPY_INFLATE_STATE()`, 66`ZALLOC_WINDOW()` and `TRY_FREE_WINDOW()` macros encapsulate allocation 67details for the parameter block (which is allocated alongside zlib-ng 68state) and the window (which must be page-aligned). 69 70While inflate software and hardware window formats match, this is not 71the case for deflate. Therefore, `deflateSetDictionary()` and 72`deflateGetDictionary()` need special handling, which is triggered using 73`DEFLATE_SET_DICTIONARY_HOOK()` and `DEFLATE_GET_DICTIONARY_HOOK()` 74macros. 75 76`deflateResetKeep()` and `inflateResetKeep()` update the DFLTCC 77parameter block using `DEFLATE_RESET_KEEP_HOOK()` and 78`INFLATE_RESET_KEEP_HOOK()` macros. 79 80`INFLATE_PRIME_HOOK()`, `INFLATE_MARK_HOOK()` and 81`INFLATE_SYNC_POINT_HOOK()` macros make the respective unsupported 82calls gracefully fail. 83 84`DEFLATE_PARAMS_HOOK()` implements switching between hardware and 85software compression mid-stream using `deflateParams()`. Switching 86normally entails flushing the current block, which might not be possible 87in low memory situations. `deflateParams()` uses `DEFLATE_DONE()` hook 88in order to detect and gracefully handle such situations. 89 90The algorithm implemented in hardware has different compression ratio 91than the one implemented in software. `DEFLATE_BOUND_ADJUST_COMPLEN()` 92and `DEFLATE_NEED_CONSERVATIVE_BOUND()` macros make `deflateBound()` 93return the correct results for the hardware implementation. 94 95Actual compression and decompression are handled by `DEFLATE_HOOK()` and 96`INFLATE_TYPEDO_HOOK()` macros. Since inflation with DFLTCC manages the 97window on its own, calling `updatewindow()` is suppressed using 98`INFLATE_NEED_UPDATEWINDOW()` macro. 99 100In addition to compression, DFLTCC computes CRC-32 and Adler-32 101checksums, therefore, whenever it's used, software checksumming is 102suppressed using `DEFLATE_NEED_CHECKSUM()` and `INFLATE_NEED_CHECKSUM()` 103macros. 104 105While software always produces reproducible compression results, this 106is not the case for DFLTCC. Therefore, zlib-ng users are given the 107ability to specify whether or not reproducible compression results 108are required. While it is always possible to specify this setting 109before the compression begins, it is not always possible to do so in 110the middle of a deflate stream - the exact conditions for that are 111determined by `DEFLATE_CAN_SET_REPRODUCIBLE()` macro. 112 113## SystemZ-specific code 114 115When zlib-ng is built with DFLTCC, the hooks described above are 116converted to calls to functions, which are implemented in 117`arch/s390/dfltcc_*` files. The functions can be grouped in three broad 118categories: 119 120* Base DFLTCC support, e.g. wrapping the machine instruction - 121 `dfltcc()` and allocating aligned memory - `dfltcc_alloc_state()`. 122* Translating between software and hardware data formats, e.g. 123 `dfltcc_deflate_set_dictionary()`. 124* Translating between software and hardware state machines, e.g. 125 `dfltcc_deflate()` and `dfltcc_inflate()`. 126 127The functions from the first two categories are fairly simple, however, 128various quirks in both software and hardware state machines make the 129functions from the third category quite complicated. 130 131### `dfltcc_deflate()` function 132 133This function is called by `deflate()` and has the following 134responsibilities: 135 136* Checking whether DFLTCC can be used with the current stream. If this 137 is not the case, then it returns `0`, making `deflate()` use some 138 other function in order to compress in software. Otherwise it returns 139 `1`. 140* Block management and Huffman table generation. DFLTCC ends blocks only 141 when explicitly instructed to do so by the software. Furthermore, 142 whether to use fixed or dynamic Huffman tables must also be determined 143 by the software. Since looking at data in order to gather statistics 144 would negate performance benefits, the following approach is used: the 145 first `DFLTCC_FIRST_FHT_BLOCK_SIZE` bytes are placed into a fixed 146 block, and every next `DFLTCC_BLOCK_SIZE` bytes are placed into 147 dynamic blocks. 148* Writing EOBS. Block Closing Control bit in the parameter block 149 instructs DFLTCC to write EOBS, however, certain conditions need to be 150 met: input data length must be non-zero or Continuation Flag must be 151 set. To put this in simpler terms, DFLTCC will silently refuse to 152 write EOBS if this is the only thing that it is asked to do. Since the 153 code has to be able to emit EOBS in software anyway, in order to avoid 154 tricky corner cases Block Closing Control is never used. Whether to 155 write EOBS is instead controlled by `soft_bcc` variable. 156* Triggering block post-processing. Depending on flush mode, `deflate()` 157 must perform various additional actions when a block or a stream ends. 158 `dfltcc_deflate()` informs `deflate()` about this using 159 `block_state *result` parameter. 160* Converting software state fields into hardware parameter block fields, 161 and vice versa. For example, `wrap` and Check Value Type or `bi_valid` 162 and Sub-Byte Boundary. Certain fields cannot be translated and must 163 persist untouched in the parameter block between calls, for example, 164 Continuation Flag or Continuation State Buffer. 165* Handling flush modes and low-memory situations. These aspects are 166 quite intertwined and pervasive. The general idea here is that the 167 code must not do anything in software - whether explicitly by e.g. 168 calling `send_eobs()`, or implicitly - by returning to `deflate()` 169 with certain return and `*result` values, when Continuation Flag is 170 set. 171* Ending streams. When a new block is started and flush mode is 172 `Z_FINISH`, Block Header Final parameter block bit is used to mark 173 this block as final. However, sometimes an empty final block is 174 needed, and, unfortunately, just like with EOBS, DFLTCC will silently 175 refuse to do this. The general idea of DFLTCC implementation is to 176 rely as much as possible on the existing code. Here in order to do 177 this, the code pretends that it does not support DFLTCC, which makes 178 `deflate()` call a software compression function, which writes an 179 empty final block. Whether this is required is controlled by 180 `need_empty_block` variable. 181* Error handling. This is simply converting 182 Operation-Ending-Supplemental Code to string. Errors can only happen 183 due to things like memory corruption, and therefore they don't affect 184 the `deflate()` return code. 185 186### `dfltcc_inflate()` function 187 188This function is called by `inflate()` from the `TYPEDO` state (that is, 189when all the metadata is parsed and the stream is positioned at the type 190bits of deflate block header) and it's responsible for the following: 191 192* Falling back to software when flush mode is `Z_BLOCK` or `Z_TREES`. 193 Unfortunately, there is no way to ask DFLTCC to stop decompressing on 194 block or tree boundary. 195* `inflate()` decompression loop management. This is controlled using 196 the return value, which can be either `DFLTCC_INFLATE_BREAK` or 197 `DFLTCC_INFLATE_CONTINUE`. 198* Converting software state fields into hardware parameter block fields, 199 and vice versa. For example, `whave` and History Length or `wnext` and 200 History Offset. 201* Ending streams. This instructs `inflate()` to return `Z_STREAM_END` 202 and is controlled by `last` state field. 203* Error handling. Like deflate, error handling comprises 204 Operation-Ending-Supplemental Code to string conversion. Unlike 205 deflate, errors may happen due to bad inputs, therefore they are 206 propagated to `inflate()` by setting `mode` field to `MEM` or `BAD`. 207 208# Testing 209 210Given complexity of DFLTCC machine instruction, it is not clear whether 211QEMU TCG will ever support it. At the time of writing, one has to have 212access to an IBM z15+ VM or LPAR in order to test DFLTCC support. Since 213DFLTCC is a non-privileged instruction, neither special VM/LPAR 214configuration nor root are required. 215 216zlib-ng CI uses an IBM-provided z15 self-hosted builder for the DFLTCC 217testing. There are no IBM Z builds of GitHub Actions runner, and 218stable qemu-user has problems with .NET apps, so the builder runs the 219x86_64 runner version with qemu-user built from the master branch. 220 221## Configuring the builder. 222 223### Install prerequisites. 224 225``` 226$ sudo dnf install docker 227``` 228 229### Add services. 230 231``` 232$ sudo cp self-hosted-builder/*.service /etc/systemd/system/ 233$ sudo systemctl daemon-reload 234``` 235 236### Create a config file. 237 238``` 239$ sudo tee /etc/actions-runner 240repo=<owner>/<name> 241access_token=<ghp_***> 242``` 243 244Access token should have the repo scope, consult 245https://docs.github.com/en/rest/reference/actions#create-a-registration-token-for-a-repository 246for details. 247 248### Autostart the x86_64 emulation support. 249 250``` 251$ sudo systemctl enable --now qemu-user-static 252``` 253 254### Autostart the runner. 255 256``` 257$ sudo systemctl enable --now actions-runner 258``` 259 260## Rebuilding the image 261 262In order to update the `iiilinuxibmcom/actions-runner` image, e.g. to get the 263latest OS security fixes, use the following commands: 264 265``` 266$ sudo docker build \ 267 --pull \ 268 -f self-hosted-builder/actions-runner.Dockerfile \ 269 -t iiilinuxibmcom/actions-runner 270$ sudo systemctl restart actions-runner 271``` 272 273## Removing persistent data 274 275The `actions-runner` service stores various temporary data, such as runner 276registration information, work directories and logs, in the `actions-runner` 277volume. In order to remove it and start from scratch, e.g. when switching the 278runner to a different repository, use the following commands: 279 280``` 281$ sudo systemctl stop actions-runner 282$ sudo docker rm -f actions-runner 283$ sudo docker volume rm actions-runner 284``` 285