1Changes for 1.5.0 'Sonic': 2-------------------------- 3 41.5.0 is a major release of dav1d, that: 5 - WARNING: we removed some of the SSE2 optimizations, so if you care about 6 systems without SSSE3, you should be careful when updating! 7 - Add Arm OpenBSD run-time CPU feature 8 - Optimize index offset calculations for decode_coefs 9 - picture: copy HDR10+ and T35 metadata only to visible frames 10 - SSSE3 new optimizations for 6-tap (8bit and hbd) 11 - AArch64/SVE: Add HBD subpel filters using 128-bit SVE2 12 - AArch64: Add USMMLA implempentation for 6-tap H/HV 13 - AArch64: Optimize Armv8.0 NEON for HBD horizontal filters and 6-tap filters 14 - Power9: Optimized ITX till 16x4. 15 - Loongarch: numerous optimizations 16 - RISC-V optimizations for pal, cdef_filter, ipred, mc_blend, mc_bdir, itx 17 - Allow playing videos in full-screen mode in dav1dplay 18 19 20Changes for 1.4.3 'Road Runner': 21-------------------------------- 22 231.4.3 is a small release focused on security issues 24 - AArch64: Fix potential out of bounds access in DotProd H/HV filters 25 - cli: Prevent buffer over-read 26 27 28Changes for 1.4.2 'Road Runner': 29-------------------------------- 30 311.4.2 is a small release of dav1d, improving notably ARM, AVX-512 and PowerPC 32 - AVX2 optimizations for 8-tap and new variants for 6-tap 33 - AVX-512 optimizations for 8-tap and new variants for 6-tap 34 - Improve entropy decoding on ARM64 35 - New ARM64 optimizations for convolutions based on DotProd extension 36 - New ARM64 optimizations for convolutions based on i8mm extension 37 - New ARM64 optimizations for subpel and prep filters for i8mm 38 - Misc improvements on existing ARM64 optimizations, notably for put/prep 39 - New PowerPC9 optimizations for loopfilter 40 - Support for macOS kperf API for benchmarking 41 42 43Changes for 1.4.1 'Road Runner': 44-------------------------------- 45 461.4.1 is a small release of dav1d, improving notably ARM and RISC-V speed 47 48- Optimizations for 6tap filters for NEON (ARM) 49- More RISC-V optimizations for itx (4x8, 8x4, 4x16, 16x4, 8x16, 16x8) 50- Reduction of binary size on ARM64, ARM32 and RISC-V 51- Fix out-of-bounds read in 8bpc SSE2/SSSE3 wiener_filter 52- Msac optimizations 53 54 55Changes for 1.4.0 'Road Runner': 56-------------------------------- 57 581.4.0 is a medium release of dav1d, focusing on new architecture support and optimizations 59 60- AVX-512 optimizations for z1, z2, z3 in 8bit and high-bitdepth 61- New architecture supported: loongarch 62- Loongarch optimizations for 8bit 63- New architecture supported: RISC-V 64- RISC-V optimizations for itx 65- Misc improvements in threading and in reducing binary size 66- Fix potential integer overflow with extremely large frame sizes (CVE-2024-1580) 67 68 69Changes for 1.3.0 'Tundra Peregrine Falcon (Calidus)': 70------------------------------------------------------ 71 721.3.0 is a medium release of dav1d, focus on new APIs and memory usage reduction. 73 74- Reduce memory usage in numerous places 75- ABI break in Dav1dSequenceHeader, Dav1dFrameHeader, Dav1dContentLightLevel structures 76- new API function to check the API version: dav1d_version_api() 77- Rewrite of the SGR functions for ARM64 to be faster 78- NEON implemetation of save_tmvs for ARM32 and ARM64 79- x86 palette DSP for pal_idx_finish function 80 81 82Changes for 1.2.1 'Arctic Peregrine Falcon': 83-------------------------------------------- 84 851.2.1 is a small release of dav1d, adding more SIMD and fixes 86 87- Fix a threading race on task_thread.init_done 88- NEON z2 8bpc and high bit-depth optimizations 89- SSSE3 z2 high bit-depth optimziations 90- Fix a desynced luma/chroma planes issue with Film Grain 91- Reduce memory consumption 92- Improve dav1d_parse_sequence_header() speed 93- OBU: Improve header parsing and fix potential overflows 94- OBU: Improve ITU-T T.35 parsing speed 95- Misc buildsystems, CI and headers fixes 96 97 98Changes for 1.2.0 'Arctic Peregrine Falcon': 99-------------------------------------------- 100 1011.2.0 is a small release of dav1d, adding more SIMD and fixes 102 103- Improvements on attachments of props and T.35 entries on output pictures 104- NEON z1/z3 high bit-depth optimizations and improvements for 8bpc 105- SSSE3 z2/z3 8bpc and SSSE3 z1/z3 high bit-depth optimziations 106- refmvs.save_tmvs optimizations in SSSE3/AVX2/AVX-512 107- AVX-512 optimizations for high bit-depth itx (16x64, 32x64, 64x16, 64x32, 64x64) 108- AVX2 optimizations for 12bpc for 16x32, 32x16, 32x32 itx 109 110 111Changes for 1.1.0 'Arctic Peregrine Falcon': 112-------------------------------------------- 113 1141.1.0 is an important release of dav1d, fixing numerous bugs, and adding SIMD 115 116- New function dav1d_get_frame_delay to query the decoder frame delay 117- Numerous fixes for strict conformity to the specs and samples 118- NEON and AVX-512 misc fixes and improvements 119- Partial AVX2 12bpc transform implementations 120- AVX-512 high bit-depth cdef_filter, loopfilter, itx 121- NEON z1/z3 optimization for 8bpc 122- SSSE3 z1 optimization for 8bpc 123 124 "From VideoLAN with love" 125 126 127Changes for 1.0.0 'Peregrine Falcon': 128------------------------------------- 129 1301.0.0 is a major release of dav1d, adding important features and bug fixes. 131 132It notably changes, in an important way, the way threading works, by adding 133an automatic thread management. 134 135It also adds support for AVX-512 acceleration, and adds speedups to existing x86 136code (from SSE2 to AVX2). 137 1381.0.0 adds new grain API to ease acceleration on the GPU, and adds an API call 139to get information of which frame failed to decode, in error cases. 140 141Finally, 1.0.0 fixes numerous small bugs that were reported since the beginning 142of the project to have a proper release. 143 144 .''. 145 .''. . *''* :_\/_: . 146 :_\/_: _\(/_ .:.*_\/_* : /\ : .'.:.'. 147 .''.: /\ : ./)\ ':'* /\ * : '..'. -=:o:=- 148 :_\/_:'.:::. ' *''* * '.\'/.' _\(/_'.':'.' 149 : /\ : ::::: *_\/_* -= o =- /)\ ' * 150 '..' ':::' * /\ * .'/.\'. ' 151 * *..* : 152 * : 153 * 1.0.0 154 155 156 157Changes for 0.9.2 'Golden Eagle': 158--------------------------------- 159 1600.9.2 is a small update of dav1d on the 0.9.x branch: 161 - x86: SSE4 optimizations of inverse transforms for 10bit for all sizes 162 - x86: mc.resize optimizations with AVX2/SSSE3 for 10/12b 163 - x86: SSSE3 optimizations for cdef_filter in 10/12b and mc_w_mask_422/444 in 8b 164 - ARM NEON optimizations for FilmGrain Gen_grain functions 165 - Optimizations for splat_mv in SSE2/AVX2 and NEON 166 - x86: SGR improvements for SSSE3 CPUs 167 - x86: AVX2 optimizations for cfl_ac 168 169 170Changes for 0.9.1 'Golden Eagle': 171--------------------------------- 172 1730.9.1 is a middle-size revision of dav1d, adding notably 10b acceleration for SSSE3: 174 - 10/12b SSSE3 optimizations for mc (avg, w_avg, mask, w_mask, emu_edge), 175 prep/put_bilin, prep/put_8tap, ipred (dc/h/v, paeth, smooth, pal, filter), wiener, 176 sgr (10b), warp8x8, deblock, film_grain, cfl_ac/pred for 32bit and 64bit x86 processors 177 - Film grain NEON for fguv 10/12b, fgy/fguv 8b and fgy/fguv 10/12 arm32 178 - Fixes for filmgrain on ARM 179 - itx 10bit optimizations for 4x4/x8/x16, 8x4/x8/x16 for SSE4 180 - Misc improvements on SSE2, SSE4 181 182 183Changes for 0.9.0 'Golden Eagle': 184--------------------------------- 185 1860.9.0 is a major version of dav1d, adding notably 10b acceleration on x64. 187 188Details: 189 - x86 (64bit) AVX2 implementation of most 10b/12b functions, which should provide 190 a large boost for high-bitdepth decoding on modern x86 computers and servers. 191 - ARM64 neon implementation of FilmGrain (4:2:0/4:2:2/4:4:4 8bit) 192 - New API to signal events happening during the decoding process 193 194 195Changes for 0.8.2 'Eurasian Hobby': 196----------------------------------- 197 1980.8.2 is a middle-size update of the 0.8.0 branch: 199 - ARM32 optimizations for ipred and itx in 10/12bits, 200 completing the 10b/12b work on ARM64 and ARM32 201 - Give the post-filters their own threads 202 - ARM64: rewrite the wiener functions 203 - Speed up coefficient decoding, 0.5%-3% global decoding gain 204 - x86 optimizations for CDEF_filter and wiener in 10/12bit 205 - x86: rewrite the SGR AVX2 asm 206 - x86: improve msac speed on SSE2+ machines 207 - ARM32: improve speed of ipred and warp 208 - ARM64: improve speed of ipred, cdef_dir, cdef_filter, warp_motion and itx16 209 - ARM32/64: improve speed of looprestoration 210 - Add seeking, pausing to the player 211 - Update the player for rendering of 10b/12b 212 - Misc speed improvements and fixes on all platforms 213 - Add a xxh3 muxer in the dav1d application 214 215 216Changes for 0.8.1 'Eurasian Hobby': 217----------------------------------- 218 2190.8.1 is a minor update on 0.8.0: 220 - Keep references to buffers valid after dav1d_close(). Fixes a regression 221 caused by the picture buffer pool added in 0.8.0. 222 - ARM32 optimizations for 10bit bitdepth for SGR 223 - ARM32 optimizations for 16bit bitdepth for blend/w_masl/emu_edge 224 - ARM64 optimizations for 10bit bitdepth for SGR 225 - x86 optimizations for wiener in SSE2/SSSE3/AVX2 226 227 228Changes for 0.8.0 'Eurasian Hobby': 229----------------------------------- 230 2310.8.0 is a major update for dav1d: 232 - Improve the performance by using a picture buffer pool; 233 The improvements can reach 10% on some cases on Windows. 234 - Support for Apple ARM Silicon 235 - ARM32 optimizations for 8bit bitdepth for ipred paeth, smooth, cfl 236 - ARM32 optimizations for 10/12/16bit bitdepth for mc_avg/mask/w_avg, 237 put/prep 8tap/bilin, wiener and CDEF filters 238 - ARM64 optimizations for cfl_ac 444 for all bitdepths 239 - x86 optimizations for MC 8-tap, mc_scaled in AVX2 240 - x86 optimizations for CDEF in SSE and {put/prep}_{8tap/bilin} in SSSE3 241 242 243Changes for 0.7.1 'Frigatebird': 244------------------------------ 245 2460.7.1 is a minor update on 0.7.0: 247 - ARM32 NEON optimizations for itxfm, which can give up to 28% speedup, and MSAC 248 - SSE2 optimizations for prep_bilin and prep_8tap 249 - AVX2 optimizations for MC scaled 250 - Fix a clamping issue in motion vector projection 251 - Fix an issue on some specific Haswell CPU on ipred_z AVX2 functions 252 - Improvements on the dav1dplay utility player to support resizing 253 254 255Changes for 0.7.0 'Frigatebird': 256------------------------------ 257 2580.7.0 is a major release for dav1d: 259 - Faster refmv implementation gaining up to 12% speed while -25% of RAM (Single Thread) 260 - 10b/12b ARM64 optimizations are mostly complete: 261 - ipred (paeth, smooth, dc, pal, filter, cfl) 262 - itxfm (only 10b) 263 - AVX2/SSSE3 for non-4:2:0 film grain and for mc.resize 264 - AVX2 for cfl4:4:4 265 - AVX-512 CDEF filter 266 - ARM64 8b improvements for cfl_ac and itxfm 267 - ARM64 implementation for emu_edge in 8b/10b/12b 268 - ARM32 implementation for emu_edge in 8b 269 - Improvements on the dav1dplay utility player to support 10 bit, 270 non-4:2:0 pixel formats and film grain on the GPU 271 272 273Changes for 0.6.0 'Gyrfalcon': 274------------------------------ 275 2760.6.0 is a major release for dav1d: 277 - New ARM64 optimizations for the 10/12bit depth: 278 - mc_avg, mc_w_avg, mc_mask 279 - mc_put/mc_prep 8tap/bilin 280 - mc_warp_8x8 281 - mc_w_mask 282 - mc_blend 283 - wiener 284 - SGR 285 - loopfilter 286 - cdef 287 - New AVX-512 optimizations for prep_bilin, prep_8tap, cdef_filter, mc_avg/w_avg/mask 288 - New SSSE3 optimizations for film grain 289 - New AVX2 optimizations for msac_adapt16 290 - Fix rare mismatches against the reference decoder, notably because of clipping 291 - Improvements on ARM64 on msac, cdef, mc_blend_v and looprestoration optimizations 292 - Improvements on AVX2 optimizations for cdef_filter 293 - Improvements in the C version for itxfm, cdef_filter 294 295 296Changes for 0.5.2 'Asiatic Cheetah': 297------------------------------------ 298 2990.5.2 is a small release improving speed for ARM32 and adding minor features: 300 - ARM32 optimizations for loopfilter, ipred_dc|h|v 301 - Add section-5 raw OBU demuxer 302 - Improve the speed by reducing the L2 cache collisions 303 - Fix minor issues 304 305 306Changes for 0.5.1 'Asiatic Cheetah': 307------------------------------------ 308 3090.5.1 is a small release improving speeds and fixing minor issues 310compared to 0.5.0: 311 - SSE2 optimizations for CDEF, wiener and warp_affine 312 - NEON optimizations for SGR on ARM32 313 - Fix mismatch issue in x86 asm in inverse identity transforms 314 - Fix build issue in ARM64 assembly if debug info was enabled 315 - Add a workaround for Xcode 11 -fstack-check bug 316 317 318Changes for 0.5.0 'Asiatic Cheetah': 319------------------------------------ 320 3210.5.0 is a medium release fixing regressions and minor issues, 322and improving speed significantly: 323 - Export ITU T.35 metadata 324 - Speed improvements on blend_ on ARM 325 - Speed improvements on decode_coef and MSAC 326 - NEON optimizations for blend*, w_mask_, ipred functions for ARM64 327 - NEON optimizations for CDEF and warp on ARM32 328 - SSE2 optimizations for MSAC hi_tok decoding 329 - SSSE3 optimizations for deblocking loopfilters and warp_affine 330 - AVX2 optimizations for film grain and ipred_z2 331 - SSE4 optimizations for warp_affine 332 - VSX optimizations for wiener 333 - Fix inverse transform overflows in x86 and NEON asm 334 - Fix integer overflows with large frames 335 - Improve film grain generation to match reference code 336 - Improve compatibility with older binutils for ARM 337 - More advanced Player example in tools 338 339 340Changes for 0.4.0 'Cheetah': 341---------------------------- 342 343 - Fix playback with unknown OBUs 344 - Add an option to limit the maximum frame size 345 - SSE2 and ARM64 optimizations for MSAC 346 - Improve speed on 32bits systems 347 - Optimization in obmc blend 348 - Reduce RAM usage significantly 349 - The initial PPC SIMD code, cdef_filter 350 - NEON optimizations for blend functions on ARM 351 - NEON optimizations for w_mask functions on ARM 352 - NEON optimizations for inverse transforms on ARM64 353 - VSX optimizations for CDEF filter 354 - Improve handling of malloc failures 355 - Simple Player example in tools 356 357 358Changes for 0.3.1 'Sailfish': 359------------------------------ 360 361 - Fix a buffer overflow in frame-threading mode on SSSE3 CPUs 362 - Reduce binary size, notably on Windows 363 - SSSE3 optimizations for ipred_filter 364 - ARM optimizations for MSAC 365 366 367Changes for 0.3.0 'Sailfish': 368------------------------------ 369 370This is the final release for the numerous speed improvements of 0.3.0-rc. 371It mostly: 372 - Fixes an annoying crash on SSSE3 that happened in the itx functions 373 374 375Changes for 0.2.2 (0.3.0-rc) 'Antelope': 376----------------------------- 377 378 - Large improvement on MSAC decoding with SSE, bringing 4-6% speed increase 379 The impact is important on SSSE3, SSE4 and AVX2 cpus 380 - SSSE3 optimizations for all blocks size in itx 381 - SSSE3 optimizations for ipred_paeth and ipred_cfl (420, 422 and 444) 382 - Speed improvements on CDEF for SSE4 CPUs 383 - NEON optimizations for SGR and loop filter 384 - Minor crashes, improvements and build changes 385 386 387Changes for 0.2.1 'Antelope': 388---------------------------- 389 390 - SSSE3 optimization for cdef_dir 391 - AVX2 improvements of the existing CDEF optimizations 392 - NEON improvements of the existing CDEF and wiener optimizations 393 - Clarification about the numbering/versionning scheme 394 395 396Changes for 0.2.0 'Antelope': 397---------------------------- 398 399 - ARM64 and ARM optimizations using NEON instructions 400 - SSSE3 optimizations for both 32 and 64bits 401 - More AVX2 assembly, reaching almost completion 402 - Fix installation of includes 403 - Rewrite inverse transforms to avoid overflows 404 - Snap packaging for Linux 405 - Updated API (ABI and API break) 406 - Fixes for un-decodable samples 407 408 409Changes for 0.1.0 'Gazelle': 410---------------------------- 411 412Initial release of dav1d, the fast and small AV1 decoder. 413 - Support for all features of the AV1 bitstream 414 - Support for all bitdepth, 8, 10 and 12bits 415 - Support for all chroma subsamplings 4:2:0, 4:2:2, 4:4:4 *and* grayscale 416 - Full acceleration for AVX2 64bits processors, making it the fastest decoder 417 - Partial acceleration for SSSE3 processors 418 - Partial acceleration for NEON processors 419