xref: /aosp_15_r20/external/libdav1d/NEWS (revision c09093415860a1c2373dacd84c4fde00c507cdfd)
1Changes for 1.5.0 'Sonic':
2--------------------------
3
41.5.0 is a major release of dav1d, that:
5 - WARNING: we removed some of the SSE2 optimizations, so if you care about
6            systems without SSSE3, you should be careful when updating!
7 - Add Arm OpenBSD run-time CPU feature
8 - Optimize index offset calculations for decode_coefs
9 - picture: copy HDR10+ and T35 metadata only to visible frames
10 - SSSE3 new optimizations for 6-tap (8bit and hbd)
11 - AArch64/SVE: Add HBD subpel filters using 128-bit SVE2
12 - AArch64: Add USMMLA implempentation for 6-tap H/HV
13 - AArch64: Optimize Armv8.0 NEON for HBD horizontal filters and 6-tap filters
14 - Power9: Optimized ITX till 16x4.
15 - Loongarch: numerous optimizations
16 - RISC-V optimizations for pal, cdef_filter, ipred, mc_blend, mc_bdir, itx
17 - Allow playing videos in full-screen mode in dav1dplay
18
19
20Changes for 1.4.3 'Road Runner':
21--------------------------------
22
231.4.3 is a small release focused on security issues
24 - AArch64: Fix potential out of bounds access in DotProd H/HV filters
25 - cli: Prevent buffer over-read
26
27
28Changes for 1.4.2 'Road Runner':
29--------------------------------
30
311.4.2 is a small release of dav1d, improving notably ARM, AVX-512 and PowerPC
32 - AVX2 optimizations for 8-tap and new variants for 6-tap
33 - AVX-512 optimizations for 8-tap and new variants for 6-tap
34 - Improve entropy decoding on ARM64
35 - New ARM64 optimizations for convolutions based on DotProd extension
36 - New ARM64 optimizations for convolutions based on i8mm extension
37 - New ARM64 optimizations for subpel and prep filters for i8mm
38 - Misc improvements on existing ARM64 optimizations, notably for put/prep
39 - New PowerPC9 optimizations for loopfilter
40 - Support for macOS kperf API for benchmarking
41
42
43Changes for 1.4.1 'Road Runner':
44--------------------------------
45
461.4.1 is a small release of dav1d, improving notably ARM and RISC-V speed
47
48- Optimizations for 6tap filters for NEON (ARM)
49- More RISC-V optimizations for itx (4x8, 8x4, 4x16, 16x4, 8x16, 16x8)
50- Reduction of binary size on ARM64, ARM32 and RISC-V
51- Fix out-of-bounds read in 8bpc SSE2/SSSE3 wiener_filter
52- Msac optimizations
53
54
55Changes for 1.4.0 'Road Runner':
56--------------------------------
57
581.4.0 is a medium release of dav1d, focusing on new architecture support and optimizations
59
60- AVX-512 optimizations for z1, z2, z3 in 8bit and high-bitdepth
61- New architecture supported: loongarch
62- Loongarch optimizations for 8bit
63- New architecture supported: RISC-V
64- RISC-V optimizations for itx
65- Misc improvements in threading and in reducing binary size
66- Fix potential integer overflow with extremely large frame sizes (CVE-2024-1580)
67
68
69Changes for 1.3.0 'Tundra Peregrine Falcon (Calidus)':
70------------------------------------------------------
71
721.3.0 is a medium release of dav1d, focus on new APIs and memory usage reduction.
73
74- Reduce memory usage in numerous places
75- ABI break in Dav1dSequenceHeader, Dav1dFrameHeader, Dav1dContentLightLevel structures
76- new API function to check the API version: dav1d_version_api()
77- Rewrite of the SGR functions for ARM64 to be faster
78- NEON implemetation of save_tmvs for ARM32 and ARM64
79- x86 palette DSP for pal_idx_finish function
80
81
82Changes for 1.2.1 'Arctic Peregrine Falcon':
83--------------------------------------------
84
851.2.1 is a small release of dav1d, adding more SIMD and fixes
86
87- Fix a threading race on task_thread.init_done
88- NEON z2 8bpc and high bit-depth optimizations
89- SSSE3 z2 high bit-depth optimziations
90- Fix a desynced luma/chroma planes issue with Film Grain
91- Reduce memory consumption
92- Improve dav1d_parse_sequence_header() speed
93- OBU: Improve header parsing and fix potential overflows
94- OBU: Improve ITU-T T.35 parsing speed
95- Misc buildsystems, CI and headers fixes
96
97
98Changes for 1.2.0 'Arctic Peregrine Falcon':
99--------------------------------------------
100
1011.2.0 is a small release of dav1d, adding more SIMD and fixes
102
103- Improvements on attachments of props and T.35 entries on output pictures
104- NEON z1/z3 high bit-depth optimizations and improvements for 8bpc
105- SSSE3 z2/z3 8bpc and SSSE3 z1/z3 high bit-depth optimziations
106- refmvs.save_tmvs optimizations in SSSE3/AVX2/AVX-512
107- AVX-512 optimizations for high bit-depth itx (16x64, 32x64, 64x16, 64x32, 64x64)
108- AVX2 optimizations for 12bpc for 16x32, 32x16, 32x32 itx
109
110
111Changes for 1.1.0 'Arctic Peregrine Falcon':
112--------------------------------------------
113
1141.1.0 is an important release of dav1d, fixing numerous bugs, and adding SIMD
115
116- New function dav1d_get_frame_delay to query the decoder frame delay
117- Numerous fixes for strict conformity to the specs and samples
118- NEON and AVX-512 misc fixes and improvements
119- Partial AVX2 12bpc transform implementations
120- AVX-512 high bit-depth cdef_filter, loopfilter, itx
121- NEON z1/z3 optimization for 8bpc
122- SSSE3 z1 optimization for 8bpc
123
124 "From VideoLAN with love"
125
126
127Changes for 1.0.0 'Peregrine Falcon':
128-------------------------------------
129
1301.0.0 is a major release of dav1d, adding important features and bug fixes.
131
132It notably changes, in an important way, the way threading works, by adding
133an automatic thread management.
134
135It also adds support for AVX-512 acceleration, and adds speedups to existing x86
136code (from SSE2 to AVX2).
137
1381.0.0 adds new grain API to ease acceleration on the GPU, and adds an API call
139to get information of which frame failed to decode, in error cases.
140
141Finally, 1.0.0 fixes numerous small bugs that were reported since the beginning
142of the project to have a proper release.
143
144                                     .''.
145         .''.      .        *''*    :_\/_:     .
146        :_\/_:   _\(/_  .:.*_\/_*   : /\ :  .'.:.'.
147    .''.: /\ :   ./)\   ':'* /\ * :  '..'.  -=:o:=-
148   :_\/_:'.:::.    ' *''*    * '.\'/.' _\(/_'.':'.'
149   : /\ : :::::     *_\/_*     -= o =-  /)\    '  *
150    '..'  ':::'     * /\ *     .'/.\'.   '
151        *            *..*         :
152          *                       :
153          *         1.0.0
154
155
156
157Changes for 0.9.2 'Golden Eagle':
158---------------------------------
159
1600.9.2 is a small update of dav1d on the 0.9.x branch:
161 - x86: SSE4 optimizations of inverse transforms for 10bit for all sizes
162 - x86: mc.resize optimizations with AVX2/SSSE3 for 10/12b
163 - x86: SSSE3 optimizations for cdef_filter in 10/12b and mc_w_mask_422/444 in 8b
164 - ARM NEON optimizations for FilmGrain Gen_grain functions
165 - Optimizations for splat_mv in SSE2/AVX2 and NEON
166 - x86: SGR improvements for SSSE3 CPUs
167 - x86: AVX2 optimizations for cfl_ac
168
169
170Changes for 0.9.1 'Golden Eagle':
171---------------------------------
172
1730.9.1 is a middle-size revision of dav1d, adding notably 10b acceleration for SSSE3:
174 - 10/12b SSSE3 optimizations for mc (avg, w_avg, mask, w_mask, emu_edge),
175   prep/put_bilin, prep/put_8tap, ipred (dc/h/v, paeth, smooth, pal, filter), wiener,
176   sgr (10b), warp8x8, deblock, film_grain, cfl_ac/pred for 32bit and 64bit x86 processors
177 - Film grain NEON for fguv 10/12b, fgy/fguv 8b and fgy/fguv 10/12 arm32
178 - Fixes for filmgrain on ARM
179 - itx 10bit optimizations for 4x4/x8/x16, 8x4/x8/x16 for SSE4
180 - Misc improvements on SSE2, SSE4
181
182
183Changes for 0.9.0 'Golden Eagle':
184---------------------------------
185
1860.9.0 is a major version of dav1d, adding notably 10b acceleration on x64.
187
188Details:
189 - x86 (64bit) AVX2 implementation of most 10b/12b functions, which should provide
190   a large boost for high-bitdepth decoding on modern x86 computers and servers.
191 - ARM64 neon implementation of FilmGrain (4:2:0/4:2:2/4:4:4 8bit)
192 - New API to signal events happening during the decoding process
193
194
195Changes for 0.8.2 'Eurasian Hobby':
196-----------------------------------
197
1980.8.2 is a middle-size update of the 0.8.0 branch:
199 - ARM32 optimizations for ipred and itx in 10/12bits,
200   completing the 10b/12b work on ARM64 and ARM32
201 - Give the post-filters their own threads
202 - ARM64: rewrite the wiener functions
203 - Speed up coefficient decoding, 0.5%-3% global decoding gain
204 - x86 optimizations for CDEF_filter and wiener in 10/12bit
205 - x86: rewrite the SGR AVX2 asm
206 - x86: improve msac speed on SSE2+ machines
207 - ARM32: improve speed of ipred and warp
208 - ARM64: improve speed of ipred, cdef_dir, cdef_filter, warp_motion and itx16
209 - ARM32/64: improve speed of looprestoration
210 - Add seeking, pausing to the player
211 - Update the player for rendering of 10b/12b
212 - Misc speed improvements and fixes on all platforms
213 - Add a xxh3 muxer in the dav1d application
214
215
216Changes for 0.8.1 'Eurasian Hobby':
217-----------------------------------
218
2190.8.1 is a minor update on 0.8.0:
220 - Keep references to buffers valid after dav1d_close(). Fixes a regression
221   caused by the picture buffer pool added in 0.8.0.
222 - ARM32 optimizations for 10bit bitdepth for SGR
223 - ARM32 optimizations for 16bit bitdepth for blend/w_masl/emu_edge
224 - ARM64 optimizations for 10bit bitdepth for SGR
225 - x86 optimizations for wiener in SSE2/SSSE3/AVX2
226
227
228Changes for 0.8.0 'Eurasian Hobby':
229-----------------------------------
230
2310.8.0 is a major update for dav1d:
232 - Improve the performance by using a picture buffer pool;
233   The improvements can reach 10% on some cases on Windows.
234 - Support for Apple ARM Silicon
235 - ARM32 optimizations for 8bit bitdepth for ipred paeth, smooth, cfl
236 - ARM32 optimizations for 10/12/16bit bitdepth for mc_avg/mask/w_avg,
237   put/prep 8tap/bilin, wiener and CDEF filters
238 - ARM64 optimizations for cfl_ac 444 for all bitdepths
239 - x86 optimizations for MC 8-tap, mc_scaled in AVX2
240 - x86 optimizations for CDEF in SSE and {put/prep}_{8tap/bilin} in SSSE3
241
242
243Changes for 0.7.1 'Frigatebird':
244------------------------------
245
2460.7.1 is a minor update on 0.7.0:
247 - ARM32 NEON optimizations for itxfm, which can give up to 28% speedup, and MSAC
248 - SSE2 optimizations for prep_bilin and prep_8tap
249 - AVX2 optimizations for MC scaled
250 - Fix a clamping issue in motion vector projection
251 - Fix an issue on some specific Haswell CPU on ipred_z AVX2 functions
252 - Improvements on the dav1dplay utility player to support resizing
253
254
255Changes for 0.7.0 'Frigatebird':
256------------------------------
257
2580.7.0 is a major release for dav1d:
259 - Faster refmv implementation gaining up to 12% speed while -25% of RAM (Single Thread)
260 - 10b/12b ARM64 optimizations are mostly complete:
261   - ipred (paeth, smooth, dc, pal, filter, cfl)
262   - itxfm (only 10b)
263 - AVX2/SSSE3 for non-4:2:0 film grain and for mc.resize
264 - AVX2 for cfl4:4:4
265 - AVX-512 CDEF filter
266 - ARM64 8b improvements for cfl_ac and itxfm
267 - ARM64 implementation for emu_edge in 8b/10b/12b
268 - ARM32 implementation for emu_edge in 8b
269 - Improvements on the dav1dplay utility player to support 10 bit,
270   non-4:2:0 pixel formats and film grain on the GPU
271
272
273Changes for 0.6.0 'Gyrfalcon':
274------------------------------
275
2760.6.0 is a major release for dav1d:
277 - New ARM64 optimizations for the 10/12bit depth:
278    - mc_avg, mc_w_avg, mc_mask
279    - mc_put/mc_prep 8tap/bilin
280    - mc_warp_8x8
281    - mc_w_mask
282    - mc_blend
283    - wiener
284    - SGR
285    - loopfilter
286    - cdef
287 - New AVX-512 optimizations for prep_bilin, prep_8tap, cdef_filter, mc_avg/w_avg/mask
288 - New SSSE3 optimizations for film grain
289 - New AVX2 optimizations for msac_adapt16
290 - Fix rare mismatches against the reference decoder, notably because of clipping
291 - Improvements on ARM64 on msac, cdef, mc_blend_v and looprestoration optimizations
292 - Improvements on AVX2 optimizations for cdef_filter
293 - Improvements in the C version for itxfm, cdef_filter
294
295
296Changes for 0.5.2 'Asiatic Cheetah':
297------------------------------------
298
2990.5.2 is a small release improving speed for ARM32 and adding minor features:
300 - ARM32 optimizations for loopfilter, ipred_dc|h|v
301 - Add section-5 raw OBU demuxer
302 - Improve the speed by reducing the L2 cache collisions
303 - Fix minor issues
304
305
306Changes for 0.5.1 'Asiatic Cheetah':
307------------------------------------
308
3090.5.1 is a small release improving speeds and fixing minor issues
310compared to 0.5.0:
311 - SSE2 optimizations for CDEF, wiener and warp_affine
312 - NEON optimizations for SGR on ARM32
313 - Fix mismatch issue in x86 asm in inverse identity transforms
314 - Fix build issue in ARM64 assembly if debug info was enabled
315 - Add a workaround for Xcode 11 -fstack-check bug
316
317
318Changes for 0.5.0 'Asiatic Cheetah':
319------------------------------------
320
3210.5.0 is a medium release fixing regressions and minor issues,
322and improving speed significantly:
323 - Export ITU T.35 metadata
324 - Speed improvements on blend_ on ARM
325 - Speed improvements on decode_coef and MSAC
326 - NEON optimizations for blend*, w_mask_, ipred functions for ARM64
327 - NEON optimizations for CDEF and warp on ARM32
328 - SSE2 optimizations for MSAC hi_tok decoding
329 - SSSE3 optimizations for deblocking loopfilters and warp_affine
330 - AVX2 optimizations for film grain and ipred_z2
331 - SSE4 optimizations for warp_affine
332 - VSX optimizations for wiener
333 - Fix inverse transform overflows in x86 and NEON asm
334 - Fix integer overflows with large frames
335 - Improve film grain generation to match reference code
336 - Improve compatibility with older binutils for ARM
337 - More advanced Player example in tools
338
339
340Changes for 0.4.0 'Cheetah':
341----------------------------
342
343 - Fix playback with unknown OBUs
344 - Add an option to limit the maximum frame size
345 - SSE2 and ARM64 optimizations for MSAC
346 - Improve speed on 32bits systems
347 - Optimization in obmc blend
348 - Reduce RAM usage significantly
349 - The initial PPC SIMD code, cdef_filter
350 - NEON optimizations for blend functions on ARM
351 - NEON optimizations for w_mask functions on ARM
352 - NEON optimizations for inverse transforms on ARM64
353 - VSX optimizations for CDEF filter
354 - Improve handling of malloc failures
355 - Simple Player example in tools
356
357
358Changes for 0.3.1 'Sailfish':
359------------------------------
360
361 - Fix a buffer overflow in frame-threading mode on SSSE3 CPUs
362 - Reduce binary size, notably on Windows
363 - SSSE3 optimizations for ipred_filter
364 - ARM optimizations for MSAC
365
366
367Changes for 0.3.0 'Sailfish':
368------------------------------
369
370This is the final release for the numerous speed improvements of 0.3.0-rc.
371It mostly:
372 - Fixes an annoying crash on SSSE3 that happened in the itx functions
373
374
375Changes for 0.2.2 (0.3.0-rc) 'Antelope':
376-----------------------------
377
378 - Large improvement on MSAC decoding with SSE, bringing 4-6% speed increase
379   The impact is important on SSSE3, SSE4 and AVX2 cpus
380 - SSSE3 optimizations for all blocks size in itx
381 - SSSE3 optimizations for ipred_paeth and ipred_cfl (420, 422 and 444)
382 - Speed improvements on CDEF for SSE4 CPUs
383 - NEON optimizations for SGR and loop filter
384 - Minor crashes, improvements and build changes
385
386
387Changes for 0.2.1 'Antelope':
388----------------------------
389
390 - SSSE3 optimization for cdef_dir
391 - AVX2 improvements of the existing CDEF optimizations
392 - NEON improvements of the existing CDEF and wiener optimizations
393 - Clarification about the numbering/versionning scheme
394
395
396Changes for 0.2.0 'Antelope':
397----------------------------
398
399 - ARM64 and ARM optimizations using NEON instructions
400 - SSSE3 optimizations for both 32 and 64bits
401 - More AVX2 assembly, reaching almost completion
402 - Fix installation of includes
403 - Rewrite inverse transforms to avoid overflows
404 - Snap packaging for Linux
405 - Updated API (ABI and API break)
406 - Fixes for un-decodable samples
407
408
409Changes for 0.1.0 'Gazelle':
410----------------------------
411
412Initial release of dav1d, the fast and small AV1 decoder.
413 - Support for all features of the AV1 bitstream
414 - Support for all bitdepth, 8, 10 and 12bits
415 - Support for all chroma subsamplings 4:2:0, 4:2:2, 4:4:4 *and* grayscale
416 - Full acceleration for AVX2 64bits processors, making it the fastest decoder
417 - Partial acceleration for SSSE3 processors
418 - Partial acceleration for NEON processors
419