Lines Matching full:aco

67 - aco: Missing 8-bit uadd_carry
772 - aco: Pre-split result of bvh64_intersect_ray_amd.
1206 - aco: use std::vector::reserve() more often
1207 - aco/live_var_analysis: implement faster merging of live_out sets for some cases
1208 - aco/optimizer: disallow can_eliminate_and_exec() with s_not
1209 - aco/optimizer: do can_eliminate_and_exec() optimization later
1210 - aco/optimizer: optimize s_and(exec, s_and(x, y)) more aggressively
1211 - aco/optimizer: change inverse_comparison in-place
1214 - aco: simplify operands_offset calculation in create_instruction()
1215 - aco: implement custom memory resource
1216 - aco: use monotonic_buffer_resource for instructions
1217 - aco: implement allocator_traits for monotonic_allocator<T>
1218 - aco/opt_value_numbering: use monotonic_allocator for unordered_map
1219 - aco/spill: Fix spilling of Phi operands
1220 - aco/ra: only rename fixed Operands if the copy-location matches
1221 - aco: change thread_local memory resource to pointer
1660 - aco: drop unused variable
2011 - aco: Check that we don't override exec_val operands during branching sequence optimization.
2012 - aco/assembler: Fix v_cmpx with SDWA.
2013 - aco: Fix optimizing branching sequence with s_and_saveexec.
2014 - aco/assembler: Fix v_cmpx pre GFX10.
2015 - aco: Use v_cmpx pre GFX10.
2016 - aco: Force tex operand to have the correct sub dword size before packing.
2019 - aco: Implement storage image A16.
2020 - aco: Combine 16bit undef and constants instead of using s_pack.
2025 - aco: Fix image instructions with lod when 2d_view_of_3d is enabled on GFX9.
2027 - aco: Use plain VOPC for vcmpx when possible.
2030 - aco: Use v_fmaak/v_fmamk if two operands are the same literal.
2031 - aco: Unswizzle v_pk_fma_f16 literals to produce more v_pk_fmac_f16.
2034 - aco: Use s_pack_ll for s_bfe operand on GFX9+.
2037 - aco: Implement [ui]find_msb_rev.
2040 - radv,aco: Lower uclz in NIR.
2043 - aco: fmaak/fmamk can't use SDWA.
2044 - aco: Don't use opsel for p_insert.
2046 - aco: Implement signed idot instructions on GFX11.
2049 - aco: Use opsel for the third operand.
2050 - aco: Use s_pack_ll_b32_b16 for scalar zero extend.
4334 - ac/nir/ngg,ac/llvm,aco: save nogs ngg culling one lds dword
4398 - aco: fix LdsBranchVmemWARHazard with 2+ branch chains
4399 - aco: set has_VMEM,has_DS=false after a branch
4400 - aco: only add vscnt wait when visiting VMEM/DS
4401 - aco: improve VcmpxPermlaneHazard workaround
4402 - aco: fix hash statistic
4404 - aco: fix consecutive exec writes when finding exec_copy instruction
4405 - aco: rename is_cmp to is_fp_cmp
4406 - aco: fix assembly of vopc_sdwa writing exec
4407 - aco: fix re-write of uses of exec_val's lo/hi half
4408 - aco: test branch opcode if removing it in try_optimize_branching_sequence
4409 - aco: remove val_and_copy_adjacent
4410 - aco: improve vcc check for instructions between exec_val and exec_copy
4411 - aco: test for one and_savexec opcode in try_optimize_branching_sequence
4412 - aco: fix long-jump version of discard early exit
4415 - aco: fix 16-bit VS inputs
4417 - aco: don't expand vec3 VS input load to vec4 on GFX6
4418 - aco: allow direct_fetch=true for vec4 VS input loads
4420 - aco: add SCC clobber in build_cube_select
4422 - radv: enable ac_nir_lower_resinfo for ACO
4423 - aco: remove dead code for querying image size/samples/levels
4433 - radv,aco: use pipe_format for static vertex input state
4434 - radv,aco: use pipe_format for dynamic vertex input state
4437 - radv,aco: implement 64-bit vertex inputs
4439 - aco/ra: handle empty def_reg interval in get_regs_for_copies
4440 - aco/ra: remove bounds parameter from get_regs_for_copies()
4441 - aco/ra: rework fixed operands
4449 - aco: DCE ra_ctx::defs_done
4450 - aco: rename Interp_instruction to VINTRP_instruction
4451 - aco: add reg() helper to assembler
4452 - aco: fix assembly of MUBUF-to-LDS loads
4453 - aco: add GFX11 opcode numbers
4454 - aco/gfx11: don't use more than 1 NSA dword
4455 - aco: update assembler for GFX11
4456 - aco: limit GFX11 to 128 VGPRs for now
4457 - aco: add LDSDIR instruction format
4458 - aco: add VINTERP instruction format
4459 - aco: omit read-only memory_sync_info when printing
4460 - aco/tests: add GFX11 assembly tests
4461 - aco: mostly implement FS input loads on GFX11
4462 - aco: fix VMEMtoScalarWriteHazard s_waitcnt mitigation
4463 - aco: improve VMEMtoScalarWriteHazard s_waitcnt mitigation
4464 - aco: use some helpers in GFX10 hazard workarounds
4465 - aco: improve printing of sgpr_null
4466 - aco: improve printing of s_waitcnt_depctr
4467 - aco: add VMEMtoScalarWriteHazard tests
4468 - aco/gfx11: swap ds_cmpst_* data operands
4469 - aco: improve wait_imm unpack
4470 - aco/gfx11: fix s_waitcnt printing
4471 - aco: update sendmsg enum from LLVM
4472 - aco/gfx11: deallocate VGPRs at the end of the shader
4473 - aco/gfx11: update form_hard_clauses
4474 - aco: limit hard clauses to 63 instructions
4475 - aco: fix assembler.gfx11.vinterp test
4476 - aco: add search_backwards helper
4477 - aco/gfx11: workaround VcmpxPermlaneHazard
4478 - aco/gfx11: workaround LdsDirectVALUHazard
4479 - aco/gfx11: workaround LdsDirectVMEMHazard
4480 - aco/gfx11: workaround VALUTransUseHazard
4481 - aco/gfx11: workaround VALUPartialForwardingHazard
4482 - aco/gfx11: workaround VALUMaskWriteHazard
4483 - aco: add ACO_DEBUG=force-waitdeps
4487 - aco/gfx11: optimize LS/HS load_local_invocation_index
4488 - aco: swap v_perm_b32 operands
4493 - aco: add storage_gds
4494 - aco: insert waitcnt before/after ds_ordered_count
4495 - nir,ac/nir,aco,radv: replace has_input_*_amd with more general intrinsics
4496 - aco: don't split swizzled store_buffer_amd on GFX9+
4500 - radv,aco: don't use lower_to_fragment_fetch_amd on GFX11+
4501 - aco: fix typo in branch lowering
4502 - aco/gfx11: perform FS input loads in WQM
4503 - aco/gfx11: fix FS input loads in quad-divergent control flow
4508 - aco/gfx11: increase gfx1100/gfx1101 physical vgprs
4513 - aco: ensure MRT0 is written with dual source blending
4790 - aco: fix wrong size for 1D images and A16 on GFX9
4825 - aco: remove unused isel_context::tcs_num_patches
4868 - aco: prevent a division by zero when patch control points is dynamic
4892 - radv,aco: lower barycentric_at_sample in NIR
4895 - radv,aco: do not compact MRTs if the pipeline uses a PS epilog
4954 - aco: fix tcs_wave_id unpacking on GFX11
4972 - aco,radv/llvm: do not export parameters on GFX11
4978 - aco: add support for s_sendmsg_rtn_b{32,64}
4979 - aco: split the sendmsg enumeration into sendmsg_rtn
4980 - aco: add support for device clock on GFX11
4989 - aco: create a new builder variant for ds_add_rtn
4990 - aco: implement NIR intrinsics for NGG streamout
4991 - aco: remove invalid assertions for NGG streamout
5018 - aco: fix p_interp_gfx11 to not overwrite SCC
5019 - aco: fix missing SCC for p_interp_gfx11 in emit_interp_mov_instr()
5020 - aco: add p_dual_src_export_gfx11 for dual source blending on GFX11
5021 - aco: fix dual source blending on GFX11
5022 - aco: fix FS inputs loads in WQM with 16-bit
5031 - aco: fix emitting DEALLOC_VGPRS in the discard block
5311 - aco: Optimize branching sequence during SSA elimination.
5312 - aco: Remove branch instruction when exec is constant non-zero.
5315 - aco: Add faster code path to store_lds for consecutive write mask.
5316 - aco: Fix invalidated reference in branching sequence optimization.
5317 - aco: Check for instructions that inhibit the branching sequence optimization.
5318 - aco/optimizer_postRA: Don't try to optimize dead instructions.
5319 - aco: Support s_cselect_b64 in SCC no-compare optimization.
5320 - aco: Improve SCC nocompare optimization when SCC is clobbered.
5321 - aco: Fix p_init_scratch for task shaders.
5343 - aco/tests: Add post-RA optimizer testcase for partially overwritten VCC.
5344 - aco/tests: Add post-RA DPP test cases with control flow.
5345 - aco/tests: Add post-RA SCC no-compare tests cases with control flow.
5346 - aco/optimizer_postRA: Mark a register overwritten when predecessors disagree.
5347 - aco/optimizer_postRA: Don't assume all operand registers were written by same instr.
5348 - aco/optimizer_postRA: Fix logical control flow handling.
5349 - aco/optimizer_postRA: Clarify terminology.
5350 - aco: Change inverse-comparison optimization to work with s_not
5354 - aco: Fix build error with std::max on GCC 12
5357 - aco: Allow explicitly removing jumps on GFX10+ when beneficial.
5360 - nir, ac, aco: Add ACCESS intrinsic index to load/store_buffer_amd.
5361 - aco: Cleanup load_vmem_mubuf and store_vmem_mubuf functions.
5362 - nir, ac, aco: Add index src to load_buffer_amd/store_buffer_amd.
5363 - aco: Optimize MUBUF 0 offset when idxen is also being used.
5364 - aco/optimizer_postRA: Use unique_ptr + array for instruction indices.
5365 - aco/optimizer_postRA: Speed up reset_block() with predecessors.
5366 - aco/optimizer_postRA: Properly handle vccz/execz/scc in reset_block.
5367 - aco/optimizer_postRA: Delete dead instructions more efficiently.
5368 - aco: Move is_dead to aco_ir.h to allow it to get inlined.
5369 - aco: Add ACO_DEBUG=novalidateir option.
5520 - aco: Use unreachable instead assert(false)
5581 - aco: Convert to use u8 literal for Unicode character to fixes msvc warning
5592 - aco: Fixes compiling error about char8_t with c++20
5662 - aco: Do not define NOMINMAX as it's already defined in pre_args now