Lines Matching full:aco
90 - No Mans Sky GPU hang on Radeon ACO
110 - ACO error with GCN 1 GPU
117 - subgroupBallotFindMSB() broken in RADV/ACO 20.3.4
154 - RADV/ACO - DCC causing garbled output on RX570
173 - Rage 2: Visual corruption on in-game menu with ACO.
174 - ACO doesn't correctly render map in Borderlands 3 vs. LLVM on 5700 XT
186 - [RADV][ACO] Overwatch game crash: amd/compiler/aco_insert_exec_mask.cpp: Failed Assertion
203 - [RADV/ACO] Death Stranding cause a GPU hung (\*ERROR* Waiting for fences timed out!)
1343 - aco: fix VOP3P assembly, VN and validation
1344 - aco/RA: fix subdword operands on VOP3P instructions
1345 - aco: allow constants/literals on every src position for VOP3P
1346 - aco: allow SGPRs on every src position for VOP3P
1347 - aco: change usesModifiers() considering opsel_hi on packed instructions
1348 - aco: create helpers to emit vop3p instructions
1349 - aco: emit packed 16bit instructions
1351 - aco: simplify multiply-add combining
1352 - aco: optimize packed mul+add to v_pk_fma_f16
1353 - aco: optimize packed clamp
1354 - aco: optimize packed fneg
1355 - aco: optimize v_pk_fma_f16 -> v_pk_fmac_f16 on GFX10
1356 - aco: propagate swizzles when optimizing packed clamp & fma
1357 - aco: remove divergent branches which only jump over very few instructions
1358 - aco/optimizer: don't copy-prop logical phis
1359 - aco/optimizer: don't propagate subdword temps of different size
1360 - aco: generalize subdword constant copy lowering
1361 - aco/validate: validate that p_create_vector operands are aligned unless they are subdword operands
1362 - aco/validate: ensure that Operand and Definition size matches for parallelcopies
1363 - aco/validate: relax subdword restrictions
1364 - aco: propagate temporaries into PSEUDO instructions if it can take it
1365 - aco/optimizer: expand subdword vectors with SGPRs on all generations
1366 - aco/optimizer: convert extract_vector with index 0 into parallelcopies if possible
1368 - aco: fix VCC hint on boolean subgroup operations
1369 - aco: fix nir_intrinsic_ballot with wave32
1370 - aco: fix shared VGPR allocation on RDNA2
1371 - aco: change gpr_alloc_granule to full alignment
1372 - aco: refactor GPR limit calculation
1373 - aco: don't decrease the vgpr_limit when encountering bpermute
1374 - aco: also consider VCC in get_reg_specified()
1375 - aco: check get_reg_specified() on register hints
1376 - aco: don't abort() if disassembly fails
1377 - aco: use VCC as regular SGPR pair on GFX10
1378 - aco: don't create unnecessary exec phi on merge blocks
1379 - aco: handle non-temp phi definitions and operands
1380 - aco: make all exec accesses non-temporaries
1381 - aco: remove dead code for the handling of exec temporaries
1382 - aco: fix assertion in insert_exec_mask pass
1385 - aco: remove special handling of load_helper_invocation
1386 - aco: don't rematerialize exec
1387 - aco: value number VOPC instructions with different exec masks
1388 - aco/value_numbering: use can_eliminate() function to avoid unnecessary hashmap lookups
1389 - aco/optimizer: set VCC hint on new v_cmp_* definitions
1390 - aco/ra: allow VCC on SMEM sbase operand on GFX10+
1392 - aco/ra: split affinity creation into separate function
1393 - aco/ra: split register_file initialization into separate function
1394 - aco/ra: refactor SSA repairing during register allocation
1395 - aco/ra: iterate backwards when coalescing phis
1396 - aco/ra: allow m0 in get_reg_specified()
1397 - aco/ra: remove exec handling for phis
1398 - aco/spill: refactor spill decision taking
1399 - aco/spill: reload spilled exec masks directly to exec
1400 - aco/spill: spill phi constants and exec directly to VGPR
1401 - aco/spill: don't count phis as variable access
1402 - aco/spill: refactor some more spill decision taking
1403 - aco/spill: refactor live-in registerDemand calculation
1404 - aco/spill: use correct next_use_distances at loop header
1405 - aco: lower p_spill with constants correctly
1406 - aco: fix kill flags on phi operands
1407 - aco: add new reindex_ssa() pass
1408 - aco/cssa: rewrite lower_to_cssa pass
1409 - aco/cssa: don't create parallelcopies for constants and exec
3757 - aco/tests: Use _exit in child process
4701 - aco: fix convert_to_SDWA() check in add_subdword_definition()
4702 - aco: add test for incorrect convert_to_SDWA() check
4704 - aco: fix num_waves on GFX10+
4705 - aco: have emit_wqm() take Builder instead of isel_context
4706 - aco: add emit_mimg() helper
4707 - aco: move VADDR to the end of the operand list
4708 - aco: use non-sequential addressing
4709 - aco: only require texture coordinates to be in WQM if NSA is used
4710 - aco: add affinity for non-sequential MIMG operands
4711 - radv,aco: don't use MUBUF for multi-channel loads on GFX8 with robustness2
4713 - radv,aco: use deref_buffer_array_length
4717 - aco: don't consider a phi trivial if same's register doesn't match the def
4718 - aco: remove Format::{VOP3A,VOP3B}
4719 - aco: add instruction cast and format-check methods
4720 - aco: use instruction cast methods
4721 - aco: use format-check methods
4722 - aco: return references in instruction cast methods
4723 - aco: fix WQM for texture instructions with args before the coordinates
4726 - aco: disable a*1.0 optimization if the instruction is precise
4728 - aco: optimize a*0.0
4729 - aco: optimize out a*1.0 if it's used as a float
4732 - aco: don't affect isPrecise() after applying output modifiers
4735 - aco: implement non-uniform get_ssbo_size
4739 - aco: add fallback algorithm in get_reg()
4740 - aco: always set exec_live=false
4741 - aco: optimize AC_FETCH_FORMAT_SNORM alpha adjust
4742 - aco: do not flag all blocks WQM to ensure we enter all nested loops in WQM
4743 - aco: rewrite setting of Exact_Branch
4744 - aco: remove loop to flag loop blocks as WQM
4745 - aco: fix adjust_vertex_fetch_alpha
4752 - aco: fix waves calculation for wave32
4753 - aco: add Program::wgp_mode
4754 - radv,aco: add radv_nir_compiler_options::wgp_mode
4755 - aco: consider that GFX10.3 allocates LDS in 1024 byte blocks
4756 - aco: add DeviceInfo
4757 - aco: fix transition_to_{WQM,Exact} if exec.back() is not in exec
4759 - radv,aco: allow unaligned LDS access on GFX9+
4760 - aco/lower_phis: fix all_preds_uniform with continue_or_break
4764 - aco: calculate all p_as_uniform and v_readfirstlane_b32 sources in WQM
4765 - aco: use p_as_uniform for get_sampler_desc and convert_pointer_to_64_bit
4771 - Revert "radv,aco: allow unaligned LDS access on GFX9+"
4772 - aco: add missing usable_read2 check
4776 - radv,aco: remove aco_compiler_statistics
4778 - aco: set compr for fp16 exports
4780 - aco: simplify loop_nest_depth tracking in isel
4781 - aco: track divergent and uniform branch depth
4782 - aco: move wait_imm to aco_ir.h
4783 - aco: lower p_constaddr into separate instructions earlier
4784 - aco: add instruction classes
4785 - aco: add latency and inverse throughput statistics
4786 - aco: add print option to print program without temporary IDs
4787 - aco: add ACO_DEBUG=perfinfo
4788 - aco: remove vmem/smem score statistics
4789 - aco: fix NSA MIMG followed by MUBUF/MTBUF
4790 - aco/tests: add test for NSAToVMEMBug
4791 - aco: fix NSA following writelane
4792 - aco/tests: add test for waNsaCannotFollowWritelane
4794 - aco: implement 64-bit VGPR {u,i}find_msb
4795 - aco: use uadd32_sat() helper for nir_op_uadd_sat
4796 - aco: use a single instruction for uadd32_sat() on GFX8
4797 - aco: implement image_deref_samples
4798 - aco: add aco_print_program() flag to print kill flags
4799 - aco: add aco_print_program() flags to print live_out and register demand
4801 - aco: add ACO_DEBUG=liveinfo
4805 - aco: don't optimize min(a*1.0, ...) to min(a, ...) on GFX8
4806 - aco: use -1.0*x and 1.0*|x| for fneg/fabs
4807 - aco/tests: add tests for denormal-aware propagation
4809 - aco/tests: fix isel.sparse.clause for LLVM 12+
4814 - aco: fix integer tg4 workaround with unnormalized coordinates
4816 - aco: ensure loops nested in a WQM loop are in WQM
4821 - aco: fix 16-bit u2f32
4822 - aco: fix 16-bit f2{u8,i8} on GFX6/7
4828 - aco/ra: use original names when renaming loop carried phi operands
4829 - aco/ra: remove live-in temporary from live_out_per_block when moving it
4832 - aco: set TRUNC_COORD=0 for nir_texop_tg4
4835 - aco: don't update register demand during RA validation
4836 - aco: allow SDWA sels smaller than the operand size
5059 - radv,aco: fix shifting input VGPRs for the LS VGPR init bug on GFX9
5201 - aco: fix get_sampler_desc() for image loads
5202 - aco: implement a workaround for the image load DCC hw bug on GFX10.3
5242 - aco: fix opquantize2f16 on GFX6-7
5353 - aco: Fix LDS statistics of tess control shaders.
5355 - aco: Disallow LSHS temp-only I/O when VS output is written indirectly.
5357 - aco: Use ASSERTED to avoid unused variable warning.
5373 - aco: Implement new buffer load/store intrinsics.
5374 - aco: Implement the new tessellation I/O related NIR intrinsics.
5375 - aco: Implement new Geometry Shader intrinsics.
5394 - aco: Delete superfluous tess and ESGS I/O code.
5395 - aco: Fix constant address offset calculation for ds_read2 instructions.
5397 - aco: Optimize workgroup exclusive scan to better avoid bank conflicts.
5398 - aco: Align NGG scratch size to 16 so a single ds_read can always read it.
5399 - aco: Remove useless s_setprio near gs_alloc_req.
5400 - aco: Use s_setprio 3 at the beginning of every VS and TES.
5401 - aco: Extract ngg_nogs_export_prim_id to a separate function.
5402 - aco: Set block_kind_export_end in create_vs/fs_exports.
5403 - aco: Emit fewer branches for NGG VS/TES with late primitive export.
5404 - aco: Add a simple heuristic to decide early or late primitive export.
5405 - aco: Mark VCC clobbered for iadd8 and iadd16 reductions on GFX6-7.
5429 - aco/ra: Update register use bounds before recursing in get_regs_for_copies
5430 - aco/ra: Introduce PhysRegInterval helper class
5431 - aco/ra: Conservatively refactor existing code to use PhysRegInterval
5432 - aco/ra: Remove always-false conditions
5433 - aco/ra: Add iterator interface for PhysRegInterval
5434 - aco/ra: Use std::find_if(_not) to clean up get_reg_simple
5435 - aco/ra: Use std::all_of to simplify a loop
5436 - aco/ra: Conservatively refactor get_reg_specified to use PhysRegInterval
5437 - aco/ra: Move commonly repeated code to a helper function
5438 - aco/ra: Add helpers to test for intersection/containment of reg intervals
5439 - aco/ra: Use std::all_of to simplify a loop
5440 - aco/ra: Remove unused function parameter
5441 - aco/ra: Use PhysReg for member functions of PhysRegInterval
5442 - aco/ra: Use PhysReg when indexing into RegisterFile's containers
5443 - aco/ra: Use PhysRegInterval for collect_vars parameters
5444 - aco/ra: Use PhysRegInterval for count_zero
5445 - aco/ra: Fix print_regs using the wrong constant to check for blocked slots
5446 - aco/ra: Fix build with print_regs enabled
5447 - aco/ra: Remove preprocessor guards for print_regs
5448 - aco/ra: Add helper to get a PhysRegInterval for the register demand
5449 - aco: Fix vector::reserve() being called with the wrong size
5453 - aco/ra: Avoid unnecessary copying of std::vectors
5454 - aco/isel: Don't emit unsupported i16<->f16 conversion opcodes on GFX6/7
5455 - aco/isel: Fix i64/u64->float32 conversion for large inputs
5456 - aco/isel: Don't request sign extension when truncating signed integers
5457 - aco/isel: Add documentation and asserts for convert_int
5458 - aco/isel: Fix large inputs being truncated in int32->f16 conversions
5459 - aco/isel: Add documentation for (u)int64->f16 conversion
5462 - aco/spill: Fix improper handling of exec phis
5497 - aco: Initialize ds_state.front.writeMask.