Home
last modified time | relevance | path

Searched defs:numel_per_thread (Results 1 – 2 of 2) sorted by relevance

/aosp_15_r20/external/pytorch/torch/csrc/distributed/c10d/
H A DCUDASymmetricMemoryOps.cu61 const size_t numel_per_thread = alignment / element_size; in init_elementwise_launch_config() local
85 constexpr size_t numel_per_thread = alignment / sizeof(T); in multimem_all_reduce_kernel() local
178 constexpr size_t numel_per_thread = alignment / sizeof(T); in multimem_one_shot_all_reduce_kernel() local
/aosp_15_r20/external/pytorch/aten/src/ATen/cuda/
H A DApplyGridUtils.cuh25 …uint64_t numel_per_thread = static_cast<uint64_t>(max_threads_per_block) * static_cast<uint64_t>(s… in getApplyGrid() local