1*9880d681SAndroid Build Coastguard Worker============================================== 2*9880d681SAndroid Build Coastguard WorkerLLVM Atomic Instructions and Concurrency Guide 3*9880d681SAndroid Build Coastguard Worker============================================== 4*9880d681SAndroid Build Coastguard Worker 5*9880d681SAndroid Build Coastguard Worker.. contents:: 6*9880d681SAndroid Build Coastguard Worker :local: 7*9880d681SAndroid Build Coastguard Worker 8*9880d681SAndroid Build Coastguard WorkerIntroduction 9*9880d681SAndroid Build Coastguard Worker============ 10*9880d681SAndroid Build Coastguard Worker 11*9880d681SAndroid Build Coastguard WorkerLLVM supports instructions which are well-defined in the presence of threads and 12*9880d681SAndroid Build Coastguard Workerasynchronous signals. 13*9880d681SAndroid Build Coastguard Worker 14*9880d681SAndroid Build Coastguard WorkerThe atomic instructions are designed specifically to provide readable IR and 15*9880d681SAndroid Build Coastguard Workeroptimized code generation for the following: 16*9880d681SAndroid Build Coastguard Worker 17*9880d681SAndroid Build Coastguard Worker* The C++11 ``<atomic>`` header. (`C++11 draft available here 18*9880d681SAndroid Build Coastguard Worker <http://www.open-std.org/jtc1/sc22/wg21/>`_.) (`C11 draft available here 19*9880d681SAndroid Build Coastguard Worker <http://www.open-std.org/jtc1/sc22/wg14/>`_.) 20*9880d681SAndroid Build Coastguard Worker 21*9880d681SAndroid Build Coastguard Worker* Proper semantics for Java-style memory, for both ``volatile`` and regular 22*9880d681SAndroid Build Coastguard Worker shared variables. (`Java Specification 23*9880d681SAndroid Build Coastguard Worker <http://docs.oracle.com/javase/specs/jls/se8/html/jls-17.html>`_) 24*9880d681SAndroid Build Coastguard Worker 25*9880d681SAndroid Build Coastguard Worker* gcc-compatible ``__sync_*`` builtins. (`Description 26*9880d681SAndroid Build Coastguard Worker <https://gcc.gnu.org/onlinedocs/gcc/_005f_005fsync-Builtins.html>`_) 27*9880d681SAndroid Build Coastguard Worker 28*9880d681SAndroid Build Coastguard Worker* Other scenarios with atomic semantics, including ``static`` variables with 29*9880d681SAndroid Build Coastguard Worker non-trivial constructors in C++. 30*9880d681SAndroid Build Coastguard Worker 31*9880d681SAndroid Build Coastguard WorkerAtomic and volatile in the IR are orthogonal; "volatile" is the C/C++ volatile, 32*9880d681SAndroid Build Coastguard Workerwhich ensures that every volatile load and store happens and is performed in the 33*9880d681SAndroid Build Coastguard Workerstated order. A couple examples: if a SequentiallyConsistent store is 34*9880d681SAndroid Build Coastguard Workerimmediately followed by another SequentiallyConsistent store to the same 35*9880d681SAndroid Build Coastguard Workeraddress, the first store can be erased. This transformation is not allowed for a 36*9880d681SAndroid Build Coastguard Workerpair of volatile stores. On the other hand, a non-volatile non-atomic load can 37*9880d681SAndroid Build Coastguard Workerbe moved across a volatile load freely, but not an Acquire load. 38*9880d681SAndroid Build Coastguard Worker 39*9880d681SAndroid Build Coastguard WorkerThis document is intended to provide a guide to anyone either writing a frontend 40*9880d681SAndroid Build Coastguard Workerfor LLVM or working on optimization passes for LLVM with a guide for how to deal 41*9880d681SAndroid Build Coastguard Workerwith instructions with special semantics in the presence of concurrency. This 42*9880d681SAndroid Build Coastguard Workeris not intended to be a precise guide to the semantics; the details can get 43*9880d681SAndroid Build Coastguard Workerextremely complicated and unreadable, and are not usually necessary. 44*9880d681SAndroid Build Coastguard Worker 45*9880d681SAndroid Build Coastguard Worker.. _Optimization outside atomic: 46*9880d681SAndroid Build Coastguard Worker 47*9880d681SAndroid Build Coastguard WorkerOptimization outside atomic 48*9880d681SAndroid Build Coastguard Worker=========================== 49*9880d681SAndroid Build Coastguard Worker 50*9880d681SAndroid Build Coastguard WorkerThe basic ``'load'`` and ``'store'`` allow a variety of optimizations, but can 51*9880d681SAndroid Build Coastguard Workerlead to undefined results in a concurrent environment; see `NotAtomic`_. This 52*9880d681SAndroid Build Coastguard Workersection specifically goes into the one optimizer restriction which applies in 53*9880d681SAndroid Build Coastguard Workerconcurrent environments, which gets a bit more of an extended description 54*9880d681SAndroid Build Coastguard Workerbecause any optimization dealing with stores needs to be aware of it. 55*9880d681SAndroid Build Coastguard Worker 56*9880d681SAndroid Build Coastguard WorkerFrom the optimizer's point of view, the rule is that if there are not any 57*9880d681SAndroid Build Coastguard Workerinstructions with atomic ordering involved, concurrency does not matter, with 58*9880d681SAndroid Build Coastguard Workerone exception: if a variable might be visible to another thread or signal 59*9880d681SAndroid Build Coastguard Workerhandler, a store cannot be inserted along a path where it might not execute 60*9880d681SAndroid Build Coastguard Workerotherwise. Take the following example: 61*9880d681SAndroid Build Coastguard Worker 62*9880d681SAndroid Build Coastguard Worker.. code-block:: c 63*9880d681SAndroid Build Coastguard Worker 64*9880d681SAndroid Build Coastguard Worker /* C code, for readability; run through clang -O2 -S -emit-llvm to get 65*9880d681SAndroid Build Coastguard Worker equivalent IR */ 66*9880d681SAndroid Build Coastguard Worker int x; 67*9880d681SAndroid Build Coastguard Worker void f(int* a) { 68*9880d681SAndroid Build Coastguard Worker for (int i = 0; i < 100; i++) { 69*9880d681SAndroid Build Coastguard Worker if (a[i]) 70*9880d681SAndroid Build Coastguard Worker x += 1; 71*9880d681SAndroid Build Coastguard Worker } 72*9880d681SAndroid Build Coastguard Worker } 73*9880d681SAndroid Build Coastguard Worker 74*9880d681SAndroid Build Coastguard WorkerThe following is equivalent in non-concurrent situations: 75*9880d681SAndroid Build Coastguard Worker 76*9880d681SAndroid Build Coastguard Worker.. code-block:: c 77*9880d681SAndroid Build Coastguard Worker 78*9880d681SAndroid Build Coastguard Worker int x; 79*9880d681SAndroid Build Coastguard Worker void f(int* a) { 80*9880d681SAndroid Build Coastguard Worker int xtemp = x; 81*9880d681SAndroid Build Coastguard Worker for (int i = 0; i < 100; i++) { 82*9880d681SAndroid Build Coastguard Worker if (a[i]) 83*9880d681SAndroid Build Coastguard Worker xtemp += 1; 84*9880d681SAndroid Build Coastguard Worker } 85*9880d681SAndroid Build Coastguard Worker x = xtemp; 86*9880d681SAndroid Build Coastguard Worker } 87*9880d681SAndroid Build Coastguard Worker 88*9880d681SAndroid Build Coastguard WorkerHowever, LLVM is not allowed to transform the former to the latter: it could 89*9880d681SAndroid Build Coastguard Workerindirectly introduce undefined behavior if another thread can access ``x`` at 90*9880d681SAndroid Build Coastguard Workerthe same time. (This example is particularly of interest because before the 91*9880d681SAndroid Build Coastguard Workerconcurrency model was implemented, LLVM would perform this transformation.) 92*9880d681SAndroid Build Coastguard Worker 93*9880d681SAndroid Build Coastguard WorkerNote that speculative loads are allowed; a load which is part of a race returns 94*9880d681SAndroid Build Coastguard Worker``undef``, but does not have undefined behavior. 95*9880d681SAndroid Build Coastguard Worker 96*9880d681SAndroid Build Coastguard WorkerAtomic instructions 97*9880d681SAndroid Build Coastguard Worker=================== 98*9880d681SAndroid Build Coastguard Worker 99*9880d681SAndroid Build Coastguard WorkerFor cases where simple loads and stores are not sufficient, LLVM provides 100*9880d681SAndroid Build Coastguard Workervarious atomic instructions. The exact guarantees provided depend on the 101*9880d681SAndroid Build Coastguard Workerordering; see `Atomic orderings`_. 102*9880d681SAndroid Build Coastguard Worker 103*9880d681SAndroid Build Coastguard Worker``load atomic`` and ``store atomic`` provide the same basic functionality as 104*9880d681SAndroid Build Coastguard Workernon-atomic loads and stores, but provide additional guarantees in situations 105*9880d681SAndroid Build Coastguard Workerwhere threads and signals are involved. 106*9880d681SAndroid Build Coastguard Worker 107*9880d681SAndroid Build Coastguard Worker``cmpxchg`` and ``atomicrmw`` are essentially like an atomic load followed by an 108*9880d681SAndroid Build Coastguard Workeratomic store (where the store is conditional for ``cmpxchg``), but no other 109*9880d681SAndroid Build Coastguard Workermemory operation can happen on any thread between the load and store. 110*9880d681SAndroid Build Coastguard Worker 111*9880d681SAndroid Build Coastguard WorkerA ``fence`` provides Acquire and/or Release ordering which is not part of 112*9880d681SAndroid Build Coastguard Workeranother operation; it is normally used along with Monotonic memory operations. 113*9880d681SAndroid Build Coastguard WorkerA Monotonic load followed by an Acquire fence is roughly equivalent to an 114*9880d681SAndroid Build Coastguard WorkerAcquire load, and a Monotonic store following a Release fence is roughly 115*9880d681SAndroid Build Coastguard Workerequivalent to a Release store. SequentiallyConsistent fences behave as both 116*9880d681SAndroid Build Coastguard Workeran Acquire and a Release fence, and offer some additional complicated 117*9880d681SAndroid Build Coastguard Workerguarantees, see the C++11 standard for details. 118*9880d681SAndroid Build Coastguard Worker 119*9880d681SAndroid Build Coastguard WorkerFrontends generating atomic instructions generally need to be aware of the 120*9880d681SAndroid Build Coastguard Workertarget to some degree; atomic instructions are guaranteed to be lock-free, and 121*9880d681SAndroid Build Coastguard Workertherefore an instruction which is wider than the target natively supports can be 122*9880d681SAndroid Build Coastguard Workerimpossible to generate. 123*9880d681SAndroid Build Coastguard Worker 124*9880d681SAndroid Build Coastguard Worker.. _Atomic orderings: 125*9880d681SAndroid Build Coastguard Worker 126*9880d681SAndroid Build Coastguard WorkerAtomic orderings 127*9880d681SAndroid Build Coastguard Worker================ 128*9880d681SAndroid Build Coastguard Worker 129*9880d681SAndroid Build Coastguard WorkerIn order to achieve a balance between performance and necessary guarantees, 130*9880d681SAndroid Build Coastguard Workerthere are six levels of atomicity. They are listed in order of strength; each 131*9880d681SAndroid Build Coastguard Workerlevel includes all the guarantees of the previous level except for 132*9880d681SAndroid Build Coastguard WorkerAcquire/Release. (See also `LangRef Ordering <LangRef.html#ordering>`_.) 133*9880d681SAndroid Build Coastguard Worker 134*9880d681SAndroid Build Coastguard Worker.. _NotAtomic: 135*9880d681SAndroid Build Coastguard Worker 136*9880d681SAndroid Build Coastguard WorkerNotAtomic 137*9880d681SAndroid Build Coastguard Worker--------- 138*9880d681SAndroid Build Coastguard Worker 139*9880d681SAndroid Build Coastguard WorkerNotAtomic is the obvious, a load or store which is not atomic. (This isn't 140*9880d681SAndroid Build Coastguard Workerreally a level of atomicity, but is listed here for comparison.) This is 141*9880d681SAndroid Build Coastguard Workeressentially a regular load or store. If there is a race on a given memory 142*9880d681SAndroid Build Coastguard Workerlocation, loads from that location return undef. 143*9880d681SAndroid Build Coastguard Worker 144*9880d681SAndroid Build Coastguard WorkerRelevant standard 145*9880d681SAndroid Build Coastguard Worker This is intended to match shared variables in C/C++, and to be used in any 146*9880d681SAndroid Build Coastguard Worker other context where memory access is necessary, and a race is impossible. (The 147*9880d681SAndroid Build Coastguard Worker precise definition is in `LangRef Memory Model <LangRef.html#memmodel>`_.) 148*9880d681SAndroid Build Coastguard Worker 149*9880d681SAndroid Build Coastguard WorkerNotes for frontends 150*9880d681SAndroid Build Coastguard Worker The rule is essentially that all memory accessed with basic loads and stores 151*9880d681SAndroid Build Coastguard Worker by multiple threads should be protected by a lock or other synchronization; 152*9880d681SAndroid Build Coastguard Worker otherwise, you are likely to run into undefined behavior. If your frontend is 153*9880d681SAndroid Build Coastguard Worker for a "safe" language like Java, use Unordered to load and store any shared 154*9880d681SAndroid Build Coastguard Worker variable. Note that NotAtomic volatile loads and stores are not properly 155*9880d681SAndroid Build Coastguard Worker atomic; do not try to use them as a substitute. (Per the C/C++ standards, 156*9880d681SAndroid Build Coastguard Worker volatile does provide some limited guarantees around asynchronous signals, but 157*9880d681SAndroid Build Coastguard Worker atomics are generally a better solution.) 158*9880d681SAndroid Build Coastguard Worker 159*9880d681SAndroid Build Coastguard WorkerNotes for optimizers 160*9880d681SAndroid Build Coastguard Worker Introducing loads to shared variables along a codepath where they would not 161*9880d681SAndroid Build Coastguard Worker otherwise exist is allowed; introducing stores to shared variables is not. See 162*9880d681SAndroid Build Coastguard Worker `Optimization outside atomic`_. 163*9880d681SAndroid Build Coastguard Worker 164*9880d681SAndroid Build Coastguard WorkerNotes for code generation 165*9880d681SAndroid Build Coastguard Worker The one interesting restriction here is that it is not allowed to write to 166*9880d681SAndroid Build Coastguard Worker bytes outside of the bytes relevant to a store. This is mostly relevant to 167*9880d681SAndroid Build Coastguard Worker unaligned stores: it is not allowed in general to convert an unaligned store 168*9880d681SAndroid Build Coastguard Worker into two aligned stores of the same width as the unaligned store. Backends are 169*9880d681SAndroid Build Coastguard Worker also expected to generate an i8 store as an i8 store, and not an instruction 170*9880d681SAndroid Build Coastguard Worker which writes to surrounding bytes. (If you are writing a backend for an 171*9880d681SAndroid Build Coastguard Worker architecture which cannot satisfy these restrictions and cares about 172*9880d681SAndroid Build Coastguard Worker concurrency, please send an email to llvm-dev.) 173*9880d681SAndroid Build Coastguard Worker 174*9880d681SAndroid Build Coastguard WorkerUnordered 175*9880d681SAndroid Build Coastguard Worker--------- 176*9880d681SAndroid Build Coastguard Worker 177*9880d681SAndroid Build Coastguard WorkerUnordered is the lowest level of atomicity. It essentially guarantees that races 178*9880d681SAndroid Build Coastguard Workerproduce somewhat sane results instead of having undefined behavior. It also 179*9880d681SAndroid Build Coastguard Workerguarantees the operation to be lock-free, so it does not depend on the data 180*9880d681SAndroid Build Coastguard Workerbeing part of a special atomic structure or depend on a separate per-process 181*9880d681SAndroid Build Coastguard Workerglobal lock. Note that code generation will fail for unsupported atomic 182*9880d681SAndroid Build Coastguard Workeroperations; if you need such an operation, use explicit locking. 183*9880d681SAndroid Build Coastguard Worker 184*9880d681SAndroid Build Coastguard WorkerRelevant standard 185*9880d681SAndroid Build Coastguard Worker This is intended to match the Java memory model for shared variables. 186*9880d681SAndroid Build Coastguard Worker 187*9880d681SAndroid Build Coastguard WorkerNotes for frontends 188*9880d681SAndroid Build Coastguard Worker This cannot be used for synchronization, but is useful for Java and other 189*9880d681SAndroid Build Coastguard Worker "safe" languages which need to guarantee that the generated code never 190*9880d681SAndroid Build Coastguard Worker exhibits undefined behavior. Note that this guarantee is cheap on common 191*9880d681SAndroid Build Coastguard Worker platforms for loads of a native width, but can be expensive or unavailable for 192*9880d681SAndroid Build Coastguard Worker wider loads, like a 64-bit store on ARM. (A frontend for Java or other "safe" 193*9880d681SAndroid Build Coastguard Worker languages would normally split a 64-bit store on ARM into two 32-bit unordered 194*9880d681SAndroid Build Coastguard Worker stores.) 195*9880d681SAndroid Build Coastguard Worker 196*9880d681SAndroid Build Coastguard WorkerNotes for optimizers 197*9880d681SAndroid Build Coastguard Worker In terms of the optimizer, this prohibits any transformation that transforms a 198*9880d681SAndroid Build Coastguard Worker single load into multiple loads, transforms a store into multiple stores, 199*9880d681SAndroid Build Coastguard Worker narrows a store, or stores a value which would not be stored otherwise. Some 200*9880d681SAndroid Build Coastguard Worker examples of unsafe optimizations are narrowing an assignment into a bitfield, 201*9880d681SAndroid Build Coastguard Worker rematerializing a load, and turning loads and stores into a memcpy 202*9880d681SAndroid Build Coastguard Worker call. Reordering unordered operations is safe, though, and optimizers should 203*9880d681SAndroid Build Coastguard Worker take advantage of that because unordered operations are common in languages 204*9880d681SAndroid Build Coastguard Worker that need them. 205*9880d681SAndroid Build Coastguard Worker 206*9880d681SAndroid Build Coastguard WorkerNotes for code generation 207*9880d681SAndroid Build Coastguard Worker These operations are required to be atomic in the sense that if you use 208*9880d681SAndroid Build Coastguard Worker unordered loads and unordered stores, a load cannot see a value which was 209*9880d681SAndroid Build Coastguard Worker never stored. A normal load or store instruction is usually sufficient, but 210*9880d681SAndroid Build Coastguard Worker note that an unordered load or store cannot be split into multiple 211*9880d681SAndroid Build Coastguard Worker instructions (or an instruction which does multiple memory operations, like 212*9880d681SAndroid Build Coastguard Worker ``LDRD`` on ARM without LPAE, or not naturally-aligned ``LDRD`` on LPAE ARM). 213*9880d681SAndroid Build Coastguard Worker 214*9880d681SAndroid Build Coastguard WorkerMonotonic 215*9880d681SAndroid Build Coastguard Worker--------- 216*9880d681SAndroid Build Coastguard Worker 217*9880d681SAndroid Build Coastguard WorkerMonotonic is the weakest level of atomicity that can be used in synchronization 218*9880d681SAndroid Build Coastguard Workerprimitives, although it does not provide any general synchronization. It 219*9880d681SAndroid Build Coastguard Workeressentially guarantees that if you take all the operations affecting a specific 220*9880d681SAndroid Build Coastguard Workeraddress, a consistent ordering exists. 221*9880d681SAndroid Build Coastguard Worker 222*9880d681SAndroid Build Coastguard WorkerRelevant standard 223*9880d681SAndroid Build Coastguard Worker This corresponds to the C++11/C11 ``memory_order_relaxed``; see those 224*9880d681SAndroid Build Coastguard Worker standards for the exact definition. 225*9880d681SAndroid Build Coastguard Worker 226*9880d681SAndroid Build Coastguard WorkerNotes for frontends 227*9880d681SAndroid Build Coastguard Worker If you are writing a frontend which uses this directly, use with caution. The 228*9880d681SAndroid Build Coastguard Worker guarantees in terms of synchronization are very weak, so make sure these are 229*9880d681SAndroid Build Coastguard Worker only used in a pattern which you know is correct. Generally, these would 230*9880d681SAndroid Build Coastguard Worker either be used for atomic operations which do not protect other memory (like 231*9880d681SAndroid Build Coastguard Worker an atomic counter), or along with a ``fence``. 232*9880d681SAndroid Build Coastguard Worker 233*9880d681SAndroid Build Coastguard WorkerNotes for optimizers 234*9880d681SAndroid Build Coastguard Worker In terms of the optimizer, this can be treated as a read+write on the relevant 235*9880d681SAndroid Build Coastguard Worker memory location (and alias analysis will take advantage of that). In addition, 236*9880d681SAndroid Build Coastguard Worker it is legal to reorder non-atomic and Unordered loads around Monotonic 237*9880d681SAndroid Build Coastguard Worker loads. CSE/DSE and a few other optimizations are allowed, but Monotonic 238*9880d681SAndroid Build Coastguard Worker operations are unlikely to be used in ways which would make those 239*9880d681SAndroid Build Coastguard Worker optimizations useful. 240*9880d681SAndroid Build Coastguard Worker 241*9880d681SAndroid Build Coastguard WorkerNotes for code generation 242*9880d681SAndroid Build Coastguard Worker Code generation is essentially the same as that for unordered for loads and 243*9880d681SAndroid Build Coastguard Worker stores. No fences are required. ``cmpxchg`` and ``atomicrmw`` are required 244*9880d681SAndroid Build Coastguard Worker to appear as a single operation. 245*9880d681SAndroid Build Coastguard Worker 246*9880d681SAndroid Build Coastguard WorkerAcquire 247*9880d681SAndroid Build Coastguard Worker------- 248*9880d681SAndroid Build Coastguard Worker 249*9880d681SAndroid Build Coastguard WorkerAcquire provides a barrier of the sort necessary to acquire a lock to access 250*9880d681SAndroid Build Coastguard Workerother memory with normal loads and stores. 251*9880d681SAndroid Build Coastguard Worker 252*9880d681SAndroid Build Coastguard WorkerRelevant standard 253*9880d681SAndroid Build Coastguard Worker This corresponds to the C++11/C11 ``memory_order_acquire``. It should also be 254*9880d681SAndroid Build Coastguard Worker used for C++11/C11 ``memory_order_consume``. 255*9880d681SAndroid Build Coastguard Worker 256*9880d681SAndroid Build Coastguard WorkerNotes for frontends 257*9880d681SAndroid Build Coastguard Worker If you are writing a frontend which uses this directly, use with caution. 258*9880d681SAndroid Build Coastguard Worker Acquire only provides a semantic guarantee when paired with a Release 259*9880d681SAndroid Build Coastguard Worker operation. 260*9880d681SAndroid Build Coastguard Worker 261*9880d681SAndroid Build Coastguard WorkerNotes for optimizers 262*9880d681SAndroid Build Coastguard Worker Optimizers not aware of atomics can treat this like a nothrow call. It is 263*9880d681SAndroid Build Coastguard Worker also possible to move stores from before an Acquire load or read-modify-write 264*9880d681SAndroid Build Coastguard Worker operation to after it, and move non-Acquire loads from before an Acquire 265*9880d681SAndroid Build Coastguard Worker operation to after it. 266*9880d681SAndroid Build Coastguard Worker 267*9880d681SAndroid Build Coastguard WorkerNotes for code generation 268*9880d681SAndroid Build Coastguard Worker Architectures with weak memory ordering (essentially everything relevant today 269*9880d681SAndroid Build Coastguard Worker except x86 and SPARC) require some sort of fence to maintain the Acquire 270*9880d681SAndroid Build Coastguard Worker semantics. The precise fences required varies widely by architecture, but for 271*9880d681SAndroid Build Coastguard Worker a simple implementation, most architectures provide a barrier which is strong 272*9880d681SAndroid Build Coastguard Worker enough for everything (``dmb`` on ARM, ``sync`` on PowerPC, etc.). Putting 273*9880d681SAndroid Build Coastguard Worker such a fence after the equivalent Monotonic operation is sufficient to 274*9880d681SAndroid Build Coastguard Worker maintain Acquire semantics for a memory operation. 275*9880d681SAndroid Build Coastguard Worker 276*9880d681SAndroid Build Coastguard WorkerRelease 277*9880d681SAndroid Build Coastguard Worker------- 278*9880d681SAndroid Build Coastguard Worker 279*9880d681SAndroid Build Coastguard WorkerRelease is similar to Acquire, but with a barrier of the sort necessary to 280*9880d681SAndroid Build Coastguard Workerrelease a lock. 281*9880d681SAndroid Build Coastguard Worker 282*9880d681SAndroid Build Coastguard WorkerRelevant standard 283*9880d681SAndroid Build Coastguard Worker This corresponds to the C++11/C11 ``memory_order_release``. 284*9880d681SAndroid Build Coastguard Worker 285*9880d681SAndroid Build Coastguard WorkerNotes for frontends 286*9880d681SAndroid Build Coastguard Worker If you are writing a frontend which uses this directly, use with caution. 287*9880d681SAndroid Build Coastguard Worker Release only provides a semantic guarantee when paired with a Acquire 288*9880d681SAndroid Build Coastguard Worker operation. 289*9880d681SAndroid Build Coastguard Worker 290*9880d681SAndroid Build Coastguard WorkerNotes for optimizers 291*9880d681SAndroid Build Coastguard Worker Optimizers not aware of atomics can treat this like a nothrow call. It is 292*9880d681SAndroid Build Coastguard Worker also possible to move loads from after a Release store or read-modify-write 293*9880d681SAndroid Build Coastguard Worker operation to before it, and move non-Release stores from after an Release 294*9880d681SAndroid Build Coastguard Worker operation to before it. 295*9880d681SAndroid Build Coastguard Worker 296*9880d681SAndroid Build Coastguard WorkerNotes for code generation 297*9880d681SAndroid Build Coastguard Worker See the section on Acquire; a fence before the relevant operation is usually 298*9880d681SAndroid Build Coastguard Worker sufficient for Release. Note that a store-store fence is not sufficient to 299*9880d681SAndroid Build Coastguard Worker implement Release semantics; store-store fences are generally not exposed to 300*9880d681SAndroid Build Coastguard Worker IR because they are extremely difficult to use correctly. 301*9880d681SAndroid Build Coastguard Worker 302*9880d681SAndroid Build Coastguard WorkerAcquireRelease 303*9880d681SAndroid Build Coastguard Worker-------------- 304*9880d681SAndroid Build Coastguard Worker 305*9880d681SAndroid Build Coastguard WorkerAcquireRelease (``acq_rel`` in IR) provides both an Acquire and a Release 306*9880d681SAndroid Build Coastguard Workerbarrier (for fences and operations which both read and write memory). 307*9880d681SAndroid Build Coastguard Worker 308*9880d681SAndroid Build Coastguard WorkerRelevant standard 309*9880d681SAndroid Build Coastguard Worker This corresponds to the C++11/C11 ``memory_order_acq_rel``. 310*9880d681SAndroid Build Coastguard Worker 311*9880d681SAndroid Build Coastguard WorkerNotes for frontends 312*9880d681SAndroid Build Coastguard Worker If you are writing a frontend which uses this directly, use with caution. 313*9880d681SAndroid Build Coastguard Worker Acquire only provides a semantic guarantee when paired with a Release 314*9880d681SAndroid Build Coastguard Worker operation, and vice versa. 315*9880d681SAndroid Build Coastguard Worker 316*9880d681SAndroid Build Coastguard WorkerNotes for optimizers 317*9880d681SAndroid Build Coastguard Worker In general, optimizers should treat this like a nothrow call; the possible 318*9880d681SAndroid Build Coastguard Worker optimizations are usually not interesting. 319*9880d681SAndroid Build Coastguard Worker 320*9880d681SAndroid Build Coastguard WorkerNotes for code generation 321*9880d681SAndroid Build Coastguard Worker This operation has Acquire and Release semantics; see the sections on Acquire 322*9880d681SAndroid Build Coastguard Worker and Release. 323*9880d681SAndroid Build Coastguard Worker 324*9880d681SAndroid Build Coastguard WorkerSequentiallyConsistent 325*9880d681SAndroid Build Coastguard Worker---------------------- 326*9880d681SAndroid Build Coastguard Worker 327*9880d681SAndroid Build Coastguard WorkerSequentiallyConsistent (``seq_cst`` in IR) provides Acquire semantics for loads 328*9880d681SAndroid Build Coastguard Workerand Release semantics for stores. Additionally, it guarantees that a total 329*9880d681SAndroid Build Coastguard Workerordering exists between all SequentiallyConsistent operations. 330*9880d681SAndroid Build Coastguard Worker 331*9880d681SAndroid Build Coastguard WorkerRelevant standard 332*9880d681SAndroid Build Coastguard Worker This corresponds to the C++11/C11 ``memory_order_seq_cst``, Java volatile, and 333*9880d681SAndroid Build Coastguard Worker the gcc-compatible ``__sync_*`` builtins which do not specify otherwise. 334*9880d681SAndroid Build Coastguard Worker 335*9880d681SAndroid Build Coastguard WorkerNotes for frontends 336*9880d681SAndroid Build Coastguard Worker If a frontend is exposing atomic operations, these are much easier to reason 337*9880d681SAndroid Build Coastguard Worker about for the programmer than other kinds of operations, and using them is 338*9880d681SAndroid Build Coastguard Worker generally a practical performance tradeoff. 339*9880d681SAndroid Build Coastguard Worker 340*9880d681SAndroid Build Coastguard WorkerNotes for optimizers 341*9880d681SAndroid Build Coastguard Worker Optimizers not aware of atomics can treat this like a nothrow call. For 342*9880d681SAndroid Build Coastguard Worker SequentiallyConsistent loads and stores, the same reorderings are allowed as 343*9880d681SAndroid Build Coastguard Worker for Acquire loads and Release stores, except that SequentiallyConsistent 344*9880d681SAndroid Build Coastguard Worker operations may not be reordered. 345*9880d681SAndroid Build Coastguard Worker 346*9880d681SAndroid Build Coastguard WorkerNotes for code generation 347*9880d681SAndroid Build Coastguard Worker SequentiallyConsistent loads minimally require the same barriers as Acquire 348*9880d681SAndroid Build Coastguard Worker operations and SequentiallyConsistent stores require Release 349*9880d681SAndroid Build Coastguard Worker barriers. Additionally, the code generator must enforce ordering between 350*9880d681SAndroid Build Coastguard Worker SequentiallyConsistent stores followed by SequentiallyConsistent loads. This 351*9880d681SAndroid Build Coastguard Worker is usually done by emitting either a full fence before the loads or a full 352*9880d681SAndroid Build Coastguard Worker fence after the stores; which is preferred varies by architecture. 353*9880d681SAndroid Build Coastguard Worker 354*9880d681SAndroid Build Coastguard WorkerAtomics and IR optimization 355*9880d681SAndroid Build Coastguard Worker=========================== 356*9880d681SAndroid Build Coastguard Worker 357*9880d681SAndroid Build Coastguard WorkerPredicates for optimizer writers to query: 358*9880d681SAndroid Build Coastguard Worker 359*9880d681SAndroid Build Coastguard Worker* ``isSimple()``: A load or store which is not volatile or atomic. This is 360*9880d681SAndroid Build Coastguard Worker what, for example, memcpyopt would check for operations it might transform. 361*9880d681SAndroid Build Coastguard Worker 362*9880d681SAndroid Build Coastguard Worker* ``isUnordered()``: A load or store which is not volatile and at most 363*9880d681SAndroid Build Coastguard Worker Unordered. This would be checked, for example, by LICM before hoisting an 364*9880d681SAndroid Build Coastguard Worker operation. 365*9880d681SAndroid Build Coastguard Worker 366*9880d681SAndroid Build Coastguard Worker* ``mayReadFromMemory()``/``mayWriteToMemory()``: Existing predicate, but note 367*9880d681SAndroid Build Coastguard Worker that they return true for any operation which is volatile or at least 368*9880d681SAndroid Build Coastguard Worker Monotonic. 369*9880d681SAndroid Build Coastguard Worker 370*9880d681SAndroid Build Coastguard Worker* ``isStrongerThan`` / ``isAtLeastOrStrongerThan``: These are predicates on 371*9880d681SAndroid Build Coastguard Worker orderings. They can be useful for passes that are aware of atomics, for 372*9880d681SAndroid Build Coastguard Worker example to do DSE across a single atomic access, but not across a 373*9880d681SAndroid Build Coastguard Worker release-acquire pair (see MemoryDependencyAnalysis for an example of this) 374*9880d681SAndroid Build Coastguard Worker 375*9880d681SAndroid Build Coastguard Worker* Alias analysis: Note that AA will return ModRef for anything Acquire or 376*9880d681SAndroid Build Coastguard Worker Release, and for the address accessed by any Monotonic operation. 377*9880d681SAndroid Build Coastguard Worker 378*9880d681SAndroid Build Coastguard WorkerTo support optimizing around atomic operations, make sure you are using the 379*9880d681SAndroid Build Coastguard Workerright predicates; everything should work if that is done. If your pass should 380*9880d681SAndroid Build Coastguard Workeroptimize some atomic operations (Unordered operations in particular), make sure 381*9880d681SAndroid Build Coastguard Workerit doesn't replace an atomic load or store with a non-atomic operation. 382*9880d681SAndroid Build Coastguard Worker 383*9880d681SAndroid Build Coastguard WorkerSome examples of how optimizations interact with various kinds of atomic 384*9880d681SAndroid Build Coastguard Workeroperations: 385*9880d681SAndroid Build Coastguard Worker 386*9880d681SAndroid Build Coastguard Worker* ``memcpyopt``: An atomic operation cannot be optimized into part of a 387*9880d681SAndroid Build Coastguard Worker memcpy/memset, including unordered loads/stores. It can pull operations 388*9880d681SAndroid Build Coastguard Worker across some atomic operations. 389*9880d681SAndroid Build Coastguard Worker 390*9880d681SAndroid Build Coastguard Worker* LICM: Unordered loads/stores can be moved out of a loop. It just treats 391*9880d681SAndroid Build Coastguard Worker monotonic operations like a read+write to a memory location, and anything 392*9880d681SAndroid Build Coastguard Worker stricter than that like a nothrow call. 393*9880d681SAndroid Build Coastguard Worker 394*9880d681SAndroid Build Coastguard Worker* DSE: Unordered stores can be DSE'ed like normal stores. Monotonic stores can 395*9880d681SAndroid Build Coastguard Worker be DSE'ed in some cases, but it's tricky to reason about, and not especially 396*9880d681SAndroid Build Coastguard Worker important. It is possible in some case for DSE to operate across a stronger 397*9880d681SAndroid Build Coastguard Worker atomic operation, but it is fairly tricky. DSE delegates this reasoning to 398*9880d681SAndroid Build Coastguard Worker MemoryDependencyAnalysis (which is also used by other passes like GVN). 399*9880d681SAndroid Build Coastguard Worker 400*9880d681SAndroid Build Coastguard Worker* Folding a load: Any atomic load from a constant global can be constant-folded, 401*9880d681SAndroid Build Coastguard Worker because it cannot be observed. Similar reasoning allows sroa with 402*9880d681SAndroid Build Coastguard Worker atomic loads and stores. 403*9880d681SAndroid Build Coastguard Worker 404*9880d681SAndroid Build Coastguard WorkerAtomics and Codegen 405*9880d681SAndroid Build Coastguard Worker=================== 406*9880d681SAndroid Build Coastguard Worker 407*9880d681SAndroid Build Coastguard WorkerAtomic operations are represented in the SelectionDAG with ``ATOMIC_*`` opcodes. 408*9880d681SAndroid Build Coastguard WorkerOn architectures which use barrier instructions for all atomic ordering (like 409*9880d681SAndroid Build Coastguard WorkerARM), appropriate fences can be emitted by the AtomicExpand Codegen pass if 410*9880d681SAndroid Build Coastguard Worker``setInsertFencesForAtomic()`` was used. 411*9880d681SAndroid Build Coastguard Worker 412*9880d681SAndroid Build Coastguard WorkerThe MachineMemOperand for all atomic operations is currently marked as volatile; 413*9880d681SAndroid Build Coastguard Workerthis is not correct in the IR sense of volatile, but CodeGen handles anything 414*9880d681SAndroid Build Coastguard Workermarked volatile very conservatively. This should get fixed at some point. 415*9880d681SAndroid Build Coastguard Worker 416*9880d681SAndroid Build Coastguard WorkerOne very important property of the atomic operations is that if your backend 417*9880d681SAndroid Build Coastguard Workersupports any inline lock-free atomic operations of a given size, you should 418*9880d681SAndroid Build Coastguard Workersupport *ALL* operations of that size in a lock-free manner. 419*9880d681SAndroid Build Coastguard Worker 420*9880d681SAndroid Build Coastguard WorkerWhen the target implements atomic ``cmpxchg`` or LL/SC instructions (as most do) 421*9880d681SAndroid Build Coastguard Workerthis is trivial: all the other operations can be implemented on top of those 422*9880d681SAndroid Build Coastguard Workerprimitives. However, on many older CPUs (e.g. ARMv5, SparcV8, Intel 80386) there 423*9880d681SAndroid Build Coastguard Workerare atomic load and store instructions, but no ``cmpxchg`` or LL/SC. As it is 424*9880d681SAndroid Build Coastguard Workerinvalid to implement ``atomic load`` using the native instruction, but 425*9880d681SAndroid Build Coastguard Worker``cmpxchg`` using a library call to a function that uses a mutex, ``atomic 426*9880d681SAndroid Build Coastguard Workerload`` must *also* expand to a library call on such architectures, so that it 427*9880d681SAndroid Build Coastguard Workercan remain atomic with regards to a simultaneous ``cmpxchg``, by using the same 428*9880d681SAndroid Build Coastguard Workermutex. 429*9880d681SAndroid Build Coastguard Worker 430*9880d681SAndroid Build Coastguard WorkerAtomicExpandPass can help with that: it will expand all atomic operations to the 431*9880d681SAndroid Build Coastguard Workerproper ``__atomic_*`` libcalls for any size above the maximum set by 432*9880d681SAndroid Build Coastguard Worker``setMaxAtomicSizeInBitsSupported`` (which defaults to 0). 433*9880d681SAndroid Build Coastguard Worker 434*9880d681SAndroid Build Coastguard WorkerOn x86, all atomic loads generate a ``MOV``. SequentiallyConsistent stores 435*9880d681SAndroid Build Coastguard Workergenerate an ``XCHG``, other stores generate a ``MOV``. SequentiallyConsistent 436*9880d681SAndroid Build Coastguard Workerfences generate an ``MFENCE``, other fences do not cause any code to be 437*9880d681SAndroid Build Coastguard Workergenerated. ``cmpxchg`` uses the ``LOCK CMPXCHG`` instruction. ``atomicrmw xchg`` 438*9880d681SAndroid Build Coastguard Workeruses ``XCHG``, ``atomicrmw add`` and ``atomicrmw sub`` use ``XADD``, and all 439*9880d681SAndroid Build Coastguard Workerother ``atomicrmw`` operations generate a loop with ``LOCK CMPXCHG``. Depending 440*9880d681SAndroid Build Coastguard Workeron the users of the result, some ``atomicrmw`` operations can be translated into 441*9880d681SAndroid Build Coastguard Workeroperations like ``LOCK AND``, but that does not work in general. 442*9880d681SAndroid Build Coastguard Worker 443*9880d681SAndroid Build Coastguard WorkerOn ARM (before v8), MIPS, and many other RISC architectures, Acquire, Release, 444*9880d681SAndroid Build Coastguard Workerand SequentiallyConsistent semantics require barrier instructions for every such 445*9880d681SAndroid Build Coastguard Workeroperation. Loads and stores generate normal instructions. ``cmpxchg`` and 446*9880d681SAndroid Build Coastguard Worker``atomicrmw`` can be represented using a loop with LL/SC-style instructions 447*9880d681SAndroid Build Coastguard Workerwhich take some sort of exclusive lock on a cache line (``LDREX`` and ``STREX`` 448*9880d681SAndroid Build Coastguard Workeron ARM, etc.). 449*9880d681SAndroid Build Coastguard Worker 450*9880d681SAndroid Build Coastguard WorkerIt is often easiest for backends to use AtomicExpandPass to lower some of the 451*9880d681SAndroid Build Coastguard Workeratomic constructs. Here are some lowerings it can do: 452*9880d681SAndroid Build Coastguard Worker 453*9880d681SAndroid Build Coastguard Worker* cmpxchg -> loop with load-linked/store-conditional 454*9880d681SAndroid Build Coastguard Worker by overriding ``shouldExpandAtomicCmpXchgInIR()``, ``emitLoadLinked()``, 455*9880d681SAndroid Build Coastguard Worker ``emitStoreConditional()`` 456*9880d681SAndroid Build Coastguard Worker* large loads/stores -> ll-sc/cmpxchg 457*9880d681SAndroid Build Coastguard Worker by overriding ``shouldExpandAtomicStoreInIR()``/``shouldExpandAtomicLoadInIR()`` 458*9880d681SAndroid Build Coastguard Worker* strong atomic accesses -> monotonic accesses + fences by overriding 459*9880d681SAndroid Build Coastguard Worker ``shouldInsertFencesForAtomic()``, ``emitLeadingFence()``, and 460*9880d681SAndroid Build Coastguard Worker ``emitTrailingFence()`` 461*9880d681SAndroid Build Coastguard Worker* atomic rmw -> loop with cmpxchg or load-linked/store-conditional 462*9880d681SAndroid Build Coastguard Worker by overriding ``expandAtomicRMWInIR()`` 463*9880d681SAndroid Build Coastguard Worker* expansion to __atomic_* libcalls for unsupported sizes. 464*9880d681SAndroid Build Coastguard Worker 465*9880d681SAndroid Build Coastguard WorkerFor an example of all of these, look at the ARM backend. 466*9880d681SAndroid Build Coastguard Worker 467*9880d681SAndroid Build Coastguard WorkerLibcalls: __atomic_* 468*9880d681SAndroid Build Coastguard Worker==================== 469*9880d681SAndroid Build Coastguard Worker 470*9880d681SAndroid Build Coastguard WorkerThere are two kinds of atomic library calls that are generated by LLVM. Please 471*9880d681SAndroid Build Coastguard Workernote that both sets of library functions somewhat confusingly share the names of 472*9880d681SAndroid Build Coastguard Workerbuiltin functions defined by clang. Despite this, the library functions are 473*9880d681SAndroid Build Coastguard Workernot directly related to the builtins: it is *not* the case that ``__atomic_*`` 474*9880d681SAndroid Build Coastguard Workerbuiltins lower to ``__atomic_*`` library calls and ``__sync_*`` builtins lower 475*9880d681SAndroid Build Coastguard Workerto ``__sync_*`` library calls. 476*9880d681SAndroid Build Coastguard Worker 477*9880d681SAndroid Build Coastguard WorkerThe first set of library functions are named ``__atomic_*``. This set has been 478*9880d681SAndroid Build Coastguard Worker"standardized" by GCC, and is described below. (See also `GCC's documentation 479*9880d681SAndroid Build Coastguard Worker<https://gcc.gnu.org/wiki/Atomic/GCCMM/LIbrary>`_) 480*9880d681SAndroid Build Coastguard Worker 481*9880d681SAndroid Build Coastguard WorkerLLVM's AtomicExpandPass will translate atomic operations on data sizes above 482*9880d681SAndroid Build Coastguard Worker``MaxAtomicSizeInBitsSupported`` into calls to these functions. 483*9880d681SAndroid Build Coastguard Worker 484*9880d681SAndroid Build Coastguard WorkerThere are four generic functions, which can be called with data of any size or 485*9880d681SAndroid Build Coastguard Workeralignment:: 486*9880d681SAndroid Build Coastguard Worker 487*9880d681SAndroid Build Coastguard Worker void __atomic_load(size_t size, void *ptr, void *ret, int ordering) 488*9880d681SAndroid Build Coastguard Worker void __atomic_store(size_t size, void *ptr, void *val, int ordering) 489*9880d681SAndroid Build Coastguard Worker void __atomic_exchange(size_t size, void *ptr, void *val, void *ret, int ordering) 490*9880d681SAndroid Build Coastguard Worker bool __atomic_compare_exchange(size_t size, void *ptr, void *expected, void *desired, int success_order, int failure_order) 491*9880d681SAndroid Build Coastguard Worker 492*9880d681SAndroid Build Coastguard WorkerThere are also size-specialized versions of the above functions, which can only 493*9880d681SAndroid Build Coastguard Workerbe used with *naturally-aligned* pointers of the appropriate size. In the 494*9880d681SAndroid Build Coastguard Workersignatures below, "N" is one of 1, 2, 4, 8, and 16, and "iN" is the appropriate 495*9880d681SAndroid Build Coastguard Workerinteger type of that size; if no such integer type exists, the specialization 496*9880d681SAndroid Build Coastguard Workercannot be used:: 497*9880d681SAndroid Build Coastguard Worker 498*9880d681SAndroid Build Coastguard Worker iN __atomic_load_N(iN *ptr, iN val, int ordering) 499*9880d681SAndroid Build Coastguard Worker void __atomic_store_N(iN *ptr, iN val, int ordering) 500*9880d681SAndroid Build Coastguard Worker iN __atomic_exchange_N(iN *ptr, iN val, int ordering) 501*9880d681SAndroid Build Coastguard Worker bool __atomic_compare_exchange_N(iN *ptr, iN *expected, iN desired, int success_order, int failure_order) 502*9880d681SAndroid Build Coastguard Worker 503*9880d681SAndroid Build Coastguard WorkerFinally there are some read-modify-write functions, which are only available in 504*9880d681SAndroid Build Coastguard Workerthe size-specific variants (any other sizes use a ``__atomic_compare_exchange`` 505*9880d681SAndroid Build Coastguard Workerloop):: 506*9880d681SAndroid Build Coastguard Worker 507*9880d681SAndroid Build Coastguard Worker iN __atomic_fetch_add_N(iN *ptr, iN val, int ordering) 508*9880d681SAndroid Build Coastguard Worker iN __atomic_fetch_sub_N(iN *ptr, iN val, int ordering) 509*9880d681SAndroid Build Coastguard Worker iN __atomic_fetch_and_N(iN *ptr, iN val, int ordering) 510*9880d681SAndroid Build Coastguard Worker iN __atomic_fetch_or_N(iN *ptr, iN val, int ordering) 511*9880d681SAndroid Build Coastguard Worker iN __atomic_fetch_xor_N(iN *ptr, iN val, int ordering) 512*9880d681SAndroid Build Coastguard Worker iN __atomic_fetch_nand_N(iN *ptr, iN val, int ordering) 513*9880d681SAndroid Build Coastguard Worker 514*9880d681SAndroid Build Coastguard WorkerThis set of library functions have some interesting implementation requirements 515*9880d681SAndroid Build Coastguard Workerto take note of: 516*9880d681SAndroid Build Coastguard Worker 517*9880d681SAndroid Build Coastguard Worker- They support all sizes and alignments -- including those which cannot be 518*9880d681SAndroid Build Coastguard Worker implemented natively on any existing hardware. Therefore, they will certainly 519*9880d681SAndroid Build Coastguard Worker use mutexes in for some sizes/alignments. 520*9880d681SAndroid Build Coastguard Worker 521*9880d681SAndroid Build Coastguard Worker- As a consequence, they cannot be shipped in a statically linked 522*9880d681SAndroid Build Coastguard Worker compiler-support library, as they have state which must be shared amongst all 523*9880d681SAndroid Build Coastguard Worker DSOs loaded in the program. They must be provided in a shared library used by 524*9880d681SAndroid Build Coastguard Worker all objects. 525*9880d681SAndroid Build Coastguard Worker 526*9880d681SAndroid Build Coastguard Worker- The set of atomic sizes supported lock-free must be a superset of the sizes 527*9880d681SAndroid Build Coastguard Worker any compiler can emit. That is: if a new compiler introduces support for 528*9880d681SAndroid Build Coastguard Worker inline-lock-free atomics of size N, the ``__atomic_*`` functions must also have a 529*9880d681SAndroid Build Coastguard Worker lock-free implementation for size N. This is a requirement so that code 530*9880d681SAndroid Build Coastguard Worker produced by an old compiler (which will have called the ``__atomic_*`` function) 531*9880d681SAndroid Build Coastguard Worker interoperates with code produced by the new compiler (which will use native 532*9880d681SAndroid Build Coastguard Worker the atomic instruction). 533*9880d681SAndroid Build Coastguard Worker 534*9880d681SAndroid Build Coastguard WorkerNote that it's possible to write an entirely target-independent implementation 535*9880d681SAndroid Build Coastguard Workerof these library functions by using the compiler atomic builtins themselves to 536*9880d681SAndroid Build Coastguard Workerimplement the operations on naturally-aligned pointers of supported sizes, and a 537*9880d681SAndroid Build Coastguard Workergeneric mutex implementation otherwise. 538*9880d681SAndroid Build Coastguard Worker 539*9880d681SAndroid Build Coastguard WorkerLibcalls: __sync_* 540*9880d681SAndroid Build Coastguard Worker================== 541*9880d681SAndroid Build Coastguard Worker 542*9880d681SAndroid Build Coastguard WorkerSome targets or OS/target combinations can support lock-free atomics, but for 543*9880d681SAndroid Build Coastguard Workervarious reasons, it is not practical to emit the instructions inline. 544*9880d681SAndroid Build Coastguard Worker 545*9880d681SAndroid Build Coastguard WorkerThere's two typical examples of this. 546*9880d681SAndroid Build Coastguard Worker 547*9880d681SAndroid Build Coastguard WorkerSome CPUs support multiple instruction sets which can be swiched back and forth 548*9880d681SAndroid Build Coastguard Workeron function-call boundaries. For example, MIPS supports the MIPS16 ISA, which 549*9880d681SAndroid Build Coastguard Workerhas a smaller instruction encoding than the usual MIPS32 ISA. ARM, similarly, 550*9880d681SAndroid Build Coastguard Workerhas the Thumb ISA. In MIPS16 and earlier versions of Thumb, the atomic 551*9880d681SAndroid Build Coastguard Workerinstructions are not encodable. However, those instructions are available via a 552*9880d681SAndroid Build Coastguard Workerfunction call to a function with the longer encoding. 553*9880d681SAndroid Build Coastguard Worker 554*9880d681SAndroid Build Coastguard WorkerAdditionally, a few OS/target pairs provide kernel-supported lock-free 555*9880d681SAndroid Build Coastguard Workeratomics. ARM/Linux is an example of this: the kernel `provides 556*9880d681SAndroid Build Coastguard Worker<https://www.kernel.org/doc/Documentation/arm/kernel_user_helpers.txt>`_ a 557*9880d681SAndroid Build Coastguard Workerfunction which on older CPUs contains a "magically-restartable" atomic sequence 558*9880d681SAndroid Build Coastguard Worker(which looks atomic so long as there's only one CPU), and contains actual atomic 559*9880d681SAndroid Build Coastguard Workerinstructions on newer multicore models. This sort of functionality can typically 560*9880d681SAndroid Build Coastguard Workerbe provided on any architecture, if all CPUs which are missing atomic 561*9880d681SAndroid Build Coastguard Workercompare-and-swap support are uniprocessor (no SMP). This is almost always the 562*9880d681SAndroid Build Coastguard Workercase. The only common architecture without that property is SPARC -- SPARCV8 SMP 563*9880d681SAndroid Build Coastguard Workersystems were common, yet it doesn't support any sort of compare-and-swap 564*9880d681SAndroid Build Coastguard Workeroperation. 565*9880d681SAndroid Build Coastguard Worker 566*9880d681SAndroid Build Coastguard WorkerIn either of these cases, the Target in LLVM can claim support for atomics of an 567*9880d681SAndroid Build Coastguard Workerappropriate size, and then implement some subset of the operations via libcalls 568*9880d681SAndroid Build Coastguard Workerto a ``__sync_*`` function. Such functions *must* not use locks in their 569*9880d681SAndroid Build Coastguard Workerimplementation, because unlike the ``__atomic_*`` routines used by 570*9880d681SAndroid Build Coastguard WorkerAtomicExpandPass, these may be mixed-and-matched with native instructions by the 571*9880d681SAndroid Build Coastguard Workertarget lowering. 572*9880d681SAndroid Build Coastguard Worker 573*9880d681SAndroid Build Coastguard WorkerFurther, these routines do not need to be shared, as they are stateless. So, 574*9880d681SAndroid Build Coastguard Workerthere is no issue with having multiple copies included in one binary. Thus, 575*9880d681SAndroid Build Coastguard Workertypically these routines are implemented by the statically-linked compiler 576*9880d681SAndroid Build Coastguard Workerruntime support library. 577*9880d681SAndroid Build Coastguard Worker 578*9880d681SAndroid Build Coastguard WorkerLLVM will emit a call to an appropriate ``__sync_*`` routine if the target 579*9880d681SAndroid Build Coastguard WorkerISelLowering code has set the corresponding ``ATOMIC_CMPXCHG``, ``ATOMIC_SWAP``, 580*9880d681SAndroid Build Coastguard Workeror ``ATOMIC_LOAD_*`` operation to "Expand", and if it has opted-into the 581*9880d681SAndroid Build Coastguard Workeravailability of those library functions via a call to ``initSyncLibcalls()``. 582*9880d681SAndroid Build Coastguard Worker 583*9880d681SAndroid Build Coastguard WorkerThe full set of functions that may be called by LLVM is (for ``N`` being 1, 2, 584*9880d681SAndroid Build Coastguard Worker4, 8, or 16):: 585*9880d681SAndroid Build Coastguard Worker 586*9880d681SAndroid Build Coastguard Worker iN __sync_val_compare_and_swap_N(iN *ptr, iN expected, iN desired) 587*9880d681SAndroid Build Coastguard Worker iN __sync_lock_test_and_set_N(iN *ptr, iN val) 588*9880d681SAndroid Build Coastguard Worker iN __sync_fetch_and_add_N(iN *ptr, iN val) 589*9880d681SAndroid Build Coastguard Worker iN __sync_fetch_and_sub_N(iN *ptr, iN val) 590*9880d681SAndroid Build Coastguard Worker iN __sync_fetch_and_and_N(iN *ptr, iN val) 591*9880d681SAndroid Build Coastguard Worker iN __sync_fetch_and_or_N(iN *ptr, iN val) 592*9880d681SAndroid Build Coastguard Worker iN __sync_fetch_and_xor_N(iN *ptr, iN val) 593*9880d681SAndroid Build Coastguard Worker iN __sync_fetch_and_nand_N(iN *ptr, iN val) 594*9880d681SAndroid Build Coastguard Worker iN __sync_fetch_and_max_N(iN *ptr, iN val) 595*9880d681SAndroid Build Coastguard Worker iN __sync_fetch_and_umax_N(iN *ptr, iN val) 596*9880d681SAndroid Build Coastguard Worker iN __sync_fetch_and_min_N(iN *ptr, iN val) 597*9880d681SAndroid Build Coastguard Worker iN __sync_fetch_and_umin_N(iN *ptr, iN val) 598*9880d681SAndroid Build Coastguard Worker 599*9880d681SAndroid Build Coastguard WorkerThis list doesn't include any function for atomic load or store; all known 600*9880d681SAndroid Build Coastguard Workerarchitectures support atomic loads and stores directly (possibly by emitting a 601*9880d681SAndroid Build Coastguard Workerfence on either side of a normal load or store.) 602*9880d681SAndroid Build Coastguard Worker 603*9880d681SAndroid Build Coastguard WorkerThere's also, somewhat separately, the possibility to lower ``ATOMIC_FENCE`` to 604*9880d681SAndroid Build Coastguard Worker``__sync_synchronize()``. This may happen or not happen independent of all the 605*9880d681SAndroid Build Coastguard Workerabove, controlled purely by ``setOperationAction(ISD::ATOMIC_FENCE, ...)``. 606