xref: /aosp_15_r20/external/llvm/docs/Atomics.rst (revision 9880d6810fe72a1726cb53787c6711e909410d58)
1*9880d681SAndroid Build Coastguard Worker==============================================
2*9880d681SAndroid Build Coastguard WorkerLLVM Atomic Instructions and Concurrency Guide
3*9880d681SAndroid Build Coastguard Worker==============================================
4*9880d681SAndroid Build Coastguard Worker
5*9880d681SAndroid Build Coastguard Worker.. contents::
6*9880d681SAndroid Build Coastguard Worker   :local:
7*9880d681SAndroid Build Coastguard Worker
8*9880d681SAndroid Build Coastguard WorkerIntroduction
9*9880d681SAndroid Build Coastguard Worker============
10*9880d681SAndroid Build Coastguard Worker
11*9880d681SAndroid Build Coastguard WorkerLLVM supports instructions which are well-defined in the presence of threads and
12*9880d681SAndroid Build Coastguard Workerasynchronous signals.
13*9880d681SAndroid Build Coastguard Worker
14*9880d681SAndroid Build Coastguard WorkerThe atomic instructions are designed specifically to provide readable IR and
15*9880d681SAndroid Build Coastguard Workeroptimized code generation for the following:
16*9880d681SAndroid Build Coastguard Worker
17*9880d681SAndroid Build Coastguard Worker* The C++11 ``<atomic>`` header.  (`C++11 draft available here
18*9880d681SAndroid Build Coastguard Worker  <http://www.open-std.org/jtc1/sc22/wg21/>`_.) (`C11 draft available here
19*9880d681SAndroid Build Coastguard Worker  <http://www.open-std.org/jtc1/sc22/wg14/>`_.)
20*9880d681SAndroid Build Coastguard Worker
21*9880d681SAndroid Build Coastguard Worker* Proper semantics for Java-style memory, for both ``volatile`` and regular
22*9880d681SAndroid Build Coastguard Worker  shared variables. (`Java Specification
23*9880d681SAndroid Build Coastguard Worker  <http://docs.oracle.com/javase/specs/jls/se8/html/jls-17.html>`_)
24*9880d681SAndroid Build Coastguard Worker
25*9880d681SAndroid Build Coastguard Worker* gcc-compatible ``__sync_*`` builtins. (`Description
26*9880d681SAndroid Build Coastguard Worker  <https://gcc.gnu.org/onlinedocs/gcc/_005f_005fsync-Builtins.html>`_)
27*9880d681SAndroid Build Coastguard Worker
28*9880d681SAndroid Build Coastguard Worker* Other scenarios with atomic semantics, including ``static`` variables with
29*9880d681SAndroid Build Coastguard Worker  non-trivial constructors in C++.
30*9880d681SAndroid Build Coastguard Worker
31*9880d681SAndroid Build Coastguard WorkerAtomic and volatile in the IR are orthogonal; "volatile" is the C/C++ volatile,
32*9880d681SAndroid Build Coastguard Workerwhich ensures that every volatile load and store happens and is performed in the
33*9880d681SAndroid Build Coastguard Workerstated order.  A couple examples: if a SequentiallyConsistent store is
34*9880d681SAndroid Build Coastguard Workerimmediately followed by another SequentiallyConsistent store to the same
35*9880d681SAndroid Build Coastguard Workeraddress, the first store can be erased. This transformation is not allowed for a
36*9880d681SAndroid Build Coastguard Workerpair of volatile stores. On the other hand, a non-volatile non-atomic load can
37*9880d681SAndroid Build Coastguard Workerbe moved across a volatile load freely, but not an Acquire load.
38*9880d681SAndroid Build Coastguard Worker
39*9880d681SAndroid Build Coastguard WorkerThis document is intended to provide a guide to anyone either writing a frontend
40*9880d681SAndroid Build Coastguard Workerfor LLVM or working on optimization passes for LLVM with a guide for how to deal
41*9880d681SAndroid Build Coastguard Workerwith instructions with special semantics in the presence of concurrency.  This
42*9880d681SAndroid Build Coastguard Workeris not intended to be a precise guide to the semantics; the details can get
43*9880d681SAndroid Build Coastguard Workerextremely complicated and unreadable, and are not usually necessary.
44*9880d681SAndroid Build Coastguard Worker
45*9880d681SAndroid Build Coastguard Worker.. _Optimization outside atomic:
46*9880d681SAndroid Build Coastguard Worker
47*9880d681SAndroid Build Coastguard WorkerOptimization outside atomic
48*9880d681SAndroid Build Coastguard Worker===========================
49*9880d681SAndroid Build Coastguard Worker
50*9880d681SAndroid Build Coastguard WorkerThe basic ``'load'`` and ``'store'`` allow a variety of optimizations, but can
51*9880d681SAndroid Build Coastguard Workerlead to undefined results in a concurrent environment; see `NotAtomic`_. This
52*9880d681SAndroid Build Coastguard Workersection specifically goes into the one optimizer restriction which applies in
53*9880d681SAndroid Build Coastguard Workerconcurrent environments, which gets a bit more of an extended description
54*9880d681SAndroid Build Coastguard Workerbecause any optimization dealing with stores needs to be aware of it.
55*9880d681SAndroid Build Coastguard Worker
56*9880d681SAndroid Build Coastguard WorkerFrom the optimizer's point of view, the rule is that if there are not any
57*9880d681SAndroid Build Coastguard Workerinstructions with atomic ordering involved, concurrency does not matter, with
58*9880d681SAndroid Build Coastguard Workerone exception: if a variable might be visible to another thread or signal
59*9880d681SAndroid Build Coastguard Workerhandler, a store cannot be inserted along a path where it might not execute
60*9880d681SAndroid Build Coastguard Workerotherwise.  Take the following example:
61*9880d681SAndroid Build Coastguard Worker
62*9880d681SAndroid Build Coastguard Worker.. code-block:: c
63*9880d681SAndroid Build Coastguard Worker
64*9880d681SAndroid Build Coastguard Worker /* C code, for readability; run through clang -O2 -S -emit-llvm to get
65*9880d681SAndroid Build Coastguard Worker     equivalent IR */
66*9880d681SAndroid Build Coastguard Worker  int x;
67*9880d681SAndroid Build Coastguard Worker  void f(int* a) {
68*9880d681SAndroid Build Coastguard Worker    for (int i = 0; i < 100; i++) {
69*9880d681SAndroid Build Coastguard Worker      if (a[i])
70*9880d681SAndroid Build Coastguard Worker        x += 1;
71*9880d681SAndroid Build Coastguard Worker    }
72*9880d681SAndroid Build Coastguard Worker  }
73*9880d681SAndroid Build Coastguard Worker
74*9880d681SAndroid Build Coastguard WorkerThe following is equivalent in non-concurrent situations:
75*9880d681SAndroid Build Coastguard Worker
76*9880d681SAndroid Build Coastguard Worker.. code-block:: c
77*9880d681SAndroid Build Coastguard Worker
78*9880d681SAndroid Build Coastguard Worker  int x;
79*9880d681SAndroid Build Coastguard Worker  void f(int* a) {
80*9880d681SAndroid Build Coastguard Worker    int xtemp = x;
81*9880d681SAndroid Build Coastguard Worker    for (int i = 0; i < 100; i++) {
82*9880d681SAndroid Build Coastguard Worker      if (a[i])
83*9880d681SAndroid Build Coastguard Worker        xtemp += 1;
84*9880d681SAndroid Build Coastguard Worker    }
85*9880d681SAndroid Build Coastguard Worker    x = xtemp;
86*9880d681SAndroid Build Coastguard Worker  }
87*9880d681SAndroid Build Coastguard Worker
88*9880d681SAndroid Build Coastguard WorkerHowever, LLVM is not allowed to transform the former to the latter: it could
89*9880d681SAndroid Build Coastguard Workerindirectly introduce undefined behavior if another thread can access ``x`` at
90*9880d681SAndroid Build Coastguard Workerthe same time. (This example is particularly of interest because before the
91*9880d681SAndroid Build Coastguard Workerconcurrency model was implemented, LLVM would perform this transformation.)
92*9880d681SAndroid Build Coastguard Worker
93*9880d681SAndroid Build Coastguard WorkerNote that speculative loads are allowed; a load which is part of a race returns
94*9880d681SAndroid Build Coastguard Worker``undef``, but does not have undefined behavior.
95*9880d681SAndroid Build Coastguard Worker
96*9880d681SAndroid Build Coastguard WorkerAtomic instructions
97*9880d681SAndroid Build Coastguard Worker===================
98*9880d681SAndroid Build Coastguard Worker
99*9880d681SAndroid Build Coastguard WorkerFor cases where simple loads and stores are not sufficient, LLVM provides
100*9880d681SAndroid Build Coastguard Workervarious atomic instructions. The exact guarantees provided depend on the
101*9880d681SAndroid Build Coastguard Workerordering; see `Atomic orderings`_.
102*9880d681SAndroid Build Coastguard Worker
103*9880d681SAndroid Build Coastguard Worker``load atomic`` and ``store atomic`` provide the same basic functionality as
104*9880d681SAndroid Build Coastguard Workernon-atomic loads and stores, but provide additional guarantees in situations
105*9880d681SAndroid Build Coastguard Workerwhere threads and signals are involved.
106*9880d681SAndroid Build Coastguard Worker
107*9880d681SAndroid Build Coastguard Worker``cmpxchg`` and ``atomicrmw`` are essentially like an atomic load followed by an
108*9880d681SAndroid Build Coastguard Workeratomic store (where the store is conditional for ``cmpxchg``), but no other
109*9880d681SAndroid Build Coastguard Workermemory operation can happen on any thread between the load and store.
110*9880d681SAndroid Build Coastguard Worker
111*9880d681SAndroid Build Coastguard WorkerA ``fence`` provides Acquire and/or Release ordering which is not part of
112*9880d681SAndroid Build Coastguard Workeranother operation; it is normally used along with Monotonic memory operations.
113*9880d681SAndroid Build Coastguard WorkerA Monotonic load followed by an Acquire fence is roughly equivalent to an
114*9880d681SAndroid Build Coastguard WorkerAcquire load, and a Monotonic store following a Release fence is roughly
115*9880d681SAndroid Build Coastguard Workerequivalent to a Release store. SequentiallyConsistent fences behave as both
116*9880d681SAndroid Build Coastguard Workeran Acquire and a Release fence, and offer some additional complicated
117*9880d681SAndroid Build Coastguard Workerguarantees, see the C++11 standard for details.
118*9880d681SAndroid Build Coastguard Worker
119*9880d681SAndroid Build Coastguard WorkerFrontends generating atomic instructions generally need to be aware of the
120*9880d681SAndroid Build Coastguard Workertarget to some degree; atomic instructions are guaranteed to be lock-free, and
121*9880d681SAndroid Build Coastguard Workertherefore an instruction which is wider than the target natively supports can be
122*9880d681SAndroid Build Coastguard Workerimpossible to generate.
123*9880d681SAndroid Build Coastguard Worker
124*9880d681SAndroid Build Coastguard Worker.. _Atomic orderings:
125*9880d681SAndroid Build Coastguard Worker
126*9880d681SAndroid Build Coastguard WorkerAtomic orderings
127*9880d681SAndroid Build Coastguard Worker================
128*9880d681SAndroid Build Coastguard Worker
129*9880d681SAndroid Build Coastguard WorkerIn order to achieve a balance between performance and necessary guarantees,
130*9880d681SAndroid Build Coastguard Workerthere are six levels of atomicity. They are listed in order of strength; each
131*9880d681SAndroid Build Coastguard Workerlevel includes all the guarantees of the previous level except for
132*9880d681SAndroid Build Coastguard WorkerAcquire/Release. (See also `LangRef Ordering <LangRef.html#ordering>`_.)
133*9880d681SAndroid Build Coastguard Worker
134*9880d681SAndroid Build Coastguard Worker.. _NotAtomic:
135*9880d681SAndroid Build Coastguard Worker
136*9880d681SAndroid Build Coastguard WorkerNotAtomic
137*9880d681SAndroid Build Coastguard Worker---------
138*9880d681SAndroid Build Coastguard Worker
139*9880d681SAndroid Build Coastguard WorkerNotAtomic is the obvious, a load or store which is not atomic. (This isn't
140*9880d681SAndroid Build Coastguard Workerreally a level of atomicity, but is listed here for comparison.) This is
141*9880d681SAndroid Build Coastguard Workeressentially a regular load or store. If there is a race on a given memory
142*9880d681SAndroid Build Coastguard Workerlocation, loads from that location return undef.
143*9880d681SAndroid Build Coastguard Worker
144*9880d681SAndroid Build Coastguard WorkerRelevant standard
145*9880d681SAndroid Build Coastguard Worker  This is intended to match shared variables in C/C++, and to be used in any
146*9880d681SAndroid Build Coastguard Worker  other context where memory access is necessary, and a race is impossible. (The
147*9880d681SAndroid Build Coastguard Worker  precise definition is in `LangRef Memory Model <LangRef.html#memmodel>`_.)
148*9880d681SAndroid Build Coastguard Worker
149*9880d681SAndroid Build Coastguard WorkerNotes for frontends
150*9880d681SAndroid Build Coastguard Worker  The rule is essentially that all memory accessed with basic loads and stores
151*9880d681SAndroid Build Coastguard Worker  by multiple threads should be protected by a lock or other synchronization;
152*9880d681SAndroid Build Coastguard Worker  otherwise, you are likely to run into undefined behavior. If your frontend is
153*9880d681SAndroid Build Coastguard Worker  for a "safe" language like Java, use Unordered to load and store any shared
154*9880d681SAndroid Build Coastguard Worker  variable.  Note that NotAtomic volatile loads and stores are not properly
155*9880d681SAndroid Build Coastguard Worker  atomic; do not try to use them as a substitute. (Per the C/C++ standards,
156*9880d681SAndroid Build Coastguard Worker  volatile does provide some limited guarantees around asynchronous signals, but
157*9880d681SAndroid Build Coastguard Worker  atomics are generally a better solution.)
158*9880d681SAndroid Build Coastguard Worker
159*9880d681SAndroid Build Coastguard WorkerNotes for optimizers
160*9880d681SAndroid Build Coastguard Worker  Introducing loads to shared variables along a codepath where they would not
161*9880d681SAndroid Build Coastguard Worker  otherwise exist is allowed; introducing stores to shared variables is not. See
162*9880d681SAndroid Build Coastguard Worker  `Optimization outside atomic`_.
163*9880d681SAndroid Build Coastguard Worker
164*9880d681SAndroid Build Coastguard WorkerNotes for code generation
165*9880d681SAndroid Build Coastguard Worker  The one interesting restriction here is that it is not allowed to write to
166*9880d681SAndroid Build Coastguard Worker  bytes outside of the bytes relevant to a store.  This is mostly relevant to
167*9880d681SAndroid Build Coastguard Worker  unaligned stores: it is not allowed in general to convert an unaligned store
168*9880d681SAndroid Build Coastguard Worker  into two aligned stores of the same width as the unaligned store. Backends are
169*9880d681SAndroid Build Coastguard Worker  also expected to generate an i8 store as an i8 store, and not an instruction
170*9880d681SAndroid Build Coastguard Worker  which writes to surrounding bytes.  (If you are writing a backend for an
171*9880d681SAndroid Build Coastguard Worker  architecture which cannot satisfy these restrictions and cares about
172*9880d681SAndroid Build Coastguard Worker  concurrency, please send an email to llvm-dev.)
173*9880d681SAndroid Build Coastguard Worker
174*9880d681SAndroid Build Coastguard WorkerUnordered
175*9880d681SAndroid Build Coastguard Worker---------
176*9880d681SAndroid Build Coastguard Worker
177*9880d681SAndroid Build Coastguard WorkerUnordered is the lowest level of atomicity. It essentially guarantees that races
178*9880d681SAndroid Build Coastguard Workerproduce somewhat sane results instead of having undefined behavior.  It also
179*9880d681SAndroid Build Coastguard Workerguarantees the operation to be lock-free, so it does not depend on the data
180*9880d681SAndroid Build Coastguard Workerbeing part of a special atomic structure or depend on a separate per-process
181*9880d681SAndroid Build Coastguard Workerglobal lock.  Note that code generation will fail for unsupported atomic
182*9880d681SAndroid Build Coastguard Workeroperations; if you need such an operation, use explicit locking.
183*9880d681SAndroid Build Coastguard Worker
184*9880d681SAndroid Build Coastguard WorkerRelevant standard
185*9880d681SAndroid Build Coastguard Worker  This is intended to match the Java memory model for shared variables.
186*9880d681SAndroid Build Coastguard Worker
187*9880d681SAndroid Build Coastguard WorkerNotes for frontends
188*9880d681SAndroid Build Coastguard Worker  This cannot be used for synchronization, but is useful for Java and other
189*9880d681SAndroid Build Coastguard Worker  "safe" languages which need to guarantee that the generated code never
190*9880d681SAndroid Build Coastguard Worker  exhibits undefined behavior. Note that this guarantee is cheap on common
191*9880d681SAndroid Build Coastguard Worker  platforms for loads of a native width, but can be expensive or unavailable for
192*9880d681SAndroid Build Coastguard Worker  wider loads, like a 64-bit store on ARM. (A frontend for Java or other "safe"
193*9880d681SAndroid Build Coastguard Worker  languages would normally split a 64-bit store on ARM into two 32-bit unordered
194*9880d681SAndroid Build Coastguard Worker  stores.)
195*9880d681SAndroid Build Coastguard Worker
196*9880d681SAndroid Build Coastguard WorkerNotes for optimizers
197*9880d681SAndroid Build Coastguard Worker  In terms of the optimizer, this prohibits any transformation that transforms a
198*9880d681SAndroid Build Coastguard Worker  single load into multiple loads, transforms a store into multiple stores,
199*9880d681SAndroid Build Coastguard Worker  narrows a store, or stores a value which would not be stored otherwise.  Some
200*9880d681SAndroid Build Coastguard Worker  examples of unsafe optimizations are narrowing an assignment into a bitfield,
201*9880d681SAndroid Build Coastguard Worker  rematerializing a load, and turning loads and stores into a memcpy
202*9880d681SAndroid Build Coastguard Worker  call. Reordering unordered operations is safe, though, and optimizers should
203*9880d681SAndroid Build Coastguard Worker  take advantage of that because unordered operations are common in languages
204*9880d681SAndroid Build Coastguard Worker  that need them.
205*9880d681SAndroid Build Coastguard Worker
206*9880d681SAndroid Build Coastguard WorkerNotes for code generation
207*9880d681SAndroid Build Coastguard Worker  These operations are required to be atomic in the sense that if you use
208*9880d681SAndroid Build Coastguard Worker  unordered loads and unordered stores, a load cannot see a value which was
209*9880d681SAndroid Build Coastguard Worker  never stored.  A normal load or store instruction is usually sufficient, but
210*9880d681SAndroid Build Coastguard Worker  note that an unordered load or store cannot be split into multiple
211*9880d681SAndroid Build Coastguard Worker  instructions (or an instruction which does multiple memory operations, like
212*9880d681SAndroid Build Coastguard Worker  ``LDRD`` on ARM without LPAE, or not naturally-aligned ``LDRD`` on LPAE ARM).
213*9880d681SAndroid Build Coastguard Worker
214*9880d681SAndroid Build Coastguard WorkerMonotonic
215*9880d681SAndroid Build Coastguard Worker---------
216*9880d681SAndroid Build Coastguard Worker
217*9880d681SAndroid Build Coastguard WorkerMonotonic is the weakest level of atomicity that can be used in synchronization
218*9880d681SAndroid Build Coastguard Workerprimitives, although it does not provide any general synchronization. It
219*9880d681SAndroid Build Coastguard Workeressentially guarantees that if you take all the operations affecting a specific
220*9880d681SAndroid Build Coastguard Workeraddress, a consistent ordering exists.
221*9880d681SAndroid Build Coastguard Worker
222*9880d681SAndroid Build Coastguard WorkerRelevant standard
223*9880d681SAndroid Build Coastguard Worker  This corresponds to the C++11/C11 ``memory_order_relaxed``; see those
224*9880d681SAndroid Build Coastguard Worker  standards for the exact definition.
225*9880d681SAndroid Build Coastguard Worker
226*9880d681SAndroid Build Coastguard WorkerNotes for frontends
227*9880d681SAndroid Build Coastguard Worker  If you are writing a frontend which uses this directly, use with caution.  The
228*9880d681SAndroid Build Coastguard Worker  guarantees in terms of synchronization are very weak, so make sure these are
229*9880d681SAndroid Build Coastguard Worker  only used in a pattern which you know is correct.  Generally, these would
230*9880d681SAndroid Build Coastguard Worker  either be used for atomic operations which do not protect other memory (like
231*9880d681SAndroid Build Coastguard Worker  an atomic counter), or along with a ``fence``.
232*9880d681SAndroid Build Coastguard Worker
233*9880d681SAndroid Build Coastguard WorkerNotes for optimizers
234*9880d681SAndroid Build Coastguard Worker  In terms of the optimizer, this can be treated as a read+write on the relevant
235*9880d681SAndroid Build Coastguard Worker  memory location (and alias analysis will take advantage of that). In addition,
236*9880d681SAndroid Build Coastguard Worker  it is legal to reorder non-atomic and Unordered loads around Monotonic
237*9880d681SAndroid Build Coastguard Worker  loads. CSE/DSE and a few other optimizations are allowed, but Monotonic
238*9880d681SAndroid Build Coastguard Worker  operations are unlikely to be used in ways which would make those
239*9880d681SAndroid Build Coastguard Worker  optimizations useful.
240*9880d681SAndroid Build Coastguard Worker
241*9880d681SAndroid Build Coastguard WorkerNotes for code generation
242*9880d681SAndroid Build Coastguard Worker  Code generation is essentially the same as that for unordered for loads and
243*9880d681SAndroid Build Coastguard Worker  stores.  No fences are required.  ``cmpxchg`` and ``atomicrmw`` are required
244*9880d681SAndroid Build Coastguard Worker  to appear as a single operation.
245*9880d681SAndroid Build Coastguard Worker
246*9880d681SAndroid Build Coastguard WorkerAcquire
247*9880d681SAndroid Build Coastguard Worker-------
248*9880d681SAndroid Build Coastguard Worker
249*9880d681SAndroid Build Coastguard WorkerAcquire provides a barrier of the sort necessary to acquire a lock to access
250*9880d681SAndroid Build Coastguard Workerother memory with normal loads and stores.
251*9880d681SAndroid Build Coastguard Worker
252*9880d681SAndroid Build Coastguard WorkerRelevant standard
253*9880d681SAndroid Build Coastguard Worker  This corresponds to the C++11/C11 ``memory_order_acquire``. It should also be
254*9880d681SAndroid Build Coastguard Worker  used for C++11/C11 ``memory_order_consume``.
255*9880d681SAndroid Build Coastguard Worker
256*9880d681SAndroid Build Coastguard WorkerNotes for frontends
257*9880d681SAndroid Build Coastguard Worker  If you are writing a frontend which uses this directly, use with caution.
258*9880d681SAndroid Build Coastguard Worker  Acquire only provides a semantic guarantee when paired with a Release
259*9880d681SAndroid Build Coastguard Worker  operation.
260*9880d681SAndroid Build Coastguard Worker
261*9880d681SAndroid Build Coastguard WorkerNotes for optimizers
262*9880d681SAndroid Build Coastguard Worker  Optimizers not aware of atomics can treat this like a nothrow call.  It is
263*9880d681SAndroid Build Coastguard Worker  also possible to move stores from before an Acquire load or read-modify-write
264*9880d681SAndroid Build Coastguard Worker  operation to after it, and move non-Acquire loads from before an Acquire
265*9880d681SAndroid Build Coastguard Worker  operation to after it.
266*9880d681SAndroid Build Coastguard Worker
267*9880d681SAndroid Build Coastguard WorkerNotes for code generation
268*9880d681SAndroid Build Coastguard Worker  Architectures with weak memory ordering (essentially everything relevant today
269*9880d681SAndroid Build Coastguard Worker  except x86 and SPARC) require some sort of fence to maintain the Acquire
270*9880d681SAndroid Build Coastguard Worker  semantics.  The precise fences required varies widely by architecture, but for
271*9880d681SAndroid Build Coastguard Worker  a simple implementation, most architectures provide a barrier which is strong
272*9880d681SAndroid Build Coastguard Worker  enough for everything (``dmb`` on ARM, ``sync`` on PowerPC, etc.).  Putting
273*9880d681SAndroid Build Coastguard Worker  such a fence after the equivalent Monotonic operation is sufficient to
274*9880d681SAndroid Build Coastguard Worker  maintain Acquire semantics for a memory operation.
275*9880d681SAndroid Build Coastguard Worker
276*9880d681SAndroid Build Coastguard WorkerRelease
277*9880d681SAndroid Build Coastguard Worker-------
278*9880d681SAndroid Build Coastguard Worker
279*9880d681SAndroid Build Coastguard WorkerRelease is similar to Acquire, but with a barrier of the sort necessary to
280*9880d681SAndroid Build Coastguard Workerrelease a lock.
281*9880d681SAndroid Build Coastguard Worker
282*9880d681SAndroid Build Coastguard WorkerRelevant standard
283*9880d681SAndroid Build Coastguard Worker  This corresponds to the C++11/C11 ``memory_order_release``.
284*9880d681SAndroid Build Coastguard Worker
285*9880d681SAndroid Build Coastguard WorkerNotes for frontends
286*9880d681SAndroid Build Coastguard Worker  If you are writing a frontend which uses this directly, use with caution.
287*9880d681SAndroid Build Coastguard Worker  Release only provides a semantic guarantee when paired with a Acquire
288*9880d681SAndroid Build Coastguard Worker  operation.
289*9880d681SAndroid Build Coastguard Worker
290*9880d681SAndroid Build Coastguard WorkerNotes for optimizers
291*9880d681SAndroid Build Coastguard Worker  Optimizers not aware of atomics can treat this like a nothrow call.  It is
292*9880d681SAndroid Build Coastguard Worker  also possible to move loads from after a Release store or read-modify-write
293*9880d681SAndroid Build Coastguard Worker  operation to before it, and move non-Release stores from after an Release
294*9880d681SAndroid Build Coastguard Worker  operation to before it.
295*9880d681SAndroid Build Coastguard Worker
296*9880d681SAndroid Build Coastguard WorkerNotes for code generation
297*9880d681SAndroid Build Coastguard Worker  See the section on Acquire; a fence before the relevant operation is usually
298*9880d681SAndroid Build Coastguard Worker  sufficient for Release. Note that a store-store fence is not sufficient to
299*9880d681SAndroid Build Coastguard Worker  implement Release semantics; store-store fences are generally not exposed to
300*9880d681SAndroid Build Coastguard Worker  IR because they are extremely difficult to use correctly.
301*9880d681SAndroid Build Coastguard Worker
302*9880d681SAndroid Build Coastguard WorkerAcquireRelease
303*9880d681SAndroid Build Coastguard Worker--------------
304*9880d681SAndroid Build Coastguard Worker
305*9880d681SAndroid Build Coastguard WorkerAcquireRelease (``acq_rel`` in IR) provides both an Acquire and a Release
306*9880d681SAndroid Build Coastguard Workerbarrier (for fences and operations which both read and write memory).
307*9880d681SAndroid Build Coastguard Worker
308*9880d681SAndroid Build Coastguard WorkerRelevant standard
309*9880d681SAndroid Build Coastguard Worker  This corresponds to the C++11/C11 ``memory_order_acq_rel``.
310*9880d681SAndroid Build Coastguard Worker
311*9880d681SAndroid Build Coastguard WorkerNotes for frontends
312*9880d681SAndroid Build Coastguard Worker  If you are writing a frontend which uses this directly, use with caution.
313*9880d681SAndroid Build Coastguard Worker  Acquire only provides a semantic guarantee when paired with a Release
314*9880d681SAndroid Build Coastguard Worker  operation, and vice versa.
315*9880d681SAndroid Build Coastguard Worker
316*9880d681SAndroid Build Coastguard WorkerNotes for optimizers
317*9880d681SAndroid Build Coastguard Worker  In general, optimizers should treat this like a nothrow call; the possible
318*9880d681SAndroid Build Coastguard Worker  optimizations are usually not interesting.
319*9880d681SAndroid Build Coastguard Worker
320*9880d681SAndroid Build Coastguard WorkerNotes for code generation
321*9880d681SAndroid Build Coastguard Worker  This operation has Acquire and Release semantics; see the sections on Acquire
322*9880d681SAndroid Build Coastguard Worker  and Release.
323*9880d681SAndroid Build Coastguard Worker
324*9880d681SAndroid Build Coastguard WorkerSequentiallyConsistent
325*9880d681SAndroid Build Coastguard Worker----------------------
326*9880d681SAndroid Build Coastguard Worker
327*9880d681SAndroid Build Coastguard WorkerSequentiallyConsistent (``seq_cst`` in IR) provides Acquire semantics for loads
328*9880d681SAndroid Build Coastguard Workerand Release semantics for stores. Additionally, it guarantees that a total
329*9880d681SAndroid Build Coastguard Workerordering exists between all SequentiallyConsistent operations.
330*9880d681SAndroid Build Coastguard Worker
331*9880d681SAndroid Build Coastguard WorkerRelevant standard
332*9880d681SAndroid Build Coastguard Worker  This corresponds to the C++11/C11 ``memory_order_seq_cst``, Java volatile, and
333*9880d681SAndroid Build Coastguard Worker  the gcc-compatible ``__sync_*`` builtins which do not specify otherwise.
334*9880d681SAndroid Build Coastguard Worker
335*9880d681SAndroid Build Coastguard WorkerNotes for frontends
336*9880d681SAndroid Build Coastguard Worker  If a frontend is exposing atomic operations, these are much easier to reason
337*9880d681SAndroid Build Coastguard Worker  about for the programmer than other kinds of operations, and using them is
338*9880d681SAndroid Build Coastguard Worker  generally a practical performance tradeoff.
339*9880d681SAndroid Build Coastguard Worker
340*9880d681SAndroid Build Coastguard WorkerNotes for optimizers
341*9880d681SAndroid Build Coastguard Worker  Optimizers not aware of atomics can treat this like a nothrow call.  For
342*9880d681SAndroid Build Coastguard Worker  SequentiallyConsistent loads and stores, the same reorderings are allowed as
343*9880d681SAndroid Build Coastguard Worker  for Acquire loads and Release stores, except that SequentiallyConsistent
344*9880d681SAndroid Build Coastguard Worker  operations may not be reordered.
345*9880d681SAndroid Build Coastguard Worker
346*9880d681SAndroid Build Coastguard WorkerNotes for code generation
347*9880d681SAndroid Build Coastguard Worker  SequentiallyConsistent loads minimally require the same barriers as Acquire
348*9880d681SAndroid Build Coastguard Worker  operations and SequentiallyConsistent stores require Release
349*9880d681SAndroid Build Coastguard Worker  barriers. Additionally, the code generator must enforce ordering between
350*9880d681SAndroid Build Coastguard Worker  SequentiallyConsistent stores followed by SequentiallyConsistent loads. This
351*9880d681SAndroid Build Coastguard Worker  is usually done by emitting either a full fence before the loads or a full
352*9880d681SAndroid Build Coastguard Worker  fence after the stores; which is preferred varies by architecture.
353*9880d681SAndroid Build Coastguard Worker
354*9880d681SAndroid Build Coastguard WorkerAtomics and IR optimization
355*9880d681SAndroid Build Coastguard Worker===========================
356*9880d681SAndroid Build Coastguard Worker
357*9880d681SAndroid Build Coastguard WorkerPredicates for optimizer writers to query:
358*9880d681SAndroid Build Coastguard Worker
359*9880d681SAndroid Build Coastguard Worker* ``isSimple()``: A load or store which is not volatile or atomic.  This is
360*9880d681SAndroid Build Coastguard Worker  what, for example, memcpyopt would check for operations it might transform.
361*9880d681SAndroid Build Coastguard Worker
362*9880d681SAndroid Build Coastguard Worker* ``isUnordered()``: A load or store which is not volatile and at most
363*9880d681SAndroid Build Coastguard Worker  Unordered. This would be checked, for example, by LICM before hoisting an
364*9880d681SAndroid Build Coastguard Worker  operation.
365*9880d681SAndroid Build Coastguard Worker
366*9880d681SAndroid Build Coastguard Worker* ``mayReadFromMemory()``/``mayWriteToMemory()``: Existing predicate, but note
367*9880d681SAndroid Build Coastguard Worker  that they return true for any operation which is volatile or at least
368*9880d681SAndroid Build Coastguard Worker  Monotonic.
369*9880d681SAndroid Build Coastguard Worker
370*9880d681SAndroid Build Coastguard Worker* ``isStrongerThan`` / ``isAtLeastOrStrongerThan``: These are predicates on
371*9880d681SAndroid Build Coastguard Worker  orderings. They can be useful for passes that are aware of atomics, for
372*9880d681SAndroid Build Coastguard Worker  example to do DSE across a single atomic access, but not across a
373*9880d681SAndroid Build Coastguard Worker  release-acquire pair (see MemoryDependencyAnalysis for an example of this)
374*9880d681SAndroid Build Coastguard Worker
375*9880d681SAndroid Build Coastguard Worker* Alias analysis: Note that AA will return ModRef for anything Acquire or
376*9880d681SAndroid Build Coastguard Worker  Release, and for the address accessed by any Monotonic operation.
377*9880d681SAndroid Build Coastguard Worker
378*9880d681SAndroid Build Coastguard WorkerTo support optimizing around atomic operations, make sure you are using the
379*9880d681SAndroid Build Coastguard Workerright predicates; everything should work if that is done.  If your pass should
380*9880d681SAndroid Build Coastguard Workeroptimize some atomic operations (Unordered operations in particular), make sure
381*9880d681SAndroid Build Coastguard Workerit doesn't replace an atomic load or store with a non-atomic operation.
382*9880d681SAndroid Build Coastguard Worker
383*9880d681SAndroid Build Coastguard WorkerSome examples of how optimizations interact with various kinds of atomic
384*9880d681SAndroid Build Coastguard Workeroperations:
385*9880d681SAndroid Build Coastguard Worker
386*9880d681SAndroid Build Coastguard Worker* ``memcpyopt``: An atomic operation cannot be optimized into part of a
387*9880d681SAndroid Build Coastguard Worker  memcpy/memset, including unordered loads/stores.  It can pull operations
388*9880d681SAndroid Build Coastguard Worker  across some atomic operations.
389*9880d681SAndroid Build Coastguard Worker
390*9880d681SAndroid Build Coastguard Worker* LICM: Unordered loads/stores can be moved out of a loop.  It just treats
391*9880d681SAndroid Build Coastguard Worker  monotonic operations like a read+write to a memory location, and anything
392*9880d681SAndroid Build Coastguard Worker  stricter than that like a nothrow call.
393*9880d681SAndroid Build Coastguard Worker
394*9880d681SAndroid Build Coastguard Worker* DSE: Unordered stores can be DSE'ed like normal stores.  Monotonic stores can
395*9880d681SAndroid Build Coastguard Worker  be DSE'ed in some cases, but it's tricky to reason about, and not especially
396*9880d681SAndroid Build Coastguard Worker  important. It is possible in some case for DSE to operate across a stronger
397*9880d681SAndroid Build Coastguard Worker  atomic operation, but it is fairly tricky. DSE delegates this reasoning to
398*9880d681SAndroid Build Coastguard Worker  MemoryDependencyAnalysis (which is also used by other passes like GVN).
399*9880d681SAndroid Build Coastguard Worker
400*9880d681SAndroid Build Coastguard Worker* Folding a load: Any atomic load from a constant global can be constant-folded,
401*9880d681SAndroid Build Coastguard Worker  because it cannot be observed.  Similar reasoning allows sroa with
402*9880d681SAndroid Build Coastguard Worker  atomic loads and stores.
403*9880d681SAndroid Build Coastguard Worker
404*9880d681SAndroid Build Coastguard WorkerAtomics and Codegen
405*9880d681SAndroid Build Coastguard Worker===================
406*9880d681SAndroid Build Coastguard Worker
407*9880d681SAndroid Build Coastguard WorkerAtomic operations are represented in the SelectionDAG with ``ATOMIC_*`` opcodes.
408*9880d681SAndroid Build Coastguard WorkerOn architectures which use barrier instructions for all atomic ordering (like
409*9880d681SAndroid Build Coastguard WorkerARM), appropriate fences can be emitted by the AtomicExpand Codegen pass if
410*9880d681SAndroid Build Coastguard Worker``setInsertFencesForAtomic()`` was used.
411*9880d681SAndroid Build Coastguard Worker
412*9880d681SAndroid Build Coastguard WorkerThe MachineMemOperand for all atomic operations is currently marked as volatile;
413*9880d681SAndroid Build Coastguard Workerthis is not correct in the IR sense of volatile, but CodeGen handles anything
414*9880d681SAndroid Build Coastguard Workermarked volatile very conservatively.  This should get fixed at some point.
415*9880d681SAndroid Build Coastguard Worker
416*9880d681SAndroid Build Coastguard WorkerOne very important property of the atomic operations is that if your backend
417*9880d681SAndroid Build Coastguard Workersupports any inline lock-free atomic operations of a given size, you should
418*9880d681SAndroid Build Coastguard Workersupport *ALL* operations of that size in a lock-free manner.
419*9880d681SAndroid Build Coastguard Worker
420*9880d681SAndroid Build Coastguard WorkerWhen the target implements atomic ``cmpxchg`` or LL/SC instructions (as most do)
421*9880d681SAndroid Build Coastguard Workerthis is trivial: all the other operations can be implemented on top of those
422*9880d681SAndroid Build Coastguard Workerprimitives. However, on many older CPUs (e.g. ARMv5, SparcV8, Intel 80386) there
423*9880d681SAndroid Build Coastguard Workerare atomic load and store instructions, but no ``cmpxchg`` or LL/SC. As it is
424*9880d681SAndroid Build Coastguard Workerinvalid to implement ``atomic load`` using the native instruction, but
425*9880d681SAndroid Build Coastguard Worker``cmpxchg`` using a library call to a function that uses a mutex, ``atomic
426*9880d681SAndroid Build Coastguard Workerload`` must *also* expand to a library call on such architectures, so that it
427*9880d681SAndroid Build Coastguard Workercan remain atomic with regards to a simultaneous ``cmpxchg``, by using the same
428*9880d681SAndroid Build Coastguard Workermutex.
429*9880d681SAndroid Build Coastguard Worker
430*9880d681SAndroid Build Coastguard WorkerAtomicExpandPass can help with that: it will expand all atomic operations to the
431*9880d681SAndroid Build Coastguard Workerproper ``__atomic_*`` libcalls for any size above the maximum set by
432*9880d681SAndroid Build Coastguard Worker``setMaxAtomicSizeInBitsSupported`` (which defaults to 0).
433*9880d681SAndroid Build Coastguard Worker
434*9880d681SAndroid Build Coastguard WorkerOn x86, all atomic loads generate a ``MOV``. SequentiallyConsistent stores
435*9880d681SAndroid Build Coastguard Workergenerate an ``XCHG``, other stores generate a ``MOV``. SequentiallyConsistent
436*9880d681SAndroid Build Coastguard Workerfences generate an ``MFENCE``, other fences do not cause any code to be
437*9880d681SAndroid Build Coastguard Workergenerated.  ``cmpxchg`` uses the ``LOCK CMPXCHG`` instruction.  ``atomicrmw xchg``
438*9880d681SAndroid Build Coastguard Workeruses ``XCHG``, ``atomicrmw add`` and ``atomicrmw sub`` use ``XADD``, and all
439*9880d681SAndroid Build Coastguard Workerother ``atomicrmw`` operations generate a loop with ``LOCK CMPXCHG``.  Depending
440*9880d681SAndroid Build Coastguard Workeron the users of the result, some ``atomicrmw`` operations can be translated into
441*9880d681SAndroid Build Coastguard Workeroperations like ``LOCK AND``, but that does not work in general.
442*9880d681SAndroid Build Coastguard Worker
443*9880d681SAndroid Build Coastguard WorkerOn ARM (before v8), MIPS, and many other RISC architectures, Acquire, Release,
444*9880d681SAndroid Build Coastguard Workerand SequentiallyConsistent semantics require barrier instructions for every such
445*9880d681SAndroid Build Coastguard Workeroperation. Loads and stores generate normal instructions.  ``cmpxchg`` and
446*9880d681SAndroid Build Coastguard Worker``atomicrmw`` can be represented using a loop with LL/SC-style instructions
447*9880d681SAndroid Build Coastguard Workerwhich take some sort of exclusive lock on a cache line (``LDREX`` and ``STREX``
448*9880d681SAndroid Build Coastguard Workeron ARM, etc.).
449*9880d681SAndroid Build Coastguard Worker
450*9880d681SAndroid Build Coastguard WorkerIt is often easiest for backends to use AtomicExpandPass to lower some of the
451*9880d681SAndroid Build Coastguard Workeratomic constructs. Here are some lowerings it can do:
452*9880d681SAndroid Build Coastguard Worker
453*9880d681SAndroid Build Coastguard Worker* cmpxchg -> loop with load-linked/store-conditional
454*9880d681SAndroid Build Coastguard Worker  by overriding ``shouldExpandAtomicCmpXchgInIR()``, ``emitLoadLinked()``,
455*9880d681SAndroid Build Coastguard Worker  ``emitStoreConditional()``
456*9880d681SAndroid Build Coastguard Worker* large loads/stores -> ll-sc/cmpxchg
457*9880d681SAndroid Build Coastguard Worker  by overriding ``shouldExpandAtomicStoreInIR()``/``shouldExpandAtomicLoadInIR()``
458*9880d681SAndroid Build Coastguard Worker* strong atomic accesses -> monotonic accesses + fences by overriding
459*9880d681SAndroid Build Coastguard Worker  ``shouldInsertFencesForAtomic()``, ``emitLeadingFence()``, and
460*9880d681SAndroid Build Coastguard Worker  ``emitTrailingFence()``
461*9880d681SAndroid Build Coastguard Worker* atomic rmw -> loop with cmpxchg or load-linked/store-conditional
462*9880d681SAndroid Build Coastguard Worker  by overriding ``expandAtomicRMWInIR()``
463*9880d681SAndroid Build Coastguard Worker* expansion to __atomic_* libcalls for unsupported sizes.
464*9880d681SAndroid Build Coastguard Worker
465*9880d681SAndroid Build Coastguard WorkerFor an example of all of these, look at the ARM backend.
466*9880d681SAndroid Build Coastguard Worker
467*9880d681SAndroid Build Coastguard WorkerLibcalls: __atomic_*
468*9880d681SAndroid Build Coastguard Worker====================
469*9880d681SAndroid Build Coastguard Worker
470*9880d681SAndroid Build Coastguard WorkerThere are two kinds of atomic library calls that are generated by LLVM. Please
471*9880d681SAndroid Build Coastguard Workernote that both sets of library functions somewhat confusingly share the names of
472*9880d681SAndroid Build Coastguard Workerbuiltin functions defined by clang. Despite this, the library functions are
473*9880d681SAndroid Build Coastguard Workernot directly related to the builtins: it is *not* the case that ``__atomic_*``
474*9880d681SAndroid Build Coastguard Workerbuiltins lower to ``__atomic_*`` library calls and ``__sync_*`` builtins lower
475*9880d681SAndroid Build Coastguard Workerto ``__sync_*`` library calls.
476*9880d681SAndroid Build Coastguard Worker
477*9880d681SAndroid Build Coastguard WorkerThe first set of library functions are named ``__atomic_*``. This set has been
478*9880d681SAndroid Build Coastguard Worker"standardized" by GCC, and is described below. (See also `GCC's documentation
479*9880d681SAndroid Build Coastguard Worker<https://gcc.gnu.org/wiki/Atomic/GCCMM/LIbrary>`_)
480*9880d681SAndroid Build Coastguard Worker
481*9880d681SAndroid Build Coastguard WorkerLLVM's AtomicExpandPass will translate atomic operations on data sizes above
482*9880d681SAndroid Build Coastguard Worker``MaxAtomicSizeInBitsSupported`` into calls to these functions.
483*9880d681SAndroid Build Coastguard Worker
484*9880d681SAndroid Build Coastguard WorkerThere are four generic functions, which can be called with data of any size or
485*9880d681SAndroid Build Coastguard Workeralignment::
486*9880d681SAndroid Build Coastguard Worker
487*9880d681SAndroid Build Coastguard Worker   void __atomic_load(size_t size, void *ptr, void *ret, int ordering)
488*9880d681SAndroid Build Coastguard Worker   void __atomic_store(size_t size, void *ptr, void *val, int ordering)
489*9880d681SAndroid Build Coastguard Worker   void __atomic_exchange(size_t size, void *ptr, void *val, void *ret, int ordering)
490*9880d681SAndroid Build Coastguard Worker   bool __atomic_compare_exchange(size_t size, void *ptr, void *expected, void *desired, int success_order, int failure_order)
491*9880d681SAndroid Build Coastguard Worker
492*9880d681SAndroid Build Coastguard WorkerThere are also size-specialized versions of the above functions, which can only
493*9880d681SAndroid Build Coastguard Workerbe used with *naturally-aligned* pointers of the appropriate size. In the
494*9880d681SAndroid Build Coastguard Workersignatures below, "N" is one of 1, 2, 4, 8, and 16, and "iN" is the appropriate
495*9880d681SAndroid Build Coastguard Workerinteger type of that size; if no such integer type exists, the specialization
496*9880d681SAndroid Build Coastguard Workercannot be used::
497*9880d681SAndroid Build Coastguard Worker
498*9880d681SAndroid Build Coastguard Worker   iN __atomic_load_N(iN *ptr, iN val, int ordering)
499*9880d681SAndroid Build Coastguard Worker   void __atomic_store_N(iN *ptr, iN val, int ordering)
500*9880d681SAndroid Build Coastguard Worker   iN __atomic_exchange_N(iN *ptr, iN val, int ordering)
501*9880d681SAndroid Build Coastguard Worker   bool __atomic_compare_exchange_N(iN *ptr, iN *expected, iN desired, int success_order, int failure_order)
502*9880d681SAndroid Build Coastguard Worker
503*9880d681SAndroid Build Coastguard WorkerFinally there are some read-modify-write functions, which are only available in
504*9880d681SAndroid Build Coastguard Workerthe size-specific variants (any other sizes use a ``__atomic_compare_exchange``
505*9880d681SAndroid Build Coastguard Workerloop)::
506*9880d681SAndroid Build Coastguard Worker
507*9880d681SAndroid Build Coastguard Worker   iN __atomic_fetch_add_N(iN *ptr, iN val, int ordering)
508*9880d681SAndroid Build Coastguard Worker   iN __atomic_fetch_sub_N(iN *ptr, iN val, int ordering)
509*9880d681SAndroid Build Coastguard Worker   iN __atomic_fetch_and_N(iN *ptr, iN val, int ordering)
510*9880d681SAndroid Build Coastguard Worker   iN __atomic_fetch_or_N(iN *ptr, iN val, int ordering)
511*9880d681SAndroid Build Coastguard Worker   iN __atomic_fetch_xor_N(iN *ptr, iN val, int ordering)
512*9880d681SAndroid Build Coastguard Worker   iN __atomic_fetch_nand_N(iN *ptr, iN val, int ordering)
513*9880d681SAndroid Build Coastguard Worker
514*9880d681SAndroid Build Coastguard WorkerThis set of library functions have some interesting implementation requirements
515*9880d681SAndroid Build Coastguard Workerto take note of:
516*9880d681SAndroid Build Coastguard Worker
517*9880d681SAndroid Build Coastguard Worker- They support all sizes and alignments -- including those which cannot be
518*9880d681SAndroid Build Coastguard Worker  implemented natively on any existing hardware. Therefore, they will certainly
519*9880d681SAndroid Build Coastguard Worker  use mutexes in for some sizes/alignments.
520*9880d681SAndroid Build Coastguard Worker
521*9880d681SAndroid Build Coastguard Worker- As a consequence, they cannot be shipped in a statically linked
522*9880d681SAndroid Build Coastguard Worker  compiler-support library, as they have state which must be shared amongst all
523*9880d681SAndroid Build Coastguard Worker  DSOs loaded in the program. They must be provided in a shared library used by
524*9880d681SAndroid Build Coastguard Worker  all objects.
525*9880d681SAndroid Build Coastguard Worker
526*9880d681SAndroid Build Coastguard Worker- The set of atomic sizes supported lock-free must be a superset of the sizes
527*9880d681SAndroid Build Coastguard Worker  any compiler can emit. That is: if a new compiler introduces support for
528*9880d681SAndroid Build Coastguard Worker  inline-lock-free atomics of size N, the ``__atomic_*`` functions must also have a
529*9880d681SAndroid Build Coastguard Worker  lock-free implementation for size N. This is a requirement so that code
530*9880d681SAndroid Build Coastguard Worker  produced by an old compiler (which will have called the ``__atomic_*`` function)
531*9880d681SAndroid Build Coastguard Worker  interoperates with code produced by the new compiler (which will use native
532*9880d681SAndroid Build Coastguard Worker  the atomic instruction).
533*9880d681SAndroid Build Coastguard Worker
534*9880d681SAndroid Build Coastguard WorkerNote that it's possible to write an entirely target-independent implementation
535*9880d681SAndroid Build Coastguard Workerof these library functions by using the compiler atomic builtins themselves to
536*9880d681SAndroid Build Coastguard Workerimplement the operations on naturally-aligned pointers of supported sizes, and a
537*9880d681SAndroid Build Coastguard Workergeneric mutex implementation otherwise.
538*9880d681SAndroid Build Coastguard Worker
539*9880d681SAndroid Build Coastguard WorkerLibcalls: __sync_*
540*9880d681SAndroid Build Coastguard Worker==================
541*9880d681SAndroid Build Coastguard Worker
542*9880d681SAndroid Build Coastguard WorkerSome targets or OS/target combinations can support lock-free atomics, but for
543*9880d681SAndroid Build Coastguard Workervarious reasons, it is not practical to emit the instructions inline.
544*9880d681SAndroid Build Coastguard Worker
545*9880d681SAndroid Build Coastguard WorkerThere's two typical examples of this.
546*9880d681SAndroid Build Coastguard Worker
547*9880d681SAndroid Build Coastguard WorkerSome CPUs support multiple instruction sets which can be swiched back and forth
548*9880d681SAndroid Build Coastguard Workeron function-call boundaries. For example, MIPS supports the MIPS16 ISA, which
549*9880d681SAndroid Build Coastguard Workerhas a smaller instruction encoding than the usual MIPS32 ISA. ARM, similarly,
550*9880d681SAndroid Build Coastguard Workerhas the Thumb ISA. In MIPS16 and earlier versions of Thumb, the atomic
551*9880d681SAndroid Build Coastguard Workerinstructions are not encodable. However, those instructions are available via a
552*9880d681SAndroid Build Coastguard Workerfunction call to a function with the longer encoding.
553*9880d681SAndroid Build Coastguard Worker
554*9880d681SAndroid Build Coastguard WorkerAdditionally, a few OS/target pairs provide kernel-supported lock-free
555*9880d681SAndroid Build Coastguard Workeratomics. ARM/Linux is an example of this: the kernel `provides
556*9880d681SAndroid Build Coastguard Worker<https://www.kernel.org/doc/Documentation/arm/kernel_user_helpers.txt>`_ a
557*9880d681SAndroid Build Coastguard Workerfunction which on older CPUs contains a "magically-restartable" atomic sequence
558*9880d681SAndroid Build Coastguard Worker(which looks atomic so long as there's only one CPU), and contains actual atomic
559*9880d681SAndroid Build Coastguard Workerinstructions on newer multicore models. This sort of functionality can typically
560*9880d681SAndroid Build Coastguard Workerbe provided on any architecture, if all CPUs which are missing atomic
561*9880d681SAndroid Build Coastguard Workercompare-and-swap support are uniprocessor (no SMP). This is almost always the
562*9880d681SAndroid Build Coastguard Workercase. The only common architecture without that property is SPARC -- SPARCV8 SMP
563*9880d681SAndroid Build Coastguard Workersystems were common, yet it doesn't support any sort of compare-and-swap
564*9880d681SAndroid Build Coastguard Workeroperation.
565*9880d681SAndroid Build Coastguard Worker
566*9880d681SAndroid Build Coastguard WorkerIn either of these cases, the Target in LLVM can claim support for atomics of an
567*9880d681SAndroid Build Coastguard Workerappropriate size, and then implement some subset of the operations via libcalls
568*9880d681SAndroid Build Coastguard Workerto a ``__sync_*`` function. Such functions *must* not use locks in their
569*9880d681SAndroid Build Coastguard Workerimplementation, because unlike the ``__atomic_*`` routines used by
570*9880d681SAndroid Build Coastguard WorkerAtomicExpandPass, these may be mixed-and-matched with native instructions by the
571*9880d681SAndroid Build Coastguard Workertarget lowering.
572*9880d681SAndroid Build Coastguard Worker
573*9880d681SAndroid Build Coastguard WorkerFurther, these routines do not need to be shared, as they are stateless. So,
574*9880d681SAndroid Build Coastguard Workerthere is no issue with having multiple copies included in one binary. Thus,
575*9880d681SAndroid Build Coastguard Workertypically these routines are implemented by the statically-linked compiler
576*9880d681SAndroid Build Coastguard Workerruntime support library.
577*9880d681SAndroid Build Coastguard Worker
578*9880d681SAndroid Build Coastguard WorkerLLVM will emit a call to an appropriate ``__sync_*`` routine if the target
579*9880d681SAndroid Build Coastguard WorkerISelLowering code has set the corresponding ``ATOMIC_CMPXCHG``, ``ATOMIC_SWAP``,
580*9880d681SAndroid Build Coastguard Workeror ``ATOMIC_LOAD_*`` operation to "Expand", and if it has opted-into the
581*9880d681SAndroid Build Coastguard Workeravailability of those library functions via a call to ``initSyncLibcalls()``.
582*9880d681SAndroid Build Coastguard Worker
583*9880d681SAndroid Build Coastguard WorkerThe full set of functions that may be called by LLVM is (for ``N`` being 1, 2,
584*9880d681SAndroid Build Coastguard Worker4, 8, or 16)::
585*9880d681SAndroid Build Coastguard Worker
586*9880d681SAndroid Build Coastguard Worker  iN __sync_val_compare_and_swap_N(iN *ptr, iN expected, iN desired)
587*9880d681SAndroid Build Coastguard Worker  iN __sync_lock_test_and_set_N(iN *ptr, iN val)
588*9880d681SAndroid Build Coastguard Worker  iN __sync_fetch_and_add_N(iN *ptr, iN val)
589*9880d681SAndroid Build Coastguard Worker  iN __sync_fetch_and_sub_N(iN *ptr, iN val)
590*9880d681SAndroid Build Coastguard Worker  iN __sync_fetch_and_and_N(iN *ptr, iN val)
591*9880d681SAndroid Build Coastguard Worker  iN __sync_fetch_and_or_N(iN *ptr, iN val)
592*9880d681SAndroid Build Coastguard Worker  iN __sync_fetch_and_xor_N(iN *ptr, iN val)
593*9880d681SAndroid Build Coastguard Worker  iN __sync_fetch_and_nand_N(iN *ptr, iN val)
594*9880d681SAndroid Build Coastguard Worker  iN __sync_fetch_and_max_N(iN *ptr, iN val)
595*9880d681SAndroid Build Coastguard Worker  iN __sync_fetch_and_umax_N(iN *ptr, iN val)
596*9880d681SAndroid Build Coastguard Worker  iN __sync_fetch_and_min_N(iN *ptr, iN val)
597*9880d681SAndroid Build Coastguard Worker  iN __sync_fetch_and_umin_N(iN *ptr, iN val)
598*9880d681SAndroid Build Coastguard Worker
599*9880d681SAndroid Build Coastguard WorkerThis list doesn't include any function for atomic load or store; all known
600*9880d681SAndroid Build Coastguard Workerarchitectures support atomic loads and stores directly (possibly by emitting a
601*9880d681SAndroid Build Coastguard Workerfence on either side of a normal load or store.)
602*9880d681SAndroid Build Coastguard Worker
603*9880d681SAndroid Build Coastguard WorkerThere's also, somewhat separately, the possibility to lower ``ATOMIC_FENCE`` to
604*9880d681SAndroid Build Coastguard Worker``__sync_synchronize()``. This may happen or not happen independent of all the
605*9880d681SAndroid Build Coastguard Workerabove, controlled purely by ``setOperationAction(ISD::ATOMIC_FENCE, ...)``.
606