xref: /aosp_15_r20/external/pytorch/aten/src/ATen/native/README.md (revision da0073e96a02ea20f0ac840b70461e3646d07c45)
1ATen "native" functions are the modern mechanism for adding operators and
2functions to ATen.  Native functions
3are declared in `native_functions.yaml` and have implementations defined
4in one of the `cpp` files in this directory.
5
6Like all ATen methods/functions, native functions are made available
7from both ATen's C++ and Python APIs.  In C++, they are made available
8either as methods on `Tensor` (`t.mymeth()`) and functions in the ATen
9namespace (`at::myfunc()`).  In PyTorch, they are made available as
10methods on `Variable` or as functions on `torch._C._FunctionBase`.
11(It is the user's responsibility to re-export these functions in
12a more user-facing module.)
13
14The rest of this document describes how to implement an ATen function.
15
16## Registering a function in `native_functions.yaml`
17
18Every native function must have an entry in
19`native_functions.yaml`.  The format can be summarized as:
20
21```
22- func: func_name(ArgType arg0[=default], ArgType arg1[=default], ...) -> Return
23  variants: function, method
24  dispatch:
25    CPU: func_cpu
26    CUDA: func_cuda
27```
28
29Each component is described in more detail below:
30
31### `func`
32
33```
34- func: func_name[.overload_name](ArgType arg0[=default], ArgType arg1[=default], ...) -> Return
35```
36
37The `func` entry is a string describing the name of the function and its type
38signature.
39
40**Argument types.** These types are permissible as ArgType:
41
42- `Tensor`.  A `Tensor` argument translates into a C++ argument of type `const Tensor&`
43  (except when the argument is "inplace"; in this case, it is simply `Tensor&`).
44  A trailing `?`, as in `Tensor?`, indicates that the tensor argument is optional
45  and may be omitted by passing std::nullopt.  When a function takes multiple
46  `Tensor` arguments, these tensors are assumed to be the same type (e.g.,
47  if one argument is a `FloatTensor`, all other arguments are checked
48  to be `FloatTensor`s).
49  `Tensor` or `Tensor?` must sometimes be annotated to indicate aliasing and mutability.
50  In general annotations can be defined via the following situations:
51  - `Tensor(a)` - `a` is a set of Tensors that may alias to the same data. The set could have a size of one.
52  - `Tensor(a!)` - members of `a` may be written to thus mutating the underlying data.
53  - `Tensor(a! -> a|b)` - Tensor is in set `a`, written to, and after the write is in set `a` AND `b`.
54  For more details on when and why this needs to happen, please see the section on annotations.
55- `Tensor[]`.  A `Tensor[]` argument translates into a C++ argument of type `ArrayRef<Tensor>`
56  (a.k.a. `TensorList`)
57- `int[]`.  `int[]` accepts an optional length specifier, e.g., `int[2]`, which
58  has no effect in C++ but extends our Python bindings to accept a bare number, which will be
59  expanded into an appropriately sized list by repeating the number.
60- `int`. Think about this like a Python int. This is translated into a C++ argument of type `int64_t`.
61- `float`. Think about this like a Python `float`. It is translated into a C++ argument of type `double`.
62- `bool`
63- `str`.  It is translated into a C++ argument of non-owning type `c10::string_view`
64- `Scalar`. `Scalar` supports binding to any numerical types from Python, including integral types,
65  floating point types, and zero dimensional tensors. `int` and `float` bind to the corresponding Python
66  numerical types. However, you probably don't want to use `Scalar`;
67  `float` and `int` argument types should suffice for most algorithms
68  (you should only use `Scalar` if the operator truly may accept either
69  type).
70- `Generator?`, the state for a random number generator,
71- `bool[N]` (where N is `1-4`).
72- `*` is a special sentinel argument, which doesn't translate into an actual
73  argument, but indicates that in the Python bindings, any subsequent arguments
74  must be specified as keyword arguments (and cannot be provided positionally).
75- `?` is trailing question mark that annotates an argument to be an optional type. Grep for
76  `optional` to find some example usages. In general, most functions will not need to use
77  this, but there are some cases that we want to use optional for the different types:
78    - You want to pass a `None` to an ATen function/method from Python and handle the
79      None type on the C++ side. For example, `clamp(Tensor self, Scalar? min=None, Scalar? max=None)`
80      can take `None` for its `min` and `max` parameter, but does not dispatch to different
81      backends if one of the parameters is `None`. Optional type can accept a `None` type
82      (`nullopt` in C++) from Python and use the [C++ Optional class](https://en.cppreference.com/w/cpp/utility/optional) to interact with the parameters.
83    - You want a default value, which is fine in Python, but would cause ambiguity in C++.
84      For example, `norm(Tensor self, Scalar p=2, int dim, bool keepdim=False)` would
85      cause ambiguity in C++ since its default args must be adjacent (`p` could not
86      have a default value when `dim` does not). Therefore, we need to make `p` as a
87      optional Scalar, and make `p=2` when `p` is not passed in (nullopt).
88    - You want a value to default to the same value as another argument (this cannot be
89      expressed in C++ default arguments).
90
91Functions with no tensor inputs are called *factory functions*, and
92are handled specially by code generation.  If your function is behaving
93differently than another example, check first and see if one is a
94factory while another is not. In some rare cases, factory function might have a
95tensor argument. In this case mark it with `category_override: factory`
96explicitly.
97
98**Argument names.** Argument names are meaningful; downstream binding code may make use of the specific
99argument name you provide, and a rename of an argument name is considered a BC-breaking
100change (e.g., you will probably need to update `tools/autograd/derivatives.yaml` at
101least, and it may affect Python keyword arguments). For more details please see the section on `variants`.
102
103As a convention we use 'out' to indicate an output argument. This aligns with the
104Python bindings. Even if a function might not be used in the Python bindings, we
105still advise to follow this convention. Check the generated code when making a change
106to make sure you're not breaking the API when renaming an argument name of an
107existing function.
108
109**Defaults.** Any suffix of arguments can have a default value defined;
110these default values translate into C++/Python default values which
111are applied when those positional arguments are not specified.
112
113Here are the supported default values:
114
115* Numbers (e.g., `0` or `5.0` for `int`, `float` and `int[]`
116  with an explicit length (e.g., `int[2]`)--in the case of `int[]`
117  a number is replicated to fill the length (e.g., `int[2] x=2`
118  is equivalent to `int[2] x=[2,2]`).
119* Lists of numbers (e.g., `[0, 0]`) for `IntList`.
120* Booleans (e.g., `True`) for `bool`.
121* Empty initializer lists (e.g., `[]`) for `Tensor` (this implicitly changes
122  a `Tensor` argument to accept undefined tensors).
123* `None` for pointer types (e.g., `Generator?`)
124
125**Returns.** The following are permissible on Return:
126
127Non-tuple return:
128```
129ReturnType [retarg0]
130```
131
132Tuple return:
133```
134(ReturnType [retarg0], ReturnType [retarg1], ...)
135```
136
137The following are permissible on ReturnType:
138- `Tensor` and `Tensor[]`, which translate into the C++ types `Tensor` and `std::vector<Tensor>`,
139  respectively (unless the operation is in-place, in which case the return type
140  is `Tensor&`.
141- A tuple of any number of `Tensor`, e.g., `(Tensor, Tensor)`, translating into
142  the C++ `std::tuple<Tensor, Tensor>`.
143
144If you need a type that is not listed in this list, it may be possible to extend ATen's
145code generation to support it.  ATen's philosophy on types to support is that it supports
146only simple, universal types, as well as a handful of fundamental Tensor structures
147(e.g., `Tensor` and `Generator?`), because these types can be easily ported to any language
148bound to ATen (in practice, C++ and Python.)
149
150Return also supports specifying (optional) return argument names. These serve
151two functions:
152
153- They let you easily write derivatives in terms of return arguments in
154  `tools/autograd/derivatives.yaml`
155
156- They correspond to the named field the output can be referred to from
157  Python.  (This means that changing a return argument name is
158  BC-breaking, be careful!)
159
160Note that argument type modifiers such as defaults and optional are not currently supported on Return.
161
162
163**Overloads.** You can register multiple functions with the same name and different
164function signatures if you give them unique overload names. An overload name
165is specified after the function name, separated by a dot.
166
167Overload names do not have to be globally unique, but must be unique in the set
168of all overloads for the same function. Overload names cannot be changed for
169backwards compatibility reasons. Please try to make overload names semantically
170meaningful. An overload name that just enumerates all the argument types isn't
171helpful. In many cases, a semantic name is clear from what the overload is doing
172differently. As a fallback, you can use the name or type of the first differing
173argument as an overload name.
174
175If you add a new overload to an existing function, please leave the existing
176overload names as they are (for backwards compatibility), but give the new
177overload a new, unique name.  Although overload names are not directly
178used by the Python or C++ APIs, they are public API surface for external
179backends (who register to specific overload names) and deployed mobile
180models (which use overload names as part of the serialization format.)
181
182Not specifying an overload name is equivalent to specifying an empty overload
183name. If you add a new function with multiple overloads, give them unique
184overload names, at most one overload is allowed to have an empty overload name.
185
186
187The declarations also support the following attributes.
188
189**Namespaces.** User can register operators in different namespaces than `aten`, by simply putting custom namespaces before the function name. Currently nested namespace is not supported for function name. If not specified, all the functions will be registered in `aten` namespace.
190
191For example, suppose we are registering `my_op` into `custom` namespace, we can have:
192```
193- func: custom::my_op(Tensor(a) self, ...) -> Tensor(a)
194  variants: function, method
195  dispatch:
196    CPU: my_op_cpu
197    CUDA: my_op_cuda
198```
199
200Note that we have a one-off `TORCH_LIBRARY` APIs to achieve the same goal of registering an operator in a custom namespace. Comparing with that API, having custom namespace in `native_functions.yaml` is useful in cases where the function does not really belong to ATen but is also widely used and it is preferred to have a shared place to register it.
201
202### `variants`
203
204```
205variants: function, method
206```
207
208Controls whether Tensor method (`t.foo()`) or namespace Function (`at::foo()`) is
209generated as a result of this declaration.  If the declaration is a method,
210you must have an argument `Tensor self` at some position in the method;
211in the method variant this argument will be elided from the argument
212list.  For example, given the declaration `where(BoolTensor cond, Tensor self, Tensor other)`,
213this generates the function `at::where(cond, self, other)` and the method
214`self.where(cond, other)`.
215
216By default, ATen generates only the function variant for a native function.
217When should you also generate a method variant? Tensor operations as methods
218are appropriate for "core" Tensor operations (e.g., add, sub, etc.), but not for
219more complicated neural network layers (e.g., `conv2d`) and internal functions
220designed specifically for binding (e.g., `cudnn_convolution`).
221
222As we progress along our schema unification of the `func` schema with the JIT
223signature schema, we must introduce features that allow us to increase compliance.
224One of these features are Tensor annotations. As of now we use naming conventions
225to indicate whether an argument of a function is going to be mutated and returned.
226
227### `annotations`
228
229There are two typical situations in which we mutate the memory of an argument in the Python
230frontend:
231a) For an inplace operations such as `self.abs_()`
232b) for a function with an output keyword argument such as `torch.abs(input, out=None)`.
233
234In order to provide implementations for these Python functions the legacy schema
235requires C++ implementations for three situations `abs(Tensor self)  -> Tensor`,
236`abs_(Tensor self) -> Tensor` and `abs_out(Tensor out, Tensor self) -> Tensor`.
237
238Now, as we move towards the unification, we start to use a different syntax to represent
239this by using annotations. In the end we still translate to the legacy schema for the downstream
240consumers such as the C++ code generation, but this will soon change.
241
242If two Tensors carry the same annotation, they both *may* represent the same memory.
243A write annotation, as indicated by an exclamation mark, indicates that they both *may*
244also be written to.
245
246Let's revisit the previous native function declarations and see the conventions of adding annotations.
247  - `abs(Tensor self) -> Tensor` stays the same as it will always allocate new memory.
248  - `abs_(Tensor(a!) self) -> Tensor(a!)`
249    `self` may be written to and returned. Further, the annotation indicates that the return value
250    may alias the input. This indicates an inplace function and by convention ends in a single '\_'.
251  - `abs(Tensor self, *, Tensor(a!) out) -> Tensor(a!)`
252    In the Python frontend `out` can be passed as a keyword argument and may be written to.
253    In this case it indicates the schema for a function that must accept `out` as this does not
254    provide a default argument. The idea behind representing this as a optional argument is to
255    document the intended usage. This maps to the legacy `abs_out(Tensor out, Tensor self) -> Tensor`.
256    As with the legacy `_out` function you must call the argument `Tensor out` or `Tensor out0`,
257    `Tensor out1` in the context of multiple arguments.
258
259There is also another situation in which we use annotations, namely views.
260  - `transpose(Tensor(a) self, int dim0, int dim1) -> Tensor(a)`
261    An alias to the memory represented by `self` may be also returned, however it is not mutated.
262
263When a Tensor views are contained in a Tensor list, we need to represent that the output list
264contains Tensors that alias the input.
265  - `func: chunk(Tensor(a -> *) self, int chunks, int dim=0) -> Tensor(a)[]`
266We assume lists contain memory which aliases the heap, so in order to correctly set up the aliasing
267relationship between the output and input, we annotate that the input Tensor enters the wildcard set `(a -> *)`.
268For more details, see the JIT [README](https://github.com/pytorch/pytorch/blob/master/torch/csrc/jit/OVERVIEW.md#aliasing-and-mutation-annotations-in-functionschema).
269
270We have some asserts to check whether a developer uses these annotations correctly and throw asserts
271if she doesn't. For example, any out function must use the `(a!)` annotation as described above.
272 If this causes a lot of confusion please add @cpuhrsch to your PR.
273
274### `dispatch`
275
276```
277dispatch:
278    CPU: func_cpu
279    CUDA: func_cuda
280```
281
282This specifies the actual name of the function you want to dispatch to, so you
283can dispatch to different functions depending on which backend the passed tensors
284belong to.  Notice that custom namespaces is supported on these names, it's useful when the native function listed lives in a namespace other than the default `at::native`. Currently we support nested namespace with maximum level of 2. For example:
285```
286dispatch:
287    CPU: custom::ns::func_cpu
288```
289The example above hinted the native function can be found under `custom::ns::native` namespace (the trailing `::native` is added automatically).
290
291If the dispatch table is omitted, we assume a default dispatch
292table:
293
294```
295# overload is ignored
296func: func.overload(...) -> ...
297dispatch:
298    CompositeImplicitAutograd: func
299
300# overload is ignored, but out functions get suffixed with _out in their name
301# (NB: no out functions in PyTorch today actually support autograd, but if they
302# did, you could call them here and autograd would be inferred)
303func: func.out_overload(...) -> ...
304dispatch:
305    CompositeImplicitAutograd: func_out
306```
307
308If two backends have the same dispatch function, you can write `CPU, CUDA: func`
309to reuse the same function name in both cases.
310
311Available backend options can be found by searching `dispatch_keys` in
312[codegen](https://github.com/pytorch/pytorch/blob/master/torchgen/gen.py).
313There are also three special "generic" backends:
314
315  - `CompositeExplicitAutograd` (previously known as `DefaultBackend`):
316    implementations of kernels that work for all backends, but require an
317    explicit definition of backward function in `derivatives.yaml` to support autograd.
318    The most typical use of this key are for delegating functions; i.e.,
319    functions that do a very small amount of work and then delegate to another
320    operator to do the actual heavy lifting.  Under the hood, registering a
321    kernel to `CompositeExplicitAutograd` is equivalent to registering that
322    kernel to every backend (e.g., `CPU, CUDA`). Note: kernels which call
323    DispatchStub should NOT be registered as CompositeExplicitAutograd, as
324    DispatchStub only works for `CPU, CUDA`)
325
326  - `CompositeExplicitAutogradNonFunctional`:
327    Similar to CompositeExplicitAutograd, but this key should be used if:
328    (1) Your kernel is written for a non-aliasing operator.
329    (2) *and* it calls internally into an aliasing operator.
330    An example of this is select_backward, which is non-aliasing, but decomposes into select.
331    We would like to distinguish between "ordinary" CompositeExplicitAutograd kernels
332    and these kernels, because some backends would not like
333    to decompose an non-aliasing op into an aliasing op.
334    LazyTensor + XLA are the two current examples of this - since they operate on a functional IR,
335    they would prefer to directly implement a non-aliasing operator with their own kernel,
336    instead of using a decomposition that results in more aliasing operators.
337
338  - `CompositeImplicitAutograd` (previously known as `Math`): implementations of
339    kernels that work for all backends, and also can implicitly support autograd,
340    because all of the operations it calls support autograd.  Direct use of
341    this key should be rare: if you provide no dispatch table, we default to
342    registering your kernel as `CompositeImplicitAutograd`.  Explicitly adding
343    this key to an existing dispatch table may be useful if you have specialized
344    CPU and CUDA implementations, but you might want to provide a fallback
345    lowering for external backends that may not have a specialized
346    implementation.
347
348Functions registered to composite backends should work for any backend, if the
349nested functions they call work for those backends.
350
351For example, suppose `my_op` can be implemented in the following way:
352
353```
354at::Tensor my_op(const Tensor& self, const Tensor& other) {
355  return self + 2 * other;
356}
357```
358
359If we already know inference kernels and derivative formulas for operators `+` and `*` in our system,
360you can just register `my_op` to `CompositeImplicitAutograd` and both inference & autograd will just work.
361Although it seems we only write down the inference formula here, PyTorch autograd system would correctly
362set up the backward for `my_op` using the chain formula and derivatives of `+` & `*` operators.
363In other words `d_out/d_self = 1; d_out/d_other = 2` can be derived automatically from
364the `my_op` inference kernel. Of course if we don't have derivative formula defined for either `+` or `*`,
365backward of `my_op` can no longer be derived automatically.
366
367Whether to use implicit or explicit autograd for your kernel can be decided by the following steps:
3681. If you can, always start with a `CompositeImplicitAutograd` kernel that's composable from existing operators.
3692. If you don't want to use the derived gradient formula from `CompositeImplicitAutograd` kernel for autograd, either to
370   get better performance or better numerical stability, you should register the kernel with `CompositeExplicitAutograd`
371   so that it's only used in inference.
372   Later for autograd, depending on whether your autograd kernel works for all backends or not,
373   you can put them in alias `Autograd` or specific keys like `AutogradCPU`.
3743. If you prefer to write backend-specific kernels, use reserved dispatch keys for your backend instead,
375   e.g. `CPU/AutogradCPU`.
376
377**Important**: because a `CompositeImplicitAutograd` kernel is implicitly registered for ops with no `dispatch:` section,
378when you add a backend-specific kernel (and hence a `dispatch:` section) to one of these, you **must** also
379add a `CompositeImplicitAutograd:` entry that names the old kernel implementation (it's named after the op, with _<overload name>
380added if applicable), so that it's still available for other backends to use.
381
382If you implemented a native function in C++ and want to find out which dispatch keyword
383should be used in native_functions.yaml, please [follow steps in dispatch keywords](#choosing-the-right-dispatch-keyword)
384
385### Composite Compliance
386
387Definition: a "composite function" is an Operator registered as
388CompositeImplicitAutograd or a (Python or C++) function that consists of PyTorch
389operations. Examples of the latter include backward formulas and forward-mode AD formulas.
390
391Composite functions defined in the PyTorch library MUST work for most, if not
392all, backends/subclasses. This means that we impose a set of constraints that make it more
393difficult to write composite functions inside PyTorch library code than users
394writing PyTorch code.
395
396If you wish to do something that is banned (you may wish to do this for perf
397reasons), please write a backwards formula for your function so it is no longer
398hide parts of the function in a new aten operator that is not CompositeImplicitAutograd.
399
400Composite functions may not:
401- call `resize_` or moral equivalents. These are tricky to handle for
402many backends, like vmap and meta.
403- call `out=` operations. These are impossible to handle for vmap and can cause
404dispatch-to-python objects to lose their subclassing.
405- Change the metadata of a Tensor without performing dispatches. Examples of these
406operations are directly accessing the TensorImpl API to modify the
407sizes/strides/metadata of a Tensor.
408- In the same vein as the last point, `data_ptr` access or `item` access are not
409allowed. These operations do not go through the dispatcher.
410- `copy_` is a marginal case. If you're able to rewrite your operation without
411`copy_` you should definitely do so; this should be trivial if you're not copy-ing
412into a view. Otherwise, it is fine to leave the code as-is.
413
414We have CompositeImplicitAutograd compliance tests in `test/test_ops.py`. These
415tests aren't perfect (it's pretty difficult to check for all of the above) so if
416something looks wrong please shout.
417
418### `device_guard`
419
420```
421device_guard: False
422```
423
424By default, ATen code generation will generate a DeviceGuard invocation,
425which will ensure that kernel code will run with the current device set
426to match the device of the first Tensor argument (or first tensor of
427the first Tensor[] argument, if the function takes a list of tensors).
428For the most part, this means kernel authors do not have to worry about
429setting devices.
430
431However, in some cases, setting the device is unnecessary, because,
432e.g., you call a function already manages device guard setting, or
433you're a function that simply does not interact with any devices. In
434that case, code generation of the device guard can be disabled by adding
435`device_guard: False` to your function definition.
436
437### `device_check`
438
439```
440device_check: NoCheck
441```
442
443By default, ATen code generation will generate device check,
444which will ensure all the tensor parameters passed to kernel are
445on the same device.
446
447However, in some cases, checking the device is unnecessary, because,
448e.g., you call a function allows to work on multiple devices.
449In that case, code generation of the device check can be disabled by adding
450`device_check: NoCheck` to your function definition.
451
452### `manual_kernel_registration`
453
454```
455manual_kernel_registration: True
456```
457
458With this flag set, we will not generate code to automatically register the C++ operator implementation
459to TypeDefault (catchAll dispatch key) with the dispatcher.
460It doesn't make sense to have both `dispatch` section and `manual_kernel_registration: True` for the same op.
461You can find the manual registrations in torch/csrc/autograd/VariableTypeManual.cpp.
462Currently ops have this field set to True should match `MANUAL_CATCHALL` in tools/autograd/gen_variable_type.py
463(It can be a superset of `MANUAL_CATCHALL` but we don't have a use case for it).
464This field should only be used rarely.
465
466### `use_const_ref_for_mutable_tensors`
467
468```
469use_const_ref_for_mutable_tensors: True
470```
471
472With this flag set, we will generate arguments for Tensors whose underlying data may change as
473`const Tensor&` (or similar), just like we would for other Tensors. Previously, we generated these
474as `Tensor &`, which 1) allowed changing which `TensorImpl` the `Tensor` itself referred to and 2)
475was not necessary to allow the underlying data to change. (This was like using `T * const` when we
476wanted `const T*`.)
477
478### `autogen`
479
480```
481- func: my_op_(Tensor(a!) self) -> Tensor(a!)
482...
483  autogen: my_op, my_op.out
484```
485
486`autogen` keyword is being used to specify which native function the codegen system should generate
487implementations for.
488* For an in-place variant of a native function (op name ends with an `_`), we will generate a functional
489variant and an out= variant.
490* If a functional variant is given, we generate an out= variant.
491* We don't support `autogen` for view ops, ops that bypass the dispatcher as well as composite ops.
492
493We also generate kernels for generated ops, which merely copy and return the result from the base ops.
494These generated kernels can be found in `<gen-out>/aten/src/ATen/CompositeViewCopyKernels.cpp`.
495
496Also notice that for new operators being added to `native_functions.yaml`, if they satisfy the requirements
497mentioned above, they should include `autogen` keyword, since functionalization depends on it. We will
498enforce this in codegen.
499
500
501## Writing an implementation in C++
502
503Implementations of native functions go in an appropriate C++ file in the
504`native/` directory (they are organized roughly by topic, but there is no
505semantic meaning to their organization aside for the `cuda` directory,
506which is the only place the build system knows how to build `cu` files.)
507To write a native function, you only need to write a C++
508implementation (no header necessary) with a matching signature to
509the generated header from the ATen metadata.  There are many
510simple native functions; take a look at some of them to see what to do.
511
512Although writing an ATen function is mostly writing the algorithm you want
513to implement, there are some less obvious details you should also consider.
514
515### Will your function be automatically differentiable?
516
517If you are writing a pair of functions `foo` and `foo_backward`, with
518the intent that `foo_backward` implements the derivative of `foo`, then
519your implementation of `foo` is probably not automatically differentiable:
520it might make use of functions like `data_ptr()` or it dispatches differently
521depending on if it's operating on CPU or CUDA tensors.  Once you write these two functions,
522you will have to write an entry correlating them together in
523`tools/autograd/derivatives.yaml`.
524
525However, in some situations, you can write a function in ATen and it
526will be automatically differentiated! This can be the case if the function implementation
527only calls other operations which are themselves differentiable.  In this
528case, you don't have to write an entry in `tools/autograd/derivatives.yaml`.
529
530### Choosing the right dispatch keyword
531
532After writing a native function in C++, it's important to think about which dispatch keyword
533to use in native_functions.yaml as it gives the dispatcher information about backend and autograd support
534of the implementation.
535
536Here're steps to follow to decide the right dispatch keyword:
537
5381. Think about inference: does your kernel work for all backends?
539
540    - No: you're likely providing different kernels for different backends, e.g.
541      backend-dependent logic is used in the implementation or it's implemented through DispatchStub.
542      DispatchStub only support a backend if you explicitly provide a kernel through `REGISTER_DISPATCH`.
543      Typically it only supports a few in-tree backends like CPU, CUDA, QuantizedCPU etc but not
544      out-of-tree backends like XLA.
545      Write a dispatch section, enumerate all supported backends and point them to the implementations.
546      ```
547      dispatch:
548        CPU: kernel_cpu
549        CUDA: kernel_cuda
550        QuantizedCPU: kernel_quantized_cpu
551      ```
552
553      You're done. Now this op will be called in `CPU/CUDA/QuantizedCPU` backend inference!
554
555      Note: to support training, you're required to write a formula in
556      derivatives.yaml since your backend implementations don't support autograd.
557
558    - Yes: you're likely calling other `at::` ops in the implementation. Go to step 2.
559
5602. Think about training: does your kernel support autograd? [check autograd support](#will-your-function-be-automatically-differentiable)
561    - Yes: in other words, you're providing a `CompositeImplicitAutograd` kernel which supports both inference and autograd.
562      To use autograd support for training, simply skip adding a dispatch
563      section and you're done. This will allow this op to be correctly
564      registered for both inference and training.
565
566    - Yes, but you still want to provide a numerically stable gradient formula instead of using autograd, write
567      ```
568      dispatch:
569        CompositeExplicitAutograd: kernel
570      ```
571
572      You're done. This op will be called in inference for all backends.
573
574      Note: to support training you're required to add an autograd formula,
575      or it'll error out in backward pass when calling with a Tensor has requires_grad=True.
576
577    - No: ops in this category are mainly using `_out` boilerplate where its out version doesn't have a derivative
578      formula defined. For example:
579      ```
580      Tensor& sign_out(Tensor& result, const Tensor& self) { return unary_op_impl_out(result, self, sign_stub); }
581      Tensor sign(const Tensor& self) { return unary_op_impl(self, at::sign_out); }
582      Tensor& sign_(Tensor& self) { return unary_op_impl_(self, at::sign_out); }
583      ```
584
585      `sign_out` uses DispatchStub so the supported backends are enumerated in its dispatch section.
586      For `sign` and `sign_`, write
587      ```
588      dispatch:
589        CompositeExplicitAutograd: kernel
590      ```
591
592      You're done. This op will be called in inference for all backends.
593
594      Note: to support training you're required to add an autograd formula for `sign`,
595      or it'll error out in backward pass when calling with a Tensor has requires_grad=True.
596
597      Note: current plan on record for ops using this boilerplate is to replace `at::` with `at::native` in
598      the implementations and add dispatch section with device keywords instead.
5993. Validate the computed dispatch table matches what you want. You can use `PythonDispatcher` provided in
600[torch/_python_dispatcher.py](https://github.com/pytorch/pytorch/blob/master/torch/_python_dispatcher.py).
601It shows for a certain operator, what the computed dispatch table looks like after your registrations.
602
603    ```
604    dispatcher = PythonDispatcher()
605    dispatcher.register(["CPU", "XLA", "AutogradCPU", "CompositeImplicitAutograd"])
606    print(dispatcher.dispatchTable()) # Tells you exactly which kernel is used for certain backend.
607    ```
608
6094. TODO: AutogradCPUOrCUDA
610
611Note that in native_functions.yaml you can mix using backend keywords and alias keywords above for one op:
612  - direct registration to backend always has higher precedence than alias
613  - DO NOT provide multiple alias keywords to the same op: alias keywords have precedence `CompositeExplicitAutograd > CompositeImplicitAutograd`,
614    e.g. adding both `CompositeImplicitAutograd` and `CompositeExplicitAutograd` kernels for one op will completely ignore `CompositeImplicitAutograd` kernel for
615    both inference and training. Thus this will trigger an error when native_functions.yaml is parsed.
616
617
618
619### Will this function be exposed to python? What are the namespaces?
620
621We don't generate python bindings for all functions. There're certain patterns in function
622name that we skip in python binding generation, e.g. `*_backward`. Check
623`tools/autograd/gen_python_functions.py` for the latest rules.
624
625The generated bindings are either exposed as methods on python_variable or functions on
626the torch._C._nn (marked with `python_module: nn`),
627torch._C._fft (marked with `python_module: fft`),
628torch._C._linalg (marked with `python_module: linalg`) objects,
629torch._C._sparse (marked with `python_module: sparse`) objects,
630torch._C._special (marked with `python_module: special`) objects,
631or torch._C._nested (marked with `python_module: nested`) objects.
632
633### Undefined tensor conventions
634
635By default, `Tensor` arguments to ATen functions are always defined, unless
636you explicitly specified that an undefined tensor was permissible by writing
637`Tensor?` or `Tensor? x=[]`, the latter one is needed when you have to assign
638a default value in C++ (e.g. in the middle of other parameters with default values).
639
640The rules for returning undefined Tensors are a bit more subtle, but there
641is only one case you have to remember:
642
643* If the function in question is a backward function which accepts a
644  `std::array<bool,N> output_mask` argument, you MUST return an undefined
645  `Tensor` at every tuple position `i` for which `output_mask[i]` is false, otherwise
646
647* You MUST NOT return an undefined tensor.
648
649The most common situations where you might be tempted to return undefined tensors
650are when:
651
652- You have a forward function that may return a buffer if training is enabled, but does not
653  return the buffer in inference mode.  In this case, just return an appropriately
654  typed zero-size tensor.
655
656- You have a backward function where the gradient for an input is zero.  In this case, you
657  are expected to create a zero-filled tensor of appropriate size to return for this input.
658  To get the shape, it may be helpful to take a `TensorGeometry` of the input to use.
659
660### Debugging tips
661
662If you build ATen and get a linker error, that probably means you copy-pasted
663the C++ definition of your function incorrectly.  Double check your `Tensor`
664arguments, and make sure you wrote `const Tensor&` in your signature.
665