1ATen "native" functions are the modern mechanism for adding operators and 2functions to ATen. Native functions 3are declared in `native_functions.yaml` and have implementations defined 4in one of the `cpp` files in this directory. 5 6Like all ATen methods/functions, native functions are made available 7from both ATen's C++ and Python APIs. In C++, they are made available 8either as methods on `Tensor` (`t.mymeth()`) and functions in the ATen 9namespace (`at::myfunc()`). In PyTorch, they are made available as 10methods on `Variable` or as functions on `torch._C._FunctionBase`. 11(It is the user's responsibility to re-export these functions in 12a more user-facing module.) 13 14The rest of this document describes how to implement an ATen function. 15 16## Registering a function in `native_functions.yaml` 17 18Every native function must have an entry in 19`native_functions.yaml`. The format can be summarized as: 20 21``` 22- func: func_name(ArgType arg0[=default], ArgType arg1[=default], ...) -> Return 23 variants: function, method 24 dispatch: 25 CPU: func_cpu 26 CUDA: func_cuda 27``` 28 29Each component is described in more detail below: 30 31### `func` 32 33``` 34- func: func_name[.overload_name](ArgType arg0[=default], ArgType arg1[=default], ...) -> Return 35``` 36 37The `func` entry is a string describing the name of the function and its type 38signature. 39 40**Argument types.** These types are permissible as ArgType: 41 42- `Tensor`. A `Tensor` argument translates into a C++ argument of type `const Tensor&` 43 (except when the argument is "inplace"; in this case, it is simply `Tensor&`). 44 A trailing `?`, as in `Tensor?`, indicates that the tensor argument is optional 45 and may be omitted by passing std::nullopt. When a function takes multiple 46 `Tensor` arguments, these tensors are assumed to be the same type (e.g., 47 if one argument is a `FloatTensor`, all other arguments are checked 48 to be `FloatTensor`s). 49 `Tensor` or `Tensor?` must sometimes be annotated to indicate aliasing and mutability. 50 In general annotations can be defined via the following situations: 51 - `Tensor(a)` - `a` is a set of Tensors that may alias to the same data. The set could have a size of one. 52 - `Tensor(a!)` - members of `a` may be written to thus mutating the underlying data. 53 - `Tensor(a! -> a|b)` - Tensor is in set `a`, written to, and after the write is in set `a` AND `b`. 54 For more details on when and why this needs to happen, please see the section on annotations. 55- `Tensor[]`. A `Tensor[]` argument translates into a C++ argument of type `ArrayRef<Tensor>` 56 (a.k.a. `TensorList`) 57- `int[]`. `int[]` accepts an optional length specifier, e.g., `int[2]`, which 58 has no effect in C++ but extends our Python bindings to accept a bare number, which will be 59 expanded into an appropriately sized list by repeating the number. 60- `int`. Think about this like a Python int. This is translated into a C++ argument of type `int64_t`. 61- `float`. Think about this like a Python `float`. It is translated into a C++ argument of type `double`. 62- `bool` 63- `str`. It is translated into a C++ argument of non-owning type `c10::string_view` 64- `Scalar`. `Scalar` supports binding to any numerical types from Python, including integral types, 65 floating point types, and zero dimensional tensors. `int` and `float` bind to the corresponding Python 66 numerical types. However, you probably don't want to use `Scalar`; 67 `float` and `int` argument types should suffice for most algorithms 68 (you should only use `Scalar` if the operator truly may accept either 69 type). 70- `Generator?`, the state for a random number generator, 71- `bool[N]` (where N is `1-4`). 72- `*` is a special sentinel argument, which doesn't translate into an actual 73 argument, but indicates that in the Python bindings, any subsequent arguments 74 must be specified as keyword arguments (and cannot be provided positionally). 75- `?` is trailing question mark that annotates an argument to be an optional type. Grep for 76 `optional` to find some example usages. In general, most functions will not need to use 77 this, but there are some cases that we want to use optional for the different types: 78 - You want to pass a `None` to an ATen function/method from Python and handle the 79 None type on the C++ side. For example, `clamp(Tensor self, Scalar? min=None, Scalar? max=None)` 80 can take `None` for its `min` and `max` parameter, but does not dispatch to different 81 backends if one of the parameters is `None`. Optional type can accept a `None` type 82 (`nullopt` in C++) from Python and use the [C++ Optional class](https://en.cppreference.com/w/cpp/utility/optional) to interact with the parameters. 83 - You want a default value, which is fine in Python, but would cause ambiguity in C++. 84 For example, `norm(Tensor self, Scalar p=2, int dim, bool keepdim=False)` would 85 cause ambiguity in C++ since its default args must be adjacent (`p` could not 86 have a default value when `dim` does not). Therefore, we need to make `p` as a 87 optional Scalar, and make `p=2` when `p` is not passed in (nullopt). 88 - You want a value to default to the same value as another argument (this cannot be 89 expressed in C++ default arguments). 90 91Functions with no tensor inputs are called *factory functions*, and 92are handled specially by code generation. If your function is behaving 93differently than another example, check first and see if one is a 94factory while another is not. In some rare cases, factory function might have a 95tensor argument. In this case mark it with `category_override: factory` 96explicitly. 97 98**Argument names.** Argument names are meaningful; downstream binding code may make use of the specific 99argument name you provide, and a rename of an argument name is considered a BC-breaking 100change (e.g., you will probably need to update `tools/autograd/derivatives.yaml` at 101least, and it may affect Python keyword arguments). For more details please see the section on `variants`. 102 103As a convention we use 'out' to indicate an output argument. This aligns with the 104Python bindings. Even if a function might not be used in the Python bindings, we 105still advise to follow this convention. Check the generated code when making a change 106to make sure you're not breaking the API when renaming an argument name of an 107existing function. 108 109**Defaults.** Any suffix of arguments can have a default value defined; 110these default values translate into C++/Python default values which 111are applied when those positional arguments are not specified. 112 113Here are the supported default values: 114 115* Numbers (e.g., `0` or `5.0` for `int`, `float` and `int[]` 116 with an explicit length (e.g., `int[2]`)--in the case of `int[]` 117 a number is replicated to fill the length (e.g., `int[2] x=2` 118 is equivalent to `int[2] x=[2,2]`). 119* Lists of numbers (e.g., `[0, 0]`) for `IntList`. 120* Booleans (e.g., `True`) for `bool`. 121* Empty initializer lists (e.g., `[]`) for `Tensor` (this implicitly changes 122 a `Tensor` argument to accept undefined tensors). 123* `None` for pointer types (e.g., `Generator?`) 124 125**Returns.** The following are permissible on Return: 126 127Non-tuple return: 128``` 129ReturnType [retarg0] 130``` 131 132Tuple return: 133``` 134(ReturnType [retarg0], ReturnType [retarg1], ...) 135``` 136 137The following are permissible on ReturnType: 138- `Tensor` and `Tensor[]`, which translate into the C++ types `Tensor` and `std::vector<Tensor>`, 139 respectively (unless the operation is in-place, in which case the return type 140 is `Tensor&`. 141- A tuple of any number of `Tensor`, e.g., `(Tensor, Tensor)`, translating into 142 the C++ `std::tuple<Tensor, Tensor>`. 143 144If you need a type that is not listed in this list, it may be possible to extend ATen's 145code generation to support it. ATen's philosophy on types to support is that it supports 146only simple, universal types, as well as a handful of fundamental Tensor structures 147(e.g., `Tensor` and `Generator?`), because these types can be easily ported to any language 148bound to ATen (in practice, C++ and Python.) 149 150Return also supports specifying (optional) return argument names. These serve 151two functions: 152 153- They let you easily write derivatives in terms of return arguments in 154 `tools/autograd/derivatives.yaml` 155 156- They correspond to the named field the output can be referred to from 157 Python. (This means that changing a return argument name is 158 BC-breaking, be careful!) 159 160Note that argument type modifiers such as defaults and optional are not currently supported on Return. 161 162 163**Overloads.** You can register multiple functions with the same name and different 164function signatures if you give them unique overload names. An overload name 165is specified after the function name, separated by a dot. 166 167Overload names do not have to be globally unique, but must be unique in the set 168of all overloads for the same function. Overload names cannot be changed for 169backwards compatibility reasons. Please try to make overload names semantically 170meaningful. An overload name that just enumerates all the argument types isn't 171helpful. In many cases, a semantic name is clear from what the overload is doing 172differently. As a fallback, you can use the name or type of the first differing 173argument as an overload name. 174 175If you add a new overload to an existing function, please leave the existing 176overload names as they are (for backwards compatibility), but give the new 177overload a new, unique name. Although overload names are not directly 178used by the Python or C++ APIs, they are public API surface for external 179backends (who register to specific overload names) and deployed mobile 180models (which use overload names as part of the serialization format.) 181 182Not specifying an overload name is equivalent to specifying an empty overload 183name. If you add a new function with multiple overloads, give them unique 184overload names, at most one overload is allowed to have an empty overload name. 185 186 187The declarations also support the following attributes. 188 189**Namespaces.** User can register operators in different namespaces than `aten`, by simply putting custom namespaces before the function name. Currently nested namespace is not supported for function name. If not specified, all the functions will be registered in `aten` namespace. 190 191For example, suppose we are registering `my_op` into `custom` namespace, we can have: 192``` 193- func: custom::my_op(Tensor(a) self, ...) -> Tensor(a) 194 variants: function, method 195 dispatch: 196 CPU: my_op_cpu 197 CUDA: my_op_cuda 198``` 199 200Note that we have a one-off `TORCH_LIBRARY` APIs to achieve the same goal of registering an operator in a custom namespace. Comparing with that API, having custom namespace in `native_functions.yaml` is useful in cases where the function does not really belong to ATen but is also widely used and it is preferred to have a shared place to register it. 201 202### `variants` 203 204``` 205variants: function, method 206``` 207 208Controls whether Tensor method (`t.foo()`) or namespace Function (`at::foo()`) is 209generated as a result of this declaration. If the declaration is a method, 210you must have an argument `Tensor self` at some position in the method; 211in the method variant this argument will be elided from the argument 212list. For example, given the declaration `where(BoolTensor cond, Tensor self, Tensor other)`, 213this generates the function `at::where(cond, self, other)` and the method 214`self.where(cond, other)`. 215 216By default, ATen generates only the function variant for a native function. 217When should you also generate a method variant? Tensor operations as methods 218are appropriate for "core" Tensor operations (e.g., add, sub, etc.), but not for 219more complicated neural network layers (e.g., `conv2d`) and internal functions 220designed specifically for binding (e.g., `cudnn_convolution`). 221 222As we progress along our schema unification of the `func` schema with the JIT 223signature schema, we must introduce features that allow us to increase compliance. 224One of these features are Tensor annotations. As of now we use naming conventions 225to indicate whether an argument of a function is going to be mutated and returned. 226 227### `annotations` 228 229There are two typical situations in which we mutate the memory of an argument in the Python 230frontend: 231a) For an inplace operations such as `self.abs_()` 232b) for a function with an output keyword argument such as `torch.abs(input, out=None)`. 233 234In order to provide implementations for these Python functions the legacy schema 235requires C++ implementations for three situations `abs(Tensor self) -> Tensor`, 236`abs_(Tensor self) -> Tensor` and `abs_out(Tensor out, Tensor self) -> Tensor`. 237 238Now, as we move towards the unification, we start to use a different syntax to represent 239this by using annotations. In the end we still translate to the legacy schema for the downstream 240consumers such as the C++ code generation, but this will soon change. 241 242If two Tensors carry the same annotation, they both *may* represent the same memory. 243A write annotation, as indicated by an exclamation mark, indicates that they both *may* 244also be written to. 245 246Let's revisit the previous native function declarations and see the conventions of adding annotations. 247 - `abs(Tensor self) -> Tensor` stays the same as it will always allocate new memory. 248 - `abs_(Tensor(a!) self) -> Tensor(a!)` 249 `self` may be written to and returned. Further, the annotation indicates that the return value 250 may alias the input. This indicates an inplace function and by convention ends in a single '\_'. 251 - `abs(Tensor self, *, Tensor(a!) out) -> Tensor(a!)` 252 In the Python frontend `out` can be passed as a keyword argument and may be written to. 253 In this case it indicates the schema for a function that must accept `out` as this does not 254 provide a default argument. The idea behind representing this as a optional argument is to 255 document the intended usage. This maps to the legacy `abs_out(Tensor out, Tensor self) -> Tensor`. 256 As with the legacy `_out` function you must call the argument `Tensor out` or `Tensor out0`, 257 `Tensor out1` in the context of multiple arguments. 258 259There is also another situation in which we use annotations, namely views. 260 - `transpose(Tensor(a) self, int dim0, int dim1) -> Tensor(a)` 261 An alias to the memory represented by `self` may be also returned, however it is not mutated. 262 263When a Tensor views are contained in a Tensor list, we need to represent that the output list 264contains Tensors that alias the input. 265 - `func: chunk(Tensor(a -> *) self, int chunks, int dim=0) -> Tensor(a)[]` 266We assume lists contain memory which aliases the heap, so in order to correctly set up the aliasing 267relationship between the output and input, we annotate that the input Tensor enters the wildcard set `(a -> *)`. 268For more details, see the JIT [README](https://github.com/pytorch/pytorch/blob/master/torch/csrc/jit/OVERVIEW.md#aliasing-and-mutation-annotations-in-functionschema). 269 270We have some asserts to check whether a developer uses these annotations correctly and throw asserts 271if she doesn't. For example, any out function must use the `(a!)` annotation as described above. 272 If this causes a lot of confusion please add @cpuhrsch to your PR. 273 274### `dispatch` 275 276``` 277dispatch: 278 CPU: func_cpu 279 CUDA: func_cuda 280``` 281 282This specifies the actual name of the function you want to dispatch to, so you 283can dispatch to different functions depending on which backend the passed tensors 284belong to. Notice that custom namespaces is supported on these names, it's useful when the native function listed lives in a namespace other than the default `at::native`. Currently we support nested namespace with maximum level of 2. For example: 285``` 286dispatch: 287 CPU: custom::ns::func_cpu 288``` 289The example above hinted the native function can be found under `custom::ns::native` namespace (the trailing `::native` is added automatically). 290 291If the dispatch table is omitted, we assume a default dispatch 292table: 293 294``` 295# overload is ignored 296func: func.overload(...) -> ... 297dispatch: 298 CompositeImplicitAutograd: func 299 300# overload is ignored, but out functions get suffixed with _out in their name 301# (NB: no out functions in PyTorch today actually support autograd, but if they 302# did, you could call them here and autograd would be inferred) 303func: func.out_overload(...) -> ... 304dispatch: 305 CompositeImplicitAutograd: func_out 306``` 307 308If two backends have the same dispatch function, you can write `CPU, CUDA: func` 309to reuse the same function name in both cases. 310 311Available backend options can be found by searching `dispatch_keys` in 312[codegen](https://github.com/pytorch/pytorch/blob/master/torchgen/gen.py). 313There are also three special "generic" backends: 314 315 - `CompositeExplicitAutograd` (previously known as `DefaultBackend`): 316 implementations of kernels that work for all backends, but require an 317 explicit definition of backward function in `derivatives.yaml` to support autograd. 318 The most typical use of this key are for delegating functions; i.e., 319 functions that do a very small amount of work and then delegate to another 320 operator to do the actual heavy lifting. Under the hood, registering a 321 kernel to `CompositeExplicitAutograd` is equivalent to registering that 322 kernel to every backend (e.g., `CPU, CUDA`). Note: kernels which call 323 DispatchStub should NOT be registered as CompositeExplicitAutograd, as 324 DispatchStub only works for `CPU, CUDA`) 325 326 - `CompositeExplicitAutogradNonFunctional`: 327 Similar to CompositeExplicitAutograd, but this key should be used if: 328 (1) Your kernel is written for a non-aliasing operator. 329 (2) *and* it calls internally into an aliasing operator. 330 An example of this is select_backward, which is non-aliasing, but decomposes into select. 331 We would like to distinguish between "ordinary" CompositeExplicitAutograd kernels 332 and these kernels, because some backends would not like 333 to decompose an non-aliasing op into an aliasing op. 334 LazyTensor + XLA are the two current examples of this - since they operate on a functional IR, 335 they would prefer to directly implement a non-aliasing operator with their own kernel, 336 instead of using a decomposition that results in more aliasing operators. 337 338 - `CompositeImplicitAutograd` (previously known as `Math`): implementations of 339 kernels that work for all backends, and also can implicitly support autograd, 340 because all of the operations it calls support autograd. Direct use of 341 this key should be rare: if you provide no dispatch table, we default to 342 registering your kernel as `CompositeImplicitAutograd`. Explicitly adding 343 this key to an existing dispatch table may be useful if you have specialized 344 CPU and CUDA implementations, but you might want to provide a fallback 345 lowering for external backends that may not have a specialized 346 implementation. 347 348Functions registered to composite backends should work for any backend, if the 349nested functions they call work for those backends. 350 351For example, suppose `my_op` can be implemented in the following way: 352 353``` 354at::Tensor my_op(const Tensor& self, const Tensor& other) { 355 return self + 2 * other; 356} 357``` 358 359If we already know inference kernels and derivative formulas for operators `+` and `*` in our system, 360you can just register `my_op` to `CompositeImplicitAutograd` and both inference & autograd will just work. 361Although it seems we only write down the inference formula here, PyTorch autograd system would correctly 362set up the backward for `my_op` using the chain formula and derivatives of `+` & `*` operators. 363In other words `d_out/d_self = 1; d_out/d_other = 2` can be derived automatically from 364the `my_op` inference kernel. Of course if we don't have derivative formula defined for either `+` or `*`, 365backward of `my_op` can no longer be derived automatically. 366 367Whether to use implicit or explicit autograd for your kernel can be decided by the following steps: 3681. If you can, always start with a `CompositeImplicitAutograd` kernel that's composable from existing operators. 3692. If you don't want to use the derived gradient formula from `CompositeImplicitAutograd` kernel for autograd, either to 370 get better performance or better numerical stability, you should register the kernel with `CompositeExplicitAutograd` 371 so that it's only used in inference. 372 Later for autograd, depending on whether your autograd kernel works for all backends or not, 373 you can put them in alias `Autograd` or specific keys like `AutogradCPU`. 3743. If you prefer to write backend-specific kernels, use reserved dispatch keys for your backend instead, 375 e.g. `CPU/AutogradCPU`. 376 377**Important**: because a `CompositeImplicitAutograd` kernel is implicitly registered for ops with no `dispatch:` section, 378when you add a backend-specific kernel (and hence a `dispatch:` section) to one of these, you **must** also 379add a `CompositeImplicitAutograd:` entry that names the old kernel implementation (it's named after the op, with _<overload name> 380added if applicable), so that it's still available for other backends to use. 381 382If you implemented a native function in C++ and want to find out which dispatch keyword 383should be used in native_functions.yaml, please [follow steps in dispatch keywords](#choosing-the-right-dispatch-keyword) 384 385### Composite Compliance 386 387Definition: a "composite function" is an Operator registered as 388CompositeImplicitAutograd or a (Python or C++) function that consists of PyTorch 389operations. Examples of the latter include backward formulas and forward-mode AD formulas. 390 391Composite functions defined in the PyTorch library MUST work for most, if not 392all, backends/subclasses. This means that we impose a set of constraints that make it more 393difficult to write composite functions inside PyTorch library code than users 394writing PyTorch code. 395 396If you wish to do something that is banned (you may wish to do this for perf 397reasons), please write a backwards formula for your function so it is no longer 398hide parts of the function in a new aten operator that is not CompositeImplicitAutograd. 399 400Composite functions may not: 401- call `resize_` or moral equivalents. These are tricky to handle for 402many backends, like vmap and meta. 403- call `out=` operations. These are impossible to handle for vmap and can cause 404dispatch-to-python objects to lose their subclassing. 405- Change the metadata of a Tensor without performing dispatches. Examples of these 406operations are directly accessing the TensorImpl API to modify the 407sizes/strides/metadata of a Tensor. 408- In the same vein as the last point, `data_ptr` access or `item` access are not 409allowed. These operations do not go through the dispatcher. 410- `copy_` is a marginal case. If you're able to rewrite your operation without 411`copy_` you should definitely do so; this should be trivial if you're not copy-ing 412into a view. Otherwise, it is fine to leave the code as-is. 413 414We have CompositeImplicitAutograd compliance tests in `test/test_ops.py`. These 415tests aren't perfect (it's pretty difficult to check for all of the above) so if 416something looks wrong please shout. 417 418### `device_guard` 419 420``` 421device_guard: False 422``` 423 424By default, ATen code generation will generate a DeviceGuard invocation, 425which will ensure that kernel code will run with the current device set 426to match the device of the first Tensor argument (or first tensor of 427the first Tensor[] argument, if the function takes a list of tensors). 428For the most part, this means kernel authors do not have to worry about 429setting devices. 430 431However, in some cases, setting the device is unnecessary, because, 432e.g., you call a function already manages device guard setting, or 433you're a function that simply does not interact with any devices. In 434that case, code generation of the device guard can be disabled by adding 435`device_guard: False` to your function definition. 436 437### `device_check` 438 439``` 440device_check: NoCheck 441``` 442 443By default, ATen code generation will generate device check, 444which will ensure all the tensor parameters passed to kernel are 445on the same device. 446 447However, in some cases, checking the device is unnecessary, because, 448e.g., you call a function allows to work on multiple devices. 449In that case, code generation of the device check can be disabled by adding 450`device_check: NoCheck` to your function definition. 451 452### `manual_kernel_registration` 453 454``` 455manual_kernel_registration: True 456``` 457 458With this flag set, we will not generate code to automatically register the C++ operator implementation 459to TypeDefault (catchAll dispatch key) with the dispatcher. 460It doesn't make sense to have both `dispatch` section and `manual_kernel_registration: True` for the same op. 461You can find the manual registrations in torch/csrc/autograd/VariableTypeManual.cpp. 462Currently ops have this field set to True should match `MANUAL_CATCHALL` in tools/autograd/gen_variable_type.py 463(It can be a superset of `MANUAL_CATCHALL` but we don't have a use case for it). 464This field should only be used rarely. 465 466### `use_const_ref_for_mutable_tensors` 467 468``` 469use_const_ref_for_mutable_tensors: True 470``` 471 472With this flag set, we will generate arguments for Tensors whose underlying data may change as 473`const Tensor&` (or similar), just like we would for other Tensors. Previously, we generated these 474as `Tensor &`, which 1) allowed changing which `TensorImpl` the `Tensor` itself referred to and 2) 475was not necessary to allow the underlying data to change. (This was like using `T * const` when we 476wanted `const T*`.) 477 478### `autogen` 479 480``` 481- func: my_op_(Tensor(a!) self) -> Tensor(a!) 482... 483 autogen: my_op, my_op.out 484``` 485 486`autogen` keyword is being used to specify which native function the codegen system should generate 487implementations for. 488* For an in-place variant of a native function (op name ends with an `_`), we will generate a functional 489variant and an out= variant. 490* If a functional variant is given, we generate an out= variant. 491* We don't support `autogen` for view ops, ops that bypass the dispatcher as well as composite ops. 492 493We also generate kernels for generated ops, which merely copy and return the result from the base ops. 494These generated kernels can be found in `<gen-out>/aten/src/ATen/CompositeViewCopyKernels.cpp`. 495 496Also notice that for new operators being added to `native_functions.yaml`, if they satisfy the requirements 497mentioned above, they should include `autogen` keyword, since functionalization depends on it. We will 498enforce this in codegen. 499 500 501## Writing an implementation in C++ 502 503Implementations of native functions go in an appropriate C++ file in the 504`native/` directory (they are organized roughly by topic, but there is no 505semantic meaning to their organization aside for the `cuda` directory, 506which is the only place the build system knows how to build `cu` files.) 507To write a native function, you only need to write a C++ 508implementation (no header necessary) with a matching signature to 509the generated header from the ATen metadata. There are many 510simple native functions; take a look at some of them to see what to do. 511 512Although writing an ATen function is mostly writing the algorithm you want 513to implement, there are some less obvious details you should also consider. 514 515### Will your function be automatically differentiable? 516 517If you are writing a pair of functions `foo` and `foo_backward`, with 518the intent that `foo_backward` implements the derivative of `foo`, then 519your implementation of `foo` is probably not automatically differentiable: 520it might make use of functions like `data_ptr()` or it dispatches differently 521depending on if it's operating on CPU or CUDA tensors. Once you write these two functions, 522you will have to write an entry correlating them together in 523`tools/autograd/derivatives.yaml`. 524 525However, in some situations, you can write a function in ATen and it 526will be automatically differentiated! This can be the case if the function implementation 527only calls other operations which are themselves differentiable. In this 528case, you don't have to write an entry in `tools/autograd/derivatives.yaml`. 529 530### Choosing the right dispatch keyword 531 532After writing a native function in C++, it's important to think about which dispatch keyword 533to use in native_functions.yaml as it gives the dispatcher information about backend and autograd support 534of the implementation. 535 536Here're steps to follow to decide the right dispatch keyword: 537 5381. Think about inference: does your kernel work for all backends? 539 540 - No: you're likely providing different kernels for different backends, e.g. 541 backend-dependent logic is used in the implementation or it's implemented through DispatchStub. 542 DispatchStub only support a backend if you explicitly provide a kernel through `REGISTER_DISPATCH`. 543 Typically it only supports a few in-tree backends like CPU, CUDA, QuantizedCPU etc but not 544 out-of-tree backends like XLA. 545 Write a dispatch section, enumerate all supported backends and point them to the implementations. 546 ``` 547 dispatch: 548 CPU: kernel_cpu 549 CUDA: kernel_cuda 550 QuantizedCPU: kernel_quantized_cpu 551 ``` 552 553 You're done. Now this op will be called in `CPU/CUDA/QuantizedCPU` backend inference! 554 555 Note: to support training, you're required to write a formula in 556 derivatives.yaml since your backend implementations don't support autograd. 557 558 - Yes: you're likely calling other `at::` ops in the implementation. Go to step 2. 559 5602. Think about training: does your kernel support autograd? [check autograd support](#will-your-function-be-automatically-differentiable) 561 - Yes: in other words, you're providing a `CompositeImplicitAutograd` kernel which supports both inference and autograd. 562 To use autograd support for training, simply skip adding a dispatch 563 section and you're done. This will allow this op to be correctly 564 registered for both inference and training. 565 566 - Yes, but you still want to provide a numerically stable gradient formula instead of using autograd, write 567 ``` 568 dispatch: 569 CompositeExplicitAutograd: kernel 570 ``` 571 572 You're done. This op will be called in inference for all backends. 573 574 Note: to support training you're required to add an autograd formula, 575 or it'll error out in backward pass when calling with a Tensor has requires_grad=True. 576 577 - No: ops in this category are mainly using `_out` boilerplate where its out version doesn't have a derivative 578 formula defined. For example: 579 ``` 580 Tensor& sign_out(Tensor& result, const Tensor& self) { return unary_op_impl_out(result, self, sign_stub); } 581 Tensor sign(const Tensor& self) { return unary_op_impl(self, at::sign_out); } 582 Tensor& sign_(Tensor& self) { return unary_op_impl_(self, at::sign_out); } 583 ``` 584 585 `sign_out` uses DispatchStub so the supported backends are enumerated in its dispatch section. 586 For `sign` and `sign_`, write 587 ``` 588 dispatch: 589 CompositeExplicitAutograd: kernel 590 ``` 591 592 You're done. This op will be called in inference for all backends. 593 594 Note: to support training you're required to add an autograd formula for `sign`, 595 or it'll error out in backward pass when calling with a Tensor has requires_grad=True. 596 597 Note: current plan on record for ops using this boilerplate is to replace `at::` with `at::native` in 598 the implementations and add dispatch section with device keywords instead. 5993. Validate the computed dispatch table matches what you want. You can use `PythonDispatcher` provided in 600[torch/_python_dispatcher.py](https://github.com/pytorch/pytorch/blob/master/torch/_python_dispatcher.py). 601It shows for a certain operator, what the computed dispatch table looks like after your registrations. 602 603 ``` 604 dispatcher = PythonDispatcher() 605 dispatcher.register(["CPU", "XLA", "AutogradCPU", "CompositeImplicitAutograd"]) 606 print(dispatcher.dispatchTable()) # Tells you exactly which kernel is used for certain backend. 607 ``` 608 6094. TODO: AutogradCPUOrCUDA 610 611Note that in native_functions.yaml you can mix using backend keywords and alias keywords above for one op: 612 - direct registration to backend always has higher precedence than alias 613 - DO NOT provide multiple alias keywords to the same op: alias keywords have precedence `CompositeExplicitAutograd > CompositeImplicitAutograd`, 614 e.g. adding both `CompositeImplicitAutograd` and `CompositeExplicitAutograd` kernels for one op will completely ignore `CompositeImplicitAutograd` kernel for 615 both inference and training. Thus this will trigger an error when native_functions.yaml is parsed. 616 617 618 619### Will this function be exposed to python? What are the namespaces? 620 621We don't generate python bindings for all functions. There're certain patterns in function 622name that we skip in python binding generation, e.g. `*_backward`. Check 623`tools/autograd/gen_python_functions.py` for the latest rules. 624 625The generated bindings are either exposed as methods on python_variable or functions on 626the torch._C._nn (marked with `python_module: nn`), 627torch._C._fft (marked with `python_module: fft`), 628torch._C._linalg (marked with `python_module: linalg`) objects, 629torch._C._sparse (marked with `python_module: sparse`) objects, 630torch._C._special (marked with `python_module: special`) objects, 631or torch._C._nested (marked with `python_module: nested`) objects. 632 633### Undefined tensor conventions 634 635By default, `Tensor` arguments to ATen functions are always defined, unless 636you explicitly specified that an undefined tensor was permissible by writing 637`Tensor?` or `Tensor? x=[]`, the latter one is needed when you have to assign 638a default value in C++ (e.g. in the middle of other parameters with default values). 639 640The rules for returning undefined Tensors are a bit more subtle, but there 641is only one case you have to remember: 642 643* If the function in question is a backward function which accepts a 644 `std::array<bool,N> output_mask` argument, you MUST return an undefined 645 `Tensor` at every tuple position `i` for which `output_mask[i]` is false, otherwise 646 647* You MUST NOT return an undefined tensor. 648 649The most common situations where you might be tempted to return undefined tensors 650are when: 651 652- You have a forward function that may return a buffer if training is enabled, but does not 653 return the buffer in inference mode. In this case, just return an appropriately 654 typed zero-size tensor. 655 656- You have a backward function where the gradient for an input is zero. In this case, you 657 are expected to create a zero-filled tensor of appropriate size to return for this input. 658 To get the shape, it may be helpful to take a `TensorGeometry` of the input to use. 659 660### Debugging tips 661 662If you build ATen and get a linker error, that probably means you copy-pasted 663the C++ definition of your function incorrectly. Double check your `Tensor` 664arguments, and make sure you wrote `const Tensor&` in your signature. 665