functorch/dim/README.md

*da0073e9SAndroid Build Coastguard WorkerNamed Tensors using First-class Dimensions in PyTorch
*da0073e9SAndroid Build Coastguard Worker=====================================================
*da0073e9SAndroid Build Coastguard Worker
*da0073e9SAndroid Build Coastguard Worker-- Zachary DeVito [@Zachary_DeVito](https://twitter.com/Zachary_DeVito)
*da0073e9SAndroid Build Coastguard Worker
*da0073e9SAndroid Build Coastguard Worker_An implementation of [named tensors](https://namedtensor.github.io) with the functionality of [einsum](http://einops.rocks]http://einops.rocks) , batching ([vmap](https://jax.readthedocs.io/en/latest/jax.html#vectorization-vmap), [xmap](https://jax.readthedocs.io/en/latest/notebooks/xmap_tutorial.html)), and tensor indexing by adding dimension objects to PyTorch_.
*da0073e9SAndroid Build Coastguard Worker
*da0073e9SAndroid Build Coastguard WorkerThe tensor input to a resnet might have the shape [8, 3, 224, 224] but informally we think of those dimensions as 'batch', 'channel', 'width', and 'height'. Eventhough 'width' and 'height' have the same _size_ we still think of them as separate dimensions, and if we have two _different_ images, we think of both as sharing the _same_ 'channel' dimension.
*da0073e9SAndroid Build Coastguard Worker
*da0073e9SAndroid Build Coastguard WorkerNamed tensors gives these dimensions names. [PyTorch's current implementation](https://pytorch.org/docs/stable/named_tensor.html) uses strings to name dimensions. Instead, this library introduces a Python object, a `Dim`, to represent the concept. By expanding the semantics of tensors with dim objects, in addition to naming dimensions, we can get behavior equivalent to batching transforms (xmap, vmap), einops-style rearrangement, and loop-style tensor indexing.
*da0073e9SAndroid Build Coastguard Worker
*da0073e9SAndroid Build Coastguard WorkerA preview:
*da0073e9SAndroid Build Coastguard Worker
*da0073e9SAndroid Build Coastguard Worker```py
*da0073e9SAndroid Build Coastguard Workerfrom torchdim import dims
*da0073e9SAndroid Build Coastguard Worker
*da0073e9SAndroid Build Coastguard Worker# einsum
*da0073e9SAndroid Build Coastguard Workerdef mm(A: torch.Tensor, B: torch.Tensor):
*da0073e9SAndroid Build Coastguard Worker    i, j, k = dims(3)
*da0073e9SAndroid Build Coastguard Worker    r = (A[i, k] * B[k, j]).sum(k)
*da0073e9SAndroid Build Coastguard Worker    return r.order(i, j)
*da0073e9SAndroid Build Coastguard Worker
*da0073e9SAndroid Build Coastguard Worker# rearrange
*da0073e9SAndroid Build Coastguard Workerdef pixel_shuffle(img: torch.Tensor, upscale_factor=2):
*da0073e9SAndroid Build Coastguard Worker    h2, w2, c, b, h, w = dims(6)
*da0073e9SAndroid Build Coastguard Worker    h2.size = w2.size = upscale_factor
*da0073e9SAndroid Build Coastguard Worker    return img[b, (c, h2, w2), h, w].order(b, c, (h, h2), (w, w2))
*da0073e9SAndroid Build Coastguard Worker
*da0073e9SAndroid Build Coastguard Worker# batching
*da0073e9SAndroid Build Coastguard Workerdef bmm(A: torch.Tensor, B: torch.Tensor):
*da0073e9SAndroid Build Coastguard Worker    i = dims(1)
*da0073e9SAndroid Build Coastguard Worker    return mm(A[i], B[i]).order(i)
*da0073e9SAndroid Build Coastguard Worker
*da0073e9SAndroid Build Coastguard Worker# indexing
*da0073e9SAndroid Build Coastguard Workerdef embedding_bag(input: torch.Tensor, embedding_weights: torch.Tensor):
*da0073e9SAndroid Build Coastguard Worker    batch, sequence, features = dims(3)
*da0073e9SAndroid Build Coastguard Worker    r = embedding_weights[input[batch, sequence], features].sum(sequence)
*da0073e9SAndroid Build Coastguard Worker    return r.order(batch, features)
*da0073e9SAndroid Build Coastguard Worker```
*da0073e9SAndroid Build Coastguard Worker
*da0073e9SAndroid Build Coastguard WorkerInstallation
*da0073e9SAndroid Build Coastguard Worker============
*da0073e9SAndroid Build Coastguard Worker
*da0073e9SAndroid Build Coastguard Worker
*da0073e9SAndroid Build Coastguard Worker_torchdim is a preview release so that we can collect feedback on the API. It may have bugs, and there are known places where performance can be improved._
*da0073e9SAndroid Build Coastguard Worker
*da0073e9SAndroid Build Coastguard WorkerFirst-class dims are a library that extends PyTorch, so they need to be installed separately.
*da0073e9SAndroid Build Coastguard WorkerWe may eventually upstream them into PyTorch itself along with `functorch`.
*da0073e9SAndroid Build Coastguard Worker
*da0073e9SAndroid Build Coastguard Worker
*da0073e9SAndroid Build Coastguard WorkerWe have to install a nightly build of PyTorch so first set up an environment:
*da0073e9SAndroid Build Coastguard Worker
*da0073e9SAndroid Build Coastguard Worker```sh
*da0073e9SAndroid Build Coastguard Workerconda create --name dim
*da0073e9SAndroid Build Coastguard Workerconda activate dim
*da0073e9SAndroid Build Coastguard Worker```
*da0073e9SAndroid Build Coastguard Worker
*da0073e9SAndroid Build Coastguard WorkerFirst-class dims requires a fairly recent nightly build of PyTorch so that functorch will work. You can install it using one of these commands:
*da0073e9SAndroid Build Coastguard Worker
*da0073e9SAndroid Build Coastguard Worker```sh
*da0073e9SAndroid Build Coastguard Worker# For CUDA 10.2
*da0073e9SAndroid Build Coastguard Workerconda install pytorch torchvision torchaudio cudatoolkit=10.2 -c pytorch-nightly
*da0073e9SAndroid Build Coastguard Worker# For CUDA 11.3
*da0073e9SAndroid Build Coastguard Workerconda install pytorch torchvision torchaudio cudatoolkit=11.3 -c pytorch-nightly
*da0073e9SAndroid Build Coastguard Worker# For CPU-only build
*da0073e9SAndroid Build Coastguard Workerconda install pytorch torchvision torchaudio cpuonly -c pytorch-nightly
*da0073e9SAndroid Build Coastguard Worker```
*da0073e9SAndroid Build Coastguard Worker
*da0073e9SAndroid Build Coastguard WorkerInstall dim. You will be asked for github credentials to access the fairinternal organization.
*da0073e9SAndroid Build Coastguard Worker
*da0073e9SAndroid Build Coastguard Worker```sh
*da0073e9SAndroid Build Coastguard Workerpip install ninja  # Makes the build go faster
*da0073e9SAndroid Build Coastguard Workerpip install --user "git+https://github.com/facebookresearch/torchdim"
*da0073e9SAndroid Build Coastguard Worker```
*da0073e9SAndroid Build Coastguard Worker
*da0073e9SAndroid Build Coastguard WorkerCreating and Binding Dims
*da0073e9SAndroid Build Coastguard Worker=========================
*da0073e9SAndroid Build Coastguard Worker
*da0073e9SAndroid Build Coastguard WorkerPython objects that represent dimension are created using the `dims` operator.[^1]
*da0073e9SAndroid Build Coastguard Worker
*da0073e9SAndroid Build Coastguard Worker```py
*da0073e9SAndroid Build Coastguard Workerimport torch
*da0073e9SAndroid Build Coastguard Workerfrom torchdim import dims
*da0073e9SAndroid Build Coastguard Worker
*da0073e9SAndroid Build Coastguard Workerbatch, channel, width, height = dims(4)
*da0073e9SAndroid Build Coastguard Worker```
*da0073e9SAndroid Build Coastguard Worker
*da0073e9SAndroid Build Coastguard WorkerThe existing implementation of [Named Tensors](https://pytorch.org/docs/stable/named_tensor.html) in PyTorch, or [JAX's xmap](https://jax.readthedocs.io/en/latest/notebooks/xmap_tutorial.html) use strings to name dimensions. We call these dimensions _first class_ because they are Python objects.
*da0073e9SAndroid Build Coastguard Worker
*da0073e9SAndroid Build Coastguard WorkerIn addition to the normal _positional_ dimensions in a tensor, tensors can also have a separate set of first-class dimensions.
*da0073e9SAndroid Build Coastguard Worker
*da0073e9SAndroid Build Coastguard WorkerYou can create tensors with first-class dimensions by indexing the normal positional dimensions of a tensor with a dimension object. The `ndim` property continues to list the number of positional dimensions, while the new `dims` property lists all the bound first-class dimensions.
*da0073e9SAndroid Build Coastguard Worker
*da0073e9SAndroid Build Coastguard Worker```py
*da0073e9SAndroid Build Coastguard Workerinput = torch.rand(2, 3, 224, 224)
*da0073e9SAndroid Build Coastguard Workerprint(input.ndim)
*da0073e9SAndroid Build Coastguard Worker> 4
*da0073e9SAndroid Build Coastguard Worker
*da0073e9SAndroid Build Coastguard Workerinput_fc = input[batch, channel, width, height]
*da0073e9SAndroid Build Coastguard Workerprint(input_fc.dims) # first class dimensions
*da0073e9SAndroid Build Coastguard Worker> (batch, channel, width, height)
*da0073e9SAndroid Build Coastguard Worker
*da0073e9SAndroid Build Coastguard Worker
*da0073e9SAndroid Build Coastguard Worker# since we converted all the positional dimensions
*da0073e9SAndroid Build Coastguard Worker# first class `input_fc` has 0 positional dimensions now.
*da0073e9SAndroid Build Coastguard Workerprint(input_fc.ndim)
*da0073e9SAndroid Build Coastguard Worker> 0
*da0073e9SAndroid Build Coastguard Worker```
*da0073e9SAndroid Build Coastguard Worker
*da0073e9SAndroid Build Coastguard WorkerNotice that indexing creates a _new_ Tensor, `input_fc` with bound first-class dimensions. It does not modify the original tensor `input`, which still has 4 positional dimensions.
*da0073e9SAndroid Build Coastguard Worker
*da0073e9SAndroid Build Coastguard Worker```py
*da0073e9SAndroid Build Coastguard Workerprint(input.ndim) # unchanged
*da0073e9SAndroid Build Coastguard Worker> 4
*da0073e9SAndroid Build Coastguard Worker```
*da0073e9SAndroid Build Coastguard Worker
*da0073e9SAndroid Build Coastguard WorkerImportantly, indexing with square brackets _applies only to positional dimensions_, so attempting to index a tensor with only first class dims will error[^2]:
*da0073e9SAndroid Build Coastguard Worker
*da0073e9SAndroid Build Coastguard Worker```py
*da0073e9SAndroid Build Coastguard Workertry:
*da0073e9SAndroid Build Coastguard Worker    input_fc[0]
*da0073e9SAndroid Build Coastguard Workerexcept ValueError as ve:
*da0073e9SAndroid Build Coastguard Worker    print(ve)
*da0073e9SAndroid Build Coastguard Worker> at least 1 indices were supplied but the tensor only has 0 dimensions
*da0073e9SAndroid Build Coastguard Worker```
*da0073e9SAndroid Build Coastguard Worker
*da0073e9SAndroid Build Coastguard WorkerGenerally, it is possible to construct tensors with a mixture of positional and first class dimensions:
*da0073e9SAndroid Build Coastguard Worker
*da0073e9SAndroid Build Coastguard Worker```py
*da0073e9SAndroid Build Coastguard Workerinput_mixed = input[batch, :, :, height]
*da0073e9SAndroid Build Coastguard Workerprint(input_mixed.dims)
*da0073e9SAndroid Build Coastguard Worker> (batch, height)
*da0073e9SAndroid Build Coastguard Worker
*da0073e9SAndroid Build Coastguard Workerprint(input_mixed.ndim)
*da0073e9SAndroid Build Coastguard Worker> 2
*da0073e9SAndroid Build Coastguard Worker```
*da0073e9SAndroid Build Coastguard Worker
*da0073e9SAndroid Build Coastguard WorkerDimension Sizes
*da0073e9SAndroid Build Coastguard Worker---------------
*da0073e9SAndroid Build Coastguard Worker
*da0073e9SAndroid Build Coastguard WorkerDimensions will take on the size of the first thing they are bound to:
*da0073e9SAndroid Build Coastguard Worker
*da0073e9SAndroid Build Coastguard Worker```py
*da0073e9SAndroid Build Coastguard Workerinput = torch.rand(3)
*da0073e9SAndroid Build Coastguard Workerx = dims(1)
*da0073e9SAndroid Build Coastguard Workerinput_fc = input[x]
*da0073e9SAndroid Build Coastguard Workerprint(x.size)
*da0073e9SAndroid Build Coastguard Worker> 3
*da0073e9SAndroid Build Coastguard Worker```
*da0073e9SAndroid Build Coastguard Worker
*da0073e9SAndroid Build Coastguard WorkerBut you can also directly set the size of dimension:
*da0073e9SAndroid Build Coastguard Worker
*da0073e9SAndroid Build Coastguard Worker```py
*da0073e9SAndroid Build Coastguard Workeri = dims(1)
*da0073e9SAndroid Build Coastguard Worker
*da0073e9SAndroid Build Coastguard Workeri.size = 5 # ok, i previously did not have a size
*da0073e9SAndroid Build Coastguard Worker
*da0073e9SAndroid Build Coastguard Workeri.size = 5 # ok, it already had the size 5
*da0073e9SAndroid Build Coastguard Workertry:
*da0073e9SAndroid Build Coastguard Worker    i.size = 3
*da0073e9SAndroid Build Coastguard Workerexcept Exception as e:
*da0073e9SAndroid Build Coastguard Worker    print(e)
*da0073e9SAndroid Build Coastguard Worker> Dim 'i' previously bound to a dimension of size 5 cannot bind to a dimension of size 3
*da0073e9SAndroid Build Coastguard Worker
*da0073e9SAndroid Build Coastguard Workerj = dims(sizes=[4]) # can also be set on construction
*da0073e9SAndroid Build Coastguard Worker```
*da0073e9SAndroid Build Coastguard Worker
*da0073e9SAndroid Build Coastguard Worker[^1]: We use a bit of Python introspection to set the debug names for the dimensions based on the names of the variables they are assigned to.
*da0073e9SAndroid Build Coastguard Worker[^2]: Indexing of first-class dimensions can be done with the `index` method by specifying the dimension to be index into (e.g. `input_fc.index(batch, 0)`.
*da0073e9SAndroid Build Coastguard Worker
*da0073e9SAndroid Build Coastguard WorkerSemantics of Dimensions
*da0073e9SAndroid Build Coastguard Worker=======================
*da0073e9SAndroid Build Coastguard WorkerThe power of named tensors arises from how the first-class dimensions in the Tensors composed with existing operations.
*da0073e9SAndroid Build Coastguard Worker
*da0073e9SAndroid Build Coastguard WorkerThree rules define how dimension objects behave with existing Tensors.
*da0073e9SAndroid Build Coastguard Worker
*da0073e9SAndroid Build Coastguard WorkerRule 1: Implicit Batching
*da0073e9SAndroid Build Coastguard Worker-------------------------
*da0073e9SAndroid Build Coastguard Worker**Tensor operations (e.g. `input + bias`) are implicitly batched over the union of the first-class dimensions in their inputs.**
*da0073e9SAndroid Build Coastguard Worker
*da0073e9SAndroid Build Coastguard WorkerIf `input` has dimensions `batch, channel` and `bias` has dimension `channel`, the output will have the union of those dimensions (`batch, channel`), and the result will be computed as if there was a loop over all the first-class dimensions.[^3]
*da0073e9SAndroid Build Coastguard Worker
*da0073e9SAndroid Build Coastguard Worker```py
*da0073e9SAndroid Build Coastguard Workerinput_positional = torch.rand(128, 32)
*da0073e9SAndroid Build Coastguard Workerbias_positional = torch.rand(32)
*da0073e9SAndroid Build Coastguard Worker
*da0073e9SAndroid Build Coastguard Workerbatch, channel = dims(2)
*da0073e9SAndroid Build Coastguard Workerinput = input_positional[batch, channel]
*da0073e9SAndroid Build Coastguard Workerbias = bias_positional[channel]
*da0073e9SAndroid Build Coastguard Worker
*da0073e9SAndroid Build Coastguard Workerresult = input + bias
*da0073e9SAndroid Build Coastguard Workerprint(result.dims)
*da0073e9SAndroid Build Coastguard Worker> (batch, channel)
*da0073e9SAndroid Build Coastguard Worker```
*da0073e9SAndroid Build Coastguard Worker
*da0073e9SAndroid Build Coastguard WorkerIt is helpful to think of operators on tensors with first-class dimensions by analogy to code with explicit loops over dimensions, with the first-class dimensions of the inputs acting as implicit `for` loops, and the values in the tensor being scalars within the body of the loop:
*da0073e9SAndroid Build Coastguard Worker
*da0073e9SAndroid Build Coastguard Worker```py
*da0073e9SAndroid Build Coastguard Worker# mental model: loop-level analogy
*da0073e9SAndroid Build Coastguard Workerfor batch in range(batch.size):
*da0073e9SAndroid Build Coastguard Worker    for channel in range(channel.size):
*da0073e9SAndroid Build Coastguard Worker        input = input_positional[batch, channels]
*da0073e9SAndroid Build Coastguard Worker        bias = bias_positional[channels]
*da0073e9SAndroid Build Coastguard Worker        result[batch, channels] =  input + bias # arithmetic on scalars
*da0073e9SAndroid Build Coastguard Worker```
*da0073e9SAndroid Build Coastguard Worker
*da0073e9SAndroid Build Coastguard WorkerPositional dimensions behave as they did before (e.g. for + they will broadcast), and can be thought of as being a standard tensor _used within the implicit loops_ defined by first-class dimensions.
*da0073e9SAndroid Build Coastguard Worker
*da0073e9SAndroid Build Coastguard WorkerIn this example, we broke down the expression into lines that bind the dimension to positional tensors and then another line to do the compute. In practice, we often combine these in one statement:
*da0073e9SAndroid Build Coastguard Worker
*da0073e9SAndroid Build Coastguard Worker```py
*da0073e9SAndroid Build Coastguard Workerresult = input_positional[batch, channel] + bias_positional[channel]
*da0073e9SAndroid Build Coastguard Workerresult.dims
*da0073e9SAndroid Build Coastguard Worker```
*da0073e9SAndroid Build Coastguard Worker
*da0073e9SAndroid Build Coastguard Worker[^3] This rule is similar to how named dimensions in xmap behave within a function, but instead of introducing the dimensions via a functional transform, they are bound on the objects using indexing.
*da0073e9SAndroid Build Coastguard Worker
*da0073e9SAndroid Build Coastguard Worker
*da0073e9SAndroid Build Coastguard WorkerRule 2: Specifying dimensions
*da0073e9SAndroid Build Coastguard Worker-----------------------------
*da0073e9SAndroid Build Coastguard Worker**Wherever an integer is used to specify a dimension in the existing torch operator, a first-class dimensions can be used instead to tell the operator to work over that dimension.**
*da0073e9SAndroid Build Coastguard Worker
*da0073e9SAndroid Build Coastguard Worker```py
*da0073e9SAndroid Build Coastguard Workerbatch, channel, width, height = dims(4)
*da0073e9SAndroid Build Coastguard Workerinput_positional = torch.rand(2, 3, 224, 224)
*da0073e9SAndroid Build Coastguard Workerinput = input_positional[batch, channel, width, height]
*da0073e9SAndroid Build Coastguard Workeravg_pixel_color = input.mean((width, height))
*da0073e9SAndroid Build Coastguard Worker
*da0073e9SAndroid Build Coastguard Workerprint(avg_pixel_color.dims)
*da0073e9SAndroid Build Coastguard Worker> (batch, channel)
*da0073e9SAndroid Build Coastguard Worker```
*da0073e9SAndroid Build Coastguard Worker
*da0073e9SAndroid Build Coastguard WorkerAny other first-class dimensions (e.g. batch, channel) are still implicitly batched according to Rule #1.
*da0073e9SAndroid Build Coastguard Worker
*da0073e9SAndroid Build Coastguard WorkerRule 3: Dims are Tensors
*da0073e9SAndroid Build Coastguard Worker------------------------
*da0073e9SAndroid Build Coastguard Worker**A first-class dimension `d` can be used wherever a Tensor is expected. It will act as if it were a tensor whose only dimension is itself, `d`, and the values along the dimension are the indices of each entry `(0, 1, 2, ..., d.size - 1)`**
*da0073e9SAndroid Build Coastguard Worker
*da0073e9SAndroid Build Coastguard Worker```py
*da0073e9SAndroid Build Coastguard Workerprint(channel.dims)
*da0073e9SAndroid Build Coastguard Worker> (channel,)
*da0073e9SAndroid Build Coastguard Worker
*da0073e9SAndroid Build Coastguard Workerprint(channel + 1000)
*da0073e9SAndroid Build Coastguard Worker> tensor([1000, 1001, 1002])
*da0073e9SAndroid Build Coastguard Worker> with dims=(channel,) sizes=(3,)
*da0073e9SAndroid Build Coastguard Worker```
*da0073e9SAndroid Build Coastguard Worker
*da0073e9SAndroid Build Coastguard WorkerThis means that a dimension used as a tensor acts as an index into that dimension. Going back to our loop-level analogy, it is analogous to using the loop variable as a value:
*da0073e9SAndroid Build Coastguard Worker
*da0073e9SAndroid Build Coastguard Worker```py
*da0073e9SAndroid Build Coastguard Worker# mental model: loop-level analogy
*da0073e9SAndroid Build Coastguard Workerfor channel in range(batch.size):
*da0073e9SAndroid Build Coastguard Worker    result[channel] = channel + 1000
*da0073e9SAndroid Build Coastguard Worker```
*da0073e9SAndroid Build Coastguard Worker
*da0073e9SAndroid Build Coastguard WorkerArithmetic using dimension indices comes up a lot, such as the mask for an upper triangular part of a matrix. Using dims as tensors makes it easy:
*da0073e9SAndroid Build Coastguard Worker
*da0073e9SAndroid Build Coastguard Worker```py
*da0073e9SAndroid Build Coastguard Workerfrom torchdim import dims
*da0073e9SAndroid Build Coastguard Workeri, j = dims(sizes=[4, 4])
*da0073e9SAndroid Build Coastguard Workerprint(i <= j)
*da0073e9SAndroid Build Coastguard Worker> tensor([[ True,  True,  True,  True],
*da0073e9SAndroid Build Coastguard Worker>         [False,  True,  True,  True],
*da0073e9SAndroid Build Coastguard Worker>         [False, False,  True,  True],
*da0073e9SAndroid Build Coastguard Worker>         [False, False, False,  True]])
*da0073e9SAndroid Build Coastguard Worker> with dims=(i, j) sizes=(4, 4)
*da0073e9SAndroid Build Coastguard Worker```
*da0073e9SAndroid Build Coastguard Worker
*da0073e9SAndroid Build Coastguard WorkerBecause of the intentional similarity to loop-level code, using dimensions as tensors makes complicated indexing arithmetic easier to read.
*da0073e9SAndroid Build Coastguard Worker
*da0073e9SAndroid Build Coastguard WorkerHere is code that lookups up features in an embedding table given a sequence of ids:
*da0073e9SAndroid Build Coastguard Worker
*da0073e9SAndroid Build Coastguard Worker```py
*da0073e9SAndroid Build Coastguard Workersequence, features = dims(2)
*da0073e9SAndroid Build Coastguard Workerembeddings = torch.rand(8, 128)
*da0073e9SAndroid Build Coastguard Workerwords = torch.tensor([5, 4, 0,])
*da0073e9SAndroid Build Coastguard Worker
*da0073e9SAndroid Build Coastguard Workerstate = embeddings[words[sequence], features]
*da0073e9SAndroid Build Coastguard Workerprint(state.dims)
*da0073e9SAndroid Build Coastguard Worker> (sequence, features)
*da0073e9SAndroid Build Coastguard Worker```
*da0073e9SAndroid Build Coastguard Worker
*da0073e9SAndroid Build Coastguard WorkerWith the following analogy to loops:
*da0073e9SAndroid Build Coastguard Worker
*da0073e9SAndroid Build Coastguard Worker```py
*da0073e9SAndroid Build Coastguard Worker# mental model: loop-level analogy
*da0073e9SAndroid Build Coastguard Worker
*da0073e9SAndroid Build Coastguard Workerfor sequence in range(words.size(0)):
*da0073e9SAndroid Build Coastguard Worker    for features in range(embeddings.size(1)):
*da0073e9SAndroid Build Coastguard Worker        state = embeddings[words[sequence], features]
*da0073e9SAndroid Build Coastguard Worker```
*da0073e9SAndroid Build Coastguard Worker
*da0073e9SAndroid Build Coastguard WorkerEarlier we showed how binding tensors dimension is done with indexing `A[i, j]`. In fact, this binding is just the normal indexing operator. Its behavior follows directly from the behavior of indexing with tensor indices combined with Rule #3 and Rule #1. The expression `A[i + 1, j]` also creates a tensor with dimensions `i` and `j` but with different indexing math. The implementation knows when simple indexing patterns are used and only actually runs a kernel to do indexing when needed.
*da0073e9SAndroid Build Coastguard Worker
*da0073e9SAndroid Build Coastguard WorkerUnbinding Dims
*da0073e9SAndroid Build Coastguard Worker-------------
*da0073e9SAndroid Build Coastguard WorkerThe `order` method converts first-class dimensions in a tensor back to normal positional dimensions by specifying an order for those dimensions.[^4]
*da0073e9SAndroid Build Coastguard Worker
*da0073e9SAndroid Build Coastguard WorkerBy specifying a different order from how things were originally bound, it is easy to do transpositions.
*da0073e9SAndroid Build Coastguard Worker
*da0073e9SAndroid Build Coastguard Worker```py
*da0073e9SAndroid Build Coastguard Workeri, j = dims(2)
*da0073e9SAndroid Build Coastguard WorkerA = torch.rand(3, 4)
*da0073e9SAndroid Build Coastguard WorkerA_T = A[i, j].order(j, i)
*da0073e9SAndroid Build Coastguard Workerassert torch.allclose(A.T, A_T)
*da0073e9SAndroid Build Coastguard Worker```
*da0073e9SAndroid Build Coastguard Worker
*da0073e9SAndroid Build Coastguard WorkerIndexing acts left-to-right, and `order` also places the new dimensions back on the left, so it possible to work on tensors that have mixed positional and first-class dimensions:
*da0073e9SAndroid Build Coastguard Worker
*da0073e9SAndroid Build Coastguard Worker```py
*da0073e9SAndroid Build Coastguard WorkerB = torch.rand(3, 4, 5)
*da0073e9SAndroid Build Coastguard WorkerB_T = B[i, j].order(j, i)
*da0073e9SAndroid Build Coastguard Workerassert torch.allclose(B.permute(1, 0, 2), B_T)
*da0073e9SAndroid Build Coastguard Worker```
*da0073e9SAndroid Build Coastguard Worker
*da0073e9SAndroid Build Coastguard Worker[^4] `order` is actually just a synonym for the already-existing `permute` method, which takes a list a dimension specifiers and puts the tensor in that order because rule #2 says that first-class dims can be passed as arguments to functions that previously took only integers as dimensions. However, the name `permute` is confusing in this context since it implies dim objects have an original order, so we prefer to use `order` when writing code.
*da0073e9SAndroid Build Coastguard Worker
*da0073e9SAndroid Build Coastguard WorkerFlattening and Splitting Dims
*da0073e9SAndroid Build Coastguard Worker-----------------------------
*da0073e9SAndroid Build Coastguard Worker
*da0073e9SAndroid Build Coastguard Worker**Tuples of dimensions** can be passed to both indexing and `order`. In indexing, this will split the dimension being indexed across the dimensions in the tuple.  In `order` it will flatten the dimensions in a single positional dimension:
*da0073e9SAndroid Build Coastguard Worker
*da0073e9SAndroid Build Coastguard Worker```py
*da0073e9SAndroid Build Coastguard Workeri, j, k = dims(3)
*da0073e9SAndroid Build Coastguard Workerj.size = 2
*da0073e9SAndroid Build Coastguard WorkerA = torch.rand(6, 4)
*da0073e9SAndroid Build Coastguard Workera = A[(i, j), k] # split dim 0 into i,j
*da0073e9SAndroid Build Coastguard Workerprint(i.size, j.size, k.size)
*da0073e9SAndroid Build Coastguard Worker> 3 2 4
*da0073e9SAndroid Build Coastguard Worker
*da0073e9SAndroid Build Coastguard Workerr = a.order(i, (j, k)) # flatten j and k
*da0073e9SAndroid Build Coastguard Workerprint(r.shape)
*da0073e9SAndroid Build Coastguard Worker> torch.Size([3, 8])
*da0073e9SAndroid Build Coastguard Worker```
*da0073e9SAndroid Build Coastguard Worker
*da0073e9SAndroid Build Coastguard WorkerThe size of one unsized dimension in a tuple such as `i` can be inferred if the other sizes are known.
*da0073e9SAndroid Build Coastguard Worker
*da0073e9SAndroid Build Coastguard WorkerExamples
*da0073e9SAndroid Build Coastguard Worker========
*da0073e9SAndroid Build Coastguard Worker
*da0073e9SAndroid Build Coastguard WorkerThe usefulness of dimension objects is best seen through examples. Let's look at some different ways they can be used.
*da0073e9SAndroid Build Coastguard Worker
*da0073e9SAndroid Build Coastguard WorkerEinsum-style Products
*da0073e9SAndroid Build Coastguard Worker---------------------
*da0073e9SAndroid Build Coastguard WorkerRather than having [einsum](https://pytorch.org/docs/stable/generated/torch.einsum.html) as a custom operator, it is possible to express matrix products directly as a composition of multiplies and summations. The implementation will pattern match any multiplication followed by a sum to the right matrix-multiply operator.
*da0073e9SAndroid Build Coastguard Worker
*da0073e9SAndroid Build Coastguard Worker```py
*da0073e9SAndroid Build Coastguard Workerdef mm(A, B):
*da0073e9SAndroid Build Coastguard Worker    i, j, k = dims(3)
*da0073e9SAndroid Build Coastguard Worker    r = (A[i, k] * B[k, j]).sum(k)
*da0073e9SAndroid Build Coastguard Worker    return r.order(i, j)
*da0073e9SAndroid Build Coastguard Workermm(torch.rand(3, 4), torch.rand(4, 5)).shape
*da0073e9SAndroid Build Coastguard Worker```
*da0073e9SAndroid Build Coastguard Worker
*da0073e9SAndroid Build Coastguard WorkerThe implementation of named tensors delays the execution of multiply to see if a summation follows it as it does above. If so, it will turn this pattern into the correct _optimized matrix product_, similar to how the `einsum` function works.
*da0073e9SAndroid Build Coastguard Worker
*da0073e9SAndroid Build Coastguard WorkerSince it is no longer necessary to manually match math to matrix functions, other tensor products are easier to express, like the Gram matrix used in style transfer:
*da0073e9SAndroid Build Coastguard Worker
*da0073e9SAndroid Build Coastguard Worker```py
*da0073e9SAndroid Build Coastguard Workerdef gram_matrix_new(y):
*da0073e9SAndroid Build Coastguard Worker    b, c, c2, h, w = dims()
*da0073e9SAndroid Build Coastguard Worker    r = (y[b, c, h, w] * y[b, c2, h, w]).sum((h, w))
*da0073e9SAndroid Build Coastguard Worker    r = r / (h.size * w.size)
*da0073e9SAndroid Build Coastguard Worker    return r.order(b, c, c2)
*da0073e9SAndroid Build Coastguard Worker
*da0073e9SAndroid Build Coastguard Workergram_matrix_new(torch.rand(1, 2, 3, 4))
*da0073e9SAndroid Build Coastguard Worker# [example adapted from http://einops.rocks/pytorch-examples.html]
*da0073e9SAndroid Build Coastguard Worker```
*da0073e9SAndroid Build Coastguard Worker
*da0073e9SAndroid Build Coastguard WorkerAttention is another example that has several matrix products embedded inside it:
*da0073e9SAndroid Build Coastguard Worker
*da0073e9SAndroid Build Coastguard Worker```py
*da0073e9SAndroid Build Coastguard Workerfrom torchdim import softmax
*da0073e9SAndroid Build Coastguard Workerdef attention(K, Q, V):
*da0073e9SAndroid Build Coastguard Worker    batch, channel, key, query = dims(4)
*da0073e9SAndroid Build Coastguard Worker    k = K[batch, channel, key]
*da0073e9SAndroid Build Coastguard Worker    q = Q[batch, channel, query]
*da0073e9SAndroid Build Coastguard Worker    v = V[batch, channel, key]
*da0073e9SAndroid Build Coastguard Worker
*da0073e9SAndroid Build Coastguard Worker    a = (k * q).sum(channel) # matrix multiply
*da0073e9SAndroid Build Coastguard Worker    a = softmax(a * (channel.size ** -0.5), dim=key)
*da0073e9SAndroid Build Coastguard Worker    r = (v * a).sum(key) # matrix multiply
*da0073e9SAndroid Build Coastguard Worker    return torch.cat((r.order(batch, channel, query), Q), dim=1)
*da0073e9SAndroid Build Coastguard Worker
*da0073e9SAndroid Build Coastguard Workerinputs = (torch.rand(2, 3, 4) for _ in range(3))
*da0073e9SAndroid Build Coastguard Workerattention(*inputs)
*da0073e9SAndroid Build Coastguard Worker# [example adapted from http://einops.rocks/pytorch-examples.html]
*da0073e9SAndroid Build Coastguard Worker```
*da0073e9SAndroid Build Coastguard Worker
*da0073e9SAndroid Build Coastguard WorkerReshaping tensors (einops)
*da0073e9SAndroid Build Coastguard Worker--------------------------
*da0073e9SAndroid Build Coastguard Worker
*da0073e9SAndroid Build Coastguard WorkerLots of operations in deep learning are just different ways of reshaping, splitting, and joining dimensions, such as the pixel shuffle used to upscale an image by turning channels into pixels:
*da0073e9SAndroid Build Coastguard Worker
*da0073e9SAndroid Build Coastguard Worker```py
*da0073e9SAndroid Build Coastguard Workerdef pixel_shuffle(img, upscale_factor=2):
*da0073e9SAndroid Build Coastguard Worker    h2, w2, c, b, h, w = dims(6)
*da0073e9SAndroid Build Coastguard Worker    h2.size = w2.size = upscale_factor
*da0073e9SAndroid Build Coastguard Worker    return img[b, (c, h2, w2), h, w].order(b, c, (h, h2), (w, w2))
*da0073e9SAndroid Build Coastguard Worker```
*da0073e9SAndroid Build Coastguard Worker
*da0073e9SAndroid Build Coastguard Worker[Einops](http://einops.rocks) is an extension to einsum that adds support for the manipulation of dimensions through a few custom operators such as `rearrange`:
*da0073e9SAndroid Build Coastguard Worker
*da0073e9SAndroid Build Coastguard Worker```py
*da0073e9SAndroid Build Coastguard Workerdef pixel_shuffle_einops(img, upscale_factor=2):
*da0073e9SAndroid Build Coastguard Worker    from einops import rearrange
*da0073e9SAndroid Build Coastguard Worker    return rearrange(img, 'b (c h2 w2) h w -> b c (h h2) (w w2)', h2=upscale_factor, w2=upscale_factor)
*da0073e9SAndroid Build Coastguard Worker```
*da0073e9SAndroid Build Coastguard Worker
*da0073e9SAndroid Build Coastguard WorkerNamed tensors with first-class dimensions can accomplish the same goal, but using PyTorch's existing operator set.
*da0073e9SAndroid Build Coastguard Worker
*da0073e9SAndroid Build Coastguard WorkerAutomatically batching Code (`vmap`, `xmap`)
*da0073e9SAndroid Build Coastguard Worker-----------------------------
*da0073e9SAndroid Build Coastguard Worker
*da0073e9SAndroid Build Coastguard WorkerThe implicit batching of Rule #1 means it is easy to created batched versions of existing PyTorch code. Simply bind a dim to the dimensions that should act as a batch, and then pass the tensor to the unbatched function. Since the unbatched function does not know about the dim, the dim will be implicitly batched over:
*da0073e9SAndroid Build Coastguard Worker
*da0073e9SAndroid Build Coastguard Worker```py
*da0073e9SAndroid Build Coastguard Workerbatch_size, feature_size = 3, 5
*da0073e9SAndroid Build Coastguard Workerweights = torch.randn(feature_size)
*da0073e9SAndroid Build Coastguard Worker
*da0073e9SAndroid Build Coastguard Workerdef model(feature_vec):
*da0073e9SAndroid Build Coastguard Worker    # Very simple linear model with activation
*da0073e9SAndroid Build Coastguard Worker    assert feature_vec.dim() == 1
*da0073e9SAndroid Build Coastguard Worker    return feature_vec.dot(weights).relu()
*da0073e9SAndroid Build Coastguard Worker
*da0073e9SAndroid Build Coastguard Workerexamples = torch.randn(batch_size, feature_size)
*da0073e9SAndroid Build Coastguard Workerbatch = dims(1)
*da0073e9SAndroid Build Coastguard Workerr = model(examples[batch])
*da0073e9SAndroid Build Coastguard Workerprint(r)
*da0073e9SAndroid Build Coastguard Worker# in functorch: result = functorch.vmap(model)(examples)
*da0073e9SAndroid Build Coastguard Worker> tensor([0.4775, 0.0000, 0.3423])
*da0073e9SAndroid Build Coastguard Worker> with dims=(batch,) sizes=(3,)
*da0073e9SAndroid Build Coastguard Worker```
*da0073e9SAndroid Build Coastguard Worker
*da0073e9SAndroid Build Coastguard WorkerThis pattern also composes well with other code that also uses first class dimensions. For instance, we can write batched matrix multiply `bmm` by batching the `mm` operator.
*da0073e9SAndroid Build Coastguard Worker
*da0073e9SAndroid Build Coastguard WorkerIt doesn't matter whether the implementation of the function uses dimension objects, it is also possible to add additional batch dimensions and then call a function:
*da0073e9SAndroid Build Coastguard Worker
*da0073e9SAndroid Build Coastguard Worker```py
*da0073e9SAndroid Build Coastguard Workerdef bmm(A, B):
*da0073e9SAndroid Build Coastguard Worker    i = dims(1) # note: i here is a different value from i inside mm so it works
*da0073e9SAndroid Build Coastguard Worker    return mm(A[i], B[i]).order(i)
*da0073e9SAndroid Build Coastguard Worker```
*da0073e9SAndroid Build Coastguard Worker
*da0073e9SAndroid Build Coastguard WorkerThe equivalent code in JAX, using [xmap or vmap](https://jax.readthedocs.io/en/latest/notebooks/quickstart.html#auto-vectorization-with-vmap) are transforms over functions. So there is a lot of syntactic distance between the specification of the dimension mappings, and the values where those mappings apply. Dims express the mapping as indexing of the tensor, right at the place where the function is being applied.
*da0073e9SAndroid Build Coastguard Worker
*da0073e9SAndroid Build Coastguard Worker
*da0073e9SAndroid Build Coastguard Worker[xmap examples](https://jax.readthedocs.io/en/latest/notebooks/xmap_tutorial.html):
*da0073e9SAndroid Build Coastguard Worker
*da0073e9SAndroid Build Coastguard Worker```py
*da0073e9SAndroid Build Coastguard Workerin_axes = [['inputs', 'hidden', ...],
*da0073e9SAndroid Build Coastguard Worker           ['hidden', 'classes', ...],
*da0073e9SAndroid Build Coastguard Worker           ['batch', 'inputs', ...],
*da0073e9SAndroid Build Coastguard Worker           ['batch', ...]]
*da0073e9SAndroid Build Coastguard Worker
*da0073e9SAndroid Build Coastguard Workerloss = xmap(named_loss, in_axes=in_axes, out_axes=[...])
*da0073e9SAndroid Build Coastguard Workerprint(loss(w1, w2, images, labels))
*da0073e9SAndroid Build Coastguard Worker```
*da0073e9SAndroid Build Coastguard Worker
*da0073e9SAndroid Build Coastguard WorkerEquivalent with dimension objects:
*da0073e9SAndroid Build Coastguard Worker
*da0073e9SAndroid Build Coastguard Worker```py
*da0073e9SAndroid Build Coastguard Workerbatch, inputs, hidden, classes = dims(4)
*da0073e9SAndroid Build Coastguard Workerprint(loss(w1[inputs, hidden], w2[hidden, classes], images[batch, inputs], labels[batch],
*da0073e9SAndroid Build Coastguard Worker      batch, inputs, hidden, classes))
*da0073e9SAndroid Build Coastguard Worker```
*da0073e9SAndroid Build Coastguard Worker
*da0073e9SAndroid Build Coastguard Worker
*da0073e9SAndroid Build Coastguard WorkerComposing matrix products, reshaping, and batching:
*da0073e9SAndroid Build Coastguard Worker---------------------
*da0073e9SAndroid Build Coastguard Worker
*da0073e9SAndroid Build Coastguard WorkerMulti-headed attention is a good example of how these different uses compose. It reshapes the inputs, splitting out different attention heads. It batches over those attention heads, and it uses matrix products to compute attention scores.
*da0073e9SAndroid Build Coastguard Worker
*da0073e9SAndroid Build Coastguard Worker```py
*da0073e9SAndroid Build Coastguard Workerfrom torchdim import softmax
*da0073e9SAndroid Build Coastguard Workerdef multiheadattention(q, k, v, num_attention_heads, dropout_prob, use_positional_embedding):
*da0073e9SAndroid Build Coastguard Worker    batch, query_sequence, key_sequence, heads, features = dims(5)
*da0073e9SAndroid Build Coastguard Worker    heads.size = num_attention_heads
*da0073e9SAndroid Build Coastguard Worker
*da0073e9SAndroid Build Coastguard Worker    # binding dimensions, and unflattening the heads from the feature dimension
*da0073e9SAndroid Build Coastguard Worker    q = q[batch, query_sequence, [heads, features]]
*da0073e9SAndroid Build Coastguard Worker    k = k[batch, key_sequence, [heads, features]]
*da0073e9SAndroid Build Coastguard Worker    v = v[batch, key_sequence, [heads, features]]
*da0073e9SAndroid Build Coastguard Worker
*da0073e9SAndroid Build Coastguard Worker    # einsum-style operators to calculate scores,
*da0073e9SAndroid Build Coastguard Worker    attention_scores = (q*k).sum(features) * (features.size ** -0.5)
*da0073e9SAndroid Build Coastguard Worker
*da0073e9SAndroid Build Coastguard Worker    # use first-class dim to specify dimension for softmax
*da0073e9SAndroid Build Coastguard Worker    attention_probs = softmax(attention_scores, dim=key_sequence)
*da0073e9SAndroid Build Coastguard Worker
*da0073e9SAndroid Build Coastguard Worker    # dropout work pointwise, following Rule #1
*da0073e9SAndroid Build Coastguard Worker    attention_probs = torch.nn.functional.dropout(attention_probs, p=dropout_prob)
*da0073e9SAndroid Build Coastguard Worker
*da0073e9SAndroid Build Coastguard Worker    # another matrix product
*da0073e9SAndroid Build Coastguard Worker    context_layer = (attention_probs*v).sum(key_sequence)
*da0073e9SAndroid Build Coastguard Worker
*da0073e9SAndroid Build Coastguard Worker    # flatten heads back into features
*da0073e9SAndroid Build Coastguard Worker    return context_layer.order(batch, query_sequence, [heads, features])
*da0073e9SAndroid Build Coastguard Worker```
*da0073e9SAndroid Build Coastguard Worker
*da0073e9SAndroid Build Coastguard WorkerIndexing
*da0073e9SAndroid Build Coastguard Worker--------
*da0073e9SAndroid Build Coastguard Worker
*da0073e9SAndroid Build Coastguard WorkerRule #3 enables indexing because dimensions act as loop indices when used as a tensor. This allows for a lot of powerful behavior. The simplest might be using the dimensions to compute masks, such as extracting the upper triangular part of a matrix:
*da0073e9SAndroid Build Coastguard Worker
*da0073e9SAndroid Build Coastguard Worker```py
*da0073e9SAndroid Build Coastguard Workerfrom torch import where
*da0073e9SAndroid Build Coastguard Workerdef triu(A):
*da0073e9SAndroid Build Coastguard Worker    i,j = dims()
*da0073e9SAndroid Build Coastguard Worker    a = A[i, j]
*da0073e9SAndroid Build Coastguard Worker    return where(i <= j, a, 0).order(i, j)
*da0073e9SAndroid Build Coastguard Workertriu(torch.rand(3, 4))
*da0073e9SAndroid Build Coastguard Worker```
*da0073e9SAndroid Build Coastguard Worker
*da0073e9SAndroid Build Coastguard WorkerEmbedding bag does an embedding table lookup followed by a sum, which can be expressed concisely:
*da0073e9SAndroid Build Coastguard Worker
*da0073e9SAndroid Build Coastguard Worker```py
*da0073e9SAndroid Build Coastguard Workerdef embedding_bag(input, embedding_weights):
*da0073e9SAndroid Build Coastguard Worker    batch, sequence, features = dims(3)
*da0073e9SAndroid Build Coastguard Worker    r = embedding_weights[input[batch, sequence], features].sum(sequence)
*da0073e9SAndroid Build Coastguard Worker    return r.order(batch, features)
*da0073e9SAndroid Build Coastguard Worker
*da0073e9SAndroid Build Coastguard Workerinput = torch.tensor([[1, 0, 4, 3]])
*da0073e9SAndroid Build Coastguard WorkerW = torch.rand(5,2)
*da0073e9SAndroid Build Coastguard Workerembedding_bag(input, W)
*da0073e9SAndroid Build Coastguard Worker```
*da0073e9SAndroid Build Coastguard Worker
*da0073e9SAndroid Build Coastguard WorkerRelative positional embeddings associate an embedding vector with the distance between the query and the key in the sequence.
*da0073e9SAndroid Build Coastguard WorkerFor instance, a key 3 and query 5 will have embedding ID `(5-3)=2`. We can use first-class dimensions to do the indexing arithmetic, and the embedding lookup:
*da0073e9SAndroid Build Coastguard Worker
*da0073e9SAndroid Build Coastguard Worker```py
*da0073e9SAndroid Build Coastguard Workerdef relative_positional_embedding(q, k, distance_embedding_weight):
*da0073e9SAndroid Build Coastguard Worker    batch, query_sequence, key_sequence, heads, features = dims(5)
*da0073e9SAndroid Build Coastguard Worker    q = q[batch, query_sequence, [heads, features]]
*da0073e9SAndroid Build Coastguard Worker    k = k[batch, key_sequence, [heads, features]]
*da0073e9SAndroid Build Coastguard Worker
*da0073e9SAndroid Build Coastguard Worker    distance = query_sequence - key_sequence
*da0073e9SAndroid Build Coastguard Worker    n_embeddings = distance_embedding_weight.size(0)
*da0073e9SAndroid Build Coastguard Worker    index_bias = n_embeddings // 2
*da0073e9SAndroid Build Coastguard Worker
*da0073e9SAndroid Build Coastguard Worker    assert key_sequence.size + bias <= n_embeddings
*da0073e9SAndroid Build Coastguard Worker
*da0073e9SAndroid Build Coastguard Worker    # indexing with dims
*da0073e9SAndroid Build Coastguard Worker    positional_embedding = distance_embedding_weight[distance + index_bias, features]
*da0073e9SAndroid Build Coastguard Worker
*da0073e9SAndroid Build Coastguard Worker    # matrix multiplies with dims
*da0073e9SAndroid Build Coastguard Worker    relative_position_scores_query = (q*positional_embedding).sum(features)
*da0073e9SAndroid Build Coastguard Worker    relative_position_scores_key = (k*positional_embedding).sum(features)
*da0073e9SAndroid Build Coastguard Worker    return  (relative_position_scores_query + relative_position_scores_key).order(batch, heads, key_sequence, query_sequence)
*da0073e9SAndroid Build Coastguard Worker```
*da0073e9SAndroid Build Coastguard Worker
*da0073e9SAndroid Build Coastguard WorkerTensor Puzzlers
*da0073e9SAndroid Build Coastguard Worker===============
*da0073e9SAndroid Build Coastguard Worker
*da0073e9SAndroid Build Coastguard Worker[Tensor Puzzlers](https://github.com/srush/Tensor-Puzzles), created by Sasha Rush, are a good exercise for learning the numpy and torch APIs by figuring out how to define common operations using a small set of primitive tensor operations.
*da0073e9SAndroid Build Coastguard Worker
*da0073e9SAndroid Build Coastguard WorkerHowever, the difficulty of many of the puzzlers lies not in how to compute the answer but the awkwardness of the primitives themselves.
*da0073e9SAndroid Build Coastguard Worker
*da0073e9SAndroid Build Coastguard Worker**With first class dimensions, these puzzlers are nearly the same as the spec that defines them**
*da0073e9SAndroid Build Coastguard Worker
*da0073e9SAndroid Build Coastguard Worker
*da0073e9SAndroid Build Coastguard Worker### Puzzle 3 - outer
*da0073e9SAndroid Build Coastguard Worker
*da0073e9SAndroid Build Coastguard WorkerCompute [outer](https://numpy.org/doc/stable/reference/generated/numpy.outer.html) - the outer product of two vectors.
*da0073e9SAndroid Build Coastguard Worker
*da0073e9SAndroid Build Coastguard Worker```py
*da0073e9SAndroid Build Coastguard Workerdef outer_spec(a, b, out):
*da0073e9SAndroid Build Coastguard Worker    for i in range(len(out)):
*da0073e9SAndroid Build Coastguard Worker        for j in range(len(out[0])):
*da0073e9SAndroid Build Coastguard Worker            out[i][j] = a[i] * b[j]
*da0073e9SAndroid Build Coastguard Worker
*da0073e9SAndroid Build Coastguard Workerdef outer(a, b):
*da0073e9SAndroid Build Coastguard Worker    i, j = dims(2)
*da0073e9SAndroid Build Coastguard Worker    return (a[i] * b[j]).order(i, j)
*da0073e9SAndroid Build Coastguard Worker```
*da0073e9SAndroid Build Coastguard Worker
*da0073e9SAndroid Build Coastguard Worker### Puzzle 4 - diag
*da0073e9SAndroid Build Coastguard Worker
*da0073e9SAndroid Build Coastguard WorkerCompute [diag](https://numpy.org/doc/stable/reference/generated/numpy.diag.html) - the diagonal vector of a square matrix.
*da0073e9SAndroid Build Coastguard Worker
*da0073e9SAndroid Build Coastguard Worker```py
*da0073e9SAndroid Build Coastguard Workerdef diag_spec(a, out):
*da0073e9SAndroid Build Coastguard Worker    for i in range(len(a)):
*da0073e9SAndroid Build Coastguard Worker        out[i] = a[i][i]
*da0073e9SAndroid Build Coastguard Worker
*da0073e9SAndroid Build Coastguard Workerdef diag(a):
*da0073e9SAndroid Build Coastguard Worker    i = dims(1)
*da0073e9SAndroid Build Coastguard Worker    return a[i, i].order(i)
*da0073e9SAndroid Build Coastguard Worker```
*da0073e9SAndroid Build Coastguard Worker
*da0073e9SAndroid Build Coastguard Worker### Puzzle 5 - eye
*da0073e9SAndroid Build Coastguard Worker
*da0073e9SAndroid Build Coastguard WorkerCompute [eye](https://numpy.org/doc/stable/reference/generated/numpy.eye.html) - the identity matrix.
*da0073e9SAndroid Build Coastguard Worker
*da0073e9SAndroid Build Coastguard Worker```py
*da0073e9SAndroid Build Coastguard Workerfrom torch import where
*da0073e9SAndroid Build Coastguard Workerdef eye_spec(out):
*da0073e9SAndroid Build Coastguard Worker    for i in range(len(out)):
*da0073e9SAndroid Build Coastguard Worker        out[i][i] = 1
*da0073e9SAndroid Build Coastguard Worker
*da0073e9SAndroid Build Coastguard Workerdef eye(j: int):
*da0073e9SAndroid Build Coastguard Worker    i,j = dims(sizes=[j, j])
*da0073e9SAndroid Build Coastguard Worker    return where(i == j, 1, 0).order(i, j)
*da0073e9SAndroid Build Coastguard Worker```
*da0073e9SAndroid Build Coastguard Worker
*da0073e9SAndroid Build Coastguard Worker### Puzzle 6 - triu
*da0073e9SAndroid Build Coastguard Worker
*da0073e9SAndroid Build Coastguard WorkerCompute [triu](https://numpy.org/doc/stable/reference/generated/numpy.triu.html) - the upper triangular matrix.
*da0073e9SAndroid Build Coastguard Worker
*da0073e9SAndroid Build Coastguard Worker```py
*da0073e9SAndroid Build Coastguard Workerdef triu_spec(out):
*da0073e9SAndroid Build Coastguard Worker    for i in range(len(out)):
*da0073e9SAndroid Build Coastguard Worker        for j in range(len(out)):
*da0073e9SAndroid Build Coastguard Worker            if i <= j:
*da0073e9SAndroid Build Coastguard Worker                out[i][j] = 1
*da0073e9SAndroid Build Coastguard Worker            else:
*da0073e9SAndroid Build Coastguard Worker                out[i][j] = 0
*da0073e9SAndroid Build Coastguard Worker
*da0073e9SAndroid Build Coastguard Workerdef triu(j: int):
*da0073e9SAndroid Build Coastguard Worker    i,j = dims(sizes=[j, j])
*da0073e9SAndroid Build Coastguard Worker    return where(i <= j, 1, 0).order(i, j)
*da0073e9SAndroid Build Coastguard Worker```
*da0073e9SAndroid Build Coastguard Worker
*da0073e9SAndroid Build Coastguard Worker### Puzzle 8 - diff
*da0073e9SAndroid Build Coastguard Worker
*da0073e9SAndroid Build Coastguard WorkerCompute [diff](https://numpy.org/doc/stable/reference/generated/numpy.diff.html) - the running difference.
*da0073e9SAndroid Build Coastguard Worker
*da0073e9SAndroid Build Coastguard Worker```py
*da0073e9SAndroid Build Coastguard Workerdef diff_spec(a, out):
*da0073e9SAndroid Build Coastguard Worker    out[0] = a[0]
*da0073e9SAndroid Build Coastguard Worker    for i in range(1, len(out)):
*da0073e9SAndroid Build Coastguard Worker        out[i] = a[i] - a[i - 1]
*da0073e9SAndroid Build Coastguard Workerdef diff(a, i: int):
*da0073e9SAndroid Build Coastguard Worker    i = dims(1)
*da0073e9SAndroid Build Coastguard Worker    d = a[i] - a[i - 1]
*da0073e9SAndroid Build Coastguard Worker    return where(i - 1 >= 0, d, a[i]).order(i)
*da0073e9SAndroid Build Coastguard Worker```
*da0073e9SAndroid Build Coastguard Worker
*da0073e9SAndroid Build Coastguard Worker### Puzzle 9 - vstack
*da0073e9SAndroid Build Coastguard Worker
*da0073e9SAndroid Build Coastguard WorkerCompute [vstack](https://numpy.org/doc/stable/reference/generated/numpy.vstack.html) - the matrix of two vectors
*da0073e9SAndroid Build Coastguard Worker
*da0073e9SAndroid Build Coastguard Worker```py
*da0073e9SAndroid Build Coastguard Workerdef vstack_spec(a, b, out):
*da0073e9SAndroid Build Coastguard Worker    for i in range(len(out[0])):
*da0073e9SAndroid Build Coastguard Worker        out[0][i] = a[i]
*da0073e9SAndroid Build Coastguard Worker        out[1][i] = b[i]
*da0073e9SAndroid Build Coastguard Worker
*da0073e9SAndroid Build Coastguard Workerdef vstack(a, b):
*da0073e9SAndroid Build Coastguard Worker    v, i = dims(sizes=[2, None])
*da0073e9SAndroid Build Coastguard Worker    return where(v == 0,  a[i], b[i]).order(v, i)
*da0073e9SAndroid Build Coastguard Worker```
*da0073e9SAndroid Build Coastguard Worker
*da0073e9SAndroid Build Coastguard Worker### Puzzle 10 - roll
*da0073e9SAndroid Build Coastguard Worker
*da0073e9SAndroid Build Coastguard WorkerCompute [roll](https://numpy.org/doc/stable/reference/generated/numpy.roll.html) - the vector shifted 1 circular position.
*da0073e9SAndroid Build Coastguard Worker
*da0073e9SAndroid Build Coastguard Worker```py
*da0073e9SAndroid Build Coastguard Workerdef roll_spec(a, out):
*da0073e9SAndroid Build Coastguard Worker    for i in range(len(out)):
*da0073e9SAndroid Build Coastguard Worker        if i + 1 < len(out):
*da0073e9SAndroid Build Coastguard Worker            out[i] = a[i + 1]
*da0073e9SAndroid Build Coastguard Worker        else:
*da0073e9SAndroid Build Coastguard Worker            out[i] = a[i + 1 - len(out)]
*da0073e9SAndroid Build Coastguard Worker
*da0073e9SAndroid Build Coastguard Workerdef roll(a, i: int):
*da0073e9SAndroid Build Coastguard Worker    i = dims(sizes=[a.size(0)])
*da0073e9SAndroid Build Coastguard Worker    return a[where(i + 1 < i.size, i + 1, 0)].order(i)
*da0073e9SAndroid Build Coastguard Worker```
*da0073e9SAndroid Build Coastguard Worker
*da0073e9SAndroid Build Coastguard Worker### Puzzle 11 - flip
*da0073e9SAndroid Build Coastguard Worker
*da0073e9SAndroid Build Coastguard WorkerCompute [flip](https://numpy.org/doc/stable/reference/generated/numpy.flip.html) - the reversed vector
*da0073e9SAndroid Build Coastguard Worker
*da0073e9SAndroid Build Coastguard Worker```py
*da0073e9SAndroid Build Coastguard Workerdef flip_spec(a, out):
*da0073e9SAndroid Build Coastguard Worker    for i in range(len(out)):
*da0073e9SAndroid Build Coastguard Worker        out[i] = a[len(out) - i - 1]
*da0073e9SAndroid Build Coastguard Worker
*da0073e9SAndroid Build Coastguard Workerdef flip(a, i: int):
*da0073e9SAndroid Build Coastguard Worker    i = dims(sizes=[a.size(0)])
*da0073e9SAndroid Build Coastguard Worker    return a[i.size - i - 1].order(i)
*da0073e9SAndroid Build Coastguard Worker```
*da0073e9SAndroid Build Coastguard Worker
*da0073e9SAndroid Build Coastguard Worker### Puzzle 14 - sequence_mask
*da0073e9SAndroid Build Coastguard Worker
*da0073e9SAndroid Build Coastguard Worker
*da0073e9SAndroid Build Coastguard WorkerCompute [sequence_mask](https://www.tensorflow.org/api_docs/python/tf/sequence_mask) - pad out to length per batch.
*da0073e9SAndroid Build Coastguard Worker
*da0073e9SAndroid Build Coastguard Worker```py
*da0073e9SAndroid Build Coastguard Workerdef sequence_mask_spec(values, length, out):
*da0073e9SAndroid Build Coastguard Worker    for i in range(len(out)):
*da0073e9SAndroid Build Coastguard Worker        for j in range(len(out[0])):
*da0073e9SAndroid Build Coastguard Worker            if j < length[i]:
*da0073e9SAndroid Build Coastguard Worker                out[i][j] = values[i][j]
*da0073e9SAndroid Build Coastguard Worker            else:
*da0073e9SAndroid Build Coastguard Worker                out[i][j] = 0
*da0073e9SAndroid Build Coastguard Worker
*da0073e9SAndroid Build Coastguard Workerdef sequence_mask(values, length):
*da0073e9SAndroid Build Coastguard Worker    j, i = dims()
*da0073e9SAndroid Build Coastguard Worker    v = values[i, j]
*da0073e9SAndroid Build Coastguard Worker    return where(j < length[i], v, 0).order(i, j)
*da0073e9SAndroid Build Coastguard Worker```
*da0073e9SAndroid Build Coastguard Worker
*da0073e9SAndroid Build Coastguard WorkerAdvantages of First-class Dimensions over String Dimensions
*da0073e9SAndroid Build Coastguard Worker===================================================================
*da0073e9SAndroid Build Coastguard Worker
*da0073e9SAndroid Build Coastguard WorkerThe most prominent difference between named tensors using first-class dimensions and alternatives (einops, named tensors implemented in PyTorch today , [tensors considered harmful](https://nlp.seas.harvard.edu/NamedTensor), or xmap) is that dimensions are objects rather than strings. Using objects has a number of nice properties.
*da0073e9SAndroid Build Coastguard Worker
*da0073e9SAndroid Build Coastguard Worker### Avoiding naming conflicts
*da0073e9SAndroid Build Coastguard Worker
*da0073e9SAndroid Build Coastguard WorkerUsing strings for dimensions introduces the possibility that two unrelated dimensions are given the same name. Using objects instead makes it clear the same names are not the same dimension. It's like the difference between having only global variables, and having the ability to locally bind names in functions.
*da0073e9SAndroid Build Coastguard Worker For instance, we defined `bmm` by batching a call to `mm`, and even though they both use the name `i` to identify a dimension.  Because each `i` is a different object, there is no naming conflict:
*da0073e9SAndroid Build Coastguard Worker
*da0073e9SAndroid Build Coastguard Worker```py
*da0073e9SAndroid Build Coastguard Workerdef mm(A, B):
*da0073e9SAndroid Build Coastguard Worker    i, j, k = dims()
*da0073e9SAndroid Build Coastguard Worker    r = (A[i, k] * B[k, j]).sum(k)
*da0073e9SAndroid Build Coastguard Worker    return r.order(i, j)
*da0073e9SAndroid Build Coastguard Worker
*da0073e9SAndroid Build Coastguard Workerdef bmm(A, B):
*da0073e9SAndroid Build Coastguard Worker    i = dims() # note: doesn't matter than mm internally also uses i
*da0073e9SAndroid Build Coastguard Worker    return mm(A[i], B[i])
*da0073e9SAndroid Build Coastguard Worker```
*da0073e9SAndroid Build Coastguard Worker
*da0073e9SAndroid Build Coastguard WorkerEinops avoids conflicts by ensuring names are all introduced and removed in a single expression, but this precludes using long-lived dimensions to present implicit batching similar to xmap. When nested, JAX's xmap seems to consider axes the same if the string name matches. In the above example it would consider the `i` dimension to be the same dimension in both `bmm` and `mm` so the code would error.
*da0073e9SAndroid Build Coastguard Worker
*da0073e9SAndroid Build Coastguard Worker
*da0073e9SAndroid Build Coastguard Worker### Reuse the same operator set
*da0073e9SAndroid Build Coastguard Worker
*da0073e9SAndroid Build Coastguard WorkerHaving a new object type allows us to extend the existing operator set of PyTorch rather than come up with new operators. For instance, binding dimensions using indexing follows semantically from Rules #1 and #3, so there is no need for a special operator to do binding. Even unbinding is just the `permute` operator which follows from Rule #2, though we call it `order` for clarity. In contrast, using strings requires coming up with new APIs such as `einsum` for matrix multiplies, or `rearrange` for doing permutations.
*da0073e9SAndroid Build Coastguard Worker
*da0073e9SAndroid Build Coastguard Worker### Allows dims to act as tensors
*da0073e9SAndroid Build Coastguard Worker
*da0073e9SAndroid Build Coastguard WorkerRule #3 is not possible with strings since we cannot make strings behave as tensors. Without this rule, all of the indirect indexing that dims enable would not be easy to express.
*da0073e9SAndroid Build Coastguard Worker
*da0073e9SAndroid Build Coastguard Worker### Dims can have methods
*da0073e9SAndroid Build Coastguard WorkerFor instance, as objects, dims can have a size, which allows us to do size inference of dimensions in various places in the API where string based APIs would have to take additional arguments specifying size.
*da0073e9SAndroid Build Coastguard Worker
*da0073e9SAndroid Build Coastguard Worker
*da0073e9SAndroid Build Coastguard WorkerComparison to tensor compilers or languages (e.g. TVM or Dex)
*da0073e9SAndroid Build Coastguard Worker=============================================================
*da0073e9SAndroid Build Coastguard Worker
*da0073e9SAndroid Build Coastguard WorkerThe semantics and surface syntax of dimension objects resembles the kind of code written in tensor compilers such as [Halide](https://halide-lang.org), [TVM](https://tvm.apache.org), [Tensor Comprehensions](https://github.com/facebookresearch/TensorComprehensions), or the language [Dex](https://github.com/google-research/dex-lang).
*da0073e9SAndroid Build Coastguard Worker
*da0073e9SAndroid Build Coastguard WorkerThese compilers and language have syntax and semantics that resemble the loop-level analogy similar to first-class dimensions. However, as compilers or statically typed languages, they require some binding code to go from running deep learning framework code in Python to using the compiled language. This often at least requires refactoring the compiled parts into their own functions, and may require defining a gradient function. Similar to graph mode frameworks, this adds friction to using and debugging the code.
*da0073e9SAndroid Build Coastguard Worker
*da0073e9SAndroid Build Coastguard WorkerDimension objects are just an extension of the existing PyTorch tensors and eager semantics, so there is no friction switching between normal Python code and code that uses them. However, since loops over the dimensions are defined implicitly, they can still execute in Python with good performance compared to explicit loops. Furthermore, with dimension objects, a tensors containing dimensions can compute through code that is oblivious to the dimension such as batching examples. There is no need to separate code into 'compiled' vs 'eager'.
*da0073e9SAndroid Build Coastguard Worker
*da0073e9SAndroid Build Coastguard WorkerIn this way, first-class dims are a way of adapting the nicer syntax of these array compilers and languages to eager numpy-style libraries.
*da0073e9SAndroid Build Coastguard Worker
*da0073e9SAndroid Build Coastguard Worker
*da0073e9SAndroid Build Coastguard WorkerPerformance Expectations
*da0073e9SAndroid Build Coastguard Worker========================
*da0073e9SAndroid Build Coastguard WorkerFirst-class dimensions are not a compiler. They provide syntax for existing PyTorch operations such as advanced indexing that is easier to read and write. For large sized tensors, the performance of any statements including them will be the same as using the already existing operations. An important exception is the pattern matching of products and summation, where performance will be improved by issuing to a matrix-multiply kernel. The C++ implementation of dimensions adds a small overhead of around 2us on top of PyTorch's normal overhead of 8us to each function that uses them. In the future, the implementation can encorporate more fusion optimization to further improve performance of this style of code.
*da0073e9SAndroid Build Coastguard Worker
*da0073e9SAndroid Build Coastguard Worker
*da0073e9SAndroid Build Coastguard Worker## License
*da0073e9SAndroid Build Coastguard WorkerFunctorch has a BSD-style license, as found in the [LICENSE](LICENSE) file.