xref: /aosp_15_r20/external/tensorflow/tensorflow/tools/graph_transforms/README.md (revision b6fb3261f9314811a0f4371741dbb8839866f948)
1# Graph Transform Tool
2
3## Table of Contents
4
5*   [Introduction](#introduction)
6*   [Using the Graph Transform Tool](#using-the-graph-transform-tool)
7*   [Inspecting Graphs](#inspecting-graphs)
8*   [Common Use Cases](#common-use-cases)
9    *   [Optimizing for Deployment](#optimizing-for-deployment)
10    *   [Fixing Missing Kernel Errors on
11        Mobile](#fixing-missing-kernel-errors-on-mobile)
12    *   [Shrinking File Size](#shrinking-file-size)
13    *   [Eight-bit Calculations](#eight-bit-calculations)
14*   [Transform Reference](#transform-reference)
15    *   [add_default_attributes](#add_default_attributes)
16    *   [backport_concatv2](#backport_concatv2)
17    *   [flatten_atrous_conv](#flatten_atrous_conv)
18    *   [fold_batch_norms](#fold_batch_norms)
19    *   [fold_constants](#fold_constants)
20    *   [fold_old_batch_norms](#fold_old_batch_norms)
21    *   [freeze_requantization_ranges](#freeze_requantization_ranges)
22    *   [fuse_convolutions](#fuse_convolutions)
23    *   [insert_logging](#insert_logging)
24    *   [merge_duplicate_nodes](#merge_duplicate_nodes)
25    *   [obfuscate_names](#obfuscate_names)
26    *   [quantize_nodes](#quantize_nodes)
27    *   [quantize_weights](#quantize_weights)
28    *   [remove_attribute](#remove_attribute)
29    *   [remove_device](#remove_device)
30    *   [remove_nodes](#remove_nodes)
31    *   [rename_attribute](#rename_attribute)
32    *   [rename_op](#rename_op)
33    *   [round_weights](#round_weights)
34    *   [sparsify_gather](#sparsify_gather)
35    *   [set_device](#set_device)
36    *   [sort_by_execution_order](#sort_by_execution_order)
37    *   [strip_unused_nodes](#strip_unused_nodes)
38*   [Writing Your Own Transforms](#writing-your-own-transforms)
39    *   [Transform Functions](#transform-functions)
40    *   [Pattern Syntax](#pattern-syntax)
41    *   [ReplaceMatchingOpTypes](#replacematchingoptypes)
42    *   [Parameters](#parameters)
43    *   [Function Libraries](#function-libraries)
44    *   [Registering](#registering)
45
46## Introduction
47
48When you have finished training a model and want to deploy it in production,
49you'll often want to modify it to better run in its final environment. For
50example if you're targeting a phone you might want to shrink the file size by
51quantizing the weights, or optimize away batch normalization or other
52training-only features. The Graph Transform framework offers a suite of tools
53for modifying computational graphs, and a framework to make it easy to write
54your own modifications.
55
56This guide is structured into three main parts, first giving some tutorials on
57how to perform common tasks, second a reference covering all of the different
58transformations that are included, together with the options that apply to them,
59and third a guide to creating your own transforms.
60
61## Using the Graph Transform Tool
62
63The Graph Transform tool is designed to work on models that are saved as
64GraphDef files, usually in a binary protobuf format. This is the low-level
65definition of a TensorFlow computational graph, including a list of nodes and
66the input and output connections between them. If you're using a Python API to
67train your model, this will usually be saved out in the same directory as your
68checkpoints, and usually has a '.pb' suffix.
69
70If you want to work with the values of your trained parameters, for example to
71quantize weights, you'll need to run
72[tensorflow/python/tools/freeze_graph.py](https://github.com/tensorflow/tensorflow/blob/master/tensorflow/python/tools/freeze_graph.py)
73to convert the checkpoint values into embedded constants within the graph file
74itself.
75
76You call the Graph Transform tool itself like this:
77
78```bash
79bazel build tensorflow/tools/graph_transforms:transform_graph
80bazel-bin/tensorflow/tools/graph_transforms/transform_graph \
81--in_graph=tensorflow_inception_graph.pb \
82--out_graph=optimized_inception_graph.pb \
83--inputs='Mul:0' \
84--outputs='softmax:0' \
85--transforms='
86strip_unused_nodes(type=float, shape="1,299,299,3")
87remove_nodes(op=Identity, op=CheckNumerics)
88fold_old_batch_norms
89'
90```
91
92The arguments here are specifying where to read the graph from, where to write
93the transformed version to, what the input and output layers are, and what
94transforms to modify the graph with. The transforms are given as a list of
95names, and can each have arguments themselves. These transforms define the
96pipeline of modifications that are applied in order to produce the output.
97Sometimes you need some transforms to happen before others, and the ordering
98within the list lets you specify which happen first.
99Note that the optimization
100`remove_nodes(op=Identity, op=CheckNumerics)` will break the model with control
101flow operations, such as `tf.cond`, `tf.map_fn`, and `tf.while`.
102
103## Inspecting Graphs
104
105Many of the transforms that the tool supports need to know what the input and
106output layers of the model are. The best source for these is the model training
107process, where for a classifier the inputs will be the nodes that receive the
108data from the training set, and the output will be the predictions. If you're
109unsure, the
110[`summarize_graph`](https://github.com/tensorflow/tensorflow/blob/master/tensorflow/tools/graph_transforms/summarize_graph_main.cc)
111tool can inspect the model and provide guesses about likely input and output nodes,
112as well as other information that's useful for debugging. Here's an example of
113how to use it on the [Inception V3
114graph](https://storage.googleapis.com/download.tensorflow.org/models/image/imagenet/inception-2015-12-05.tgz):
115
116```bash
117bazel build tensorflow/tools/graph_transforms:summarize_graph
118bazel-bin/tensorflow/tools/graph_transforms/summarize_graph --in_graph=tensorflow_inception_graph.pb
119```
120
121## Common Use Cases
122
123This section has small guides for some of the most frequently-used
124transformation pipelines, aimed at users who want to quickly accomplish one of
125these tasks. A lot of them will use the Inception V3 model for their examples,
126which can be downloaded from
127[https://storage.googleapis.com/download.tensorflow.org/models/image/imagenet/inception-2015-12-05.tgz](https://storage.googleapis.com/download.tensorflow.org/models/image/imagenet/inception-2015-12-05.tgz).
128
129### Optimizing for Deployment
130
131If you've finished training your model and want to deploy it on a server or a
132mobile device, you'll want it to run as fast as possible, and with as few
133non-essential dependencies as you can. This recipe removes all of the nodes that
134aren't called during inference, shrinks expressions that are always constant
135into single nodes, and optimizes away some multiply operations used during batch
136normalization by pre-multiplying the weights for convolutions.
137
138```bash
139bazel build tensorflow/tools/graph_transforms:transform_graph
140bazel-bin/tensorflow/tools/graph_transforms/transform_graph \
141--in_graph=tensorflow_inception_graph.pb \
142--out_graph=optimized_inception_graph.pb \
143--inputs='Mul' \
144--outputs='softmax' \
145--transforms='
146  strip_unused_nodes(type=float, shape="1,299,299,3")
147  remove_nodes(op=Identity, op=CheckNumerics)
148  fold_constants(ignore_errors=true)
149  fold_batch_norms
150  fold_old_batch_norms'
151```
152
153The batch norm folding is included twice because there are two different flavors
154of batch normalization used in TensorFlow. The older version was implemented
155with a single op like BatchNormWithGlobalNormalization or FusedBatchNorm, and
156BatchNormWithGlobalNormalization was deprecated in favor of a more recent
157approach using individual ops to implement the same computation. The two
158transforms are in there so that both styles are recognized and optimized.
159
160### Fixing Missing Kernel Errors on Mobile
161
162The mobile version of TensorFlow is focused on inference, and so by default the
163list of supported ops (defined in
164[tensorflow/core/kernels/BUILD:android_extended_ops](https://github.com/tensorflow/tensorflow/blob/master/tensorflow/core/kernels/BUILD)
165for Bazel doesn't include a lot that are training related. This can cause
166`No OpKernel was registered to support Op` errors when a GraphDef is loaded,
167even if the op isn't going to be executed.
168
169If you see this error and it's an op that you do actually want to run on mobile,
170then you'll need to make local modifications to the build files to include the
171right .cc file that defines it. In a lot of cases the op is just a vestigial
172remnant from the training process though, and if that's true then you can run
173the [strip_unused_nodes](#strip_unused_nodes), specifying the inputs and outputs
174of your inference usage, to remove those unnecessary nodes:
175
176```bash
177bazel build tensorflow/tools/graph_transforms:transform_graph
178bazel-bin/tensorflow/tools/graph_transforms/transform_graph \
179--in_graph=tensorflow_inception_graph.pb \
180--out_graph=optimized_inception_graph.pb \
181--inputs='Mul' \
182--outputs='softmax' \
183--transforms='
184  strip_unused_nodes(type=float, shape="1,299,299,3")
185  fold_constants(ignore_errors=true)
186  fold_batch_norms
187  fold_old_batch_norms'
188```
189
190### Shrinking File Size
191
192If you're looking to deploy your model as part of a mobile app, then keeping the
193download size as small as possible is important. For most TensorFlow models, the
194largest contributors to the file size are the weights passed in to convolutional
195and fully-connected layers, so anything that can reduce the storage size for
196those is very useful. Luckily most neural networks are resistant to noise, so
197it's possible to change the representation of those weights in a lossy way
198without losing very much accuracy overall.
199
200On both iOS and Android app packages are compressed before download, so the
201simplest way to reduce the bandwidth your users need to receive your app is to
202provide raw data that compresses more easily. By default the weights are stored
203as floating-point values, and even tiny differences between numbers result in
204very different bit patterns, and so these don't compress very well. If you round
205the weights so that nearby numbers are stored as exactly the same values, the
206resulting bit stream has a lot more repetition and so compresses down a lot more
207effectively. To try this technique on your model, run the
208[round_weights](#round_weights) transform.
209
210```bash
211bazel build tensorflow/tools/graph_transforms:transform_graph
212bazel-bin/tensorflow/tools/graph_transforms/transform_graph \
213--in_graph=tensorflow_inception_graph.pb \
214--out_graph=optimized_inception_graph.pb \
215--inputs='Mul' \
216--outputs='softmax' \
217--transforms='
218  strip_unused_nodes(type=float, shape="1,299,299,3")
219  fold_constants(ignore_errors=true)
220  fold_batch_norms
221  fold_old_batch_norms
222  round_weights(num_steps=256)'
223```
224
225You should see that the `optimized_inception_graph.pb` output file is the same
226size as the input, but if you run zip on it to compress it, it's almost 70%
227smaller than if you zip the original! The nice thing about this transform is
228that it doesn't change the structure of the graph at all, so it's running
229exactly the same operations and should have the same latency and memory usage as
230before. You can adjust the `num_steps` parameter to control how many values each
231weight buffer is rounded to, so lower numbers will increase the compression at
232the cost of accuracy.
233
234As a further step, you can store the weights into eight-bit values directly.
235Here's the recipe for that:
236
237```bash
238bazel build tensorflow/tools/graph_transforms:transform_graph
239bazel-bin/tensorflow/tools/graph_transforms/transform_graph \
240--in_graph=tensorflow_inception_graph.pb \
241--out_graph=optimized_inception_graph.pb \
242--inputs='Mul' \
243--outputs='softmax' \
244--transforms='
245  strip_unused_nodes(type=float, shape="1,299,299,3")
246  fold_constants(ignore_errors=true)
247  fold_batch_norms
248  fold_old_batch_norms
249  quantize_weights'
250```
251
252You should see that the size of the output graph is about a quarter of the
253original. The downside to this approach compared to round_weights is that extra
254decompression ops are inserted to convert the eight-bit values back into
255floating point, but optimizations in TensorFlow's runtime should ensure these
256results are cached and so you shouldn't see the graph run any more slowly.
257
258So far we've been concentrating on weights because those generally take up the
259most space. If you have a graph with a lot of small nodes in it, the names of
260those nodes can start to take up a noticeable amount of space too. To shrink
261those down, you can run the [obfuscate_names](#obfuscate_names) transform, which
262replaces all the names (except for inputs and outputs) with short, cryptic but
263unique ids:
264
265```bash
266bazel build tensorflow/tools/graph_transforms:transform_graph
267bazel-bin/tensorflow/tools/graph_transforms/transform_graph \
268--in_graph=tensorflow_inception_graph.pb \
269--out_graph=optimized_inception_graph.pb \
270--inputs='Mul:0' \
271--outputs='softmax:0' \
272--transforms='
273  obfuscate_names'
274```
275
276### Eight-bit Calculations
277
278For some platforms it's very helpful to be able to do as many calculations as
279possible in eight-bit, rather than floating-point. The support for this in
280TensorFlow is still experimental and evolving, but you can convert models into
281quantized form using the graph transform tool:
282
283```bash
284bazel build tensorflow/tools/graph_transforms:transform_graph
285bazel-bin/tensorflow/tools/graph_transforms/transform_graph \
286--in_graph=tensorflow_inception_graph.pb \
287--out_graph=optimized_inception_graph.pb \
288--inputs='Mul' \
289--outputs='softmax' \
290--transforms='
291  add_default_attributes
292  strip_unused_nodes(type=float, shape="1,299,299,3")
293  remove_nodes(op=Identity, op=CheckNumerics)
294  fold_constants(ignore_errors=true)
295  fold_batch_norms
296  fold_old_batch_norms
297  quantize_weights
298  quantize_nodes
299  strip_unused_nodes
300  sort_by_execution_order'
301```
302
303This process converts all the operations in the graph that have eight-bit
304quantized equivalents, and leaves the rest in floating point. Only a subset of
305ops are supported, and on many platforms the quantized code may actually be
306slower than the float equivalents, but this is a way of increasing performance
307substantially when all the circumstances are right.
308
309A full guide to optimizing for quantization is beyond the scope of this guide,
310but one thing that can help is using the FakeQuantWithMinMaxVars op after Conv2D
311or similar operations during training. This trains the min/max variables that
312control the range used for quantization, so that the range doesn't have to be
313calculated dynamically by RequantizationRange during inference.
314
315## Transform Reference
316
317The --transforms string is parsed as a series of transform names, each of which
318can have multiple named arguments inside parentheses. Arguments are separated by
319commas, and double-quotes (") can be used to hold argument values if they
320themselves contain commas (for example shape definitions).
321
322The --inputs and --outputs are shared across all transforms, since it's common
323to need to know what the ingoing and outgoing nodes in the graph are. You should
324make sure you set these correctly before calling the graph transform tool, and
325if you're in doubt check with the model's author, or use the [`summarize_graph`](https://github.com/tensorflow/tensorflow/tree/master/tensorflow/tools/graph_transforms#inspecting-graphs) tool
326to examine likely inputs and outputs.
327
328All transforms can be passed the `ignore_errors` flag, with the value set to
329either true or false. By default any errors that happen within a transform will
330abort the whole process, but if you enable this then an error will just be
331logged and the transform skipped. This is especially useful for optional
332transforms where version errors or other unimportant problems may trigger an
333error.
334
335### add_default_attributes
336
337Args: None
338
339When attributes are added to ops in new versions of TensorFlow, they often have
340defaults to ensure backwards compatible behavior with their original versions.
341These defaults usually get added when the graph is loaded by the runtime, but if
342your model is going to be processed outside of the main TensorFlow framework it
343can be useful to run this update process as a transform. This process finds any
344op attributes that are defined in the current TensorFlow list of ops but not
345within the saved model, and sets them to the defined default for that attribute.
346
347### backport_concatv2
348
349Args: None
350
351If you have a GraphDef file that has been produced by a newer version of the
352TensorFlow framework and includes ConcatV2, and you want to run it on an older
353version that only supports Concat, this transform will take care of converting
354those newer ops to the equivalent older form.
355
356### flatten_atrous_conv
357
358Args: None \
359Prerequisites: [fold_constants](#fold_constants)
360
361This transform flattens atrous convolution, corresponding to a sequence of
362SpaceToBatchND-Conv2D-BatchToSpaceND operations, converting it to a regular
363Conv2D op with upsampled filters. This transforms should only be used in order
364to run graphs having atrous convolution on platforms that do not yet natively
365support SpaceToBatchND and BatchToSpaceND operations. You will need to make
366sure you run [fold_constants](#fold_constants) after this transform. If
367applicable, you should run this transform before
368[fold_batch_norms](#fold_batch_norms).
369
370### fold_batch_norms
371
372Args: None \
373Prerequisites: [fold_constants](#fold_constants)
374
375This transform tries to optimize away the Mul that's introduced after a Conv2D
376(or a MatMul) when batch normalization has been used during training. It scans
377the graph for any channel-wise multiplies immediately after convolutions, and
378multiplies the convolution's (or matrix multiplication's) weights with the Mul
379instead so this can be omitted at inference time. You'll need to make sure you
380run [fold_constants](#fold_constants) first, since the pattern can only be
381spotted if the normal complex expression that's produced by training for the Mul
382input is collapsed down into a simple constant.
383
384### fold_constants
385
386Args:
387
388*   clear_output_shapes: Clears tensor shape information saved as attributes.
389    Some older graphs contains out-of-date information and may cause import
390    errors. Defaults to true.
391
392Prerequisites: None
393
394Looks for any sub-graphs within the model that always evaluate to constant
395expressions, and replaces them with those constants. This optimization is always
396executed at run-time after the graph is loaded, so running it offline first
397won't help latency, but it can simplify the graph and so make further processing
398easier. It's often useful to call this with `fold_constants(ignore_errors=true)`
399to continue on past transient errors, since this is just an optimization phase.
400
401### fold_old_batch_norms
402
403Args: None \
404Prerequisites: None
405
406In the early days of TensorFlow, batch normalization was implemented using
407single monolithic ops like `BatchNormWithGlobalNormalization` or
408`FusedBatchNorm`. In modern versions, adding batch normalization from Python
409will give you a series of smaller math ops instead, to achieve the same effect
410without special-purpose code. If you have a graph that uses the older-style,
411this transform will recognize and optimize those ops for inference, in the same
412way that the [fold_batch_norms](#fold_batch_norms) transform does for the new
413approach.
414
415### freeze_requantization_ranges
416
417Args:
418
419*   min_max_log_file: Path to a log file containing ranges for ops.
420*   min_percentile: Percentage cutoff to use to calculate an overall min.
421    Defaults to 5.
422*   max_percentile: Percentage cutoff to use to calculate an overall max.
423    Defaults to 5.
424
425Quantized operations like convolution or matrix multiplies take their inputs as
4268-bit, but produce 32-bit results. To do further operations on these, they need
427to be converted back down to the lower depth. To make the most of those eight
428bits, you need to scale the thirty-two bits of original data down using a scale
429that matches the range that's actually being used.
430
431Because that range information isn't stored in the original graph, the
432[quantization process](#eight-bit-calculations) inserts RequantizationRange ops
433before each conversion from 32 to 8 bits. This op looks at the 32-bit output and
434calculates the current min and max every time it's run.
435
436This isn't incredibly time-consuming, but it is extra work that's nice to avoid
437if possible. One way of optimizing that away is replacing those
438RequantizationRange ops with a pair of Const nodes holding known min/max values,
439so the scaling down can be done without having to inspect the output every time.
440
441That's what this transform does. It's usually used in conjunction with a copy of
442the graph that's had [insert_logging](#insert_logging) run on it to instrument
443it to record the min/max values to stderr. Why is logging used rather than
444writing to a normal file? As you'll see later, to get best results you want to
445collect data from a lot of runs on real data, and for mobile apps especially
446it's a lot easier to do this by copying log files. As an example, here's how
447you'd add the logging operations for a quantized version of the Inception v3
448graph:
449
450```bash
451bazel build tensorflow/tools/graph_transforms:transform_graph
452bazel-bin/tensorflow/tools/graph_transforms/transform_graph \
453--in_graph=/tmp/quantized_inception.pb \
454--out_graph=/tmp/logged_quantized_inception.pb \
455--inputs=Mul \
456--outputs=softmax \
457--transforms='
458insert_logging(op=RequantizationRange, show_name=true, message="__requant_min_max:")\
459'
460```
461
462Now, when you run the `/tmp/logged_quantized_inception.pb` graph, it will write
463out log statements that show the value of the min and max calculated by each
464RequantizationRange op. Here's an example of running label_image and saving the
465log:
466
467```bash
468bazel build tensorflow/examples/label_image:label_image
469bazel-bin/tensorflow/examples/label_image/label_image \
470--image=${HOME}/Downloads/grace_hopper.jpg \
471--input_layer=Mul \
472--output_layer=softmax \
473--graph=/tmp/logged_quantized_inception.pb \
474--labels=${HOME}/Downloads/imagenet_comp_graph_label_strings.txt \
4752>/tmp/min_max_log_small.txt
476```
477
478If you look in `/tmp/min_max_log_small.txt`, you'll see a lot of lines like
479this:
480
481```
482I0108 21:45:42.261883    1972 logging_ops.cc:79] ;conv/Conv2D/eightbit/requant_range__print__;__requant_min_max:[-20.887871][22.274715]
483```
484
485This is a simple way of serializing the name of the RequantizationRange op and
486its min/max values every time it's run. It's a file like this that you pass into
487the transform as the `min_max_log_file` argument. The transform will attempt to
488extract all of the min/max values associated with ops, ignoring any irrelevant
489lines in the log, and replace the RequantizationRange ops with two Const nodes
490containing the found values.
491
492This isn't the whole story though. The min/max values can vary a lot depending
493on what the particular inputs to the graph are on any given run, which means
494picking ranges based on just one run can lead to clipping of values and a loss
495of accuracy. To get better results, you need to run your network against a range
496of different inputs. In Inception's case, I often use a thousand different
497images from the training set. You can then pass the whole concatenated log from
498all of the runs into the transform, and it will pick ranges based on the
499aggregate of the values found for each RequantizationRange op.
500
501To ensure that outliers don't increase the range too much, and so decrease the
502accuracy by putting too many bits into rare extreme values, the `min_percentile`
503and `max_percentile` arguments control how the overall min and max are chosen.
504At their default values of 5, this means that the lowest 5% of the minimum
505values will be discarded, taking the minimum of the remainder, and the
506equivalent for the maximum.
507
508### fuse_convolutions
509
510Args: None \
511Prerequisites: None
512
513For graphs that use ResizeBilinear or MirrorPad ops before convolutions (e.g. to
514scale up in the later stages of an image style transfer model), it can improve
515memory usage and latency to combine the spatial transformations with the
516convolution's im2col patch generation. This transform looks out for that
517particular pattern of ops and replaces them with a fused version that combines
518the resizing and padding with the convolution.
519
520### insert_logging
521
522Args:
523
524*   op: Insert a Print after every occurrence of this op type. Can be repeated
525    to cover multiple types. If not present, all op types will be instrumented.
526*   prefix: Insert a Print after every node whose name starts with this value.
527    Can be repeated to cover multiple nodes. If not present, all node names will
528    be matched.
529*   show_op: If true, the op type will be prepended to all log messages.
530*   show_name: If true, the node's name will be prepended to all log messages.
531*   message: Arbitrary text to log before the values.
532*   first_n: How many times to print before suppressing. Defaults to -1, which
533    means never stop.
534*   summarize: How long numerical results can be before they're truncated.
535    Defaults to 1024.
536
537The Print operator writes strings to stderr when it's run inside a graph, and
538prints out the numerical results of the node that it's reading from. This can be
539very useful when you're debugging and want to follow particular internal values
540while a graph is running. This transform allows you to insert those ops at
541particular points in the graph, and customize the message that's displayed. It's
542also used in conjunction with the
543[freeze_requantization_ranges](#freeze_requantization_ranges) transform to
544output information that it needs.
545
546### merge_duplicate_nodes
547
548Args: None \
549Prerequisites: None
550
551If there are Const nodes with the same types and contents, or nodes with the
552same inputs and attributes, this transform will merge them together. It can be
553useful when you want to cut down the number of nodes in a graph that has a lot
554of redundancy (e.g. this transform is always run as part of
555[quantize_nodes](#quantize_nodes) since the processing there can introduce
556duplicates of constants that are used in the quantize/dequantize process).
557
558### obfuscate_names
559
560Args: None \
561Prerequisites: None
562
563Replaces all nodes' names with short generated ids, other than the inputs and
564outputs. This also updates all references within the graph so that the structure
565is preserved. This can be useful if you want to shrink the file size, or if you
566want to make it harder to understand the architecture of your model before
567releasing it.
568
569### quantize_nodes
570
571Args:
572
573*   input_min: The lowest float value for any quantized placeholder inputs.
574*   input_max: The highest float value for any quantized placeholder inputs. If
575    both input_min and input_max are set, then any float placeholders in the
576    graph will be replaced with quantized versions, and consts will be created
577    to pass the range to subsequent operations.
578*   fallback_min: The lowest float value to use for requantizing activation
579    layers.
580*   fallback_max: The highest float value to use for requantizing activation
581    layers. If both fallback_min and fallback_max are set, then instead of using
582    RequantizationRange ops to figure out the useful range dynamically when
583    converting the 32-bit output of ops like QuantizedConv2D and
584    QuantizedBiasAdd, hardwired consts with these values will be used instead.
585    This can help performance, if you know the range of your activation layers
586    ahead of time.
587
588Prerequisites: [quantize_weights](#quantize_weights)
589
590Replaces any calculation nodes with their eight-bit equivalents (if available),
591and adds in conversion layers to allow remaining float operations to
592interoperate. This is one of the most complex transforms, and involves multiple
593passes and a lot of rewriting. It's also still an active area of research, so
594results may vary depending on the platform and operations you're using in your
595model. You should run quantize_weights first to ensure your Const ops are in
596eight-bit form.
597
598### quantize_weights
599
600Args:
601
602*   minimum_size: Tensors with fewer elements than this won't be quantized
603(defaults to 1024)
604
605Prerequisites: None
606
607Converts any large (more than minimum_size) float Const op into an eight-bit
608equivalent, followed by a float conversion op so that the result is usable by
609subsequent nodes. This is mostly useful for [shrinking file
610sizes](#shrinking-file-size), but also helps with the more advanced
611[quantize_nodes](#quantize_nodes) transform. Even though there are no
612prerequisites, it is advisable to run [fold_batch_norms](#fold_batch_norms) or
613[fold_old_batch_norms](#fold_old_batch_norms), because rounding variances down
614to zero may cause significant loss of precision.
615
616### remove_attribute
617
618Args:
619
620*   attribute_name: Name of the attribute you want to remove.
621*   op_name: Optional name of a single op to restrict the removal to.
622
623Prerequisites: None
624
625Deletes the given attribute from either all nodes, or just the one specified in
626`op_name`. This can be a dangerous transform since it's easy to leave your graph
627in an invalid state if you remove a required attribute. It can be useful in
628special circumstances though.
629
630### remove_device
631
632Args: None \
633Prerequisites: None
634
635All ops can have a hardware device specified. This can be a problem when you're
636loading a graph on a different system than the model was trained on, since some
637specified devices may not be available. In order to work with graphs like these,
638you can run this transform to wipe the slate clean and delete the device
639specifier from all ops.
640
641### remove_control_dependencies
642
643Args: None \
644Prerequisites: None
645
646Removes all control dependencies from the graph.
647
648### remove_nodes
649
650Args:
651
652*   op: The name of the op you want to remove. Can be repeated to remove
653    multiple ops.
654
655Prerequisites: None
656
657This is a potentially dangerous transform that looks for single-input,
658single-output ops with the given names, removes them from the graph, and rewires
659all inputs that use to pull from them to pull from the preceding node instead.
660This is most useful for getting rid of ops like `CheckNumerics` that are useful
661during training but just complicate the graph and increase latency during
662inference. It's dangerous because it's possible that removing some ops may
663change the output of your graph, so make sure you check the overall accuracy
664after using this.
665
666### rename_attribute
667
668Args:
669
670*   old_attribute_name: Current name of the attribute you want to rename.
671*   new_attribute_name: Name that you want the attribute to have now.
672*   op_name: If this is set, only change attributes for a given op type,
673    otherwise apply to all nodes with attribute names that match.
674
675Prerequisites: None
676
677Changes the name of the given attribute. This is often useful for upgrading
678graph files as op definitions change over versions, since the renaming is often
679enough to deal with minor changes.
680
681### rename_op
682
683Args:
684
685*   old_op_name: Current name of the operation.
686*   new_op_name: Name to change to.
687
688Prerequisites: None
689
690Finds all ops with the given name, and changes them to the new one. This can be
691useful for version upgrading if the changes between ops are minor apart from the
692name.
693
694### round_weights
695
696Args:
697
698*   num_steps: How many unique values to use in each buffer.
699
700Prerequisites: None
701
702Rounds all float values in large Const ops (more than 15 elements) to the given
703number of steps. The unique values are chosen per buffer by linearly allocating
704between the largest and smallest values present. This is useful when you'll be
705deploying on mobile, and you want a model that will compress effectively. See
706[shrinking file size](#shrinking-file-size) for more details. Even though there
707are no prerequisites, it is advisable to run
708[fold_batch_norms](#fold_batch_norms) or
709[fold_old_batch_norms](#fold_old_batch_norms), because rounding variances down
710to zero may cause significant loss of precision.
711
712### sparsify_gather
713
714Args: None \
715Prerequisites: None
716
717Transform 'Gather' op to a sparsified version where 'params' input of 'Gather'
718is replaced from a dense 'Const' to a 'HashTable'. 'Gather' op itself is
719replaced by a hashtable lookup. This is mostly useful for reducing sparse
720TF.learn linear model memory footprint.
721
722### set_device
723
724Args:
725
726*   device: What device to assign to ops.
727*   if_default: If this is true, only assign to ops with empty existing devices.
728
729Updates nodes to use the specified device. A device is a way to tell the code
730that executes the graph which piece of hardware it should run particular nodes
731on. The right assignment to use may change between training and deployment, so
732this transform (and [remove_device](#remove_device)) provide a way of updating
733the placement. If the `is_default` parameter is set, then only ops that don't
734have a device assigned already will be updated. This is mostly useful for
735preprocessing of graphs for other stages that expect all ops to have an explicit
736device assigned.
737
738### sort_by_execution_order
739
740Args: None \
741Prerequisites: None
742
743Arranges the nodes in the GraphDef in topological order, so that the inputs of
744any given node are always earlier than the node itself. This is especially
745useful when you're targeting a minimal inference engine, since you can just
746execute the nodes in the given order knowing that the inputs will be computed
747before they're needed.
748
749### strip_unused_nodes
750
751Args:
752
753*   type: Default type for any new Placeholder nodes generated, for example
754    int32, float, quint8.
755*   shape: Default shape for any new Placeholder nodes generated, as
756    comma-separated dimensions. For example shape="1,299,299,3". The double
757    quotes are important, since otherwise the commas will be taken as argument
758    separators.
759*   name: Identifier for the placeholder arguments.
760*   type_for_name: What type to use for the previously-given name.
761*   shape_for_name: What shape to use for the previously-given name.
762
763Prerequisites: None
764
765Removes all nodes not used in calculated the layers given in `--outputs`, fed by
766`--inputs`. This is often useful for removing training-only nodes like
767save-and-restore or summary ops. It's also handy for solving the [missing kernel
768errors problem](#fixing-missing-kernel-errors-on-mobile) when there are decode
769or other ops you don't need in the inference path.
770
771The biggest complication is that it sometimes has to create new Placeholder ops,
772so there are options to control their characteristics. This will happen if you
773bypass a DecodeJpeg op by specifying an input layer deeper in the network, for
774example, so you can pass in a raw image array instead of an encoded string as an
775input. The decode op will be removed, together with the Placeholder that fed it,
776but a new Placeholder is needed for the input layer you specify. The type and
777shape arguments let you control the attributes of any new Placeholders that are
778created. Plain `type` and `shape` set global defaults, but if you have different
779inputs with varying characteristics, you'll need to pass in a list of arguments
780where the preceding name specifies what layer each applies to. For example, if
781you had two inputs in1 and in2, you could call `strip_unused_nodes(name=in1,
782type_for_name=int32, shape_for_name="2,3", name=in2, type_for_name=float,
783shape_for_name="1,10,10,3")`.
784
785## Writing Your Own Transforms
786
787The Graph Transform Tool is designed to make it as easy as possible to create
788your own optimization, modification, and pre-processing transforms. At their
789heart, all of the transforms take in a valid GraphDef, make some changes, and
790output a new GraphDef. Each GraphDef is just a list of NodeDefs, each defining
791one node in the graph and its connections. You can find more information on the
792format at [this guide to TensorFlow model
793files](https://www.tensorflow.org/versions/master/extend/tool_developers/index.html),
794but for a simple example take a look at
795[tensorflow/tools/graph_transforms/rename_op.cc](https://github.com/tensorflow/tensorflow/tree/master/tensorflow/tools/graph_transforms/rename_op.cc),
796which implements the [rename_op](#rename_op) transform:
797
798```C++
799Status RenameOp(const GraphDef& input_graph_def,
800                const TransformFuncContext& context,
801                GraphDef* output_graph_def) {
802  if (!context.params.count("old_op_name") ||
803      (context.params.at("old_op_name").size() != 1) ||
804      !context.params.count("new_op_name") ||
805      (context.params.at("new_op_name").size() != 1)) {
806    return errors::InvalidArgument(
807        "rename_op expects exactly one 'old_op_name' and 'new_op_name' "
808        "argument, e.g. rename_op(old_op_name=Mul, new_op_name=Multiply)");
809  }
810
811  const string old_op_name = context.params.at("old_op_name")[0];
812  const string new_op_name = context.params.at("new_op_name")[0];
813  output_graph_def->Clear();
814  for (const NodeDef& node : input_graph_def.node()) {
815    NodeDef* new_node = output_graph_def->mutable_node()->Add();
816    new_node->CopyFrom(node);
817    if (node.op() == old_op_name) {
818      new_node->set_op(new_op_name);
819    }
820  }
821
822  return Status::OK();
823}
824
825REGISTER_GRAPH_TRANSFORM("rename_op", RenameOp);
826```
827
828The heart of this transform is the loop through the input_graph_def's nodes. We
829go through each op, add a new one to the output, copy the original's contents,
830and then change the op over if it matches the parameters. There's a standard set
831of parameters for every transform, so they all take in a GraphDef and context,
832and write out into a new GraphDef. The registration macro at the bottom lets the
833tool know what function to call when it finds the `rename_op` string in a
834transforms list.
835
836### Transform Functions
837
838The standard signature that all transform functions have is defined as
839`TransformFunc`, which takes in an input GraphDef, a `TransformFuncContext`
840containing environment information, writes to an output GraphDef, and returns a
841Status indicating whether the transform succeeded.
842
843The `TransformFuncContext` has a list of the inputs and outputs for the graph,
844and the [parameter arguments](#parameters) that were passed into the transform
845by the user.
846
847If you write a function that matches this signature, and [register
848it](#registration), the graph transform tool will take care of calling it.
849
850### Pattern Syntax
851
852The `rename_op` example only needs to look at a single node at a time, but one
853of the most common needs is to modify small sub-graphs within a model. To make
854this easy, the Graph Transform Tool provides the `OpTypePattern` syntax. This is
855a simple and compact way to specify patterns of nodes that you want to look for.
856The format is:
857
858```
859OP_TYPE_PATTERN ::= "{" OP "," INPUTS "}"
860INPUTS ::= OP_TYPE_PATTERN
861```
862
863The `OP` field can either contain a single "*", which means match any op type,
864one op type (for example "Const"), or a set of op types separated by `|` symbols
865(for example "Conv2D|MatMul|BiasAdd"). General regex patterns are not supported,
866just these special cases.
867
868You can think of these patterns as very limited regular expressions designed to
869pick out sub-trees in graphs. They are deliberately very constrained to the kind
870of things we commonly find ourselves needing to do, to make creating and
871debugging as straightforward as possible.
872
873For example, if you want all Conv2D nodes that have a constant as their second
874input, you would set up a pattern like this, using C++ initializer lists to
875populate the structure:
876
877```C++
878OpTypePattern conv_pattern({"Conv2D", {{"*"}, {"Const"}}});
879```
880
881It can be easier to visualize these initializers using indentation to show the
882tree structure more clearly:
883
884```C++
885OpTypePattern conv_pattern({
886  "Conv2D",
887  {
888    {"*"},
889    {"Const"}
890  }
891});
892```
893
894In plain English this is saying, a Conv2D op with two inputs, the first of which
895is any op type, and the second is a Const op.
896
897Here's a much more complex example, from the [quantize_nodes](#quantize_nodes)
898transform:
899
900```C++
901{"QuantizeV2",
902  {
903    {"Dequantize"},
904    {"Min",
905      {
906        {"Reshape",
907          {
908            {"Dequantize"},
909            {"Const"},
910          }
911        },
912        {"Const"},
913      }
914    },
915    {"Max",
916      {
917        {"Reshape",
918          {
919            {"Dequantize"},
920            {"Const"},
921          }
922        },
923        {"Const"},
924      }
925    },
926  }
927}
928```
929
930This is looking for QuantizeV2 nodes, with three inputs, the first of which is a
931Dequantize, the second is a Min that ultimately pulls from a Dequantize, and the
932third is a Max which does the same. Assuming we know the Dequantize ops are
933pulling from the same eight-bit buffer, the end result of this sub-graph is a
934no-op, since it's just turning the eight-bit buffer into float, and then
935immediately converting it back to eight-bits, so if we look for this pattern and
936remove it we can optimize the graph without changing the result.
937
938### ReplaceMatchingOpTypes
939
940It's very common to want to find all occurrences of a particular sub-graph in a
941model, and replace them all with a different sub-graph that keeps the same local
942input and output connections. For example with
943[fuse_convolutions](#fuse_convolutions), we needed to find all Conv2D ops that
944read their inputs from BilinearResizes, and replace those combinations with a
945single FusedResizeAndPadConv2D op, but without affecting other ops.
946
947To make that sort of transformation easy, we created the
948`ReplaceMatchingOpTypes` helper. This takes in a graph, an `OpTypePattern`
949defining the sub-graph to look for, and a callback function to run for every
950occurrence it finds. The job of this callback function is to look at the
951`NodeMatch` that contains information about the current sub-graph, and return a
952new sub-graph in the new_nodes list that will be used to replace the old
953sub-graph.
954
955You can see how it's used in practice in the
956[fuse_convolutions](#fuse_convolutions) code:
957
958```C++
959TF_RETURN_IF_ERROR(ReplaceMatchingOpTypes(
960    input_graph_def,  // clang-format off
961    {"Conv2D",
962        {
963            {"ResizeBilinear"},
964            {"*"}
965        }
966    },  // clang-format on
967    [](const NodeMatch& match, const std::set<string>& input_nodes,
968       const std::set<string>& output_nodes,
969       std::vector<NodeDef>* new_nodes) {
970      // Find all the nodes we expect in the subgraph.
971      const NodeDef& conv_node = match.node;
972      const NodeDef& resize_node = match.inputs[0].node;
973      const NodeDef& weights_node = match.inputs[1].node;
974
975      // We'll be reusing the old weights.
976      new_nodes->push_back(weights_node);
977
978      // Create a 'no-op' mirror padding node that has no effect.
979      NodeDef pad_dims_node;
980      pad_dims_node.set_op("Const");
981      pad_dims_node.set_name(conv_node.name() + "_dummy_paddings");
982      SetNodeAttr("dtype", DT_INT32, &pad_dims_node);
983      SetNodeTensorAttr<int32>("value", {4, 2}, {0, 0, 0, 0, 0, 0, 0, 0},
984                               &pad_dims_node);
985      new_nodes->push_back(pad_dims_node);
986
987      // Set up the new fused version of the convolution op.
988      NodeDef fused_conv;
989      fused_conv.set_op("FusedResizeAndPadConv2D");
990      fused_conv.set_name(match.node.name());
991      AddNodeInput(resize_node.input(0), &fused_conv);
992      AddNodeInput(resize_node.input(1), &fused_conv);
993      AddNodeInput(pad_dims_node.name(), &fused_conv);
994      AddNodeInput(conv_node.input(1), &fused_conv);
995      CopyNodeAttr(resize_node, "align_corners", "resize_align_corners",
996                   &fused_conv);
997      SetNodeAttr("mode", "REFLECT", &fused_conv);
998      CopyNodeAttr(conv_node, "T", "T", &fused_conv);
999      CopyNodeAttr(conv_node, "padding", "padding", &fused_conv);
1000      CopyNodeAttr(conv_node, "strides", "strides", &fused_conv);
1001      new_nodes->push_back(fused_conv);
1002
1003      return Status::OK();
1004    },
1005    {}, &replaced_graph_def));
1006```
1007
1008Here you can see we define the pattern to look for, and in the callback function
1009use information from each of the nodes in the old sub-graph to create a new
1010fused node. We also copy over the old weights input node so that isn't lost.
1011
1012There are a few things to know about the `ReplaceMatchingOpTypes` function:
1013
1014*   All of the nodes in any matching sub-graphs are removed from the new graph
1015    created by the function. If any of them are needed, it's the callback
1016    function's responsibility to add them back in. There's a `CopyOriginalMatch`
1017    convenience call that will copy over all of the original nodes if you decide
1018    you don't actually want to modify a particular sub-graph.
1019
1020*   It is assumed that the same nodes will never appear in more than one matched
1021    sub-graph. This is to ensure that sub-trees are only replaced once, but it
1022    may mean that some sub-graphs aren't spotted if they overlap with earlier
1023    matches.
1024
1025*   The calling framework tries to ensure that the graph remains sane, by
1026    looking at the new_nodes that are returned and making sure that no nodes
1027    which are needed as inputs by nodes outside the sub-graph are removed. These
1028    important nodes are listed in the `output_nodes` argument that's passed into
1029    each replacement function call. You can disable this checking by setting
1030    `allow_inconsistencies` to true in the options, but otherwise any
1031    replacements that break the graph constraints will be canceled. If you do
1032    allow inconsistencies, it's your transform's responsibility to fix them up
1033    before you return your final result. Functions like `RenameNodeInputs` can
1034    be useful if you are doing wholesale node renaming for example.
1035
1036### Parameters
1037
1038The arguments that are in parentheses after the transform name when the tool is
1039called are parsed and placed into the params member of the TransformFuncContext
1040that's given to each transform. For every named argument, there's a vector of
1041strings containing all the values that it was given, in the order they were
1042given. These are treated a bit like command-line parameters, and it's the
1043transform's responsibility to parse them into the data types it needs, and raise
1044errors by returning a bad Status if any of them are ill-formed.
1045
1046As an example, here's a hypothetical transform call:
1047
1048```
1049some_transform(foo=a, foo=b, bar=2, bob="1,2,3")
1050```
1051
1052Here's what the std::map of strings looks like in the params member:
1053
1054```
1055{{"foo", {"a", "b"}}, {"bar", {"2"}}, {"bob", {"1,2,3"}}}
1056```
1057
1058The double quotes around the comma-separated argument to `bob` are important
1059because otherwise they'll be treated as separate arguments, and the parsing will
1060fail.
1061
1062Here's an example of how [round_weights](#round_weights) reads its `num_steps`
1063parameter:
1064
1065```C++
1066TF_RETURN_IF_ERROR(context.GetOneInt32Parameter("num_steps", 256, &num_steps));
1067```
1068
1069If the conversion fails or the parameter occurs more than once the helper
1070function will raise a meaningful error through the status result of the
1071transform. If the parameter isn't specified at all then the default will be
1072used.
1073
1074### Function Libraries
1075
1076A newer feature of TensorFlow is the ability to create libraries of functions as
1077part of graphs. These are a bit like templates, which define macro operations in
1078terms of smaller components, which can then be instantiated with different input
1079and output connections inside the graph just like regular ops. Right now the
1080graph transform tool just copies these libraries between the input and output
1081graphs, but it's likely that more complex operations will be supported on them
1082in the future.
1083
1084### Registering
1085
1086The Graph Transform Tool associates names of transforms with the code to
1087implement them using the `REGISTER_GRAPH_TRANSFORM()` macro. This takes a string
1088and a function, and automatically registers the transform with the tool. You
1089will need to watch out for a few things though:
1090
1091*   Because it's using global C++ objects in each file under the hood, the
1092    linker can sometimes strip them out and lose the registration. In Bazel you
1093    need to make sure you're linking any new transforms in as libraries, and use
1094    the `alwayslink` flag in your `cc_binary` call.
1095
1096*   You should be able to create your own copy of the transform_graph tool by
1097    linking against the transform_graph_main_lib library in
1098    tensorflow/tools/graph_transforms/BUILD. This contains all the `main()`
1099    logic to parse command line arguments and call transforms.
1100