1# Graph Transform Tool 2 3## Table of Contents 4 5* [Introduction](#introduction) 6* [Using the Graph Transform Tool](#using-the-graph-transform-tool) 7* [Inspecting Graphs](#inspecting-graphs) 8* [Common Use Cases](#common-use-cases) 9 * [Optimizing for Deployment](#optimizing-for-deployment) 10 * [Fixing Missing Kernel Errors on 11 Mobile](#fixing-missing-kernel-errors-on-mobile) 12 * [Shrinking File Size](#shrinking-file-size) 13 * [Eight-bit Calculations](#eight-bit-calculations) 14* [Transform Reference](#transform-reference) 15 * [add_default_attributes](#add_default_attributes) 16 * [backport_concatv2](#backport_concatv2) 17 * [flatten_atrous_conv](#flatten_atrous_conv) 18 * [fold_batch_norms](#fold_batch_norms) 19 * [fold_constants](#fold_constants) 20 * [fold_old_batch_norms](#fold_old_batch_norms) 21 * [freeze_requantization_ranges](#freeze_requantization_ranges) 22 * [fuse_convolutions](#fuse_convolutions) 23 * [insert_logging](#insert_logging) 24 * [merge_duplicate_nodes](#merge_duplicate_nodes) 25 * [obfuscate_names](#obfuscate_names) 26 * [quantize_nodes](#quantize_nodes) 27 * [quantize_weights](#quantize_weights) 28 * [remove_attribute](#remove_attribute) 29 * [remove_device](#remove_device) 30 * [remove_nodes](#remove_nodes) 31 * [rename_attribute](#rename_attribute) 32 * [rename_op](#rename_op) 33 * [round_weights](#round_weights) 34 * [sparsify_gather](#sparsify_gather) 35 * [set_device](#set_device) 36 * [sort_by_execution_order](#sort_by_execution_order) 37 * [strip_unused_nodes](#strip_unused_nodes) 38* [Writing Your Own Transforms](#writing-your-own-transforms) 39 * [Transform Functions](#transform-functions) 40 * [Pattern Syntax](#pattern-syntax) 41 * [ReplaceMatchingOpTypes](#replacematchingoptypes) 42 * [Parameters](#parameters) 43 * [Function Libraries](#function-libraries) 44 * [Registering](#registering) 45 46## Introduction 47 48When you have finished training a model and want to deploy it in production, 49you'll often want to modify it to better run in its final environment. For 50example if you're targeting a phone you might want to shrink the file size by 51quantizing the weights, or optimize away batch normalization or other 52training-only features. The Graph Transform framework offers a suite of tools 53for modifying computational graphs, and a framework to make it easy to write 54your own modifications. 55 56This guide is structured into three main parts, first giving some tutorials on 57how to perform common tasks, second a reference covering all of the different 58transformations that are included, together with the options that apply to them, 59and third a guide to creating your own transforms. 60 61## Using the Graph Transform Tool 62 63The Graph Transform tool is designed to work on models that are saved as 64GraphDef files, usually in a binary protobuf format. This is the low-level 65definition of a TensorFlow computational graph, including a list of nodes and 66the input and output connections between them. If you're using a Python API to 67train your model, this will usually be saved out in the same directory as your 68checkpoints, and usually has a '.pb' suffix. 69 70If you want to work with the values of your trained parameters, for example to 71quantize weights, you'll need to run 72[tensorflow/python/tools/freeze_graph.py](https://github.com/tensorflow/tensorflow/blob/master/tensorflow/python/tools/freeze_graph.py) 73to convert the checkpoint values into embedded constants within the graph file 74itself. 75 76You call the Graph Transform tool itself like this: 77 78```bash 79bazel build tensorflow/tools/graph_transforms:transform_graph 80bazel-bin/tensorflow/tools/graph_transforms/transform_graph \ 81--in_graph=tensorflow_inception_graph.pb \ 82--out_graph=optimized_inception_graph.pb \ 83--inputs='Mul:0' \ 84--outputs='softmax:0' \ 85--transforms=' 86strip_unused_nodes(type=float, shape="1,299,299,3") 87remove_nodes(op=Identity, op=CheckNumerics) 88fold_old_batch_norms 89' 90``` 91 92The arguments here are specifying where to read the graph from, where to write 93the transformed version to, what the input and output layers are, and what 94transforms to modify the graph with. The transforms are given as a list of 95names, and can each have arguments themselves. These transforms define the 96pipeline of modifications that are applied in order to produce the output. 97Sometimes you need some transforms to happen before others, and the ordering 98within the list lets you specify which happen first. 99Note that the optimization 100`remove_nodes(op=Identity, op=CheckNumerics)` will break the model with control 101flow operations, such as `tf.cond`, `tf.map_fn`, and `tf.while`. 102 103## Inspecting Graphs 104 105Many of the transforms that the tool supports need to know what the input and 106output layers of the model are. The best source for these is the model training 107process, where for a classifier the inputs will be the nodes that receive the 108data from the training set, and the output will be the predictions. If you're 109unsure, the 110[`summarize_graph`](https://github.com/tensorflow/tensorflow/blob/master/tensorflow/tools/graph_transforms/summarize_graph_main.cc) 111tool can inspect the model and provide guesses about likely input and output nodes, 112as well as other information that's useful for debugging. Here's an example of 113how to use it on the [Inception V3 114graph](https://storage.googleapis.com/download.tensorflow.org/models/image/imagenet/inception-2015-12-05.tgz): 115 116```bash 117bazel build tensorflow/tools/graph_transforms:summarize_graph 118bazel-bin/tensorflow/tools/graph_transforms/summarize_graph --in_graph=tensorflow_inception_graph.pb 119``` 120 121## Common Use Cases 122 123This section has small guides for some of the most frequently-used 124transformation pipelines, aimed at users who want to quickly accomplish one of 125these tasks. A lot of them will use the Inception V3 model for their examples, 126which can be downloaded from 127[https://storage.googleapis.com/download.tensorflow.org/models/image/imagenet/inception-2015-12-05.tgz](https://storage.googleapis.com/download.tensorflow.org/models/image/imagenet/inception-2015-12-05.tgz). 128 129### Optimizing for Deployment 130 131If you've finished training your model and want to deploy it on a server or a 132mobile device, you'll want it to run as fast as possible, and with as few 133non-essential dependencies as you can. This recipe removes all of the nodes that 134aren't called during inference, shrinks expressions that are always constant 135into single nodes, and optimizes away some multiply operations used during batch 136normalization by pre-multiplying the weights for convolutions. 137 138```bash 139bazel build tensorflow/tools/graph_transforms:transform_graph 140bazel-bin/tensorflow/tools/graph_transforms/transform_graph \ 141--in_graph=tensorflow_inception_graph.pb \ 142--out_graph=optimized_inception_graph.pb \ 143--inputs='Mul' \ 144--outputs='softmax' \ 145--transforms=' 146 strip_unused_nodes(type=float, shape="1,299,299,3") 147 remove_nodes(op=Identity, op=CheckNumerics) 148 fold_constants(ignore_errors=true) 149 fold_batch_norms 150 fold_old_batch_norms' 151``` 152 153The batch norm folding is included twice because there are two different flavors 154of batch normalization used in TensorFlow. The older version was implemented 155with a single op like BatchNormWithGlobalNormalization or FusedBatchNorm, and 156BatchNormWithGlobalNormalization was deprecated in favor of a more recent 157approach using individual ops to implement the same computation. The two 158transforms are in there so that both styles are recognized and optimized. 159 160### Fixing Missing Kernel Errors on Mobile 161 162The mobile version of TensorFlow is focused on inference, and so by default the 163list of supported ops (defined in 164[tensorflow/core/kernels/BUILD:android_extended_ops](https://github.com/tensorflow/tensorflow/blob/master/tensorflow/core/kernels/BUILD) 165for Bazel doesn't include a lot that are training related. This can cause 166`No OpKernel was registered to support Op` errors when a GraphDef is loaded, 167even if the op isn't going to be executed. 168 169If you see this error and it's an op that you do actually want to run on mobile, 170then you'll need to make local modifications to the build files to include the 171right .cc file that defines it. In a lot of cases the op is just a vestigial 172remnant from the training process though, and if that's true then you can run 173the [strip_unused_nodes](#strip_unused_nodes), specifying the inputs and outputs 174of your inference usage, to remove those unnecessary nodes: 175 176```bash 177bazel build tensorflow/tools/graph_transforms:transform_graph 178bazel-bin/tensorflow/tools/graph_transforms/transform_graph \ 179--in_graph=tensorflow_inception_graph.pb \ 180--out_graph=optimized_inception_graph.pb \ 181--inputs='Mul' \ 182--outputs='softmax' \ 183--transforms=' 184 strip_unused_nodes(type=float, shape="1,299,299,3") 185 fold_constants(ignore_errors=true) 186 fold_batch_norms 187 fold_old_batch_norms' 188``` 189 190### Shrinking File Size 191 192If you're looking to deploy your model as part of a mobile app, then keeping the 193download size as small as possible is important. For most TensorFlow models, the 194largest contributors to the file size are the weights passed in to convolutional 195and fully-connected layers, so anything that can reduce the storage size for 196those is very useful. Luckily most neural networks are resistant to noise, so 197it's possible to change the representation of those weights in a lossy way 198without losing very much accuracy overall. 199 200On both iOS and Android app packages are compressed before download, so the 201simplest way to reduce the bandwidth your users need to receive your app is to 202provide raw data that compresses more easily. By default the weights are stored 203as floating-point values, and even tiny differences between numbers result in 204very different bit patterns, and so these don't compress very well. If you round 205the weights so that nearby numbers are stored as exactly the same values, the 206resulting bit stream has a lot more repetition and so compresses down a lot more 207effectively. To try this technique on your model, run the 208[round_weights](#round_weights) transform. 209 210```bash 211bazel build tensorflow/tools/graph_transforms:transform_graph 212bazel-bin/tensorflow/tools/graph_transforms/transform_graph \ 213--in_graph=tensorflow_inception_graph.pb \ 214--out_graph=optimized_inception_graph.pb \ 215--inputs='Mul' \ 216--outputs='softmax' \ 217--transforms=' 218 strip_unused_nodes(type=float, shape="1,299,299,3") 219 fold_constants(ignore_errors=true) 220 fold_batch_norms 221 fold_old_batch_norms 222 round_weights(num_steps=256)' 223``` 224 225You should see that the `optimized_inception_graph.pb` output file is the same 226size as the input, but if you run zip on it to compress it, it's almost 70% 227smaller than if you zip the original! The nice thing about this transform is 228that it doesn't change the structure of the graph at all, so it's running 229exactly the same operations and should have the same latency and memory usage as 230before. You can adjust the `num_steps` parameter to control how many values each 231weight buffer is rounded to, so lower numbers will increase the compression at 232the cost of accuracy. 233 234As a further step, you can store the weights into eight-bit values directly. 235Here's the recipe for that: 236 237```bash 238bazel build tensorflow/tools/graph_transforms:transform_graph 239bazel-bin/tensorflow/tools/graph_transforms/transform_graph \ 240--in_graph=tensorflow_inception_graph.pb \ 241--out_graph=optimized_inception_graph.pb \ 242--inputs='Mul' \ 243--outputs='softmax' \ 244--transforms=' 245 strip_unused_nodes(type=float, shape="1,299,299,3") 246 fold_constants(ignore_errors=true) 247 fold_batch_norms 248 fold_old_batch_norms 249 quantize_weights' 250``` 251 252You should see that the size of the output graph is about a quarter of the 253original. The downside to this approach compared to round_weights is that extra 254decompression ops are inserted to convert the eight-bit values back into 255floating point, but optimizations in TensorFlow's runtime should ensure these 256results are cached and so you shouldn't see the graph run any more slowly. 257 258So far we've been concentrating on weights because those generally take up the 259most space. If you have a graph with a lot of small nodes in it, the names of 260those nodes can start to take up a noticeable amount of space too. To shrink 261those down, you can run the [obfuscate_names](#obfuscate_names) transform, which 262replaces all the names (except for inputs and outputs) with short, cryptic but 263unique ids: 264 265```bash 266bazel build tensorflow/tools/graph_transforms:transform_graph 267bazel-bin/tensorflow/tools/graph_transforms/transform_graph \ 268--in_graph=tensorflow_inception_graph.pb \ 269--out_graph=optimized_inception_graph.pb \ 270--inputs='Mul:0' \ 271--outputs='softmax:0' \ 272--transforms=' 273 obfuscate_names' 274``` 275 276### Eight-bit Calculations 277 278For some platforms it's very helpful to be able to do as many calculations as 279possible in eight-bit, rather than floating-point. The support for this in 280TensorFlow is still experimental and evolving, but you can convert models into 281quantized form using the graph transform tool: 282 283```bash 284bazel build tensorflow/tools/graph_transforms:transform_graph 285bazel-bin/tensorflow/tools/graph_transforms/transform_graph \ 286--in_graph=tensorflow_inception_graph.pb \ 287--out_graph=optimized_inception_graph.pb \ 288--inputs='Mul' \ 289--outputs='softmax' \ 290--transforms=' 291 add_default_attributes 292 strip_unused_nodes(type=float, shape="1,299,299,3") 293 remove_nodes(op=Identity, op=CheckNumerics) 294 fold_constants(ignore_errors=true) 295 fold_batch_norms 296 fold_old_batch_norms 297 quantize_weights 298 quantize_nodes 299 strip_unused_nodes 300 sort_by_execution_order' 301``` 302 303This process converts all the operations in the graph that have eight-bit 304quantized equivalents, and leaves the rest in floating point. Only a subset of 305ops are supported, and on many platforms the quantized code may actually be 306slower than the float equivalents, but this is a way of increasing performance 307substantially when all the circumstances are right. 308 309A full guide to optimizing for quantization is beyond the scope of this guide, 310but one thing that can help is using the FakeQuantWithMinMaxVars op after Conv2D 311or similar operations during training. This trains the min/max variables that 312control the range used for quantization, so that the range doesn't have to be 313calculated dynamically by RequantizationRange during inference. 314 315## Transform Reference 316 317The --transforms string is parsed as a series of transform names, each of which 318can have multiple named arguments inside parentheses. Arguments are separated by 319commas, and double-quotes (") can be used to hold argument values if they 320themselves contain commas (for example shape definitions). 321 322The --inputs and --outputs are shared across all transforms, since it's common 323to need to know what the ingoing and outgoing nodes in the graph are. You should 324make sure you set these correctly before calling the graph transform tool, and 325if you're in doubt check with the model's author, or use the [`summarize_graph`](https://github.com/tensorflow/tensorflow/tree/master/tensorflow/tools/graph_transforms#inspecting-graphs) tool 326to examine likely inputs and outputs. 327 328All transforms can be passed the `ignore_errors` flag, with the value set to 329either true or false. By default any errors that happen within a transform will 330abort the whole process, but if you enable this then an error will just be 331logged and the transform skipped. This is especially useful for optional 332transforms where version errors or other unimportant problems may trigger an 333error. 334 335### add_default_attributes 336 337Args: None 338 339When attributes are added to ops in new versions of TensorFlow, they often have 340defaults to ensure backwards compatible behavior with their original versions. 341These defaults usually get added when the graph is loaded by the runtime, but if 342your model is going to be processed outside of the main TensorFlow framework it 343can be useful to run this update process as a transform. This process finds any 344op attributes that are defined in the current TensorFlow list of ops but not 345within the saved model, and sets them to the defined default for that attribute. 346 347### backport_concatv2 348 349Args: None 350 351If you have a GraphDef file that has been produced by a newer version of the 352TensorFlow framework and includes ConcatV2, and you want to run it on an older 353version that only supports Concat, this transform will take care of converting 354those newer ops to the equivalent older form. 355 356### flatten_atrous_conv 357 358Args: None \ 359Prerequisites: [fold_constants](#fold_constants) 360 361This transform flattens atrous convolution, corresponding to a sequence of 362SpaceToBatchND-Conv2D-BatchToSpaceND operations, converting it to a regular 363Conv2D op with upsampled filters. This transforms should only be used in order 364to run graphs having atrous convolution on platforms that do not yet natively 365support SpaceToBatchND and BatchToSpaceND operations. You will need to make 366sure you run [fold_constants](#fold_constants) after this transform. If 367applicable, you should run this transform before 368[fold_batch_norms](#fold_batch_norms). 369 370### fold_batch_norms 371 372Args: None \ 373Prerequisites: [fold_constants](#fold_constants) 374 375This transform tries to optimize away the Mul that's introduced after a Conv2D 376(or a MatMul) when batch normalization has been used during training. It scans 377the graph for any channel-wise multiplies immediately after convolutions, and 378multiplies the convolution's (or matrix multiplication's) weights with the Mul 379instead so this can be omitted at inference time. You'll need to make sure you 380run [fold_constants](#fold_constants) first, since the pattern can only be 381spotted if the normal complex expression that's produced by training for the Mul 382input is collapsed down into a simple constant. 383 384### fold_constants 385 386Args: 387 388* clear_output_shapes: Clears tensor shape information saved as attributes. 389 Some older graphs contains out-of-date information and may cause import 390 errors. Defaults to true. 391 392Prerequisites: None 393 394Looks for any sub-graphs within the model that always evaluate to constant 395expressions, and replaces them with those constants. This optimization is always 396executed at run-time after the graph is loaded, so running it offline first 397won't help latency, but it can simplify the graph and so make further processing 398easier. It's often useful to call this with `fold_constants(ignore_errors=true)` 399to continue on past transient errors, since this is just an optimization phase. 400 401### fold_old_batch_norms 402 403Args: None \ 404Prerequisites: None 405 406In the early days of TensorFlow, batch normalization was implemented using 407single monolithic ops like `BatchNormWithGlobalNormalization` or 408`FusedBatchNorm`. In modern versions, adding batch normalization from Python 409will give you a series of smaller math ops instead, to achieve the same effect 410without special-purpose code. If you have a graph that uses the older-style, 411this transform will recognize and optimize those ops for inference, in the same 412way that the [fold_batch_norms](#fold_batch_norms) transform does for the new 413approach. 414 415### freeze_requantization_ranges 416 417Args: 418 419* min_max_log_file: Path to a log file containing ranges for ops. 420* min_percentile: Percentage cutoff to use to calculate an overall min. 421 Defaults to 5. 422* max_percentile: Percentage cutoff to use to calculate an overall max. 423 Defaults to 5. 424 425Quantized operations like convolution or matrix multiplies take their inputs as 4268-bit, but produce 32-bit results. To do further operations on these, they need 427to be converted back down to the lower depth. To make the most of those eight 428bits, you need to scale the thirty-two bits of original data down using a scale 429that matches the range that's actually being used. 430 431Because that range information isn't stored in the original graph, the 432[quantization process](#eight-bit-calculations) inserts RequantizationRange ops 433before each conversion from 32 to 8 bits. This op looks at the 32-bit output and 434calculates the current min and max every time it's run. 435 436This isn't incredibly time-consuming, but it is extra work that's nice to avoid 437if possible. One way of optimizing that away is replacing those 438RequantizationRange ops with a pair of Const nodes holding known min/max values, 439so the scaling down can be done without having to inspect the output every time. 440 441That's what this transform does. It's usually used in conjunction with a copy of 442the graph that's had [insert_logging](#insert_logging) run on it to instrument 443it to record the min/max values to stderr. Why is logging used rather than 444writing to a normal file? As you'll see later, to get best results you want to 445collect data from a lot of runs on real data, and for mobile apps especially 446it's a lot easier to do this by copying log files. As an example, here's how 447you'd add the logging operations for a quantized version of the Inception v3 448graph: 449 450```bash 451bazel build tensorflow/tools/graph_transforms:transform_graph 452bazel-bin/tensorflow/tools/graph_transforms/transform_graph \ 453--in_graph=/tmp/quantized_inception.pb \ 454--out_graph=/tmp/logged_quantized_inception.pb \ 455--inputs=Mul \ 456--outputs=softmax \ 457--transforms=' 458insert_logging(op=RequantizationRange, show_name=true, message="__requant_min_max:")\ 459' 460``` 461 462Now, when you run the `/tmp/logged_quantized_inception.pb` graph, it will write 463out log statements that show the value of the min and max calculated by each 464RequantizationRange op. Here's an example of running label_image and saving the 465log: 466 467```bash 468bazel build tensorflow/examples/label_image:label_image 469bazel-bin/tensorflow/examples/label_image/label_image \ 470--image=${HOME}/Downloads/grace_hopper.jpg \ 471--input_layer=Mul \ 472--output_layer=softmax \ 473--graph=/tmp/logged_quantized_inception.pb \ 474--labels=${HOME}/Downloads/imagenet_comp_graph_label_strings.txt \ 4752>/tmp/min_max_log_small.txt 476``` 477 478If you look in `/tmp/min_max_log_small.txt`, you'll see a lot of lines like 479this: 480 481``` 482I0108 21:45:42.261883 1972 logging_ops.cc:79] ;conv/Conv2D/eightbit/requant_range__print__;__requant_min_max:[-20.887871][22.274715] 483``` 484 485This is a simple way of serializing the name of the RequantizationRange op and 486its min/max values every time it's run. It's a file like this that you pass into 487the transform as the `min_max_log_file` argument. The transform will attempt to 488extract all of the min/max values associated with ops, ignoring any irrelevant 489lines in the log, and replace the RequantizationRange ops with two Const nodes 490containing the found values. 491 492This isn't the whole story though. The min/max values can vary a lot depending 493on what the particular inputs to the graph are on any given run, which means 494picking ranges based on just one run can lead to clipping of values and a loss 495of accuracy. To get better results, you need to run your network against a range 496of different inputs. In Inception's case, I often use a thousand different 497images from the training set. You can then pass the whole concatenated log from 498all of the runs into the transform, and it will pick ranges based on the 499aggregate of the values found for each RequantizationRange op. 500 501To ensure that outliers don't increase the range too much, and so decrease the 502accuracy by putting too many bits into rare extreme values, the `min_percentile` 503and `max_percentile` arguments control how the overall min and max are chosen. 504At their default values of 5, this means that the lowest 5% of the minimum 505values will be discarded, taking the minimum of the remainder, and the 506equivalent for the maximum. 507 508### fuse_convolutions 509 510Args: None \ 511Prerequisites: None 512 513For graphs that use ResizeBilinear or MirrorPad ops before convolutions (e.g. to 514scale up in the later stages of an image style transfer model), it can improve 515memory usage and latency to combine the spatial transformations with the 516convolution's im2col patch generation. This transform looks out for that 517particular pattern of ops and replaces them with a fused version that combines 518the resizing and padding with the convolution. 519 520### insert_logging 521 522Args: 523 524* op: Insert a Print after every occurrence of this op type. Can be repeated 525 to cover multiple types. If not present, all op types will be instrumented. 526* prefix: Insert a Print after every node whose name starts with this value. 527 Can be repeated to cover multiple nodes. If not present, all node names will 528 be matched. 529* show_op: If true, the op type will be prepended to all log messages. 530* show_name: If true, the node's name will be prepended to all log messages. 531* message: Arbitrary text to log before the values. 532* first_n: How many times to print before suppressing. Defaults to -1, which 533 means never stop. 534* summarize: How long numerical results can be before they're truncated. 535 Defaults to 1024. 536 537The Print operator writes strings to stderr when it's run inside a graph, and 538prints out the numerical results of the node that it's reading from. This can be 539very useful when you're debugging and want to follow particular internal values 540while a graph is running. This transform allows you to insert those ops at 541particular points in the graph, and customize the message that's displayed. It's 542also used in conjunction with the 543[freeze_requantization_ranges](#freeze_requantization_ranges) transform to 544output information that it needs. 545 546### merge_duplicate_nodes 547 548Args: None \ 549Prerequisites: None 550 551If there are Const nodes with the same types and contents, or nodes with the 552same inputs and attributes, this transform will merge them together. It can be 553useful when you want to cut down the number of nodes in a graph that has a lot 554of redundancy (e.g. this transform is always run as part of 555[quantize_nodes](#quantize_nodes) since the processing there can introduce 556duplicates of constants that are used in the quantize/dequantize process). 557 558### obfuscate_names 559 560Args: None \ 561Prerequisites: None 562 563Replaces all nodes' names with short generated ids, other than the inputs and 564outputs. This also updates all references within the graph so that the structure 565is preserved. This can be useful if you want to shrink the file size, or if you 566want to make it harder to understand the architecture of your model before 567releasing it. 568 569### quantize_nodes 570 571Args: 572 573* input_min: The lowest float value for any quantized placeholder inputs. 574* input_max: The highest float value for any quantized placeholder inputs. If 575 both input_min and input_max are set, then any float placeholders in the 576 graph will be replaced with quantized versions, and consts will be created 577 to pass the range to subsequent operations. 578* fallback_min: The lowest float value to use for requantizing activation 579 layers. 580* fallback_max: The highest float value to use for requantizing activation 581 layers. If both fallback_min and fallback_max are set, then instead of using 582 RequantizationRange ops to figure out the useful range dynamically when 583 converting the 32-bit output of ops like QuantizedConv2D and 584 QuantizedBiasAdd, hardwired consts with these values will be used instead. 585 This can help performance, if you know the range of your activation layers 586 ahead of time. 587 588Prerequisites: [quantize_weights](#quantize_weights) 589 590Replaces any calculation nodes with their eight-bit equivalents (if available), 591and adds in conversion layers to allow remaining float operations to 592interoperate. This is one of the most complex transforms, and involves multiple 593passes and a lot of rewriting. It's also still an active area of research, so 594results may vary depending on the platform and operations you're using in your 595model. You should run quantize_weights first to ensure your Const ops are in 596eight-bit form. 597 598### quantize_weights 599 600Args: 601 602* minimum_size: Tensors with fewer elements than this won't be quantized 603(defaults to 1024) 604 605Prerequisites: None 606 607Converts any large (more than minimum_size) float Const op into an eight-bit 608equivalent, followed by a float conversion op so that the result is usable by 609subsequent nodes. This is mostly useful for [shrinking file 610sizes](#shrinking-file-size), but also helps with the more advanced 611[quantize_nodes](#quantize_nodes) transform. Even though there are no 612prerequisites, it is advisable to run [fold_batch_norms](#fold_batch_norms) or 613[fold_old_batch_norms](#fold_old_batch_norms), because rounding variances down 614to zero may cause significant loss of precision. 615 616### remove_attribute 617 618Args: 619 620* attribute_name: Name of the attribute you want to remove. 621* op_name: Optional name of a single op to restrict the removal to. 622 623Prerequisites: None 624 625Deletes the given attribute from either all nodes, or just the one specified in 626`op_name`. This can be a dangerous transform since it's easy to leave your graph 627in an invalid state if you remove a required attribute. It can be useful in 628special circumstances though. 629 630### remove_device 631 632Args: None \ 633Prerequisites: None 634 635All ops can have a hardware device specified. This can be a problem when you're 636loading a graph on a different system than the model was trained on, since some 637specified devices may not be available. In order to work with graphs like these, 638you can run this transform to wipe the slate clean and delete the device 639specifier from all ops. 640 641### remove_control_dependencies 642 643Args: None \ 644Prerequisites: None 645 646Removes all control dependencies from the graph. 647 648### remove_nodes 649 650Args: 651 652* op: The name of the op you want to remove. Can be repeated to remove 653 multiple ops. 654 655Prerequisites: None 656 657This is a potentially dangerous transform that looks for single-input, 658single-output ops with the given names, removes them from the graph, and rewires 659all inputs that use to pull from them to pull from the preceding node instead. 660This is most useful for getting rid of ops like `CheckNumerics` that are useful 661during training but just complicate the graph and increase latency during 662inference. It's dangerous because it's possible that removing some ops may 663change the output of your graph, so make sure you check the overall accuracy 664after using this. 665 666### rename_attribute 667 668Args: 669 670* old_attribute_name: Current name of the attribute you want to rename. 671* new_attribute_name: Name that you want the attribute to have now. 672* op_name: If this is set, only change attributes for a given op type, 673 otherwise apply to all nodes with attribute names that match. 674 675Prerequisites: None 676 677Changes the name of the given attribute. This is often useful for upgrading 678graph files as op definitions change over versions, since the renaming is often 679enough to deal with minor changes. 680 681### rename_op 682 683Args: 684 685* old_op_name: Current name of the operation. 686* new_op_name: Name to change to. 687 688Prerequisites: None 689 690Finds all ops with the given name, and changes them to the new one. This can be 691useful for version upgrading if the changes between ops are minor apart from the 692name. 693 694### round_weights 695 696Args: 697 698* num_steps: How many unique values to use in each buffer. 699 700Prerequisites: None 701 702Rounds all float values in large Const ops (more than 15 elements) to the given 703number of steps. The unique values are chosen per buffer by linearly allocating 704between the largest and smallest values present. This is useful when you'll be 705deploying on mobile, and you want a model that will compress effectively. See 706[shrinking file size](#shrinking-file-size) for more details. Even though there 707are no prerequisites, it is advisable to run 708[fold_batch_norms](#fold_batch_norms) or 709[fold_old_batch_norms](#fold_old_batch_norms), because rounding variances down 710to zero may cause significant loss of precision. 711 712### sparsify_gather 713 714Args: None \ 715Prerequisites: None 716 717Transform 'Gather' op to a sparsified version where 'params' input of 'Gather' 718is replaced from a dense 'Const' to a 'HashTable'. 'Gather' op itself is 719replaced by a hashtable lookup. This is mostly useful for reducing sparse 720TF.learn linear model memory footprint. 721 722### set_device 723 724Args: 725 726* device: What device to assign to ops. 727* if_default: If this is true, only assign to ops with empty existing devices. 728 729Updates nodes to use the specified device. A device is a way to tell the code 730that executes the graph which piece of hardware it should run particular nodes 731on. The right assignment to use may change between training and deployment, so 732this transform (and [remove_device](#remove_device)) provide a way of updating 733the placement. If the `is_default` parameter is set, then only ops that don't 734have a device assigned already will be updated. This is mostly useful for 735preprocessing of graphs for other stages that expect all ops to have an explicit 736device assigned. 737 738### sort_by_execution_order 739 740Args: None \ 741Prerequisites: None 742 743Arranges the nodes in the GraphDef in topological order, so that the inputs of 744any given node are always earlier than the node itself. This is especially 745useful when you're targeting a minimal inference engine, since you can just 746execute the nodes in the given order knowing that the inputs will be computed 747before they're needed. 748 749### strip_unused_nodes 750 751Args: 752 753* type: Default type for any new Placeholder nodes generated, for example 754 int32, float, quint8. 755* shape: Default shape for any new Placeholder nodes generated, as 756 comma-separated dimensions. For example shape="1,299,299,3". The double 757 quotes are important, since otherwise the commas will be taken as argument 758 separators. 759* name: Identifier for the placeholder arguments. 760* type_for_name: What type to use for the previously-given name. 761* shape_for_name: What shape to use for the previously-given name. 762 763Prerequisites: None 764 765Removes all nodes not used in calculated the layers given in `--outputs`, fed by 766`--inputs`. This is often useful for removing training-only nodes like 767save-and-restore or summary ops. It's also handy for solving the [missing kernel 768errors problem](#fixing-missing-kernel-errors-on-mobile) when there are decode 769or other ops you don't need in the inference path. 770 771The biggest complication is that it sometimes has to create new Placeholder ops, 772so there are options to control their characteristics. This will happen if you 773bypass a DecodeJpeg op by specifying an input layer deeper in the network, for 774example, so you can pass in a raw image array instead of an encoded string as an 775input. The decode op will be removed, together with the Placeholder that fed it, 776but a new Placeholder is needed for the input layer you specify. The type and 777shape arguments let you control the attributes of any new Placeholders that are 778created. Plain `type` and `shape` set global defaults, but if you have different 779inputs with varying characteristics, you'll need to pass in a list of arguments 780where the preceding name specifies what layer each applies to. For example, if 781you had two inputs in1 and in2, you could call `strip_unused_nodes(name=in1, 782type_for_name=int32, shape_for_name="2,3", name=in2, type_for_name=float, 783shape_for_name="1,10,10,3")`. 784 785## Writing Your Own Transforms 786 787The Graph Transform Tool is designed to make it as easy as possible to create 788your own optimization, modification, and pre-processing transforms. At their 789heart, all of the transforms take in a valid GraphDef, make some changes, and 790output a new GraphDef. Each GraphDef is just a list of NodeDefs, each defining 791one node in the graph and its connections. You can find more information on the 792format at [this guide to TensorFlow model 793files](https://www.tensorflow.org/versions/master/extend/tool_developers/index.html), 794but for a simple example take a look at 795[tensorflow/tools/graph_transforms/rename_op.cc](https://github.com/tensorflow/tensorflow/tree/master/tensorflow/tools/graph_transforms/rename_op.cc), 796which implements the [rename_op](#rename_op) transform: 797 798```C++ 799Status RenameOp(const GraphDef& input_graph_def, 800 const TransformFuncContext& context, 801 GraphDef* output_graph_def) { 802 if (!context.params.count("old_op_name") || 803 (context.params.at("old_op_name").size() != 1) || 804 !context.params.count("new_op_name") || 805 (context.params.at("new_op_name").size() != 1)) { 806 return errors::InvalidArgument( 807 "rename_op expects exactly one 'old_op_name' and 'new_op_name' " 808 "argument, e.g. rename_op(old_op_name=Mul, new_op_name=Multiply)"); 809 } 810 811 const string old_op_name = context.params.at("old_op_name")[0]; 812 const string new_op_name = context.params.at("new_op_name")[0]; 813 output_graph_def->Clear(); 814 for (const NodeDef& node : input_graph_def.node()) { 815 NodeDef* new_node = output_graph_def->mutable_node()->Add(); 816 new_node->CopyFrom(node); 817 if (node.op() == old_op_name) { 818 new_node->set_op(new_op_name); 819 } 820 } 821 822 return Status::OK(); 823} 824 825REGISTER_GRAPH_TRANSFORM("rename_op", RenameOp); 826``` 827 828The heart of this transform is the loop through the input_graph_def's nodes. We 829go through each op, add a new one to the output, copy the original's contents, 830and then change the op over if it matches the parameters. There's a standard set 831of parameters for every transform, so they all take in a GraphDef and context, 832and write out into a new GraphDef. The registration macro at the bottom lets the 833tool know what function to call when it finds the `rename_op` string in a 834transforms list. 835 836### Transform Functions 837 838The standard signature that all transform functions have is defined as 839`TransformFunc`, which takes in an input GraphDef, a `TransformFuncContext` 840containing environment information, writes to an output GraphDef, and returns a 841Status indicating whether the transform succeeded. 842 843The `TransformFuncContext` has a list of the inputs and outputs for the graph, 844and the [parameter arguments](#parameters) that were passed into the transform 845by the user. 846 847If you write a function that matches this signature, and [register 848it](#registration), the graph transform tool will take care of calling it. 849 850### Pattern Syntax 851 852The `rename_op` example only needs to look at a single node at a time, but one 853of the most common needs is to modify small sub-graphs within a model. To make 854this easy, the Graph Transform Tool provides the `OpTypePattern` syntax. This is 855a simple and compact way to specify patterns of nodes that you want to look for. 856The format is: 857 858``` 859OP_TYPE_PATTERN ::= "{" OP "," INPUTS "}" 860INPUTS ::= OP_TYPE_PATTERN 861``` 862 863The `OP` field can either contain a single "*", which means match any op type, 864one op type (for example "Const"), or a set of op types separated by `|` symbols 865(for example "Conv2D|MatMul|BiasAdd"). General regex patterns are not supported, 866just these special cases. 867 868You can think of these patterns as very limited regular expressions designed to 869pick out sub-trees in graphs. They are deliberately very constrained to the kind 870of things we commonly find ourselves needing to do, to make creating and 871debugging as straightforward as possible. 872 873For example, if you want all Conv2D nodes that have a constant as their second 874input, you would set up a pattern like this, using C++ initializer lists to 875populate the structure: 876 877```C++ 878OpTypePattern conv_pattern({"Conv2D", {{"*"}, {"Const"}}}); 879``` 880 881It can be easier to visualize these initializers using indentation to show the 882tree structure more clearly: 883 884```C++ 885OpTypePattern conv_pattern({ 886 "Conv2D", 887 { 888 {"*"}, 889 {"Const"} 890 } 891}); 892``` 893 894In plain English this is saying, a Conv2D op with two inputs, the first of which 895is any op type, and the second is a Const op. 896 897Here's a much more complex example, from the [quantize_nodes](#quantize_nodes) 898transform: 899 900```C++ 901{"QuantizeV2", 902 { 903 {"Dequantize"}, 904 {"Min", 905 { 906 {"Reshape", 907 { 908 {"Dequantize"}, 909 {"Const"}, 910 } 911 }, 912 {"Const"}, 913 } 914 }, 915 {"Max", 916 { 917 {"Reshape", 918 { 919 {"Dequantize"}, 920 {"Const"}, 921 } 922 }, 923 {"Const"}, 924 } 925 }, 926 } 927} 928``` 929 930This is looking for QuantizeV2 nodes, with three inputs, the first of which is a 931Dequantize, the second is a Min that ultimately pulls from a Dequantize, and the 932third is a Max which does the same. Assuming we know the Dequantize ops are 933pulling from the same eight-bit buffer, the end result of this sub-graph is a 934no-op, since it's just turning the eight-bit buffer into float, and then 935immediately converting it back to eight-bits, so if we look for this pattern and 936remove it we can optimize the graph without changing the result. 937 938### ReplaceMatchingOpTypes 939 940It's very common to want to find all occurrences of a particular sub-graph in a 941model, and replace them all with a different sub-graph that keeps the same local 942input and output connections. For example with 943[fuse_convolutions](#fuse_convolutions), we needed to find all Conv2D ops that 944read their inputs from BilinearResizes, and replace those combinations with a 945single FusedResizeAndPadConv2D op, but without affecting other ops. 946 947To make that sort of transformation easy, we created the 948`ReplaceMatchingOpTypes` helper. This takes in a graph, an `OpTypePattern` 949defining the sub-graph to look for, and a callback function to run for every 950occurrence it finds. The job of this callback function is to look at the 951`NodeMatch` that contains information about the current sub-graph, and return a 952new sub-graph in the new_nodes list that will be used to replace the old 953sub-graph. 954 955You can see how it's used in practice in the 956[fuse_convolutions](#fuse_convolutions) code: 957 958```C++ 959TF_RETURN_IF_ERROR(ReplaceMatchingOpTypes( 960 input_graph_def, // clang-format off 961 {"Conv2D", 962 { 963 {"ResizeBilinear"}, 964 {"*"} 965 } 966 }, // clang-format on 967 [](const NodeMatch& match, const std::set<string>& input_nodes, 968 const std::set<string>& output_nodes, 969 std::vector<NodeDef>* new_nodes) { 970 // Find all the nodes we expect in the subgraph. 971 const NodeDef& conv_node = match.node; 972 const NodeDef& resize_node = match.inputs[0].node; 973 const NodeDef& weights_node = match.inputs[1].node; 974 975 // We'll be reusing the old weights. 976 new_nodes->push_back(weights_node); 977 978 // Create a 'no-op' mirror padding node that has no effect. 979 NodeDef pad_dims_node; 980 pad_dims_node.set_op("Const"); 981 pad_dims_node.set_name(conv_node.name() + "_dummy_paddings"); 982 SetNodeAttr("dtype", DT_INT32, &pad_dims_node); 983 SetNodeTensorAttr<int32>("value", {4, 2}, {0, 0, 0, 0, 0, 0, 0, 0}, 984 &pad_dims_node); 985 new_nodes->push_back(pad_dims_node); 986 987 // Set up the new fused version of the convolution op. 988 NodeDef fused_conv; 989 fused_conv.set_op("FusedResizeAndPadConv2D"); 990 fused_conv.set_name(match.node.name()); 991 AddNodeInput(resize_node.input(0), &fused_conv); 992 AddNodeInput(resize_node.input(1), &fused_conv); 993 AddNodeInput(pad_dims_node.name(), &fused_conv); 994 AddNodeInput(conv_node.input(1), &fused_conv); 995 CopyNodeAttr(resize_node, "align_corners", "resize_align_corners", 996 &fused_conv); 997 SetNodeAttr("mode", "REFLECT", &fused_conv); 998 CopyNodeAttr(conv_node, "T", "T", &fused_conv); 999 CopyNodeAttr(conv_node, "padding", "padding", &fused_conv); 1000 CopyNodeAttr(conv_node, "strides", "strides", &fused_conv); 1001 new_nodes->push_back(fused_conv); 1002 1003 return Status::OK(); 1004 }, 1005 {}, &replaced_graph_def)); 1006``` 1007 1008Here you can see we define the pattern to look for, and in the callback function 1009use information from each of the nodes in the old sub-graph to create a new 1010fused node. We also copy over the old weights input node so that isn't lost. 1011 1012There are a few things to know about the `ReplaceMatchingOpTypes` function: 1013 1014* All of the nodes in any matching sub-graphs are removed from the new graph 1015 created by the function. If any of them are needed, it's the callback 1016 function's responsibility to add them back in. There's a `CopyOriginalMatch` 1017 convenience call that will copy over all of the original nodes if you decide 1018 you don't actually want to modify a particular sub-graph. 1019 1020* It is assumed that the same nodes will never appear in more than one matched 1021 sub-graph. This is to ensure that sub-trees are only replaced once, but it 1022 may mean that some sub-graphs aren't spotted if they overlap with earlier 1023 matches. 1024 1025* The calling framework tries to ensure that the graph remains sane, by 1026 looking at the new_nodes that are returned and making sure that no nodes 1027 which are needed as inputs by nodes outside the sub-graph are removed. These 1028 important nodes are listed in the `output_nodes` argument that's passed into 1029 each replacement function call. You can disable this checking by setting 1030 `allow_inconsistencies` to true in the options, but otherwise any 1031 replacements that break the graph constraints will be canceled. If you do 1032 allow inconsistencies, it's your transform's responsibility to fix them up 1033 before you return your final result. Functions like `RenameNodeInputs` can 1034 be useful if you are doing wholesale node renaming for example. 1035 1036### Parameters 1037 1038The arguments that are in parentheses after the transform name when the tool is 1039called are parsed and placed into the params member of the TransformFuncContext 1040that's given to each transform. For every named argument, there's a vector of 1041strings containing all the values that it was given, in the order they were 1042given. These are treated a bit like command-line parameters, and it's the 1043transform's responsibility to parse them into the data types it needs, and raise 1044errors by returning a bad Status if any of them are ill-formed. 1045 1046As an example, here's a hypothetical transform call: 1047 1048``` 1049some_transform(foo=a, foo=b, bar=2, bob="1,2,3") 1050``` 1051 1052Here's what the std::map of strings looks like in the params member: 1053 1054``` 1055{{"foo", {"a", "b"}}, {"bar", {"2"}}, {"bob", {"1,2,3"}}} 1056``` 1057 1058The double quotes around the comma-separated argument to `bob` are important 1059because otherwise they'll be treated as separate arguments, and the parsing will 1060fail. 1061 1062Here's an example of how [round_weights](#round_weights) reads its `num_steps` 1063parameter: 1064 1065```C++ 1066TF_RETURN_IF_ERROR(context.GetOneInt32Parameter("num_steps", 256, &num_steps)); 1067``` 1068 1069If the conversion fails or the parameter occurs more than once the helper 1070function will raise a meaningful error through the status result of the 1071transform. If the parameter isn't specified at all then the default will be 1072used. 1073 1074### Function Libraries 1075 1076A newer feature of TensorFlow is the ability to create libraries of functions as 1077part of graphs. These are a bit like templates, which define macro operations in 1078terms of smaller components, which can then be instantiated with different input 1079and output connections inside the graph just like regular ops. Right now the 1080graph transform tool just copies these libraries between the input and output 1081graphs, but it's likely that more complex operations will be supported on them 1082in the future. 1083 1084### Registering 1085 1086The Graph Transform Tool associates names of transforms with the code to 1087implement them using the `REGISTER_GRAPH_TRANSFORM()` macro. This takes a string 1088and a function, and automatically registers the transform with the tool. You 1089will need to watch out for a few things though: 1090 1091* Because it's using global C++ objects in each file under the hood, the 1092 linker can sometimes strip them out and lose the registration. In Bazel you 1093 need to make sure you're linking any new transforms in as libraries, and use 1094 the `alwayslink` flag in your `cc_binary` call. 1095 1096* You should be able to create your own copy of the transform_graph tool by 1097 linking against the transform_graph_main_lib library in 1098 tensorflow/tools/graph_transforms/BUILD. This contains all the `main()` 1099 logic to parse command line arguments and call transforms. 1100