1<!--- 2// Copyright 2018 The Go Authors. All rights reserved. 3// Use of this source code is governed by a BSD-style 4// license that can be found in the LICENSE file. 5--> 6 7## Introduction to the Go compiler 8 9`cmd/compile` contains the main packages that form the Go compiler. The compiler 10may be logically split in four phases, which we will briefly describe alongside 11the list of packages that contain their code. 12 13You may sometimes hear the terms "front-end" and "back-end" when referring to 14the compiler. Roughly speaking, these translate to the first two and last two 15phases we are going to list here. A third term, "middle-end", often refers to 16much of the work that happens in the second phase. 17 18Note that the `go/*` family of packages, such as `go/parser` and 19`go/types`, are mostly unused by the compiler. Since the compiler was 20initially written in C, the `go/*` packages were developed to enable 21writing tools working with Go code, such as `gofmt` and `vet`. 22However, over time the compiler's internal APIs have slowly evolved to 23be more familiar to users of the `go/*` packages. 24 25It should be clarified that the name "gc" stands for "Go compiler", and has 26little to do with uppercase "GC", which stands for garbage collection. 27 28### 1. Parsing 29 30* `cmd/compile/internal/syntax` (lexer, parser, syntax tree) 31 32In the first phase of compilation, source code is tokenized (lexical analysis), 33parsed (syntax analysis), and a syntax tree is constructed for each source 34file. 35 36Each syntax tree is an exact representation of the respective source file, with 37nodes corresponding to the various elements of the source such as expressions, 38declarations, and statements. The syntax tree also includes position information 39which is used for error reporting and the creation of debugging information. 40 41### 2. Type checking 42 43* `cmd/compile/internal/types2` (type checking) 44 45The types2 package is a port of `go/types` to use the syntax package's 46AST instead of `go/ast`. 47 48### 3. IR construction ("noding") 49 50* `cmd/compile/internal/types` (compiler types) 51* `cmd/compile/internal/ir` (compiler AST) 52* `cmd/compile/internal/noder` (create compiler AST) 53 54The compiler middle end uses its own AST definition and representation of Go 55types carried over from when it was written in C. All of its code is written in 56terms of these, so the next step after type checking is to convert the syntax 57and types2 representations to ir and types. This process is referred to as 58"noding." 59 60Noding using a process called Unified IR, which builds a node representation 61using a serialized version of the typechecked code from step 2. 62Unified IR is also involved in import/export of packages and inlining. 63 64### 4. Middle end 65 66* `cmd/compile/internal/inline` (function call inlining) 67* `cmd/compile/internal/devirtualize` (devirtualization of known interface method calls) 68* `cmd/compile/internal/escape` (escape analysis) 69 70Several optimization passes are performed on the IR representation: 71dead code elimination, (early) devirtualization, function call 72inlining, and escape analysis. 73 74The early dead code elimination pass is integrated into the unified IR writer phase. 75 76### 5. Walk 77 78* `cmd/compile/internal/walk` (order of evaluation, desugaring) 79 80The final pass over the IR representation is "walk," which serves two purposes: 81 821. It decomposes complex statements into individual, simpler statements, 83 introducing temporary variables and respecting order of evaluation. This step 84 is also referred to as "order." 85 862. It desugars higher-level Go constructs into more primitive ones. For example, 87 `switch` statements are turned into binary search or jump tables, and 88 operations on maps and channels are replaced with runtime calls. 89 90### 6. Generic SSA 91 92* `cmd/compile/internal/ssa` (SSA passes and rules) 93* `cmd/compile/internal/ssagen` (converting IR to SSA) 94 95In this phase, IR is converted into Static Single Assignment (SSA) form, a 96lower-level intermediate representation with specific properties that make it 97easier to implement optimizations and to eventually generate machine code from 98it. 99 100During this conversion, function intrinsics are applied. These are special 101functions that the compiler has been taught to replace with heavily optimized 102code on a case-by-case basis. 103 104Certain nodes are also lowered into simpler components during the AST to SSA 105conversion, so that the rest of the compiler can work with them. For instance, 106the copy builtin is replaced by memory moves, and range loops are rewritten into 107for loops. Some of these currently happen before the conversion to SSA due to 108historical reasons, but the long-term plan is to move all of them here. 109 110Then, a series of machine-independent passes and rules are applied. These do not 111concern any single computer architecture, and thus run on all `GOARCH` variants. 112These passes include dead code elimination, removal of 113unneeded nil checks, and removal of unused branches. The generic rewrite rules 114mainly concern expressions, such as replacing some expressions with constant 115values, and optimizing multiplications and float operations. 116 117### 7. Generating machine code 118 119* `cmd/compile/internal/ssa` (SSA lowering and arch-specific passes) 120* `cmd/internal/obj` (machine code generation) 121 122The machine-dependent phase of the compiler begins with the "lower" pass, which 123rewrites generic values into their machine-specific variants. For example, on 124amd64 memory operands are possible, so many load-store operations may be combined. 125 126Note that the lower pass runs all machine-specific rewrite rules, and thus it 127currently applies lots of optimizations too. 128 129Once the SSA has been "lowered" and is more specific to the target architecture, 130the final code optimization passes are run. This includes yet another dead code 131elimination pass, moving values closer to their uses, the removal of local 132variables that are never read from, and register allocation. 133 134Other important pieces of work done as part of this step include stack frame 135layout, which assigns stack offsets to local variables, and pointer liveness 136analysis, which computes which on-stack pointers are live at each GC safe point. 137 138At the end of the SSA generation phase, Go functions have been transformed into 139a series of obj.Prog instructions. These are passed to the assembler 140(`cmd/internal/obj`), which turns them into machine code and writes out the 141final object file. The object file will also contain reflect data, export data, 142and debugging information. 143 144### 7a. Export 145 146In addition to writing a file of object code for the linker, the 147compiler also writes a file of "export data" for downstream 148compilation units. The export data file holds all the information 149computed during compilation of package P that may be needed when 150compiling a package Q that directly imports P. It includes type 151information for all exported declarations, IR for bodies of functions 152that are candidates for inlining, IR for bodies of generic functions 153that may be instantiated in another package, and a summary of the 154findings of escape analysis on function parameters. 155 156The format of the export data file has gone through a number of 157iterations. Its current form is called "unified", and it is a 158serialized representation of an object graph, with an index allowing 159lazy decoding of parts of the whole (since most imports are used to 160provide only a handful of symbols). 161 162The GOROOT repository contains a reader and a writer for the unified 163format; it encodes from/decodes to the compiler's IR. 164The golang.org/x/tools repository also provides a public API for an export 165data reader (using the go/types representation) that always supports the 166compiler's current file format and a small number of historic versions. 167(It is used by x/tools/go/packages in modes that require type information 168but not type-annotated syntax.) 169 170The x/tools repository also provides public APIs for reading and 171writing exported type information (but nothing more) using the older 172"indexed" format. (For example, gopls uses this version for its 173database of workspace information, which includes types.) 174 175Export data usually provides a "deep" summary, so that compilation of 176package Q can read the export data files only for each direct import, 177and be assured that these provide all necessary information about 178declarations in indirect imports, such as the methods and struct 179fields of types referred to in P's public API. Deep export data is 180simpler for build systems, since only one file is needed per direct 181dependency. However, it does have a tendency to grow as one gets 182higher up the import graph of a big repository: if there is a set of 183very commonly used types with a large API, nearly every package's 184export data will include a copy. This problem motivated the "indexed" 185design, which allowed partial loading on demand. 186(gopls does less work than the compiler for each import and is thus 187more sensitive to export data overheads. For this reason, it uses 188"shallow" export data, in which indirect information is not recorded 189at all. This demands random access to the export data files of all 190dependencies, so is not suitable for distributed build systems.) 191 192 193### 8. Tips 194 195#### Getting Started 196 197* If you have never contributed to the compiler before, a simple way to begin 198 can be adding a log statement or `panic("here")` to get some 199 initial insight into whatever you are investigating. 200 201* The compiler itself provides logging, debugging and visualization capabilities, 202 such as: 203 ``` 204 $ go build -gcflags=-m=2 # print optimization info, including inlining, escape analysis 205 $ go build -gcflags=-d=ssa/check_bce/debug # print bounds check info 206 $ go build -gcflags=-W # print internal parse tree after type checking 207 $ GOSSAFUNC=Foo go build # generate ssa.html file for func Foo 208 $ go build -gcflags=-S # print assembly 209 $ go tool compile -bench=out.txt x.go # print timing of compiler phases 210 ``` 211 212 Some flags alter the compiler behavior, such as: 213 ``` 214 $ go tool compile -h file.go # panic on first compile error encountered 215 $ go build -gcflags=-d=checkptr=2 # enable additional unsafe pointer checking 216 ``` 217 218 There are many additional flags. Some descriptions are available via: 219 ``` 220 $ go tool compile -h # compiler flags, e.g., go build -gcflags='-m=1 -l' 221 $ go tool compile -d help # debug flags, e.g., go build -gcflags=-d=checkptr=2 222 $ go tool compile -d ssa/help # ssa flags, e.g., go build -gcflags=-d=ssa/prove/debug=2 223 ``` 224 225 There are some additional details about `-gcflags` and the differences between `go build` 226 vs. `go tool compile` in a [section below](#-gcflags-and-go-build-vs-go-tool-compile). 227 228* In general, when investigating a problem in the compiler you usually want to 229 start with the simplest possible reproduction and understand exactly what is 230 happening with it. 231 232#### Testing your changes 233 234* Be sure to read the [Quickly testing your changes](https://go.dev/doc/contribute#quick_test) 235 section of the Go Contribution Guide. 236 237* Some tests live within the cmd/compile packages and can be run by `go test ./...` or similar, 238 but many cmd/compile tests are in the top-level 239 [test](https://github.com/golang/go/tree/master/test) directory: 240 241 ``` 242 $ go test cmd/internal/testdir # all tests in 'test' dir 243 $ go test cmd/internal/testdir -run='Test/escape.*.go' # test specific files in 'test' dir 244 ``` 245 For details, see the [testdir README](https://github.com/golang/go/tree/master/test#readme). 246 The `errorCheck` method in [testdir_test.go](https://github.com/golang/go/blob/master/src/cmd/internal/testdir/testdir_test.go) 247 is helpful for a description of the `ERROR` comments used in many of those tests. 248 249 In addition, the `go/types` package from the standard library and `cmd/compile/internal/types2` 250 have shared tests in `src/internal/types/testdata`, and both type checkers 251 should be checked if anything changes there. 252 253* The new [application-based coverage profiling](https://go.dev/testing/coverage/) can be used 254 with the compiler, such as: 255 256 ``` 257 $ go install -cover -coverpkg=cmd/compile/... cmd/compile # build compiler with coverage instrumentation 258 $ mkdir /tmp/coverdir # pick location for coverage data 259 $ GOCOVERDIR=/tmp/coverdir go test [...] # use compiler, saving coverage data 260 $ go tool covdata textfmt -i=/tmp/coverdir -o coverage.out # convert to traditional coverage format 261 $ go tool cover -html coverage.out # view coverage via traditional tools 262 ``` 263 264#### Juggling compiler versions 265 266* Many of the compiler tests use the version of the `go` command found in your PATH and 267 its corresponding `compile` binary. 268 269* If you are in a branch and your PATH includes `<go-repo>/bin`, 270 doing `go install cmd/compile` will build the compiler using the code from your 271 branch and install it to the proper location so that subsequent `go` commands 272 like `go build` or `go test ./...` will exercise your freshly built compiler. 273 274* [toolstash](https://pkg.go.dev/golang.org/x/tools/cmd/toolstash) provides a way 275 to save, run, and restore a known good copy of the Go toolchain. For example, it can be 276 a good practice to initially build your branch, save that version of 277 the toolchain, then restore the known good version of the tools to compile 278 your work-in-progress version of the compiler. 279 280 Sample set up steps: 281 ``` 282 $ go install golang.org/x/tools/cmd/toolstash@latest 283 $ git clone https://go.googlesource.com/go 284 $ cd go 285 $ git checkout -b mybranch 286 $ ./src/all.bash # build and confirm good starting point 287 $ export PATH=$PWD/bin:$PATH 288 $ toolstash save # save current tools 289 ``` 290 After that, your edit/compile/test cycle can be similar to: 291 ``` 292 <... make edits to cmd/compile source ...> 293 $ toolstash restore && go install cmd/compile # restore known good tools to build compiler 294 <... 'go build', 'go test', etc. ...> # use freshly built compiler 295 ``` 296 297* toolstash also allows comparing the installed vs. stashed copy of 298 the compiler, such as if you expect equivalent behavior after a refactor. 299 For example, to check that your changed compiler produces identical object files to 300 the stashed compiler while building the standard library: 301 ``` 302 $ toolstash restore && go install cmd/compile # build latest compiler 303 $ go build -toolexec "toolstash -cmp" -a -v std # compare latest vs. saved compiler 304 ``` 305 306* If versions appear to get out of sync (for example, with errors like 307 `linked object header mismatch` with version strings like 308 `devel go1.21-db3f952b1f`), you might need to do 309 `toolstash restore && go install cmd/...` to update all the tools under cmd. 310 311#### Additional helpful tools 312 313* [compilebench](https://pkg.go.dev/golang.org/x/tools/cmd/compilebench) benchmarks 314 the speed of the compiler. 315 316* [benchstat](https://pkg.go.dev/golang.org/x/perf/cmd/benchstat) is the standard tool 317 for reporting performance changes resulting from compiler modifications, 318 including whether any improvements are statistically significant: 319 ``` 320 $ go test -bench=SomeBenchmarks -count=20 > new.txt # use new compiler 321 $ toolstash restore # restore old compiler 322 $ go test -bench=SomeBenchmarks -count=20 > old.txt # use old compiler 323 $ benchstat old.txt new.txt # compare old vs. new 324 ``` 325 326* [bent](https://pkg.go.dev/golang.org/x/benchmarks/cmd/bent) facilitates running a 327 large set of benchmarks from various community Go projects inside a Docker container. 328 329* [perflock](https://github.com/aclements/perflock) helps obtain more consistent 330 benchmark results, including by manipulating CPU frequency scaling settings on Linux. 331 332* [view-annotated-file](https://github.com/loov/view-annotated-file) (from the community) 333 overlays inlining, bounds check, and escape info back onto the source code. 334 335* [godbolt.org](https://go.godbolt.org) is widely used to examine 336 and share assembly output from many compilers, including the Go compiler. It can also 337 [compare](https://go.godbolt.org/z/5Gs1G4bKG) assembly for different versions of 338 a function or across Go compiler versions, which can be helpful for investigations and 339 bug reports. 340 341#### -gcflags and 'go build' vs. 'go tool compile' 342 343* `-gcflags` is a go command [build flag](https://pkg.go.dev/cmd/go#hdr-Compile_packages_and_dependencies). 344 `go build -gcflags=<args>` passes the supplied `<args>` to the underlying 345 `compile` invocation(s) while still doing everything that the `go build` command 346 normally does (e.g., handling the build cache, modules, and so on). In contrast, 347 `go tool compile <args>` asks the `go` command to invoke `compile <args>` a single time 348 without involving the standard `go build` machinery. In some cases, it can be helpful to have 349 fewer moving parts by doing `go tool compile <args>`, such as if you have a 350 small standalone source file that can be compiled without any assistance from `go build`. 351 In other cases, it is more convenient to pass `-gcflags` to a build command like 352 `go build`, `go test`, or `go install`. 353 354* `-gcflags` by default applies to the packages named on the command line, but can 355 use package patterns such as `-gcflags='all=-m=1 -l'`, or multiple package patterns such as 356 `-gcflags='all=-m=1' -gcflags='fmt=-m=2'`. For details, see the 357 [cmd/go documentation](https://pkg.go.dev/cmd/go#hdr-Compile_packages_and_dependencies). 358 359### Further reading 360 361To dig deeper into how the SSA package works, including its passes and rules, 362head to [cmd/compile/internal/ssa/README.md](internal/ssa/README.md). 363 364Finally, if something in this README or the SSA README is unclear 365or if you have an idea for an improvement, feel free to leave a comment in 366[issue 30074](https://go.dev/issue/30074). 367