xref: /aosp_15_r20/external/llvm/docs/tutorial/OCamlLangImpl8.rst (revision 9880d6810fe72a1726cb53787c6711e909410d58)
1*9880d681SAndroid Build Coastguard Worker======================================================
2*9880d681SAndroid Build Coastguard WorkerKaleidoscope: Conclusion and other useful LLVM tidbits
3*9880d681SAndroid Build Coastguard Worker======================================================
4*9880d681SAndroid Build Coastguard Worker
5*9880d681SAndroid Build Coastguard Worker.. contents::
6*9880d681SAndroid Build Coastguard Worker   :local:
7*9880d681SAndroid Build Coastguard Worker
8*9880d681SAndroid Build Coastguard WorkerTutorial Conclusion
9*9880d681SAndroid Build Coastguard Worker===================
10*9880d681SAndroid Build Coastguard Worker
11*9880d681SAndroid Build Coastguard WorkerWelcome to the final chapter of the "`Implementing a language with
12*9880d681SAndroid Build Coastguard WorkerLLVM <index.html>`_" tutorial. In the course of this tutorial, we have
13*9880d681SAndroid Build Coastguard Workergrown our little Kaleidoscope language from being a useless toy, to
14*9880d681SAndroid Build Coastguard Workerbeing a semi-interesting (but probably still useless) toy. :)
15*9880d681SAndroid Build Coastguard Worker
16*9880d681SAndroid Build Coastguard WorkerIt is interesting to see how far we've come, and how little code it has
17*9880d681SAndroid Build Coastguard Workertaken. We built the entire lexer, parser, AST, code generator, and an
18*9880d681SAndroid Build Coastguard Workerinteractive run-loop (with a JIT!) by-hand in under 700 lines of
19*9880d681SAndroid Build Coastguard Worker(non-comment/non-blank) code.
20*9880d681SAndroid Build Coastguard Worker
21*9880d681SAndroid Build Coastguard WorkerOur little language supports a couple of interesting features: it
22*9880d681SAndroid Build Coastguard Workersupports user defined binary and unary operators, it uses JIT
23*9880d681SAndroid Build Coastguard Workercompilation for immediate evaluation, and it supports a few control flow
24*9880d681SAndroid Build Coastguard Workerconstructs with SSA construction.
25*9880d681SAndroid Build Coastguard Worker
26*9880d681SAndroid Build Coastguard WorkerPart of the idea of this tutorial was to show you how easy and fun it
27*9880d681SAndroid Build Coastguard Workercan be to define, build, and play with languages. Building a compiler
28*9880d681SAndroid Build Coastguard Workerneed not be a scary or mystical process! Now that you've seen some of
29*9880d681SAndroid Build Coastguard Workerthe basics, I strongly encourage you to take the code and hack on it.
30*9880d681SAndroid Build Coastguard WorkerFor example, try adding:
31*9880d681SAndroid Build Coastguard Worker
32*9880d681SAndroid Build Coastguard Worker-  **global variables** - While global variables have questional value
33*9880d681SAndroid Build Coastguard Worker   in modern software engineering, they are often useful when putting
34*9880d681SAndroid Build Coastguard Worker   together quick little hacks like the Kaleidoscope compiler itself.
35*9880d681SAndroid Build Coastguard Worker   Fortunately, our current setup makes it very easy to add global
36*9880d681SAndroid Build Coastguard Worker   variables: just have value lookup check to see if an unresolved
37*9880d681SAndroid Build Coastguard Worker   variable is in the global variable symbol table before rejecting it.
38*9880d681SAndroid Build Coastguard Worker   To create a new global variable, make an instance of the LLVM
39*9880d681SAndroid Build Coastguard Worker   ``GlobalVariable`` class.
40*9880d681SAndroid Build Coastguard Worker-  **typed variables** - Kaleidoscope currently only supports variables
41*9880d681SAndroid Build Coastguard Worker   of type double. This gives the language a very nice elegance, because
42*9880d681SAndroid Build Coastguard Worker   only supporting one type means that you never have to specify types.
43*9880d681SAndroid Build Coastguard Worker   Different languages have different ways of handling this. The easiest
44*9880d681SAndroid Build Coastguard Worker   way is to require the user to specify types for every variable
45*9880d681SAndroid Build Coastguard Worker   definition, and record the type of the variable in the symbol table
46*9880d681SAndroid Build Coastguard Worker   along with its Value\*.
47*9880d681SAndroid Build Coastguard Worker-  **arrays, structs, vectors, etc** - Once you add types, you can start
48*9880d681SAndroid Build Coastguard Worker   extending the type system in all sorts of interesting ways. Simple
49*9880d681SAndroid Build Coastguard Worker   arrays are very easy and are quite useful for many different
50*9880d681SAndroid Build Coastguard Worker   applications. Adding them is mostly an exercise in learning how the
51*9880d681SAndroid Build Coastguard Worker   LLVM `getelementptr <../LangRef.html#getelementptr-instruction>`_ instruction
52*9880d681SAndroid Build Coastguard Worker   works: it is so nifty/unconventional, it `has its own
53*9880d681SAndroid Build Coastguard Worker   FAQ <../GetElementPtr.html>`_! If you add support for recursive types
54*9880d681SAndroid Build Coastguard Worker   (e.g. linked lists), make sure to read the `section in the LLVM
55*9880d681SAndroid Build Coastguard Worker   Programmer's Manual <../ProgrammersManual.html#TypeResolve>`_ that
56*9880d681SAndroid Build Coastguard Worker   describes how to construct them.
57*9880d681SAndroid Build Coastguard Worker-  **standard runtime** - Our current language allows the user to access
58*9880d681SAndroid Build Coastguard Worker   arbitrary external functions, and we use it for things like "printd"
59*9880d681SAndroid Build Coastguard Worker   and "putchard". As you extend the language to add higher-level
60*9880d681SAndroid Build Coastguard Worker   constructs, often these constructs make the most sense if they are
61*9880d681SAndroid Build Coastguard Worker   lowered to calls into a language-supplied runtime. For example, if
62*9880d681SAndroid Build Coastguard Worker   you add hash tables to the language, it would probably make sense to
63*9880d681SAndroid Build Coastguard Worker   add the routines to a runtime, instead of inlining them all the way.
64*9880d681SAndroid Build Coastguard Worker-  **memory management** - Currently we can only access the stack in
65*9880d681SAndroid Build Coastguard Worker   Kaleidoscope. It would also be useful to be able to allocate heap
66*9880d681SAndroid Build Coastguard Worker   memory, either with calls to the standard libc malloc/free interface
67*9880d681SAndroid Build Coastguard Worker   or with a garbage collector. If you would like to use garbage
68*9880d681SAndroid Build Coastguard Worker   collection, note that LLVM fully supports `Accurate Garbage
69*9880d681SAndroid Build Coastguard Worker   Collection <../GarbageCollection.html>`_ including algorithms that
70*9880d681SAndroid Build Coastguard Worker   move objects and need to scan/update the stack.
71*9880d681SAndroid Build Coastguard Worker-  **debugger support** - LLVM supports generation of `DWARF Debug
72*9880d681SAndroid Build Coastguard Worker   info <../SourceLevelDebugging.html>`_ which is understood by common
73*9880d681SAndroid Build Coastguard Worker   debuggers like GDB. Adding support for debug info is fairly
74*9880d681SAndroid Build Coastguard Worker   straightforward. The best way to understand it is to compile some
75*9880d681SAndroid Build Coastguard Worker   C/C++ code with "``clang -g -O0``" and taking a look at what it
76*9880d681SAndroid Build Coastguard Worker   produces.
77*9880d681SAndroid Build Coastguard Worker-  **exception handling support** - LLVM supports generation of `zero
78*9880d681SAndroid Build Coastguard Worker   cost exceptions <../ExceptionHandling.html>`_ which interoperate with
79*9880d681SAndroid Build Coastguard Worker   code compiled in other languages. You could also generate code by
80*9880d681SAndroid Build Coastguard Worker   implicitly making every function return an error value and checking
81*9880d681SAndroid Build Coastguard Worker   it. You could also make explicit use of setjmp/longjmp. There are
82*9880d681SAndroid Build Coastguard Worker   many different ways to go here.
83*9880d681SAndroid Build Coastguard Worker-  **object orientation, generics, database access, complex numbers,
84*9880d681SAndroid Build Coastguard Worker   geometric programming, ...** - Really, there is no end of crazy
85*9880d681SAndroid Build Coastguard Worker   features that you can add to the language.
86*9880d681SAndroid Build Coastguard Worker-  **unusual domains** - We've been talking about applying LLVM to a
87*9880d681SAndroid Build Coastguard Worker   domain that many people are interested in: building a compiler for a
88*9880d681SAndroid Build Coastguard Worker   specific language. However, there are many other domains that can use
89*9880d681SAndroid Build Coastguard Worker   compiler technology that are not typically considered. For example,
90*9880d681SAndroid Build Coastguard Worker   LLVM has been used to implement OpenGL graphics acceleration,
91*9880d681SAndroid Build Coastguard Worker   translate C++ code to ActionScript, and many other cute and clever
92*9880d681SAndroid Build Coastguard Worker   things. Maybe you will be the first to JIT compile a regular
93*9880d681SAndroid Build Coastguard Worker   expression interpreter into native code with LLVM?
94*9880d681SAndroid Build Coastguard Worker
95*9880d681SAndroid Build Coastguard WorkerHave fun - try doing something crazy and unusual. Building a language
96*9880d681SAndroid Build Coastguard Workerlike everyone else always has, is much less fun than trying something a
97*9880d681SAndroid Build Coastguard Workerlittle crazy or off the wall and seeing how it turns out. If you get
98*9880d681SAndroid Build Coastguard Workerstuck or want to talk about it, feel free to email the `llvm-dev mailing
99*9880d681SAndroid Build Coastguard Workerlist <http://lists.llvm.org/mailman/listinfo/llvm-dev>`_: it has lots
100*9880d681SAndroid Build Coastguard Workerof people who are interested in languages and are often willing to help
101*9880d681SAndroid Build Coastguard Workerout.
102*9880d681SAndroid Build Coastguard Worker
103*9880d681SAndroid Build Coastguard WorkerBefore we end this tutorial, I want to talk about some "tips and tricks"
104*9880d681SAndroid Build Coastguard Workerfor generating LLVM IR. These are some of the more subtle things that
105*9880d681SAndroid Build Coastguard Workermay not be obvious, but are very useful if you want to take advantage of
106*9880d681SAndroid Build Coastguard WorkerLLVM's capabilities.
107*9880d681SAndroid Build Coastguard Worker
108*9880d681SAndroid Build Coastguard WorkerProperties of the LLVM IR
109*9880d681SAndroid Build Coastguard Worker=========================
110*9880d681SAndroid Build Coastguard Worker
111*9880d681SAndroid Build Coastguard WorkerWe have a couple common questions about code in the LLVM IR form - lets
112*9880d681SAndroid Build Coastguard Workerjust get these out of the way right now, shall we?
113*9880d681SAndroid Build Coastguard Worker
114*9880d681SAndroid Build Coastguard WorkerTarget Independence
115*9880d681SAndroid Build Coastguard Worker-------------------
116*9880d681SAndroid Build Coastguard Worker
117*9880d681SAndroid Build Coastguard WorkerKaleidoscope is an example of a "portable language": any program written
118*9880d681SAndroid Build Coastguard Workerin Kaleidoscope will work the same way on any target that it runs on.
119*9880d681SAndroid Build Coastguard WorkerMany other languages have this property, e.g. lisp, java, haskell,
120*9880d681SAndroid Build Coastguard Workerjavascript, python, etc (note that while these languages are portable,
121*9880d681SAndroid Build Coastguard Workernot all their libraries are).
122*9880d681SAndroid Build Coastguard Worker
123*9880d681SAndroid Build Coastguard WorkerOne nice aspect of LLVM is that it is often capable of preserving target
124*9880d681SAndroid Build Coastguard Workerindependence in the IR: you can take the LLVM IR for a
125*9880d681SAndroid Build Coastguard WorkerKaleidoscope-compiled program and run it on any target that LLVM
126*9880d681SAndroid Build Coastguard Workersupports, even emitting C code and compiling that on targets that LLVM
127*9880d681SAndroid Build Coastguard Workerdoesn't support natively. You can trivially tell that the Kaleidoscope
128*9880d681SAndroid Build Coastguard Workercompiler generates target-independent code because it never queries for
129*9880d681SAndroid Build Coastguard Workerany target-specific information when generating code.
130*9880d681SAndroid Build Coastguard Worker
131*9880d681SAndroid Build Coastguard WorkerThe fact that LLVM provides a compact, target-independent,
132*9880d681SAndroid Build Coastguard Workerrepresentation for code gets a lot of people excited. Unfortunately,
133*9880d681SAndroid Build Coastguard Workerthese people are usually thinking about C or a language from the C
134*9880d681SAndroid Build Coastguard Workerfamily when they are asking questions about language portability. I say
135*9880d681SAndroid Build Coastguard Worker"unfortunately", because there is really no way to make (fully general)
136*9880d681SAndroid Build Coastguard WorkerC code portable, other than shipping the source code around (and of
137*9880d681SAndroid Build Coastguard Workercourse, C source code is not actually portable in general either - ever
138*9880d681SAndroid Build Coastguard Workerport a really old application from 32- to 64-bits?).
139*9880d681SAndroid Build Coastguard Worker
140*9880d681SAndroid Build Coastguard WorkerThe problem with C (again, in its full generality) is that it is heavily
141*9880d681SAndroid Build Coastguard Workerladen with target specific assumptions. As one simple example, the
142*9880d681SAndroid Build Coastguard Workerpreprocessor often destructively removes target-independence from the
143*9880d681SAndroid Build Coastguard Workercode when it processes the input text:
144*9880d681SAndroid Build Coastguard Worker
145*9880d681SAndroid Build Coastguard Worker.. code-block:: c
146*9880d681SAndroid Build Coastguard Worker
147*9880d681SAndroid Build Coastguard Worker    #ifdef __i386__
148*9880d681SAndroid Build Coastguard Worker      int X = 1;
149*9880d681SAndroid Build Coastguard Worker    #else
150*9880d681SAndroid Build Coastguard Worker      int X = 42;
151*9880d681SAndroid Build Coastguard Worker    #endif
152*9880d681SAndroid Build Coastguard Worker
153*9880d681SAndroid Build Coastguard WorkerWhile it is possible to engineer more and more complex solutions to
154*9880d681SAndroid Build Coastguard Workerproblems like this, it cannot be solved in full generality in a way that
155*9880d681SAndroid Build Coastguard Workeris better than shipping the actual source code.
156*9880d681SAndroid Build Coastguard Worker
157*9880d681SAndroid Build Coastguard WorkerThat said, there are interesting subsets of C that can be made portable.
158*9880d681SAndroid Build Coastguard WorkerIf you are willing to fix primitive types to a fixed size (say int =
159*9880d681SAndroid Build Coastguard Worker32-bits, and long = 64-bits), don't care about ABI compatibility with
160*9880d681SAndroid Build Coastguard Workerexisting binaries, and are willing to give up some other minor features,
161*9880d681SAndroid Build Coastguard Workeryou can have portable code. This can make sense for specialized domains
162*9880d681SAndroid Build Coastguard Workersuch as an in-kernel language.
163*9880d681SAndroid Build Coastguard Worker
164*9880d681SAndroid Build Coastguard WorkerSafety Guarantees
165*9880d681SAndroid Build Coastguard Worker-----------------
166*9880d681SAndroid Build Coastguard Worker
167*9880d681SAndroid Build Coastguard WorkerMany of the languages above are also "safe" languages: it is impossible
168*9880d681SAndroid Build Coastguard Workerfor a program written in Java to corrupt its address space and crash the
169*9880d681SAndroid Build Coastguard Workerprocess (assuming the JVM has no bugs). Safety is an interesting
170*9880d681SAndroid Build Coastguard Workerproperty that requires a combination of language design, runtime
171*9880d681SAndroid Build Coastguard Workersupport, and often operating system support.
172*9880d681SAndroid Build Coastguard Worker
173*9880d681SAndroid Build Coastguard WorkerIt is certainly possible to implement a safe language in LLVM, but LLVM
174*9880d681SAndroid Build Coastguard WorkerIR does not itself guarantee safety. The LLVM IR allows unsafe pointer
175*9880d681SAndroid Build Coastguard Workercasts, use after free bugs, buffer over-runs, and a variety of other
176*9880d681SAndroid Build Coastguard Workerproblems. Safety needs to be implemented as a layer on top of LLVM and,
177*9880d681SAndroid Build Coastguard Workerconveniently, several groups have investigated this. Ask on the `llvm-dev
178*9880d681SAndroid Build Coastguard Workermailing list <http://lists.llvm.org/mailman/listinfo/llvm-dev>`_ if
179*9880d681SAndroid Build Coastguard Workeryou are interested in more details.
180*9880d681SAndroid Build Coastguard Worker
181*9880d681SAndroid Build Coastguard WorkerLanguage-Specific Optimizations
182*9880d681SAndroid Build Coastguard Worker-------------------------------
183*9880d681SAndroid Build Coastguard Worker
184*9880d681SAndroid Build Coastguard WorkerOne thing about LLVM that turns off many people is that it does not
185*9880d681SAndroid Build Coastguard Workersolve all the world's problems in one system (sorry 'world hunger',
186*9880d681SAndroid Build Coastguard Workersomeone else will have to solve you some other day). One specific
187*9880d681SAndroid Build Coastguard Workercomplaint is that people perceive LLVM as being incapable of performing
188*9880d681SAndroid Build Coastguard Workerhigh-level language-specific optimization: LLVM "loses too much
189*9880d681SAndroid Build Coastguard Workerinformation".
190*9880d681SAndroid Build Coastguard Worker
191*9880d681SAndroid Build Coastguard WorkerUnfortunately, this is really not the place to give you a full and
192*9880d681SAndroid Build Coastguard Workerunified version of "Chris Lattner's theory of compiler design". Instead,
193*9880d681SAndroid Build Coastguard WorkerI'll make a few observations:
194*9880d681SAndroid Build Coastguard Worker
195*9880d681SAndroid Build Coastguard WorkerFirst, you're right that LLVM does lose information. For example, as of
196*9880d681SAndroid Build Coastguard Workerthis writing, there is no way to distinguish in the LLVM IR whether an
197*9880d681SAndroid Build Coastguard WorkerSSA-value came from a C "int" or a C "long" on an ILP32 machine (other
198*9880d681SAndroid Build Coastguard Workerthan debug info). Both get compiled down to an 'i32' value and the
199*9880d681SAndroid Build Coastguard Workerinformation about what it came from is lost. The more general issue
200*9880d681SAndroid Build Coastguard Workerhere, is that the LLVM type system uses "structural equivalence" instead
201*9880d681SAndroid Build Coastguard Workerof "name equivalence". Another place this surprises people is if you
202*9880d681SAndroid Build Coastguard Workerhave two types in a high-level language that have the same structure
203*9880d681SAndroid Build Coastguard Worker(e.g. two different structs that have a single int field): these types
204*9880d681SAndroid Build Coastguard Workerwill compile down into a single LLVM type and it will be impossible to
205*9880d681SAndroid Build Coastguard Workertell what it came from.
206*9880d681SAndroid Build Coastguard Worker
207*9880d681SAndroid Build Coastguard WorkerSecond, while LLVM does lose information, LLVM is not a fixed target: we
208*9880d681SAndroid Build Coastguard Workercontinue to enhance and improve it in many different ways. In addition
209*9880d681SAndroid Build Coastguard Workerto adding new features (LLVM did not always support exceptions or debug
210*9880d681SAndroid Build Coastguard Workerinfo), we also extend the IR to capture important information for
211*9880d681SAndroid Build Coastguard Workeroptimization (e.g. whether an argument is sign or zero extended,
212*9880d681SAndroid Build Coastguard Workerinformation about pointers aliasing, etc). Many of the enhancements are
213*9880d681SAndroid Build Coastguard Workeruser-driven: people want LLVM to include some specific feature, so they
214*9880d681SAndroid Build Coastguard Workergo ahead and extend it.
215*9880d681SAndroid Build Coastguard Worker
216*9880d681SAndroid Build Coastguard WorkerThird, it is *possible and easy* to add language-specific optimizations,
217*9880d681SAndroid Build Coastguard Workerand you have a number of choices in how to do it. As one trivial
218*9880d681SAndroid Build Coastguard Workerexample, it is easy to add language-specific optimization passes that
219*9880d681SAndroid Build Coastguard Worker"know" things about code compiled for a language. In the case of the C
220*9880d681SAndroid Build Coastguard Workerfamily, there is an optimization pass that "knows" about the standard C
221*9880d681SAndroid Build Coastguard Workerlibrary functions. If you call "exit(0)" in main(), it knows that it is
222*9880d681SAndroid Build Coastguard Workersafe to optimize that into "return 0;" because C specifies what the
223*9880d681SAndroid Build Coastguard Worker'exit' function does.
224*9880d681SAndroid Build Coastguard Worker
225*9880d681SAndroid Build Coastguard WorkerIn addition to simple library knowledge, it is possible to embed a
226*9880d681SAndroid Build Coastguard Workervariety of other language-specific information into the LLVM IR. If you
227*9880d681SAndroid Build Coastguard Workerhave a specific need and run into a wall, please bring the topic up on
228*9880d681SAndroid Build Coastguard Workerthe llvm-dev list. At the very worst, you can always treat LLVM as if it
229*9880d681SAndroid Build Coastguard Workerwere a "dumb code generator" and implement the high-level optimizations
230*9880d681SAndroid Build Coastguard Workeryou desire in your front-end, on the language-specific AST.
231*9880d681SAndroid Build Coastguard Worker
232*9880d681SAndroid Build Coastguard WorkerTips and Tricks
233*9880d681SAndroid Build Coastguard Worker===============
234*9880d681SAndroid Build Coastguard Worker
235*9880d681SAndroid Build Coastguard WorkerThere is a variety of useful tips and tricks that you come to know after
236*9880d681SAndroid Build Coastguard Workerworking on/with LLVM that aren't obvious at first glance. Instead of
237*9880d681SAndroid Build Coastguard Workerletting everyone rediscover them, this section talks about some of these
238*9880d681SAndroid Build Coastguard Workerissues.
239*9880d681SAndroid Build Coastguard Worker
240*9880d681SAndroid Build Coastguard WorkerImplementing portable offsetof/sizeof
241*9880d681SAndroid Build Coastguard Worker-------------------------------------
242*9880d681SAndroid Build Coastguard Worker
243*9880d681SAndroid Build Coastguard WorkerOne interesting thing that comes up, if you are trying to keep the code
244*9880d681SAndroid Build Coastguard Workergenerated by your compiler "target independent", is that you often need
245*9880d681SAndroid Build Coastguard Workerto know the size of some LLVM type or the offset of some field in an
246*9880d681SAndroid Build Coastguard Workerllvm structure. For example, you might need to pass the size of a type
247*9880d681SAndroid Build Coastguard Workerinto a function that allocates memory.
248*9880d681SAndroid Build Coastguard Worker
249*9880d681SAndroid Build Coastguard WorkerUnfortunately, this can vary widely across targets: for example the
250*9880d681SAndroid Build Coastguard Workerwidth of a pointer is trivially target-specific. However, there is a
251*9880d681SAndroid Build Coastguard Worker`clever way to use the getelementptr
252*9880d681SAndroid Build Coastguard Workerinstruction <http://nondot.org/sabre/LLVMNotes/SizeOf-OffsetOf-VariableSizedStructs.txt>`_
253*9880d681SAndroid Build Coastguard Workerthat allows you to compute this in a portable way.
254*9880d681SAndroid Build Coastguard Worker
255*9880d681SAndroid Build Coastguard WorkerGarbage Collected Stack Frames
256*9880d681SAndroid Build Coastguard Worker------------------------------
257*9880d681SAndroid Build Coastguard Worker
258*9880d681SAndroid Build Coastguard WorkerSome languages want to explicitly manage their stack frames, often so
259*9880d681SAndroid Build Coastguard Workerthat they are garbage collected or to allow easy implementation of
260*9880d681SAndroid Build Coastguard Workerclosures. There are often better ways to implement these features than
261*9880d681SAndroid Build Coastguard Workerexplicit stack frames, but `LLVM does support
262*9880d681SAndroid Build Coastguard Workerthem, <http://nondot.org/sabre/LLVMNotes/ExplicitlyManagedStackFrames.txt>`_
263*9880d681SAndroid Build Coastguard Workerif you want. It requires your front-end to convert the code into
264*9880d681SAndroid Build Coastguard Worker`Continuation Passing
265*9880d681SAndroid Build Coastguard WorkerStyle <http://en.wikipedia.org/wiki/Continuation-passing_style>`_ and
266*9880d681SAndroid Build Coastguard Workerthe use of tail calls (which LLVM also supports).
267*9880d681SAndroid Build Coastguard Worker
268