xref: /aosp_15_r20/external/clang/docs/DataFlowSanitizer.rst (revision 67e74705e28f6214e480b399dd47ea732279e315)
1*67e74705SXin Li=================
2*67e74705SXin LiDataFlowSanitizer
3*67e74705SXin Li=================
4*67e74705SXin Li
5*67e74705SXin Li.. toctree::
6*67e74705SXin Li   :hidden:
7*67e74705SXin Li
8*67e74705SXin Li   DataFlowSanitizerDesign
9*67e74705SXin Li
10*67e74705SXin Li.. contents::
11*67e74705SXin Li   :local:
12*67e74705SXin Li
13*67e74705SXin LiIntroduction
14*67e74705SXin Li============
15*67e74705SXin Li
16*67e74705SXin LiDataFlowSanitizer is a generalised dynamic data flow analysis.
17*67e74705SXin Li
18*67e74705SXin LiUnlike other Sanitizer tools, this tool is not designed to detect a
19*67e74705SXin Lispecific class of bugs on its own.  Instead, it provides a generic
20*67e74705SXin Lidynamic data flow analysis framework to be used by clients to help
21*67e74705SXin Lidetect application-specific issues within their own code.
22*67e74705SXin Li
23*67e74705SXin LiUsage
24*67e74705SXin Li=====
25*67e74705SXin Li
26*67e74705SXin LiWith no program changes, applying DataFlowSanitizer to a program
27*67e74705SXin Liwill not alter its behavior.  To use DataFlowSanitizer, the program
28*67e74705SXin Liuses API functions to apply tags to data to cause it to be tracked, and to
29*67e74705SXin Licheck the tag of a specific data item.  DataFlowSanitizer manages
30*67e74705SXin Lithe propagation of tags through the program according to its data flow.
31*67e74705SXin Li
32*67e74705SXin LiThe APIs are defined in the header file ``sanitizer/dfsan_interface.h``.
33*67e74705SXin LiFor further information about each function, please refer to the header
34*67e74705SXin Lifile.
35*67e74705SXin Li
36*67e74705SXin LiABI List
37*67e74705SXin Li--------
38*67e74705SXin Li
39*67e74705SXin LiDataFlowSanitizer uses a list of functions known as an ABI list to decide
40*67e74705SXin Liwhether a call to a specific function should use the operating system's native
41*67e74705SXin LiABI or whether it should use a variant of this ABI that also propagates labels
42*67e74705SXin Lithrough function parameters and return values.  The ABI list file also controls
43*67e74705SXin Lihow labels are propagated in the former case.  DataFlowSanitizer comes with a
44*67e74705SXin Lidefault ABI list which is intended to eventually cover the glibc library on
45*67e74705SXin LiLinux but it may become necessary for users to extend the ABI list in cases
46*67e74705SXin Liwhere a particular library or function cannot be instrumented (e.g. because
47*67e74705SXin Liit is implemented in assembly or another language which DataFlowSanitizer does
48*67e74705SXin Linot support) or a function is called from a library or function which cannot
49*67e74705SXin Libe instrumented.
50*67e74705SXin Li
51*67e74705SXin LiDataFlowSanitizer's ABI list file is a :doc:`SanitizerSpecialCaseList`.
52*67e74705SXin LiThe pass treats every function in the ``uninstrumented`` category in the
53*67e74705SXin LiABI list file as conforming to the native ABI.  Unless the ABI list contains
54*67e74705SXin Liadditional categories for those functions, a call to one of those functions
55*67e74705SXin Liwill produce a warning message, as the labelling behavior of the function
56*67e74705SXin Liis unknown.  The other supported categories are ``discard``, ``functional``
57*67e74705SXin Liand ``custom``.
58*67e74705SXin Li
59*67e74705SXin Li* ``discard`` -- To the extent that this function writes to (user-accessible)
60*67e74705SXin Li  memory, it also updates labels in shadow memory (this condition is trivially
61*67e74705SXin Li  satisfied for functions which do not write to user-accessible memory).  Its
62*67e74705SXin Li  return value is unlabelled.
63*67e74705SXin Li* ``functional`` -- Like ``discard``, except that the label of its return value
64*67e74705SXin Li  is the union of the label of its arguments.
65*67e74705SXin Li* ``custom`` -- Instead of calling the function, a custom wrapper ``__dfsw_F``
66*67e74705SXin Li  is called, where ``F`` is the name of the function.  This function may wrap
67*67e74705SXin Li  the original function or provide its own implementation.  This category is
68*67e74705SXin Li  generally used for uninstrumentable functions which write to user-accessible
69*67e74705SXin Li  memory or which have more complex label propagation behavior.  The signature
70*67e74705SXin Li  of ``__dfsw_F`` is based on that of ``F`` with each argument having a
71*67e74705SXin Li  label of type ``dfsan_label`` appended to the argument list.  If ``F``
72*67e74705SXin Li  is of non-void return type a final argument of type ``dfsan_label *``
73*67e74705SXin Li  is appended to which the custom function can store the label for the
74*67e74705SXin Li  return value.  For example:
75*67e74705SXin Li
76*67e74705SXin Li.. code-block:: c++
77*67e74705SXin Li
78*67e74705SXin Li  void f(int x);
79*67e74705SXin Li  void __dfsw_f(int x, dfsan_label x_label);
80*67e74705SXin Li
81*67e74705SXin Li  void *memcpy(void *dest, const void *src, size_t n);
82*67e74705SXin Li  void *__dfsw_memcpy(void *dest, const void *src, size_t n,
83*67e74705SXin Li                      dfsan_label dest_label, dfsan_label src_label,
84*67e74705SXin Li                      dfsan_label n_label, dfsan_label *ret_label);
85*67e74705SXin Li
86*67e74705SXin LiIf a function defined in the translation unit being compiled belongs to the
87*67e74705SXin Li``uninstrumented`` category, it will be compiled so as to conform to the
88*67e74705SXin Linative ABI.  Its arguments will be assumed to be unlabelled, but it will
89*67e74705SXin Lipropagate labels in shadow memory.
90*67e74705SXin Li
91*67e74705SXin LiFor example:
92*67e74705SXin Li
93*67e74705SXin Li.. code-block:: none
94*67e74705SXin Li
95*67e74705SXin Li  # main is called by the C runtime using the native ABI.
96*67e74705SXin Li  fun:main=uninstrumented
97*67e74705SXin Li  fun:main=discard
98*67e74705SXin Li
99*67e74705SXin Li  # malloc only writes to its internal data structures, not user-accessible memory.
100*67e74705SXin Li  fun:malloc=uninstrumented
101*67e74705SXin Li  fun:malloc=discard
102*67e74705SXin Li
103*67e74705SXin Li  # tolower is a pure function.
104*67e74705SXin Li  fun:tolower=uninstrumented
105*67e74705SXin Li  fun:tolower=functional
106*67e74705SXin Li
107*67e74705SXin Li  # memcpy needs to copy the shadow from the source to the destination region.
108*67e74705SXin Li  # This is done in a custom function.
109*67e74705SXin Li  fun:memcpy=uninstrumented
110*67e74705SXin Li  fun:memcpy=custom
111*67e74705SXin Li
112*67e74705SXin LiExample
113*67e74705SXin Li=======
114*67e74705SXin Li
115*67e74705SXin LiThe following program demonstrates label propagation by checking that
116*67e74705SXin Lithe correct labels are propagated.
117*67e74705SXin Li
118*67e74705SXin Li.. code-block:: c++
119*67e74705SXin Li
120*67e74705SXin Li  #include <sanitizer/dfsan_interface.h>
121*67e74705SXin Li  #include <assert.h>
122*67e74705SXin Li
123*67e74705SXin Li  int main(void) {
124*67e74705SXin Li    int i = 1;
125*67e74705SXin Li    dfsan_label i_label = dfsan_create_label("i", 0);
126*67e74705SXin Li    dfsan_set_label(i_label, &i, sizeof(i));
127*67e74705SXin Li
128*67e74705SXin Li    int j = 2;
129*67e74705SXin Li    dfsan_label j_label = dfsan_create_label("j", 0);
130*67e74705SXin Li    dfsan_set_label(j_label, &j, sizeof(j));
131*67e74705SXin Li
132*67e74705SXin Li    int k = 3;
133*67e74705SXin Li    dfsan_label k_label = dfsan_create_label("k", 0);
134*67e74705SXin Li    dfsan_set_label(k_label, &k, sizeof(k));
135*67e74705SXin Li
136*67e74705SXin Li    dfsan_label ij_label = dfsan_get_label(i + j);
137*67e74705SXin Li    assert(dfsan_has_label(ij_label, i_label));
138*67e74705SXin Li    assert(dfsan_has_label(ij_label, j_label));
139*67e74705SXin Li    assert(!dfsan_has_label(ij_label, k_label));
140*67e74705SXin Li
141*67e74705SXin Li    dfsan_label ijk_label = dfsan_get_label(i + j + k);
142*67e74705SXin Li    assert(dfsan_has_label(ijk_label, i_label));
143*67e74705SXin Li    assert(dfsan_has_label(ijk_label, j_label));
144*67e74705SXin Li    assert(dfsan_has_label(ijk_label, k_label));
145*67e74705SXin Li
146*67e74705SXin Li    return 0;
147*67e74705SXin Li  }
148*67e74705SXin Li
149*67e74705SXin LiCurrent status
150*67e74705SXin Li==============
151*67e74705SXin Li
152*67e74705SXin LiDataFlowSanitizer is a work in progress, currently under development for
153*67e74705SXin Lix86\_64 Linux.
154*67e74705SXin Li
155*67e74705SXin LiDesign
156*67e74705SXin Li======
157*67e74705SXin Li
158*67e74705SXin LiPlease refer to the :doc:`design document<DataFlowSanitizerDesign>`.
159