1*67e74705SXin Li================= 2*67e74705SXin LiDataFlowSanitizer 3*67e74705SXin Li================= 4*67e74705SXin Li 5*67e74705SXin Li.. toctree:: 6*67e74705SXin Li :hidden: 7*67e74705SXin Li 8*67e74705SXin Li DataFlowSanitizerDesign 9*67e74705SXin Li 10*67e74705SXin Li.. contents:: 11*67e74705SXin Li :local: 12*67e74705SXin Li 13*67e74705SXin LiIntroduction 14*67e74705SXin Li============ 15*67e74705SXin Li 16*67e74705SXin LiDataFlowSanitizer is a generalised dynamic data flow analysis. 17*67e74705SXin Li 18*67e74705SXin LiUnlike other Sanitizer tools, this tool is not designed to detect a 19*67e74705SXin Lispecific class of bugs on its own. Instead, it provides a generic 20*67e74705SXin Lidynamic data flow analysis framework to be used by clients to help 21*67e74705SXin Lidetect application-specific issues within their own code. 22*67e74705SXin Li 23*67e74705SXin LiUsage 24*67e74705SXin Li===== 25*67e74705SXin Li 26*67e74705SXin LiWith no program changes, applying DataFlowSanitizer to a program 27*67e74705SXin Liwill not alter its behavior. To use DataFlowSanitizer, the program 28*67e74705SXin Liuses API functions to apply tags to data to cause it to be tracked, and to 29*67e74705SXin Licheck the tag of a specific data item. DataFlowSanitizer manages 30*67e74705SXin Lithe propagation of tags through the program according to its data flow. 31*67e74705SXin Li 32*67e74705SXin LiThe APIs are defined in the header file ``sanitizer/dfsan_interface.h``. 33*67e74705SXin LiFor further information about each function, please refer to the header 34*67e74705SXin Lifile. 35*67e74705SXin Li 36*67e74705SXin LiABI List 37*67e74705SXin Li-------- 38*67e74705SXin Li 39*67e74705SXin LiDataFlowSanitizer uses a list of functions known as an ABI list to decide 40*67e74705SXin Liwhether a call to a specific function should use the operating system's native 41*67e74705SXin LiABI or whether it should use a variant of this ABI that also propagates labels 42*67e74705SXin Lithrough function parameters and return values. The ABI list file also controls 43*67e74705SXin Lihow labels are propagated in the former case. DataFlowSanitizer comes with a 44*67e74705SXin Lidefault ABI list which is intended to eventually cover the glibc library on 45*67e74705SXin LiLinux but it may become necessary for users to extend the ABI list in cases 46*67e74705SXin Liwhere a particular library or function cannot be instrumented (e.g. because 47*67e74705SXin Liit is implemented in assembly or another language which DataFlowSanitizer does 48*67e74705SXin Linot support) or a function is called from a library or function which cannot 49*67e74705SXin Libe instrumented. 50*67e74705SXin Li 51*67e74705SXin LiDataFlowSanitizer's ABI list file is a :doc:`SanitizerSpecialCaseList`. 52*67e74705SXin LiThe pass treats every function in the ``uninstrumented`` category in the 53*67e74705SXin LiABI list file as conforming to the native ABI. Unless the ABI list contains 54*67e74705SXin Liadditional categories for those functions, a call to one of those functions 55*67e74705SXin Liwill produce a warning message, as the labelling behavior of the function 56*67e74705SXin Liis unknown. The other supported categories are ``discard``, ``functional`` 57*67e74705SXin Liand ``custom``. 58*67e74705SXin Li 59*67e74705SXin Li* ``discard`` -- To the extent that this function writes to (user-accessible) 60*67e74705SXin Li memory, it also updates labels in shadow memory (this condition is trivially 61*67e74705SXin Li satisfied for functions which do not write to user-accessible memory). Its 62*67e74705SXin Li return value is unlabelled. 63*67e74705SXin Li* ``functional`` -- Like ``discard``, except that the label of its return value 64*67e74705SXin Li is the union of the label of its arguments. 65*67e74705SXin Li* ``custom`` -- Instead of calling the function, a custom wrapper ``__dfsw_F`` 66*67e74705SXin Li is called, where ``F`` is the name of the function. This function may wrap 67*67e74705SXin Li the original function or provide its own implementation. This category is 68*67e74705SXin Li generally used for uninstrumentable functions which write to user-accessible 69*67e74705SXin Li memory or which have more complex label propagation behavior. The signature 70*67e74705SXin Li of ``__dfsw_F`` is based on that of ``F`` with each argument having a 71*67e74705SXin Li label of type ``dfsan_label`` appended to the argument list. If ``F`` 72*67e74705SXin Li is of non-void return type a final argument of type ``dfsan_label *`` 73*67e74705SXin Li is appended to which the custom function can store the label for the 74*67e74705SXin Li return value. For example: 75*67e74705SXin Li 76*67e74705SXin Li.. code-block:: c++ 77*67e74705SXin Li 78*67e74705SXin Li void f(int x); 79*67e74705SXin Li void __dfsw_f(int x, dfsan_label x_label); 80*67e74705SXin Li 81*67e74705SXin Li void *memcpy(void *dest, const void *src, size_t n); 82*67e74705SXin Li void *__dfsw_memcpy(void *dest, const void *src, size_t n, 83*67e74705SXin Li dfsan_label dest_label, dfsan_label src_label, 84*67e74705SXin Li dfsan_label n_label, dfsan_label *ret_label); 85*67e74705SXin Li 86*67e74705SXin LiIf a function defined in the translation unit being compiled belongs to the 87*67e74705SXin Li``uninstrumented`` category, it will be compiled so as to conform to the 88*67e74705SXin Linative ABI. Its arguments will be assumed to be unlabelled, but it will 89*67e74705SXin Lipropagate labels in shadow memory. 90*67e74705SXin Li 91*67e74705SXin LiFor example: 92*67e74705SXin Li 93*67e74705SXin Li.. code-block:: none 94*67e74705SXin Li 95*67e74705SXin Li # main is called by the C runtime using the native ABI. 96*67e74705SXin Li fun:main=uninstrumented 97*67e74705SXin Li fun:main=discard 98*67e74705SXin Li 99*67e74705SXin Li # malloc only writes to its internal data structures, not user-accessible memory. 100*67e74705SXin Li fun:malloc=uninstrumented 101*67e74705SXin Li fun:malloc=discard 102*67e74705SXin Li 103*67e74705SXin Li # tolower is a pure function. 104*67e74705SXin Li fun:tolower=uninstrumented 105*67e74705SXin Li fun:tolower=functional 106*67e74705SXin Li 107*67e74705SXin Li # memcpy needs to copy the shadow from the source to the destination region. 108*67e74705SXin Li # This is done in a custom function. 109*67e74705SXin Li fun:memcpy=uninstrumented 110*67e74705SXin Li fun:memcpy=custom 111*67e74705SXin Li 112*67e74705SXin LiExample 113*67e74705SXin Li======= 114*67e74705SXin Li 115*67e74705SXin LiThe following program demonstrates label propagation by checking that 116*67e74705SXin Lithe correct labels are propagated. 117*67e74705SXin Li 118*67e74705SXin Li.. code-block:: c++ 119*67e74705SXin Li 120*67e74705SXin Li #include <sanitizer/dfsan_interface.h> 121*67e74705SXin Li #include <assert.h> 122*67e74705SXin Li 123*67e74705SXin Li int main(void) { 124*67e74705SXin Li int i = 1; 125*67e74705SXin Li dfsan_label i_label = dfsan_create_label("i", 0); 126*67e74705SXin Li dfsan_set_label(i_label, &i, sizeof(i)); 127*67e74705SXin Li 128*67e74705SXin Li int j = 2; 129*67e74705SXin Li dfsan_label j_label = dfsan_create_label("j", 0); 130*67e74705SXin Li dfsan_set_label(j_label, &j, sizeof(j)); 131*67e74705SXin Li 132*67e74705SXin Li int k = 3; 133*67e74705SXin Li dfsan_label k_label = dfsan_create_label("k", 0); 134*67e74705SXin Li dfsan_set_label(k_label, &k, sizeof(k)); 135*67e74705SXin Li 136*67e74705SXin Li dfsan_label ij_label = dfsan_get_label(i + j); 137*67e74705SXin Li assert(dfsan_has_label(ij_label, i_label)); 138*67e74705SXin Li assert(dfsan_has_label(ij_label, j_label)); 139*67e74705SXin Li assert(!dfsan_has_label(ij_label, k_label)); 140*67e74705SXin Li 141*67e74705SXin Li dfsan_label ijk_label = dfsan_get_label(i + j + k); 142*67e74705SXin Li assert(dfsan_has_label(ijk_label, i_label)); 143*67e74705SXin Li assert(dfsan_has_label(ijk_label, j_label)); 144*67e74705SXin Li assert(dfsan_has_label(ijk_label, k_label)); 145*67e74705SXin Li 146*67e74705SXin Li return 0; 147*67e74705SXin Li } 148*67e74705SXin Li 149*67e74705SXin LiCurrent status 150*67e74705SXin Li============== 151*67e74705SXin Li 152*67e74705SXin LiDataFlowSanitizer is a work in progress, currently under development for 153*67e74705SXin Lix86\_64 Linux. 154*67e74705SXin Li 155*67e74705SXin LiDesign 156*67e74705SXin Li====== 157*67e74705SXin Li 158*67e74705SXin LiPlease refer to the :doc:`design document<DataFlowSanitizerDesign>`. 159