xref: /aosp_15_r20/bionic/docs/fdsan.md (revision 8d67ca893c1523eb926b9080dbe4e2ffd2a27ba1)
1## fdsan
2
3[TOC]
4
5fdsan is a file descriptor sanitizer added to Android in API level 29.
6In API level 29, fdsan warns when it finds a bug.
7In API level 30, fdsan aborts when it finds a bug.
8
9### Background
10*What problem is fdsan trying to solve? Why should I care?*
11
12fdsan (file descriptor sanitizer) detects mishandling of file descriptor ownership, which tend to manifest as *use-after-close* and *double-close*. These errors are direct analogues of the memory allocation *use-after-free* and *double-free* bugs, but tend to be much more difficult to diagnose and fix. With `malloc` and `free`, implementations have free reign to detect errors and abort on double free. File descriptors, on the other hand, are mandated by the POSIX standard to be allocated with the lowest available number being returned for new allocations. As a result, many file descriptor bugs can *never* be noticed on the thread on which the error occurred, and will manifest as "impossible" behavior on another thread.
13
14For example, given two threads running the following code:
15```cpp
16void thread_one() {
17    int fd = open("/dev/null", O_RDONLY);
18    close(fd);
19    close(fd);
20}
21
22void thread_two() {
23    while (true) {
24        int fd = open("log", O_WRONLY | O_APPEND);
25        if (write(fd, "foo", 3) != 3) {
26            err(1, "write failed!");
27        }
28    }
29}
30```
31the following interleaving is possible:
32```cpp
33thread one                                thread two
34open("/dev/null", O_RDONLY) = 123
35close(123) = 0
36                                          open("log", O_WRONLY | APPEND) = 123
37close(123) = 0
38                                          write(123, "foo", 3) = -1 (EBADF)
39                                          err(1, "write failed!")
40```
41
42Assertion failures are probably the most innocuous result that can arise from these bugs: silent data corruption [[1](#footnotes), [2](#footnotes)] or security vulnerabilities are also possible (e.g. suppose thread two was saving user data to disk when a third thread came in and opened a socket to the Internet).
43
44### Design
45*What does fdsan do?*
46
47fdsan attempts to detect and/or prevent file descriptor mismanagement by enforcing file descriptor ownership. Like how most memory allocations can have their ownership handled by types such as `std::unique_ptr`, almost all file descriptors can be associated with a unique owner which is responsible for their closure. fdsan provides functions to associate a file descriptor with an owner; if someone tries to close a file descriptor that they don't own, depending on configuration, either a warning is emitted, or the process aborts.
48
49The way this is implemented is by providing functions to set a 64-bit closure tag on a file descriptor. The tag consists of an 8-bit type byte that identifies the type of the owner (`enum android_fdan_owner_type` in [`<android/fdsan.h>`](https://android.googlesource.com/platform/bionic/+/main/libc/include/android/fdsan.h)), and a 56-bit value. The value should ideally be something that uniquely identifies the object (object address for native objects and `System.identityHashCode` for Java objects), but in cases where it's hard to derive an identifier for the "owner" that should close a file descriptor, even using the same value for all file descriptors in the module can be useful, since it'll catch other code that closes your file descriptors.
50
51If a file descriptor that's been marked with a tag is closed with an incorrect tag, or without a tag, we know something has gone wrong, and can generate diagnostics or abort.
52
53### Enabling fdsan (as a user)
54*How do I use fdsan?*
55
56fdsan has four severity levels:
57 - disabled (`ANDROID_FDSAN_ERROR_LEVEL_DISABLED`)
58 - warn-once (`ANDROID_FDSAN_ERROR_LEVEL_WARN_ONCE`)
59   - Upon detecting an error, emit a warning to logcat, generate a tombstone, and then continue execution with fdsan disabled.
60 - warn-always (`ANDROID_FDSAN_ERROR_LEVEL_WARN_ALWAYS`)
61   - Same as warn-once, except without disabling after the first warning.
62 - fatal (`ANDROID_FDSAN_ERROR_LEVEL_FATAL`)
63   - Abort upon detecting an error.
64
65In API level 29, fdsan had a global default of warn-once.
66In API level 30 and higher, fdsan has a global default of fatal.
67fdsan can be made more or less strict at runtime via the `android_fdsan_set_error_level` function in [`<android/fdsan.h>`](https://android.googlesource.com/platform/bionic/+/main/libc/include/android/fdsan.h).
68
69The likelihood of fdsan catching a file descriptor error is proportional to the percentage of file descriptors in your process that are tagged with an owner.
70
71### Using fdsan to fix a bug
72*No, really, how do I use fdsan?*
73
74Let's look at a simple contrived example that uses sleeps to force a particular interleaving of thread execution.
75
76```cpp
77#include <err.h>
78#include <unistd.h>
79
80#include <chrono>
81#include <thread>
82#include <vector>
83
84#include <android-base/unique_fd.h>
85
86using namespace std::chrono_literals;
87using std::this_thread::sleep_for;
88
89void victim() {
90  sleep_for(300ms);
91  int fd = dup(STDOUT_FILENO);
92  sleep_for(200ms);
93  ssize_t rc = write(fd, "good\n", 5);
94  if (rc == -1) {
95    err(1, "good failed to write?!");
96  }
97  close(fd);
98}
99
100void bystander() {
101  sleep_for(100ms);
102  int fd = dup(STDOUT_FILENO);
103  sleep_for(300ms);
104  close(fd);
105}
106
107void offender() {
108  int fd = dup(STDOUT_FILENO);
109  close(fd);
110  sleep_for(200ms);
111  close(fd);
112}
113
114int main() {
115  std::vector<std::thread> threads;
116  for (auto function : { victim, bystander, offender }) {
117    threads.emplace_back(function);
118  }
119  for (auto& thread : threads) {
120    thread.join();
121  }
122}
123```
124
125When running the program, the threads' executions will be interleaved as follows:
126
127```cpp
128// victim                         bystander                       offender
129                                                                  int fd = dup(1); // 3
130                                                                  close(3);
131                                  int fd = dup(1); // 3
132                                                                  close(3);
133int fd = dup(1); // 3
134                                  close(3);
135write(3, "good\n") = ��;
136```
137
138which results in the following output:
139
140    fdsan_test: good failed to write?!: Bad file descriptor
141
142This implies that either we're accidentally closing out file descriptor too early, or someone else is helpfully closing it for us. Let's use `android::base::unique_fd` in `victim` to guard the file descriptor with fdsan:
143
144```diff
145--- a/fdsan_test.cpp
146+++ b/fdsan_test.cpp
147@@ -12,13 +12,12 @@ using std::this_thread::sleep_for;
148
149 void victim() {
150   sleep_for(200ms);
151-  int fd = dup(STDOUT_FILENO);
152+  android::base::unique_fd fd(dup(STDOUT_FILENO));
153   sleep_for(200ms);
154   ssize_t rc = write(fd, "good\n", 5);
155   if (rc == -1) {
156     err(1, "good failed to write?!");
157   }
158-  close(fd);
159 }
160
161 void bystander() {
162```
163
164Now that we've guarded the file descriptor with fdsan, we should be able to find where the double close is:
165
166```
167pid: 25587, tid: 25589, name: fdsan_test  >>> fdsan_test <<<
168signal 35 (<debuggerd signal>), code -1 (SI_QUEUE), fault addr --------
169Abort message: 'attempted to close file descriptor 3, expected to be unowned, actually owned by unique_fd 0x7bf15dc448'
170    x0  0000000000000000  x1  00000000000063f5  x2  0000000000000023  x3  0000007bf14de338
171    x4  0000007bf14de3b8  x5  3463643531666237  x6  3463643531666237  x7  3834346364353166
172    x8  00000000000000f0  x9  0000000000000000  x10 0000000000000059  x11 0000000000000035
173    x12 0000007bf1bebcfa  x13 0000007bf14ddf0a  x14 0000007bf14ddf0a  x15 0000000000000000
174    x16 0000007bf1c33048  x17 0000007bf1ba9990  x18 0000000000000000  x19 00000000000063f3
175    x20 00000000000063f5  x21 0000007bf14de588  x22 0000007bf1f1b864  x23 0000000000000001
176    x24 0000007bf14de130  x25 0000007bf13e1000  x26 0000007bf1f1f580  x27 0000005ab43ab8f0
177    x28 0000000000000000  x29 0000007bf14de400
178    sp  0000007bf14ddff0  lr  0000007bf1b5fd6c  pc  0000007bf1b5fd90
179
180backtrace:
181    #00 pc 0000000000008d90  /system/lib64/libc.so (fdsan_error(char const*, ...)+384)
182    #01 pc 0000000000008ba8  /system/lib64/libc.so (android_fdsan_close_with_tag+632)
183    #02 pc 00000000000092a0  /system/lib64/libc.so (close+16)
184    #03 pc 00000000000003e4  /system/bin/fdsan_test (bystander()+84)
185    #04 pc 0000000000000918  /system/bin/fdsan_test
186    #05 pc 000000000006689c  /system/lib64/libc.so (__pthread_start(void*)+36)
187    #06 pc 000000000000712c  /system/lib64/libc.so (__start_thread+68)
188```
189
190...in the obviously correct bystander? What's going on here?
191
192The reason for this is (hopefully!) not a bug in fdsan, and will commonly be seen when tracking down double-closes in processes that have sparse fdsan coverage. What actually happened is that the culprit closed `bystander`'s file descriptor between its open and close, which resulted in `bystander` being blamed for closing `victim`'s fd. If we store `bystander`'s fd in a `unique_fd` as well, we should get something more useful:
193```diff
194--- a/tmp/fdsan_test.cpp
195+++ b/tmp/fdsan_test.cpp
196@@ -23,9 +23,8 @@ void victim() {
197
198 void bystander() {
199   sleep_for(100ms);
200-  int fd = dup(STDOUT_FILENO);
201+  android::base::unique_fd fd(dup(STDOUT_FILENO));
202   sleep_for(200ms);
203-  close(fd);
204 }
205```
206giving us:
207```
208pid: 25779, tid: 25782, name: fdsan_test  >>> fdsan_test <<<
209signal 35 (<debuggerd signal>), code -1 (SI_QUEUE), fault addr --------
210Abort message: 'attempted to close file descriptor 3, expected to be unowned, actually owned by unique_fd 0x6fef9ff448'
211    x0  0000000000000000  x1  00000000000064b6  x2  0000000000000023  x3  0000006fef901338
212    x4  0000006fef9013b8  x5  3466663966656636  x6  3466663966656636  x7  3834346666396665
213    x8  00000000000000f0  x9  0000000000000000  x10 0000000000000059  x11 0000000000000039
214    x12 0000006ff0055cfa  x13 0000006fef900f0a  x14 0000006fef900f0a  x15 0000000000000000
215    x16 0000006ff009d048  x17 0000006ff0013990  x18 0000000000000000  x19 00000000000064b3
216    x20 00000000000064b6  x21 0000006fef901588  x22 0000006ff04ff864  x23 0000000000000001
217    x24 0000006fef901130  x25 0000006fef804000  x26 0000006ff0503580  x27 0000006368aa18f8
218    x28 0000000000000000  x29 0000006fef901400
219    sp  0000006fef900ff0  lr  0000006feffc9d6c  pc  0000006feffc9d90
220
221backtrace:
222    #00 pc 0000000000008d90  /system/lib64/libc.so (fdsan_error(char const*, ...)+384)
223    #01 pc 0000000000008ba8  /system/lib64/libc.so (android_fdsan_close_with_tag+632)
224    #02 pc 00000000000092a0  /system/lib64/libc.so (close+16)
225    #03 pc 000000000000045c  /system/bin/fdsan_test (offender()+68)
226    #04 pc 0000000000000920  /system/bin/fdsan_test
227    #05 pc 000000000006689c  /system/lib64/libc.so (__pthread_start(void*)+36)
228    #06 pc 000000000000712c  /system/lib64/libc.so (__start_thread+68)
229```
230
231Hooray!
232
233In a real application, things are probably not going to be as detectable or reproducible as our toy example, which is a good reason to try to maximize the usage of fdsan-enabled types like `unique_fd` and `ParcelFileDescriptor`, to improve the odds that double closes in other code get detected.
234
235### Enabling fdsan (as a C++ library implementer)
236
237fdsan operates via two main primitives. `android_fdsan_exchange_owner_tag` modifies a file descriptor's close tag, and `android_fdsan_close_with_tag` closes a file descriptor with its tag. In the `<android/fdsan.h>` header, these are marked with `__attribute__((weak))`, so instead of passing down the platform version from JNI, availability of the functions can be queried directly. An example implementation of unique_fd follows:
238
239```cpp
240/*
241 * Copyright (C) 2018 The Android Open Source Project
242 * All rights reserved.
243 *
244 * Redistribution and use in source and binary forms, with or without
245 * modification, are permitted provided that the following conditions
246 * are met:
247 *  * Redistributions of source code must retain the above copyright
248 *    notice, this list of conditions and the following disclaimer.
249 *  * Redistributions in binary form must reproduce the above copyright
250 *    notice, this list of conditions and the following disclaimer in
251 *    the documentation and/or other materials provided with the
252 *    distribution.
253 *
254 * THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS
255 * "AS IS" AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT
256 * LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS
257 * FOR A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE
258 * COPYRIGHT OWNER OR CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT,
259 * INCIDENTAL, SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING,
260 * BUT NOT LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS
261 * OF USE, DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED
262 * AND ON ANY THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY,
263 * OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT
264 * OF THE USE OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF
265 * SUCH DAMAGE.
266 */
267
268#pragma once
269
270#include <android/fdsan.h>
271#include <unistd.h>
272
273#include <utility>
274
275struct unique_fd {
276    unique_fd() = default;
277
278    explicit unique_fd(int fd) {
279        reset(fd);
280    }
281
282    unique_fd(const unique_fd& copy) = delete;
283    unique_fd(unique_fd&& move) {
284        *this = std::move(move);
285    }
286
287    ~unique_fd() {
288        reset();
289    }
290
291    unique_fd& operator=(const unique_fd& copy) = delete;
292    unique_fd& operator=(unique_fd&& move) {
293        if (this == &move) {
294            return *this;
295        }
296
297        reset();
298
299        if (move.fd_ != -1) {
300            fd_ = move.fd_;
301            move.fd_ = -1;
302
303            // Acquire ownership from the moved-from object.
304            exchange_tag(fd_, move.tag(), tag());
305        }
306
307        return *this;
308    }
309
310    int get() { return fd_; }
311
312    int release() {
313        if (fd_ == -1) {
314            return -1;
315        }
316
317        int fd = fd_;
318        fd_ = -1;
319
320        // Release ownership.
321        exchange_tag(fd, tag(), 0);
322        return fd;
323    }
324
325    void reset(int new_fd = -1) {
326        if (fd_ != -1) {
327            close(fd_, tag());
328            fd_ = -1;
329        }
330
331        if (new_fd != -1) {
332            fd_ = new_fd;
333
334            // Acquire ownership of the presumably unowned fd.
335            exchange_tag(fd_, 0, tag());
336        }
337    }
338
339  private:
340    int fd_ = -1;
341
342    // The obvious choice of tag to use is the address of the object.
343    uint64_t tag() {
344        return reinterpret_cast<uint64_t>(this);
345    }
346
347    // These functions are marked with __attribute__((weak)), so that their
348    // availability can be determined at runtime. These wrappers will use them
349    // if available, and fall back to no-ops or regular close on devices older
350    // than API level 29.
351    static void exchange_tag(int fd, uint64_t old_tag, uint64_t new_tag) {
352        if (android_fdsan_exchange_owner_tag) {
353            android_fdsan_exchange_owner_tag(fd, old_tag, new_tag);
354        }
355    }
356
357    static int close(int fd, uint64_t tag) {
358        if (android_fdsan_close_with_tag) {
359            return android_fdsan_close_with_tag(fd, tag);
360        } else {
361            return ::close(fd);
362        }
363    }
364};
365```
366
367### Frequently seen bugs
368 * Native APIs not making it clear when they take ownership of a file descriptor. <br/>
369   * Solution: accept `unique_fd` instead of `int` in functions that take ownership.
370   * [Example one](https://android-review.googlesource.com/c/platform/system/core/+/721985), [two](https://android-review.googlesource.com/c/platform/frameworks/native/+/709451)
371 * Receiving a `ParcelFileDescriptor` via Intent, and then passing it into JNI code that ends up calling close on it. <br/>
372   * Solution: ¯\\\_(ツ)\_/¯. Use fdsan?
373   * [Example one](https://android-review.googlesource.com/c/platform/system/bt/+/710104), [two](https://android-review.googlesource.com/c/platform/frameworks/base/+/732305)
374
375### Footnotes
3761. [How To Corrupt An SQLite Database File](https://www.sqlite.org/howtocorrupt.html#_continuing_to_use_a_file_descriptor_after_it_has_been_closed)
377
3782. [<b><i>50%</i></b> of Facebook's iOS crashes caused by a file descriptor double close leading to SQLite database corruption](https://code.fb.com/ios/debugging-file-corruption-on-ios/)
379