xref: /aosp_15_r20/external/grpc-grpc/doc/core/grpc-polling-engines.md (revision cc02d7e222339f7a4f6ba5f422e6413f4bd931f2)
1# Polling Engines
2
3_Author: Sree Kuchibhotla (@sreecha) - Sep 2018_
4
5
6## Why do we need a 'polling engine' ?
7
8Polling engine component was created for the following reasons:
9
10- gRPC code deals with a bunch of file descriptors on which events like descriptor being readable/writable/error have to be monitored
11- gRPC code knows the actions to perform when such events happen
12  -  For example:
13    - `grpc_endpoint` code calls `recvmsg` call when the fd is readable and `sendmsg` call when the fd is writable
14    - ` tcp_client` connect code issues async `connect` and finishes creating the client once the fd is writable (i.e when the `connect` actually finished)
15- gRPC needed some component that can "efficiently" do the above operations __using the threads provided by the applications (i.e., not create any new threads)__.  Also by "efficiently" we mean optimized for latency and throughput
16
17
18## Polling Engine Implementations in gRPC
19There are multiple polling engine implementations depending on the OS and the OS version.  Fortunately all of them expose the same interface
20
21- Linux:
22
23  - `epoll1` (If glibc version >= 2.9)
24  - `poll` (If kernel does not have epoll support)
25- Mac: **`poll`** (default)
26- Windows: (no name)
27
28## Polling Engine Interface
29
30### Opaque Structures exposed by the polling engine
31The following are the **Opaque** structures exposed by Polling Engine interface (NOTE: Different polling engine implementations have different definitions of these structures)
32
33- **grpc_fd:** Structure representing a file descriptor
34- **grpc_pollset:** A set of one or more grpc_fds that are ‘polled’ for readable/writable/error events. One grpc_fd can be in multiple `grpc_pollset`s
35- **grpc_pollset_worker:** Structure representing a ‘polling thread’ - more specifically, the thread that calls `grpc_pollset_work()` API
36- **grpc_pollset_set:** A group of `grpc_fd`s, `grpc_pollset`s and `grpc_pollset_set`s (yes, a `grpc_pollset_set` can contain other `grpc_pollset_set`s)
37
38### Polling engine API
39
40#### grpc_fd
41- **grpc\_fd\_notify\_on\_[read|write|error]**
42  - Signature: `grpc_fd_notify_on_(grpc_fd* fd, grpc_closure* closure)`
43  - Register a [closure](https://github.com/grpc/grpc/blob/v1.15.1/src/core/lib/iomgr/closure.h#L67) to be called when the fd becomes readable/writable or has an error (In grpc parlance, we refer to this act as “arming the fd”)
44  - The closure is called exactly once per event. I.e once the fd becomes readable (or writable or error), the closure is fired and the fd is ‘unarmed’. To be notified again, the fd has to be armed again.
45
46- **grpc_fd_shutdown**
47  - Signature: `grpc_fd_shutdown(grpc_fd* fd)`
48  - Any current (or future) closures registered for readable/writable/error events are scheduled immediately with an error
49
50- **grpc_fd_orphan**
51  - Signature: `grpc_fd_orphan(grpc_fd* fd, grpc_closure* on_done, int* release_fd, char* reason)`
52  - Release the `grpc_fd` structure and call `on_done` closure when the operation is complete
53  - If `release_fd` is set to `nullptr`, then `close()` the underlying fd as well. If not, put the underlying fd in `release_fd` (and do not call `close()`)
54    - `release_fd` set to non-null in cases where the underlying fd is NOT owned by grpc core (like for example the fds used by C-Ares DNS resolver )
55
56#### grpc_pollset
57
58- **grpc_pollset_add_fd**
59  - Signature: `grpc_pollset_add_fd(grpc_pollset* ps, grpc_fd *fd)`
60  - Add fd to pollset
61    > **NOTE**: There is no `grpc_pollset_remove_fd`. This is because calling `grpc_fd_orphan()` will effectively remove the fd from all the pollsets it’s a part of
62
63- **grpc_pollset_work**
64  - Signature: `grpc_pollset_work(grpc_pollset* ps, grpc_pollset_worker** worker, grpc_core::Timestamp deadline)`
65    > **NOTE**: `grpc_pollset_work()` requires the pollset mutex to be locked before calling it. Shortly after calling `grpc_pollset_work()`, the function populates the `*worker` pointer (among other things) and releases the mutex. Once `grpc_pollset_work()` returns, the `*worker` pointer is **invalid** and should not be used anymore. See the code in `completion_queue.cc` to see how this is used.
66  - Poll the fds in the pollset for events AND return when ANY of the following is true:
67    - Deadline expired
68    - Some fds in the pollset were found to be readable/writable/error and those associated closures were ‘scheduled’ (but not necessarily executed)
69    - worker is “kicked” (see `grpc_pollset_kick` for more details)
70
71- **grpc_pollset_kick**
72  - Signature: `grpc_pollset_kick(grpc_pollset* ps, grpc_pollset_worker* worker)`
73  - “Kick the worker” i.e Force the worker to return from grpc_pollset_work()
74  - If `worker == nullptr`, kick ANY worker active on that pollset
75
76#### grpc_pollset_set
77
78- **grpc\_pollset\_set\_[add|del]\_fd**
79  - Signature: `grpc_pollset_set_[add|del]_fd(grpc_pollset_set* pss, grpc_fd *fd)`
80  - Add/Remove fd to the `grpc_pollset_set`
81
82- **grpc\_pollset\_set_[add|del]\_pollset**
83  - Signature: `grpc_pollset_set_[add|del]_pollset(grpc_pollset_set* pss, grpc_pollset* ps)`
84  - What does adding a pollset to a pollset_set mean ?
85    - It means that calling `grpc_pollset_work()` on the pollset will also poll all the fds in the pollset_set i.e semantically, it is similar to adding all the fds inside pollset_set to the pollset.
86    - This guarantee is no longer true once the pollset is removed from the pollset_set
87
88- **grpc\_pollset\_set_[add|del]\_pollset\_set**
89  - Signature: `grpc_pollset_set_[add|del]_pollset_set(grpc_pollset_set* bag, grpc_pollset_set* item)`
90  - Semantically, this is similar to adding all the fds in the ‘bag’ pollset_set to the ‘item’ pollset_set
91
92
93#### Recap:
94
95__Relation between grpc_pollset_worker, grpc_pollset and grpc_fd:__
96
97![image](../images/grpc-ps-pss-fd.png)
98
99__grpc_pollset_set__
100
101![image](../images/grpc-pss.png)
102
103
104## Polling Engine Implementations
105
106### epoll1
107
108![image](../images/grpc-epoll1.png)
109
110Code at `src/core/lib/iomgr/ev_epoll1_posix.cc`
111
112- The logic to choose a designated poller is quite complicated. Pollsets are internally sharded into what are called `pollset_neighborhood` (a structure internal to `epoll1` polling engine implementation). `grpc_pollset_workers` that call `grpc_pollset_work` on a given pollset are all queued in a linked-list against the `grpc_pollset`. The head of the linked list is called "root worker"
113
114- There are as many neighborhoods as the number of cores. A pollset is put in a neighborhood based on the CPU core of the root worker thread. When picking the next designated poller, we always try to find another worker on the current pollset. If there are no more workers in the current pollset, a `pollset_neighborhood` listed is scanned to pick the next pollset and worker that could be the new designated poller.
115  - NOTE: There is room to tune this implementation. All we really need is good way to maintain a list of `grpc_pollset_workers` with a way to group them per-pollset (needed to implement `grpc_pollset_kick` semantics) and a way randomly select a new designated poller
116
117- See [`begin_worker()`](https://github.com/grpc/grpc/blob/v1.15.1/src/core/lib/iomgr/ev_epoll1_linux.cc#L729) function to see how a designated poller is chosen. Similarly [`end_worker()`](https://github.com/grpc/grpc/blob/v1.15.1/src/core/lib/iomgr/ev_epoll1_linux.cc#L916) function is called by the worker that was just out of `epoll_wait()` and will have to choose a new designated poller)
118
119
120### Other polling engine implementations (poll and windows polling engine)
121- **poll** polling engine: gRPC's `poll` polling engine is quite complicated. It uses the `poll()` function to do the polling (and hence it is for platforms like osx where epoll is not available)
122  - The implementation is further complicated by the fact that poll() is level triggered (just keep this in mind in case you wonder why the code at `src/core/lib/iomgr/ev_poll_posix.cc` is written a certain/seemingly complicated way :))
123
124- **Polling engine on Windows**: Windows polling engine looks nothing like other polling engines
125  - Unlike the grpc polling engines for Unix systems (epoll1 and poll) Windows endpoint implementation and polling engine implementations are very closely tied together
126  - Windows endpoint read/write API implementations use the Windows IO API which require specifying an [I/O completion port](https://docs.microsoft.com/en-us/windows/desktop/fileio/i-o-completion-ports)
127  - In Windows polling engine’s grpc_pollset_work() implementation, ONE of the threads is chosen to wait on the I/O completion port while other threads wait on a condition variable (much like the turnstile polling in epoll1)
128
129