xref: /aosp_15_r20/external/perfmark/doc/fix-stop-task.md (revision 27e8546d0ef5f99cf83d5252272c7dd38d18d29a)
1# Unbalancing PerfMark Calls
2
3Tl;Dr: `PerfMark.stopTask()` will no longer use task name or tags.
4
5# Background
6
7PerfMark was designed as a replacement for an existing tracing library called
8JEndoscope.  JEndoscope used annotations to indicate which methods to trace.
9When the classes were loaded, a Java agent would rewrite the methods to include
10the tracing calls if JEndoscope was enabled.   One of the benefits of this API
11is that it isn't possible to forget to add the closing trace call, which
12indicates a span is complete.  Thus, the danger of mismatched start and stop
13trace calls was not a problem.
14
15PerfMark was designed to not rely on an agent for tracing, and thus could not
16use annotations to indicate which calls to trace.   To avoid the risk for API
17users, PerfMark offered matching stop trace calls that included the relevant
18task name and tag.  An example usage:
19
20```java
21PerfMark.startTask("makeRPC");
22invokeSomeRPC();
23PerfMark.stopTask("makeRPC");
24```
25
26When PerfMark returned the traces collected, it was expected that the
27start task name matched the stop task name.  It could verify these to ensure
28the trace calls were properly balanced.  It was expected that a warning could
29be emitted to aid the user in fixing the mismatch.
30
31There is an additional benefit to this API that was not needed in the previous
32JEndoscope model.   PerfMark allows tracing to be dynamically enabled and
33disable at runtime.   This means that tracing may be enabled in the middle of a
34start-stop pair.   This would mean that only the stop would be recorded,
35without knowing the starting task name.   Even if the calls were properly
36balanced, if only the stop edge is present, there isn't a way to reconstruct the
37name of the task.   Having the task name in both the start and the stop solves
38this problem.   (The same problem exists on the opposite side too, if the
39traces are read before the task completes.)
40
41Finally, having the task name in both the start and stop has a symmetry to it
42that made it look correct.  It was clear which task was being stopped, even if
43the start trace was many lines above.
44
45# Problems with stopTask().
46
47At the time of the design, the aforementioned choices made sense.   However,
48several problems arose from this pattern that can't easily be worked around.
49
50## Stuttering Arguments Makes Code Verbose
51
52The PerfMark API is very fast, but it accomplishes this by trusting the
53programmer to be careful.   Because of this, a safe invocation of the trace
54involves Wrapping the code in a try-finally block:
55
56```java
57PerfMark.startTask("makeRPC");
58try {
59  invokeSomeRPC();
60} finally {
61  PerfMark.stopTask("makeRPC");
62}
63```
64
65This costs an indent block, as well as 3 extra lines per trace call.   This
66puts us into verbosity debt.  Having multiple such calls makes the code pyramid
67off the page, decreasing the readability of the code.
68
69In order to repay this debt, dropping the redundant task name (and tag) makes
70the code less unpleasant to look at.
71
72While the duplicated names do have technical reasons (mentioned above), users
73feedback indicated the verbosity was more of a problem.  Given the relatively
74rare problems with mismatched tags and split traces, addressing verbosity is
75more important.
76
77## try-with-resources Imposes an Allocation cost
78
79One of the ways Java helps programmers clean up resources is the
80try-with-resources idiom.  PerfMark explored using this as an alternative to
81bare start-stop calls with the following usage:
82
83```java
84try (PerfMarkTask task = PerfMark.task("makeRPC")) {
85  invokeSomeRPC();
86}
87```
88
89In an ideal world, `PerfMarkTask` would be a no-op object when PerfMark is
90disabled, and be a singleton object when enabled.   This would in turn call the
91appropriate start-stop methods.  However, because the preferred start-stop calls
92*both* require the task name, a new object must be allocated to capture the name
93(and tag).   This forces the user to choose between a runtime cost and safe
94programming practice.
95
96## Lazy Task Names complicate API.
97
98Another way PerfMark explored making trace calls cheaper was to use a lambda
99base task naming call.  Sometimes task names are not compile time constant, and
100in fact may have a significant runtime cost.  Eagerly calculating these imposes
101a runtime cost, even when PerfMark is disabled.   The following API shows the
102proposed usage:
103
104```java
105PerfMark.startTask(rpc, RPC::nameAndId);
106// Same as PerfMark.startTask(rpc.nameAndId());
107```
108
109The use of a method handle avoids the unwanted calculation of the `nameAndId`
110string, which the JVM may not be able to eliminate otherwise.
111
112The problem becomes more awkward when encouraging users to use matching names
113for start and stop:
114
115```java
116PerfMark.startTask(rpc, RPC::nameAndId);
117invokeSomeRPC();
118PerfMark.stopTask(rpc, RPC::nameAndId);
119```
120
121Will the expensive function be calculated twice?  What happens if the function
122is not idempotent?  The user has used the API to the best of their ability, but
123the skew problems are still present.  Trying to solve this is difficult without
124imposing other costs, and being hard to reason about.
125
126### Permutations in API size.
127
128PerfMark has several overloads of the start - stop calls:
129
130* `startTask(String name);`
131* `startTask(String name, Tag tag);`
132* `startTask(String name, String subName);`
133
134
135While exploring lambdas to make lazy variants of these methods, the API grew
136substantially.  As the number of start methods grows, so too must the stop
137methods.   This makes usage more error prone since there may be 10s of possible
138overloads to invoke.  This is an unnecessary burden on the user, since the
139library already must deal with the possibility of mismatched names.
140
141
142## Agent Byte Code is more difficult
143
144While JEndoscope required an agent to work, PerfMark can be optionally enhanced
145by using one.  It can add line and file information about the calls
146when instrumenting classes.  However, it's difficult to rewrite this byte code
147when it depends on the possibly changing task name.  JEndoscope worked around
148this problem by forcing all tracing calls to be compile time constants, and
149forcing rewritten calls to use the `ldc` JVM instruction.
150
151# Proposed Change: one stopTask()
152
153To avoid the problems above, a singular `stopTask()` method overload is
154proposed.  The existing stopTask overloads will no longer be recommended, and
155behave as the single `stopTask()` method with extra info.  The tracing library
156will be updated to not expect this info.
157
158This solves each of the problems above.   The usage of PerfMark becomes less
159verbose.  There is no longer a runtime cost to using the
160try-with-resources idiom.  There is no possibility of skew between start and
161stop names and tags.  The API will grow slightly to accommodate this new method,
162but will prevent the doubling of methods.
163
164The benefits brought by having the name twice are seldom seen, and are paid
165at runtime.  The names passed to stopTask() are stored, but are never used
166when visualizing trace diagrams.
167