xref: /aosp_15_r20/frameworks/av/media/libheadtracking/README.md (revision ec779b8e0859a360c3d303172224686826e6e0e1)
1*ec779b8eSAndroid Build Coastguard Worker# Head-Tracking Library For Immersive Audio
2*ec779b8eSAndroid Build Coastguard Worker
3*ec779b8eSAndroid Build Coastguard WorkerThis library handles the processing of head-tracking information, necessary for
4*ec779b8eSAndroid Build Coastguard WorkerImmersive Audio functionality. It goes from bare sensor reading into the final
5*ec779b8eSAndroid Build Coastguard Workerpose fed into a virtualizer.
6*ec779b8eSAndroid Build Coastguard Worker
7*ec779b8eSAndroid Build Coastguard Worker## Basic Usage
8*ec779b8eSAndroid Build Coastguard Worker
9*ec779b8eSAndroid Build Coastguard WorkerThe main entry point into this library is the `HeadTrackingProcessor` class.
10*ec779b8eSAndroid Build Coastguard WorkerThis class is provided with the following inputs:
11*ec779b8eSAndroid Build Coastguard Worker
12*ec779b8eSAndroid Build Coastguard Worker- Head pose, relative to some arbitrary world frame.
13*ec779b8eSAndroid Build Coastguard Worker- Screen pose, relative to some arbitrary world frame.
14*ec779b8eSAndroid Build Coastguard Worker- Display orientation, defined as the angle between the "physical" screen and
15*ec779b8eSAndroid Build Coastguard Worker  the "logical" screen.
16*ec779b8eSAndroid Build Coastguard Worker- Transform between the screen and the sound stage.
17*ec779b8eSAndroid Build Coastguard Worker- Desired operational mode:
18*ec779b8eSAndroid Build Coastguard Worker    - Static: only the sound stage pose is taken into account. This will result
19*ec779b8eSAndroid Build Coastguard Worker      in an experience where the sound stage moved with the listener's head.
20*ec779b8eSAndroid Build Coastguard Worker    - World-relative: both the head pose and stage pose are taken into account.
21*ec779b8eSAndroid Build Coastguard Worker      This will result in an experience where the sound stage is perceived to be
22*ec779b8eSAndroid Build Coastguard Worker      located at a fixed place in the world.
23*ec779b8eSAndroid Build Coastguard Worker    - Screen-relative: the head pose, screen pose and stage pose are all taken
24*ec779b8eSAndroid Build Coastguard Worker      into account. This will result in an experience where the sound stage is
25*ec779b8eSAndroid Build Coastguard Worker      perceived to be located at a fixed place relative to the screen.
26*ec779b8eSAndroid Build Coastguard Worker
27*ec779b8eSAndroid Build Coastguard WorkerOnce inputs are provided, the `calculate()` method will make the following
28*ec779b8eSAndroid Build Coastguard Workeroutput available:
29*ec779b8eSAndroid Build Coastguard Worker
30*ec779b8eSAndroid Build Coastguard Worker- Stage pose, relative to the head. This aggregates all the inputs mentioned
31*ec779b8eSAndroid Build Coastguard Worker  above and is ready to be fed into a virtualizer.
32*ec779b8eSAndroid Build Coastguard Worker- Actual operational mode. May deviate from the desired one in cases where the
33*ec779b8eSAndroid Build Coastguard Worker  desired mode cannot be calculated (for example, as result of dropped messages
34*ec779b8eSAndroid Build Coastguard Worker  from one of the sensors).
35*ec779b8eSAndroid Build Coastguard Worker
36*ec779b8eSAndroid Build Coastguard WorkerA `recenter()` operation is also available, which indicates to the system that
37*ec779b8eSAndroid Build Coastguard Workerwhatever pose the screen and head are currently at should be considered as the
38*ec779b8eSAndroid Build Coastguard Worker"center" pose, or frame of reference.
39*ec779b8eSAndroid Build Coastguard Worker
40*ec779b8eSAndroid Build Coastguard Worker## Pose-Related Conventions
41*ec779b8eSAndroid Build Coastguard Worker
42*ec779b8eSAndroid Build Coastguard Worker### Naming and Composition
43*ec779b8eSAndroid Build Coastguard Worker
44*ec779b8eSAndroid Build Coastguard WorkerWhen referring to poses in code, it is always good practice to follow
45*ec779b8eSAndroid Build Coastguard Workerconventional naming, which highlights the reference and target frames clearly:
46*ec779b8eSAndroid Build Coastguard Worker
47*ec779b8eSAndroid Build Coastguard WorkerBad:
48*ec779b8eSAndroid Build Coastguard Worker
49*ec779b8eSAndroid Build Coastguard Worker```
50*ec779b8eSAndroid Build Coastguard WorkerPose3f headPose;
51*ec779b8eSAndroid Build Coastguard Worker```
52*ec779b8eSAndroid Build Coastguard Worker
53*ec779b8eSAndroid Build Coastguard WorkerGood:
54*ec779b8eSAndroid Build Coastguard Worker
55*ec779b8eSAndroid Build Coastguard Worker```
56*ec779b8eSAndroid Build Coastguard WorkerPose3f worldToHead;  // “world” is the reference frame,
57*ec779b8eSAndroid Build Coastguard Worker                     // “head” is the target frame.
58*ec779b8eSAndroid Build Coastguard Worker```
59*ec779b8eSAndroid Build Coastguard Worker
60*ec779b8eSAndroid Build Coastguard WorkerBy following this convention, it is easy to follow correct composition of poses,
61*ec779b8eSAndroid Build Coastguard Workerby making sure adjacent frames are identical:
62*ec779b8eSAndroid Build Coastguard Worker
63*ec779b8eSAndroid Build Coastguard Worker```
64*ec779b8eSAndroid Build Coastguard WorkerPose3f aToD = aToB * bToC * cToD;
65*ec779b8eSAndroid Build Coastguard Worker```
66*ec779b8eSAndroid Build Coastguard Worker
67*ec779b8eSAndroid Build Coastguard WorkerAnd similarly, inverting the transform simply flips the reference and target:
68*ec779b8eSAndroid Build Coastguard Worker
69*ec779b8eSAndroid Build Coastguard Worker```
70*ec779b8eSAndroid Build Coastguard WorkerPose3f aToB = bToA.inverse();
71*ec779b8eSAndroid Build Coastguard Worker```
72*ec779b8eSAndroid Build Coastguard Worker
73*ec779b8eSAndroid Build Coastguard Worker### Twist
74*ec779b8eSAndroid Build Coastguard Worker
75*ec779b8eSAndroid Build Coastguard Worker“Twist” is to pose what velocity is to distance: it is the time-derivative of a
76*ec779b8eSAndroid Build Coastguard Workerpose, representing the change in pose over a short period of time. Its naming
77*ec779b8eSAndroid Build Coastguard Workerconvention always states one frame, e.g.:
78*ec779b8eSAndroid Build Coastguard WorkerTwist3f headTwist;
79*ec779b8eSAndroid Build Coastguard Worker
80*ec779b8eSAndroid Build Coastguard WorkerThis means that this twist represents the head-at-time-T to head-at-time-T+dt
81*ec779b8eSAndroid Build Coastguard Workertransform. Twists are not composable in the same way as poses.
82*ec779b8eSAndroid Build Coastguard Worker
83*ec779b8eSAndroid Build Coastguard Worker### Frames of Interest
84*ec779b8eSAndroid Build Coastguard Worker
85*ec779b8eSAndroid Build Coastguard WorkerThe frames of interest in this library are defined as follows:
86*ec779b8eSAndroid Build Coastguard Worker
87*ec779b8eSAndroid Build Coastguard Worker#### Head
88*ec779b8eSAndroid Build Coastguard Worker
89*ec779b8eSAndroid Build Coastguard WorkerThis is the listener’s head. The origin is at the center point between the
90*ec779b8eSAndroid Build Coastguard Workerear-drums, the X-axis goes from left ear to right ear, Y-axis goes from the back
91*ec779b8eSAndroid Build Coastguard Workerof the head towards the face and Z-axis goes from the bottom of the head to the
92*ec779b8eSAndroid Build Coastguard Workertop.
93*ec779b8eSAndroid Build Coastguard Worker
94*ec779b8eSAndroid Build Coastguard Worker#### Screen
95*ec779b8eSAndroid Build Coastguard Worker
96*ec779b8eSAndroid Build Coastguard WorkerThis is the primary screen that the user will be looking at, which is relevant
97*ec779b8eSAndroid Build Coastguard Workerfor some Immersive Audio use-cases, such as watching a movie. We will follow a
98*ec779b8eSAndroid Build Coastguard Workerdifferent convention for this frame than what the Sensor framework uses. The
99*ec779b8eSAndroid Build Coastguard Workerorigin is at the center of the screen. X-axis goes from left to right, Z-axis
100*ec779b8eSAndroid Build Coastguard Workergoes from the screen bottom to the screen top, Y-axis goes “into” the screen (
101*ec779b8eSAndroid Build Coastguard Workerfrom the direction of the viewer). The up/down/left/right of the screen are
102*ec779b8eSAndroid Build Coastguard Workerdefined as the logical directions used for display. So when flipping the display
103*ec779b8eSAndroid Build Coastguard Workerorientation between “landscape” and “portrait”, the frame of reference will
104*ec779b8eSAndroid Build Coastguard Workerchange with respect to the physical screen.
105*ec779b8eSAndroid Build Coastguard Worker
106*ec779b8eSAndroid Build Coastguard Worker#### Stage
107*ec779b8eSAndroid Build Coastguard Worker
108*ec779b8eSAndroid Build Coastguard WorkerThis is the frame of reference used by the virtualizer for positioning sound
109*ec779b8eSAndroid Build Coastguard Workerobjects. It is not associated with any physical frame. In a typical
110*ec779b8eSAndroid Build Coastguard Workermulti-channel scenario, the listener is at the origin, the X-axis goes from left
111*ec779b8eSAndroid Build Coastguard Workerto right, Y-axis from back to front and Z-axis from down to up. For example, a
112*ec779b8eSAndroid Build Coastguard Workerfront-right speaker is located at positive X, Y and Z=0, a height speaker will
113*ec779b8eSAndroid Build Coastguard Workerhave a positive Z.
114*ec779b8eSAndroid Build Coastguard Worker
115*ec779b8eSAndroid Build Coastguard Worker#### World
116*ec779b8eSAndroid Build Coastguard Worker
117*ec779b8eSAndroid Build Coastguard WorkerIt is sometimes convenient to use an intermediate frame when dealing with
118*ec779b8eSAndroid Build Coastguard Workerhead-to-screen transforms. The “world” frame is a frame of reference in the
119*ec779b8eSAndroid Build Coastguard Workerphysical world, relative to which we can measure the head pose and screen pose.
120*ec779b8eSAndroid Build Coastguard WorkerIt is arbitrary, but expected to be stable (fixed).
121*ec779b8eSAndroid Build Coastguard Worker
122*ec779b8eSAndroid Build Coastguard Worker## Processing Description
123*ec779b8eSAndroid Build Coastguard Worker
124*ec779b8eSAndroid Build Coastguard Worker![Pose processing graph](PoseProcessingGraph.png)
125*ec779b8eSAndroid Build Coastguard Worker
126*ec779b8eSAndroid Build Coastguard WorkerThe diagram above illustrates the processing that takes place from the inputs to
127*ec779b8eSAndroid Build Coastguard Workerthe outputs.
128*ec779b8eSAndroid Build Coastguard Worker
129*ec779b8eSAndroid Build Coastguard Worker### Predictor
130*ec779b8eSAndroid Build Coastguard Worker
131*ec779b8eSAndroid Build Coastguard WorkerThe Predictor block gets pose + twist (pose derivative) and extrapolates to
132*ec779b8eSAndroid Build Coastguard Workerobtain a predicted head pose (w/ given latency).
133*ec779b8eSAndroid Build Coastguard Worker
134*ec779b8eSAndroid Build Coastguard Worker### Bias
135*ec779b8eSAndroid Build Coastguard Worker
136*ec779b8eSAndroid Build Coastguard WorkerThe Bias blocks establish the reference frame for the poses by having the
137*ec779b8eSAndroid Build Coastguard Workerability to set the current pose as the reference for future poses (recentering).
138*ec779b8eSAndroid Build Coastguard Worker
139*ec779b8eSAndroid Build Coastguard Worker### Orientation Compensation
140*ec779b8eSAndroid Build Coastguard Worker
141*ec779b8eSAndroid Build Coastguard WorkerThe Orientation Compensation block applies the display orientation to the screen
142*ec779b8eSAndroid Build Coastguard Workerpose to obtain the pose of the “logical screen” frame, in which the Y-axis is
143*ec779b8eSAndroid Build Coastguard Workerpointing in the direction of the logical screen “up” rather than the physical
144*ec779b8eSAndroid Build Coastguard Workerone.
145*ec779b8eSAndroid Build Coastguard Worker
146*ec779b8eSAndroid Build Coastguard Worker### Screen-Relative Pose
147*ec779b8eSAndroid Build Coastguard Worker
148*ec779b8eSAndroid Build Coastguard WorkerThe Screen-Relative Pose block is provided with a head pose and a screen pose
149*ec779b8eSAndroid Build Coastguard Workerand estimates the pose of the head relative to the screen. Optionally, this
150*ec779b8eSAndroid Build Coastguard Workermodule may indicate that the user is likely not in front of the screen via the
151*ec779b8eSAndroid Build Coastguard Worker“valid” output.
152*ec779b8eSAndroid Build Coastguard Worker
153*ec779b8eSAndroid Build Coastguard Worker### Stillness Detector
154*ec779b8eSAndroid Build Coastguard Worker
155*ec779b8eSAndroid Build Coastguard WorkerThe stillness detector blocks detect when their incoming pose stream has been
156*ec779b8eSAndroid Build Coastguard Workerstable for a given amount of time (allowing for a configurable amount of error).
157*ec779b8eSAndroid Build Coastguard WorkerWhen the head is considered still, we would trigger a recenter operation
158*ec779b8eSAndroid Build Coastguard Worker(“auto-recentering”) and when the screen is considered not still, the mode
159*ec779b8eSAndroid Build Coastguard Workerselector would use this information to force static mode.
160*ec779b8eSAndroid Build Coastguard Worker
161*ec779b8eSAndroid Build Coastguard Worker### Mode Selector
162*ec779b8eSAndroid Build Coastguard Worker
163*ec779b8eSAndroid Build Coastguard WorkerThe Mode Selector block aggregates the various sources of pose information into
164*ec779b8eSAndroid Build Coastguard Workera head-to-stage pose that is going to feed the virtualizer. It is controlled by
165*ec779b8eSAndroid Build Coastguard Workerthe “desired mode” signal that indicates whether the preference is to be in
166*ec779b8eSAndroid Build Coastguard Workereither static, world-relative or screen-relative.
167*ec779b8eSAndroid Build Coastguard Worker
168*ec779b8eSAndroid Build Coastguard WorkerThe actual mode may diverge from the desired mode. It is determined as follows:
169*ec779b8eSAndroid Build Coastguard Worker
170*ec779b8eSAndroid Build Coastguard Worker- If the desired mode is static, the actual mode is static.
171*ec779b8eSAndroid Build Coastguard Worker- If the desired mode is world-relative:
172*ec779b8eSAndroid Build Coastguard Worker    - If head and screen poses are fresh and the screen is stable (stillness
173*ec779b8eSAndroid Build Coastguard Worker      detector output is true), the actual mode is world-relative.
174*ec779b8eSAndroid Build Coastguard Worker    - Otherwise the actual mode is static.
175*ec779b8eSAndroid Build Coastguard Worker- If the desired mode is screen-relative:
176*ec779b8eSAndroid Build Coastguard Worker    - If head and screen poses are fresh and the ‘valid’ signal is asserted, the
177*ec779b8eSAndroid Build Coastguard Worker      actual mode is screen-relative.
178*ec779b8eSAndroid Build Coastguard Worker    - Otherwise, apply the same rules as the desired mode being world-relative.
179*ec779b8eSAndroid Build Coastguard Worker
180*ec779b8eSAndroid Build Coastguard Worker### Rate Limiter
181*ec779b8eSAndroid Build Coastguard Worker
182*ec779b8eSAndroid Build Coastguard WorkerA Rate Limiter block is applied to the final output to smooth out any abrupt
183*ec779b8eSAndroid Build Coastguard Workertransitions caused by any of the following events:
184*ec779b8eSAndroid Build Coastguard Worker
185*ec779b8eSAndroid Build Coastguard Worker- Mode switch.
186*ec779b8eSAndroid Build Coastguard Worker- Display orientation switch.
187*ec779b8eSAndroid Build Coastguard Worker- Recenter operation.
188