1TP2: Dynamically Generated Cacheable xDS Resources
2----
3* Author(s): markdroth, htuch
4* Approver: htuch
5* Implemented in: <xDS client, ...>
6* Last updated: 2022-02-09
7
8## Abstract
9
10This xRFC proposes a new mechanism to allow xDS servers to
11dynamically generate the contents of xDS resources for individual
12clients while at the same time preserving cacheability.  Unlike the
13context parameter mechanism that is part of the new xDS naming scheme (see
14[xRFC TP1](TP1-xds-transport-next.md)), the mechanism described in
15this proposal is visible only to the transport protocol layer, not to the
16data model layer.  This means that if a resource has a parameter that
17affects its contents, that parameter is not part of the resource's name,
18which means that any other resources that refer to the resource do not
19need to encode the parameter.  Therefore, use of these parameters is
20not viral, thus making the mechanism much easier to use.
21
22## Background
23
24There are many use-cases where a control plane may need to
25dynamically generate the contents of xDS resources to tailor the
26resources for individual clients.  One common case is where the
27server has a list of routes to configure, but individual routes in
28the list may be included or excluded based on the client's dynamic
29selection parameters (today, conveyed as node metadata).  Thus,
30the server needs to generate a slightly different version of the
31`RouteConfiguration` for clients based on the parameters they send.  (See
32https://cloud.google.com/traffic-director/docs/configure-advanced-traffic-management#config-filtering-metadata
33for an example.)
34
35The new xDS naming scheme described in [xRFC TP1](TP1-xds-transport-next.md)
36provides a mechanism called context parameters, which is intended to move all
37parameters that affect resource contents into the resource name, thus adding
38cacheability to the xDS ecosystem.  However, this approach means that these
39parameters become part of the resource graph on an individual client, which
40causes a number of problems:
41- Dynamic context parameters are viral, spreading from a given resource
42  to all earlier resources in the resource graph.  For example, if
43  multiple variants of an EDS resource are needed, there need to be two
44  different instances of the resource with different names,
45  distinguished by a context parameter.  But because the contents of the
46  CDS resource include the name of the corresponding EDS resource name,
47  that means that we also need two different versions of the CDS
48  resource, also distinguished by the same context parameter.  And then
49  we need two different versions of the RDS resource, since that needs
50  to refer to the CDS resource.  And then two different versions of the
51  LDS resource, which refers to the RDS resource.  This causes a
52  combinatorial explosion in the number of resources needed, and it adds
53  complexity to xDS servers, which need to construct the right variants
54  of every resource and make sure that they refer to each other using
55  the right names.
56- In the new xDS naming scheme, context parameters are exact-match-only.
57  This means that if a control plane wants to provide the same resource
58  both with and without a given parameter, it needs to publish two
59  versions of the resource, each with a different name, even though the
60  contents are the same, which can also cause unnecessarily poor cache
61  performance.  For example, in the "dynamic route selection" use-case,
62  let's say that every client uses two different dynamic selection
63  parameters, `env` (which can have one of the values `prod`, `canary`, or
64  `test`) and `version` (which can have one of the values `v1`, `v2`, or
65  `v3`).  Now let's say that there is a `RouteConfiguration` with one route
66  that should be selected via the parameter `env=prod` and another route that
67  should be selected via the parameter `version=v1`. This means that there
68  are only four variants of the `RouteConfiguration` resource (`{env!=prod,
69  version!=v1}`, `{env=prod, version!=v1}`, `{env!=prod, version=v1}`, and
70  `{env=prod, version=v1}`).  However, the exact-match semantics means
71  that there will have to be nine different versions of this resource,
72  one for each combination of values of the two parameters.
73
74### Related Proposals:
75* [xRFC TP1: new xDS naming scheme](TP1-xds-transport-next.md)
76
77## Proposal
78
79This document proposes an alternative approach.  We start with the
80observation that resource names are used in two places:
81
82- The **transport protocol** layer, which needs to identify the right
83  resource contents to send for a given resource name, often obtaining
84  those resource contents from a cache.
85- The **resource graph** used on an individual client, where there are a
86  set of data model resources that refer to each other by name.  For
87  example, a `RouteConfiguration` refers to individual `Cluster` resources
88  by name.
89
90The use-cases that we're aware of for dynamic resource selection have
91an important property that we can take advantage of.  When multiple
92variants of a given resource exist, any given client will only ever use
93one of those variants at a given time.  That means that the parameters
94that affect which variant of the resource is used are required by the
95transport protocol, but they are not required by the client's data model.
96
97It should be noted that caching xDS proxies, unlike "leaf" clients, will
98need to track multiple variants of each resource, since a given caching
99proxy may be serving clients that need different variants of a given
100resource.  However, since caching xDS proxies deal with resources only
101at the transport protocol layer, the resource graph layer is
102essentially irrelevant in that case.
103
104### Dynamic Parameters
105
106With the above property in mind, this document proposes the following
107data structures:
108- **Dynamic parameters**, which are a set of key/value pairs sent by the
109  client when subscribing to a resource.
110- **Dynamic parameter constraints**, which are a set of criteria that
111  can be used to determine whether a set of dynamic parameters matches
112  the constraints.  These constraints are considered part of the unique
113  identifier for an xDS resource (along with the resource name itself)
114  on xDS servers, xDS clients, and xDS caching proxies.  This provides a
115  mechanism to represent multiple variants of a given resource in a
116  cacheable way.
117
118Both of these data structures are used in the xDS transport protocol,
119but they are not part of the resource name and therefore do not appear as
120part of the resource graph.
121
122When a client subscribes to a resource, it specifies a set of dynamic
123parameters.  In response, the server will send a resource whose dynamic
124parameter constraints match the dynamic parameters in the subscription
125request.  A client that subscribes to multiple variants of a resource (such
126as a caching xDS proxy) will use the dynamic parameter constraints on the
127returned resource to determine which of its subscriptions the resource is
128associated with.
129
130Dynamic parameters, unlike context parameters, will not be
131exact-match-only.  Dynamic parameter constraints will be able to represent
132certain simple types of flexible matching, such as matching an exact
133value or the existance of a key, and simple AND and OR combinations
134of constraints.  This flexible matching semantic means that there may be
135ambiguities when determining which resources match which subscriptions,
136which are discussed below.
137
138#### Constraints Representation
139
140Dynamic parameter constraints will be represented in protobuf form as follows:
141
142```proto
143message DynamicParameterConstraints {
144  // A single constraint for a given key.
145  message SingleConstraint {
146    message Exists {}
147    // The key to match against.
148    string key = 1;
149    // How to match.
150    oneof constraint_type {
151      // Matches this exact value.
152      string value = 2;
153      // Key is present (matches any value except for the key being absent).
154      Exists exists = 3;
155    }
156  }
157
158  message ConstraintList {
159    repeated DynamicParameterConstraints constraints = 1;
160  }
161
162  oneof type {
163    // A single constraint to evaluate.
164    SingleConstraint constraint = 1;
165
166    // A list of constraints to be ORed together.
167    ConstraintList or_constraints = 2;
168
169    // A list of constraints to be ANDed together.
170    ConstraintList and_constraints = 3;
171
172    // The inverse (NOT) of a set of constraints.
173    DynamicParameterConstraints not_constraints = 4;
174  }
175}
176```
177
178#### Background: xDS Client and Server Architecture
179
180Before discussing where dynamic parameter matching is performed, it is
181useful to provide some additional background on xDS client and server
182architecture, independent of this design.
183
184The xDS transport protocol is fundamentally a mechanism that matches up
185subscriptions provided by a client with resources provided by a server.
186The client controls what it is subscribing to at any given time,
187and the server must send the resources from its database that match the
188currently active subscriptions.
189
190An xDS server may be thought of as containing a database of resources,
191in which each resource has an associated list of clients that are currently
192subscribed to that resource.  Whenever a client subscribes to a resource,
193the server will send the current version of that resource to the client,
194and it will add the client to the list of clients currently subscribed to
195that resource.  Whenever the server receives a new version of that resource
196in its database, it will send the update to all clients that are currently
197subscribed to that resource.  Whenever a client unsubscribes from a
198resource, it is removed from the list of clients subscribed to that
199resource, so that the server knows not to send it subsequent updates for
200that resource.
201
202This same paradigm of matching up subscriptions with resources actually
203applies to the xDS client as well.  Because the xDS transport protocol
204does not require a server to resend a resource unless its contents have
205changed, clients need to cache the most recently seen value locally in
206case they need it again.  In general, the best way to structure an xDS
207transport protocol client is as an API where the caller can start or
208stop subscribing to a given resource at any time, and the xDS client will
209handle the wire-level communication and cache the resources returned by
210the server.  The cache in the xDS client functions very similarly to the
211database in an xDS server: each cache entry contains the current value
212of the resource received from the xDS server and a list of subscribers to
213that resource.  When the xDS client sees the first subscription start for
214a given resource, it will create the cache entry for that resource, add
215the subscriber to the list of subscribers for that resource, and request
216that resource from the xDS server.  When it receives the resource from
217the server, it will store the resource in the cache entry and deliver
218it to all subscribers.  When the xDS client sees a second subscription
219start for the same resource, it will add the new subscriber to the list
220of subscribers for that resource and immediately deliver the cached value
221of the resource to the new subscriber.  Whenever the server sends an
222updated version of the resource, the xDS client will deliver the update
223to all subscribers.  When all subscriptions are stopped, the xDS client
224will unsubscribe from the resource on the wire, so that the xDS server
225knows to stop sending updates for that resource to the client.
226
227In effect, the logic in an xDS client is essentially the same as that in an
228xDS server, with only two differences.  First, subscriptions come from local
229API callers instead of downstream RPC clients.  And second, the database does
230not contain the authoritative source of the resource contents but rather cached
231values obtained from the server, and the database entries are removed when
232the last subscription for a given resource is stopped.
233
234The logic in a caching xDS proxy is also essentially the same as that in an xDS
235server, with only one difference.  Just like an xDS client, the database
236does not contain the authoritative source of the resource contents but
237rather cached values obtained from the server.  However, like an xDS
238server, subscriptions do come from downstream RPC clients rather than local
239API callers.
240
241The following table summarizes this structure:
242
243<table>
244
245  <tr>
246    <th>xDS Node Type</th>
247    <th>Source of Subscriptions</th>
248    <th>Source of Resource Contents</th>
249  </tr>
250
251  <tr>
252    <td>xDS Server</td>
253    <td>downstream xDS clients</td>
254    <td>authoritative data</td>
255  </tr>
256
257  <tr>
258    <td>xDS Client</td>
259    <td>local API callers</td>
260    <td>cached data from upstream xDS server</td>
261  </tr>
262
263  <tr>
264    <td>xDS Caching Proxy</td>
265    <td>downstream xDS clients</td>
266    <td>cached data from upstream xDS server</td>
267  </tr>
268
269</table>
270
271#### Where Dynamic Parameter Matching is Performed
272
273Because of the architecture described above, evaluation of matching between
274a set of dynamic parameters and a set of constraints may need to be
275performed by both xDS servers and xDS clients.
276
277xDS servers that support multiple variants of a resource perform this
278matching when deciding which variant of a given resource to return for a
279given subscription request.  xDS servers that support multiple variants of
280a resource MUST send the dynamic parameter constraints associated with a
281resource variant to the client along with that variant.  Any server
282implementation that fails to do so is in violation of this specification.
283
284xDS caching proxies that support multiple variants of a resource also
285perform this matching when deciding which variant of a given resource to
286return for a given subscription request.  Caching proxies MUST store the
287dynamic parameter constraints obtained from the upstream server along with
288each resource variant, which they will use when deciding which variant of a
289given resource to return for a given subscription request from a downstream
290xDS client.  Caching proxies MUST send those dynamic parameter constraints to
291the downstream client when sending that variant of the resource.
292
293Note this design assumes that a given leaf client will use a fixed set of
294dynamic parameters, typically configured in a local bootstrap file, for all
295subscriptions over its lifetime.  Given that, it is not strictly necessary
296for a leaf client to perform this matching, since it should only ever
297receive a single variant of a given resource, which should always match the
298dynamic parameters it subscribed with.  However, clients MAY perform this
299matching, which may be useful in cases where the same cache implementation
300is used on both a leaf client and a caching proxy.
301
302It is important to note that the dynamic parameter matching behavior becomes
303an inherent part of the xDS transport protocol.  xDS servers that interact
304only with leaf clients may be tempted not to send dynamic parameter
305constraints to the client along with the chosen resource variant, and
306leaf clients may accept that.  However, as soon as that server wants to
307start interacting with a caching proxy or a client that does verify the
308constraints, it will run into problems.  xDS server implementors are
309strongly encouraged not to omit the dynamic parameter constraints in their
310responses.
311
312#### Example: Basic Dynamic Parameters Usage
313
314Let's say that the clients are currently categorized by the parameter
315`env`, whose value is either `prod` or `test`.  So any given client will
316send one of the following sets of dynamic parameters:
317- `{env=prod}`
318- `{env=test}`
319
320Now let's say that the server has two variants of a given resource, and
321the variants have the following dynamic parameter constraints:
322
323```textproto
324// For {env=prod}
325{constraint:{key:"env" value:"prod"}}
326
327// For {env=test}
328{constraint:{key:"env" value:"test"}}
329```
330
331When a client subscribes to this resource with dynamic parameters
332`{env=prod}`, the server will return the first variant; when a client
333subscribes to this resource with dynamic parameters `{env=test}`, the
334server will return the second variant.  When the client receives the
335returned resource, it will verify that the dynamic parameters it sent
336match the constraints of the returned resource.
337
338#### Unconstrained Parameters
339
340Note that clients may send dynamic parameters that are not specified in
341the constraints on the resulting resource.  If a set of constraints does
342not specify any constraint for a given parameter sent by the client, that
343parameter does not prevent the constraints from matching.  This allows
344clients to add new parameters before a server begins using them.
345(In general, we expect clients to send a lot of keys that may not
346actually be used by the server, since deployments often divide their
347clients into categories before they have a need to differentiate the
348configs for those categories.)
349
350Continuing the example above, if the server wanted to send the same
351contents for a given resource to both `{env=prod}` and `{env=test}` clients,
352it would have only a single variant of that resource, and that variant would
353not have any constraints.  The server would therefore send that variant to
354all clients, and the clients would consider it a match for the constraints
355that they subscribed with.
356
357#### Example: Transition Scenarios
358
359Consider what happens in transition scenarios, where a deployment initially
360groups its clients on a single key but then wants to add a second key.
361The second key needs to be added both in the constraints on the server
362side and in the clients' configurations, but those two changes cannot
363occur atomically.
364
365Let's start with the above example where the clients are already divided into
366`env=prod` and `env=test`.  Let's say that now the deployment wants to add
367an additional key called `version`, whose value will be either `v1` or `v2`,
368so that it can further subdivide its clients' configs.
369
370The first step is to add the new key on the clients, so that any given client
371will send one of the following sets of dynamic parameters:
372- `{env=prod, version=v1}`
373- `{env=prod, version=v2}`
374- `{env=test, version=v1}`
375- `{env=test, version=v2}`
376
377At this point, the server still does not have a variant of any resource
378that has constraints for the `version` key; it has only variants that
379differentiate between `env=prod` and `env=test`.  But the addition of
380the new key on the clients will not affect which resource variant is
381sent to each client, because it does not affect the matching.  Clients
382sending `{env=prod, version=v1}` or `{env=prod, version=v2}` will both get
383the resource variant for `env=prod`, and clients sending
384`{env=test, version=v1}` or `{env=test, version=v2}` will both get the
385resource variant for `env=test`.
386
387Once the clients have all been updated to send the new key, then the
388server can be updated to have different resource variants based on the
389`version` key.  For example, it may replace the single resource variant
390for `env=prod` with the following two variants:
391
392```textproto
393// For {env=prod, version=v1}
394{and_constraints:[
395  {constraint:{key:"env" value:"prod"}},
396  {constraint:{key:"version" value:"v1"}}
397]}
398
399// For {env=prod, version=v2}
400{and_constraints:[
401  {constraint:{key:"env" value:"prod"}},
402  {constraint:{key:"version" value:"v2"}}
403]}
404```
405
406Once that change happens on the server, the clients will start getting
407the correct variant of the resource based on their `version` key.
408
409Note that in order to avoid causing matching ambiguity, the server must
410handle this kind of change by sending the deletion of the original resource
411variant and the creation of the replacement resource variants in a
412single xDS response.  This will allow the client to atomically apply the
413change to its database.  For any given subscriber, the client should
414present the change as if there was only one variant of the resource and
415that variant had just been updated.
416
417#### Matching Ambiguity
418
419As mentioned above, this design does introduce the possibility of
420matching ambiguity in certain cases, where there may be more than one
421variant of a resource that matches the dynamic parameters specified by
422the client.
423
424If an xDS transport protocol implementation does encounter multiple
425possible matching variants of a resource, its behavior is undefined.
426In the following sections, we evaluate the cases where that can occur
427and specify how each one will be addressed.
428
429##### Adding a New Key on the Server First
430
431Consider what would happen in the above transition scenario if we changed
432the server to have multiple variants of a resource differentiated by
433the new `version` key before all of the clients were upgraded to use
434that key.  For clients sending `{env=prod}`, there would be two possible
435matching variants of the resource, one for `version=v1` and another for
436`version=v2`, and there would be no way to determine which variant to
437use for that client.
438
439As stated above, we are optimizing for the case where new keys are added
440on clients first, since that is expected to be the common scenario.
441However, there may be cases where it is not feasible to have all clients
442start sending a new key before the server needs to start making use of
443that key.
444
445For example, let's say that this transition scenario is occurring in
446an environment where the xDS server is controlled by one team and the
447clients are controlled by various other teams, so it's not feasible to
448force all clients to start sending the new `version` key all at once.
449But there is one particular client team that is eager to start using
450the new `version` key to differentiate the configs of their clients,
451and they don't want to wait for all of the other client teams to start
452sending the new key.
453
454Consider what happens if the server simply adds a variant of the
455resource with the new key, while leaving the original resource variant
456in place:
457
458```textproto
459// Existing variant for older clients that are not yet sending the
460// version key.
461{constraint:{key:"env" value:"prod"}}
462
463// New variant intended for clients sending the version key.
464{and_constraints:[
465  {constraint:{key:"env" value:"prod"}},
466  {constraint:{key:"version" value:"v1"}}
467]}
468```
469
470This will work fine for older clients that are not yet sending the
471`version` key, because their dynamic parameters will not match the new
472variant's constraints.  However, newer clients that are sending dynamic
473parameters `{env=prod, version=v1}` will run into ambiguity: those
474parameters can match either of the above variants of the resource.
475
476This situation will be avoided via a best practice that all authoritative
477xDS servers should have **all variants of a given resource specify
478constraints for the same set of keys**.
479
480In order to make this work for the case where the server starts sending
481the constraint on the new key before all clients are sending it, we
482provide the `exists` matcher, which will allow the server to specify
483a default explicitly for clients that are not yet sending a new key.
484In this example, the server would actually have the following two
485variants:
486
487```textproto
488// Existing variant for older clients that are not yet sending the
489// version key.
490{and_constraints:[
491  {constraint:{key:"env" value:"prod"}},
492  {not_constraint:
493    {constraint:{key:"version" exists:{}}}
494  }
495]}
496
497// New variant for clients sending the version key.
498{and_constraints:[
499  {constraint:{key:"env" value:"prod"}},
500  {constraint:{key:"version" value:"v1"}}
501]}
502```
503
504This allows maintaining the requirement that all variants of a given
505resource have constraints on the same set of keys, while also allowing
506the server to explicitly provide a result for older clients that do not
507yet send the new key.
508
509##### Variants With Overlapping Constraint Values
510
511There is also a possible ambiguity that can occur if a server provides
512multiple variants of a resource whose constraints for a given key
513overlap in terms of the values they can match.  For example, let's say
514that a server has the following two variants of a resource:
515
516```textproto
517// Matches {env=prod} or {env=test}.
518{or_constraints:[
519  {constraint:{key:"env" value:"prod"}},
520  {constraint:{key:"env" value:"test"}}
521]}
522
523// Matches {env=qa} or {env=test}.
524{or_constraints:[
525  {constraint:{key:"env" value:"qa"}},
526  {constraint:{key:"env" value:"test"}}
527]}
528```
529
530Now consider what happens if a client subscribes with dynamic parameters
531`{env=test}`.  Those dynamic parameters can match either of the above
532variants of the resource.
533
534This situation will be avoided via a best practice that all authoritative
535xDS servers should have **all variants of a given resource specify
536non-overlapping constraints for the same set of keys**.  Control planes
537must not accept a set of resources that violates this requirement.
538
539#### Matching Behavior and Best Practices
540
541We advise deployments to avoid ambiguity through the following best practices:
542- Whenever there are multiple variants of a resource, all variants must
543  list the same set of keys.  This allows the server to ignore constraints
544  on keys sent by the client that do not affect the choice of variant
545  without causing ambiguity in cache misses.  Servers may use the
546  `exists` mechanism to provide backward compatibility for clients that
547  are not yet sending a newly added key.
548- The constraints on each variant of a given resource must be mutually
549  exclusive.  For example, if one variant of a resource matches a given key
550  with values "foo" or "bar", and another variant matches that same key
551  with values "bar" or "baz", that would cause ambiguity, because both
552  variants would match the value "bar".
553- There must be a variant of the resource for every value of a key that is
554  going to be present.  For example, if clients will send constraints on the
555  `env` key requiring the value to be one of `prod`, `test`, or `qa`, then
556  you must have each of those three variants of the resource.  (Failure
557  to do this will result in the server acting as if the requested
558  resource does not exist.)
559
560#### Transport Protocol Changes
561
562The following message will be added to represent a subscription to a
563resource by name with associated dynamic parameters:
564
565```proto
566// A specification of a resource used when subscribing or unsubscribing.
567message ResourceLocator {
568  // The resource name to subscribe to.
569  string name = 1;
570
571  // A set of dynamic parameters used to match against the dynamic parameter
572  // constraints on the resource. This allows clients to select between
573  // multiple variants of the same resource.
574  map<string, string> dynamic_parameters = 2;
575}
576```
577
578The following new field will be added to `DiscoveryRequest`, to allow clients
579to specify dynamic parameters when subscribing to a resource:
580
581```proto
582  // Alternative to resource_names field that allows specifying cache
583  // keys along with each resource name. Clients that populate this field
584  // must be able to handle responses from the server where resources are
585  // wrapped in a Resource message.
586  repeated ResourceLocator resource_locators = 7;
587```
588
589Similarly, the following fields will be added to `DeltaDiscoveryRequest`:
590
591```proto
592  // Alternative to resource_names_subscribe field that allows specifying cache
593  // keys along with each resource name.
594  repeated ResourceLocator resource_locators_subscribe = 8;
595
596  // Alternative to resource_names_unsubscribe field that allows specifying cache
597  // keys along with each resource name.
598  repeated ResourceLocator resource_locators_unsubscribe = 9;
599```
600
601The following message will be added to represent the name of a specific
602variant of a resource:
603
604```proto
605// Specifies a concrete resource name.
606message ResourceName {
607  // The name of the resource.
608  string name = 1;
609
610  // Dynamic parameter constraints associated with this resource. To be used by
611  // client-side caches (including xDS proxies) when matching subscribed
612  // resource locators.
613  DynamicParameterConstraints dynamic_parameter_constraints = 2;
614}
615```
616
617The following field will be added to the `Resource` message, to allow the
618server to return the dynamic parameters associated with each resource:
619
620```proto
621  // Alternative to the *name* field, to be used when the server supports
622  // multiple variants of the named resource that are differentiated by
623  // dynamic parameter constraints.
624  // Only one of *name* or *resource_name* may be set.
625  ResourceName resource_name = 8;
626```
627
628And finally, the following field will be added to `DeltaDiscoveryResponse`:
629
630```proto
631  // Alternative to removed_resources that allows specifying which variant of
632  // a resource is being removed. This variant must be used for any resource
633  // for which dynamic parameter constraints were sent to the client.
634  repeated ResourceName removed_resource_names = 8;
635```
636
637### Client Configuration
638
639Client configuration is outside of the scope of this design.  However,
640this section lists some considerations for client implementors to take
641into account.
642
643#### Configuring Dynamic Parameters
644
645Each leaf client should have a way of configuring the dynamic parameters
646that it sends.
647
648For old-style resource names (those not using the new `xdstp` URI
649scheme from [xRFC TP1](TP1-xds-transport-next.md)), clients should
650send the same set of dynamic parameters for all resource subscriptions.
651The client's configuration should allow setting these default dynamic
652parameters globally.
653
654For new-style resource names, clients should send the same set of
655dynamic parameters for all resource subscriptions in a given authority.
656The client's configuration should allow setting the dymamic parameters to
657use for each authority.
658
659#### Migrating From Node Metadata
660
661Today, the equivalent of dynamic parameter constraints is node metadata,
662which can be used by servers to determine the set of resources to send
663for LDS and CDS wildcard subscriptions or to determine the contents of
664other resources (e.g., to select individual routes to be included in an
665RDS resource).  For transition purposes, this mechanism can continue
666to be supported by the client performing direct translation of node
667metadata to dynamic parameters.
668
669Any given xDS client may support either or both of these mechanisms.
670
671### Considerations for Implementations
672
673This specification does not prescribe implementation details for xDS
674clients or servers.  However, for illustration purposes, this section
675describes how a naive implementation might be structured.
676
677The database of an xDS server or cache of an xDS client can be thought
678of as a map, keyed by resource type and resource name.  Prior to this
679specification, the value of the map would have been the current value of the
680resource and a list of subscribers that need to be updated when the
681resource changes.  In C++ syntax, the data structure might look like this:
682
683```c++
684// Represents a subscriber (either a downstream xDS client or a local API caller).
685class Subscriber {
686 public:
687  // ...
688};
689
690struct DatabaseEntry {
691  // Current contents of resource.
692  // Whenever this changes, the change will be sent to all subscribers.
693  std::optional<google::protobuf::Any> resource_contents;
694
695  // Current list of subscribers.
696  // Entries are added and removed as subscriptions are started and stopped.
697  std::set<Subscriber*> subscribers;
698};
699
700using Database =
701    std::map<std::string /*resource_type*/,
702             std::map<std::string /*resource_name*/, DatabaseEntry>;
703```
704
705This design does not change the key structure of the map, but it does
706change the structure of the value of the map.  In particular, instead of
707storing a single value for the resource contents, it will need to store
708multiple values, keyed by the associated dynamic parameter constraints.
709And for each subscriber, it will need to store the dynamic parameters that
710the subscriber specified.  In a naive implementation (not optimized at all),
711the modified data structure may look like this:
712
713```c++
714// Represents a subscriber (either a downstream xDS client or a local API caller).
715class Subscriber {
716 public:
717  // ...
718
719  // Returns the dynamic parameters specified for the subscription.
720  DynamicParameters dynamic_parameters() const;
721};
722
723struct DatabaseEntry {
724  // Resource contents for each variant of the resource, keyed by
725  // dynamic parameter constraints.
726  // Whenever a given variant of the resource changes, the change will be
727  // sent to all subscribers whose dynamic parameters match the constraints
728  // of the resource variant that changed.
729  std::map<DynamicParameterConstraints,
730           std::optional<google::protobuf::Any>> resource_contents;
731
732  // Current list of subscribers.
733  // Entries are added and removed as subscriptions are started and stopped.
734  std::set<Subscriber*> subscribers;
735};
736```
737
738When a variant of a resource is updated, the variant is stored in the map
739based on its dynamic parameter constraints.  The implementation will then
740iterate through the list of subscribers, sending the updated resource
741variant and its dynamic parameter constraints to each subscriber whose
742dynamic parameters match those constraints.
743
744A more optimized implementation may instead choose to store a separate list
745of subscribers for each resource variant, thus avoiding the need to perform
746matching for every subscriber upon every update of a resource variant.
747However, this would require moving subscribers from one variant to another
748whenever the dynamic parameters change on the resource variants.
749
750### Example
751
752This section shows how the mechanism described in this proposal can be
753used to address the use-case described in the "Background" section above.
754
755Let's say that every client uses two different dynamic selection
756parameters, `env` (which can have one of the values `prod`, `canary`,
757or `test`) and `version` (which can have one of the values `v1`, `v2`,
758or `v3`).  Now let's say that there is a `RouteConfiguration` with one
759route that should be selected via the parameter `env=prod` and another
760route that should be selected via the parameter `version=v1`. Without
761this design, the server would need to actually provide the cross-product
762of these parameter values, so there will be 9 different variants of the
763resource, even though there are only 4 unique contents for the resource.
764However, this design instead allows the server to provide only the 4
765unique variants of the resource, with constraints allowing each client
766to get the appropriate one:
767
768<table>
769  <tr>
770    <th>Dynamic Parameter Constraints on Resource</th>
771    <th>Resource Contents</th>
772  </tr>
773
774  <tr>
775    <td>
776<code>{and_constraints:[
777  {not_constraints:
778    {constraint:{key:"env" value:"prod"}}
779  },
780  {not_constraints:
781    {constraint:{key:"version" value:"v1"}}
782  }
783]}</code>
784    </td>
785    <td>
786      <ul>
787      <li>does <i>not</i> include the route for <code>env=prod</code>
788      <li>does <i>not</i> include the route for <code>version=v1</code>
789      </ul>
790    </td>
791  </tr>
792
793  <tr>
794    <td>
795<code>{and_constraints:[
796  {constraint:{key:"env" value:"prod"}},
797  {not_constraints:
798    {constraint:{key:"version" value:"v1"}
799  }
800]}</code>
801    </td>
802    <td>
803      <ul>
804      <li>does include the route for <code>env=prod</code>
805      <li>does <i>not</i> include the route for <code>version=v1</code>
806      </ul>
807    </td>
808  </tr>
809
810  <tr>
811    <td>
812<code>{and_constraints:[
813  {not_constraints:
814    {constraint:{key:"env" value:"prod"}}
815  },
816  {constraint:{key:"version" value:"v1"}}
817]}</code>
818    </td>
819    <td>
820      <ul>
821      <li>does <i>not</i> include the route for <code>env=prod</code>
822      <li>does include the route for <code>version=v1</code>
823      </ul>
824    </td>
825  </tr>
826
827  <tr>
828    <td>
829<code>{and_constraints:[
830  {constraint:{key:"env" value:"prod"}},
831  {constraint:{key:"version" value:"v1"}}
832]}</code>
833    </td>
834    <td>
835      <ul>
836      <li>does include the route for <code>env=prod</code>
837      <li>does include the route for <code>version=v1</code>
838      </ul>
839    </td>
840  </tr>
841
842</table>
843
844## Rationale
845
846This section documents limitations and design alternatives that we
847considered.
848
849### Limitation on Enhancing Matching in the Future
850
851One limitation of this design is that, because all xDS transport protocol
852implementations (clients, servers, and caching proxies) need to implement
853this matching behavior, it will be very difficult to add new matching
854behavior in the future.  Doing so will probably require some sort of
855client capability.  This will make it feasible to expand this mechanism
856in an environment where all of the caching xDS proxies are under centralized
857control, but it will be quite difficult to deploy those changes in
858environments that depend on distributed third-party caching xDS proxies.
859
860Because of this, reviewers of this design are encouraged to carefully
861scrutinize the proposed matching semantics to ensure that they meet our
862expected needs.
863
864### Complexity of Constraint Expressions
865
866Although the `DynamicParameterConstraints` proto allows specifying
867arbitrarily nested combinations of AND, OR, and NOT expressions, control
868planes do not need to actually support that full arbitrary power.  It is
869possible to limit the sets of supported constraints to (e.g.) a
870simple flat list of AND or OR expressions, which would make it easier
871for a control plane to optimize its implementation.
872
873Simimarly, caching xDS proxies may be able to provide an optimized
874implementation if all of the constraints that they see are limited to
875some subset of the full flexibility allowed by the protocol.  However,
876any general-purpose caching proxy implementation will likely need to
877support a less optimized implementation that does support the full
878flexibility allowed by the protocol.
879
880### Using Context Parameters
881
882We considered extending the context parameter mechanism from [xRFC
883TP1](TP1-xds-transport-next.md) to support flexible matching semantics,
884rather that its current exact-match semantics.  However, that approach had
885some down-sides:
886- It would not have solved the virality problem described in the "Background"
887  section above.
888- It would have made the new xDS naming scheme a prerequisite for using
889  the dynamic resource selection mechanism.  (The mechanism described in
890  this doc is completely independent of the new xDS naming scheme; it can
891  be used with the legacy xDS naming scheme as well.)
892
893### Stricter Matching to Avoid Ambiguity
894
895We could avoid much of the matching ambiguity described above by saying that
896a set of constraints must specify all keys present in the subscription
897request in order to match.  However, this would mean that if the client
898starts subscribing with a new key before the corresponding constraint is
899added on the resources on the server, then it will fail to match the
900existing resources.  In other words, the process would be:
901
9021. Add a variant of all resources on the server side with a constraint
903   for `version=v1` (in addition to all existing constraints).
9042. Change clients to start sending the new key.
9053. When all clients are updated, remove the resource variants that do
906   *not* have the new key.
907
908This will effectively require adding new keys on the server side first,
909which seems like a large burden on users.  It also seems fairly tricky
910for most users to get the exactly correct set of dynamic parameters on
911each resource variant, and if they fail to do it right, they will break
912their existing configuration.
913
914Ultimately, although this approach is more semantically precise, it is
915also considered too rigid and difficult for users to work with.
916
917## Implementation
918
919TBD (Will probably be implemented in gRPC before Envoy)
920
921## Open issues (if applicable)
922
923N/A
924