xref: /aosp_15_r20/external/json-schema-validator/doc/yaml-line-numbers.md (revision 78c4dd6aa35290980cdcd1623a7e337e8d021c7c)
1# Obtaining YAML Line Numbers
2
3## Scenario 1 - finding YAML line numbers from the JSON tree
4
5A great feature of json-schema-validator is it's ability to validate YAML documents against a JSON Scheme. The manner in which this is done though, by pre-processing the YAML into a tree of [JsonNode](https://fasterxml.github.io/jackson-databind/javadoc/2.10/com/fasterxml/jackson/databind/JsonNode.html) objects, breaks the connection back to the original YAML source file. Very commonly, once the YAML has been validated against the schema, there may be additional processing and checking for semantic or content errors or inconsistency in the JSON tree. From an end user point of view, the ideal is to report such errors using line and column references back to the original YAML, but this information is not readily available from the processed JSON tree.
6
7### Scenario 1, solution part 1 - capturing line details during initial parsing
8
9One solution is to use a custom [JsonNodeFactory](https://fasterxml.github.io/jackson-databind/javadoc/2.10/com/fasterxml/jackson/databind/node/JsonNodeFactory.html) that returns custom JsonNode objects which are created during initial parsing, and which record the original YAML locations that were being parsed at the time they were created. The example below shows this
10
11```java
12    public static class MyNodeFactory extends JsonNodeFactory
13    {
14        YAMLParser yp;
15
16        public MyNodeFactory(YAMLParser yp)
17        {
18            super();
19            this.yp = yp;
20        }
21
22        public ArrayNode arrayNode()
23        {
24            return new MyArrayNode(this, yp.getTokenLocation(), yp.getCurrentLocation());
25        }
26
27        public BooleanNode booleanNode(boolean v)
28        {
29            return new MyBooleanNode(v, yp.getTokenLocation(), yp.getCurrentLocation());
30        }
31
32        public NumericNode numberNode(int v)
33        {
34            return new MyIntNode(v, yp.getTokenLocation(), yp.getCurrentLocation());
35        }
36
37        public NullNode nullNode()
38        {
39            return new MyNullNode(yp.getTokenLocation(), yp.getCurrentLocation());
40        }
41
42        public ObjectNode objectNode()
43        {
44            return new MyObjectNode(this, yp.getTokenLocation(), yp.getCurrentLocation());
45        }
46
47        public TextNode textNode(String text)
48        {
49            return (text != null) ? new MyTextNode(text, yp.getTokenLocation(), yp.getCurrentLocation()) : null;
50        }
51    }
52```
53
54The example above includes a basic, but usable subset of all possible JsonNode types - if your YAML needs them, than you should also consider the others i.e. `byte`, `byte[]`, `raw`, `short`, `long`, `float`, `double`, `BigInteger`, `BigDecimal`
55
56There are some important other things to note from the example:
57
58* Even in a reduced set, `ObjectNode` and `NullNode` should be included
59* The current return for methods that receive a null parameter value seems to be null rather than `NullNode` (based on inspecting the underlying `valueOf()` methods in the various `JsonNode` sub classes). Hence the implementation of the `textNode()` method above.
60
61The actual work here is really being done by the YAMLParser - it holds the location of the token being parsed, and the current location in the file. The first of these gives us a line and column number we can use to flag where an error or problem was found, and the second (if needed) can let us calculate a span to the end of the error e.g. if we wanted to highlight or underline the text in error.
62
63### Scenario 1, solution part 2 - augmented `JsonNode` subclassess
64
65We can be as simple or fancy as we like in the `JsonNode` subclassses, but basically we need 2 pieces of information from them:
66
67* An interface so when we are post processing the JSON tree, we can recognize nodes that retain line number information
68* An interface that lets us extract the relevant location information
69
70Those could be the same thing of course, but in our case we separated them as shown in the following example
71
72```java
73    public interface LocationProvider
74    {
75        LocationDetails getLocationDetails();
76    }
77
78    public interface LocationDetails
79    {
80        default int getLineNumber()     { return 1; }
81        default int getColumnNumber()   { return 1; }
82        default String getFilename()    { return ""; }
83    }
84
85    public static class LocationDetailsImpl implements LocationDetails
86    {
87        final JsonLocation currentLocation;
88        final JsonLocation tokenLocation;
89
90        public LocationDetailsImpl(JsonLocation tokenLocation, JsonLocation currentLocation)
91        {
92            this.tokenLocation = tokenLocation;
93            this.currentLocation = currentLocation;
94        }
95
96        @Override
97        public int getLineNumber()      { return (tokenLocation != null) ? tokenLocation.getLineNr() : 1; };
98        @Override
99        public int getColumnNumber()    { return (tokenLocation != null) ? tokenLocation.getColumnNr() : 1; };
100        @Override
101        public String getFilename()     { return (tokenLocation != null) ? tokenLocation.getSourceRef().toString() : ""; };
102    }
103
104    public static class MyNullNode extends NullNode implements LocationProvider
105    {
106        final LocationDetails locDetails;
107
108        public MyNullNode(JsonLocation tokenLocation, JsonLocation currentLocation)
109        {
110            super();
111            locDetails = new LocationDetailsImpl(tokenLocation, currentLocation);
112        }
113
114        @Override
115        public LocationDetails getLocationDetails()
116        {
117            return locDetails;
118        }
119    }
120
121    public static class MyTextNode extends TextNode implements LocationProvider
122    {
123        final LocationDetails locDetails;
124
125        public MyTextNode(String v, JsonLocation tokenLocation, JsonLocation currentLocation)
126        {
127            super(v);
128            locDetails = new LocationDetailsImpl(tokenLocation, currentLocation);
129        }
130
131        @Override
132        public LocationDetails getLocationDetails()     { return locDetails;}
133    }
134
135    public static class MyIntNode extends IntNode implements LocationProvider
136    {
137        final LocationDetails locDetails;
138
139        public MyIntNode(int v, JsonLocation tokenLocation, JsonLocation currentLocation)
140        {
141            super(v);
142            locDetails = new LocationDetailsImpl(tokenLocation, currentLocation);
143        }
144
145        @Override
146        public LocationDetails getLocationDetails()     { return locDetails;}
147    }
148
149    public static class MyBooleanNode extends BooleanNode implements LocationProvider
150    {
151        final LocationDetails locDetails;
152
153        public MyBooleanNode(boolean v, JsonLocation tokenLocation, JsonLocation currentLocation)
154        {
155            super(v);
156            locDetails = new LocationDetailsImpl(tokenLocation, currentLocation);
157        }
158
159        @Override
160        public LocationDetails getLocationDetails()     { return locDetails;}
161    }
162
163    public static class MyArrayNode extends ArrayNode implements LocationProvider
164    {
165        final LocationDetails locDetails;
166
167        public MyArrayNode(JsonNodeFactory nc, JsonLocation tokenLocation, JsonLocation currentLocation)
168        {
169            super(nc);
170            locDetails = new LocationDetailsImpl(tokenLocation, currentLocation);
171        }
172
173        @Override
174        public LocationDetails getLocationDetails()     { return locDetails;}
175    }
176
177    public static class MyObjectNode extends ObjectNode implements LocationProvider
178    {
179        final LocationDetails locDetails;
180
181        public MyObjectNode(JsonNodeFactory nc, JsonLocation tokenLocation, JsonLocation currentLocation)
182        {
183            super(nc);
184            locDetails = new LocationDetailsImpl(tokenLocation, currentLocation);
185        }
186
187        @Override
188        public LocationDetails getLocationDetails()     { return locDetails;}
189    }
190```
191
192### Scenario 1, solution part 3 - using the custom `JsonNodeFactory`
193
194With the pieces we now have, we just need to tell the YAML library to make of use them, which involves a minor and simple modification to the normal sequence of processing.
195
196```java
197    this.yamlFactory = new YAMLFactory();
198
199    try (YAMLParser yp = yamlFactory.createParser(f);)
200    {
201        ObjectReader rdr = mapper.reader(new MyNodeFactory(yp));
202        JsonNode jsonNode = rdr.readTree(yp);
203        Set<ValidationMessage> msgs = mySchema.validate(jsonNode);
204
205        if (msgs.isEmpty())
206        {
207            for (JsonNode item : jsonNode.get("someItem"))
208            {
209                processJsonItems(item);
210            }
211        }
212        else
213        {
214            //  ... we'll look at how to get line locations for ValidationMessage cases in Scenario 2
215        }
216
217    }
218    // a JsonProcessingException seems to be the base exception for "gross" errors e.g.
219    // missing quotes at end of string etc.
220    catch (JsonProcessingException jpEx)
221    {
222        JsonLocation loc = jpEx.getLocation();
223        // ... do something with the loc details
224    }
225```
226Some notes on what is happening here:
227
228* We instantiate our custom JsonNodeFactory with the YAMLParser reference, and the line locations get recorded for us as the file is parsed.
229* If any exceptions are thrown, they will already contain a JsonLocation object that we can use directly if needed
230* If we get no validation messages, we know the JSON tree matches the schema and we can do any post processing we need on the tree. We'll see how to report any issues with this in the next part
231* We'll look at how to get line locations for ValidationMessage errors in Scenario 2
232
233### Scenario 1, solution part 4 - extracting the line details
234
235Having got everything prepared, actually getting the line locations is rather easy
236
237
238```java
239    void processJsonItems(JsonNode item)
240    {
241        Iterator<Map.Entry<String, JsonNode>> iter = item.fields();
242
243        while (iter.hasNext())
244        {
245            Map.Entry<String, JsonNode> node = iter.next();
246            extractErrorLocation(node.getValue());
247        }
248    }
249
250    void extractErrorLocation(JsonNode node)
251    {
252        if (node == null || !(node instanceof LocationProvider))    { return; }
253
254        //Note: we also know the "span" of the error section i.e. from token location to current location (first char after the token)
255        //      if we wanted at some stage we could use this to highlight/underline all of the text in error
256        LocationDetails dets = ((LocationProvider) node).getLocationDetails();
257        // ... do something with the details e.g. report an error/issue against the YAML line
258    }
259```
260
261So that's pretty much it - as we are processing the JSON tree, if there is any point we want to report something about the contents, we can do so with a reference back to the original YAML line number.
262
263There is still a problem though, what if the validation against the schema fails?
264
265## Scenario 2 - ValidationMessage line locations
266
267Any failures validation against the schema come back in the form of a set of `ValidationMessage` objects. But these also do not contain original YAML source line information, and there's no easy way to inject it as we did for Scenario 1. Luckily though, there is a trick we can use here!
268
269Within the `ValidationMessage` object is something called the 'path' of the error, which we can access with the `getPath()` method. The syntax of this path by default is close to being [JSONPath](https://datatracker.ietf.org/doc/draft-ietf-jsonpath-base/), but can be set explicitly to be
270either [JSONPath](https://datatracker.ietf.org/doc/draft-ietf-jsonpath-base/) or [JSONPointer](https://www.rfc-editor.org/rfc/rfc6901.html) expressions. In our case as we already use [Jackson](https://github.com/FasterXML/jackson) which supports node lookups based on JSONPointer expressions,
271we will set the path expressions to be JSONPointers. This is achieved by configuring the reported path type through the `SchemaValidatorsConfig` before we read our schema:
272
273```java
274    SchemaValidatorsConfig config = new SchemaValidatorsConfig();
275    config.setPathType(PathType.JSON_POINTER);
276    JsonSchema jsonSchema = JsonSchemaFactory.getInstance().getSchema(schema, config);
277```
278
279Having set paths to be JSONPointer expressions we can use those pointers for locating the appropriate `JsonNode` instances. The following couple of methods illustrate this process:
280
281```java
282    JsonNode findJsonNode(ValidationMessage msg, JsonNode rootNode)
283    {
284        // Construct the JSONPointer.
285        JsonPointer pathPtr = JsonPointer.valueOf(msg.getPath());
286        // Now see if we can find the node.
287        JsonNode node = rootNode.at(pathPtr);
288        return node;
289    }
290
291    LocationDetails getLocationDetails(ValidationMessage msg, JsonNode rootNode)
292    {
293        LocationDetails retval = null;
294        JsonNode node = findJsonNode(msg, rootNode);
295        if (node != null && node instanceof LocationProvider)
296        {
297            retval = ((LocationProvider) node).getLocationDetails();
298        }
299        return retval;
300    }
301```
302
303## Summary
304
305Although not trivial, the steps outlined here give us a way to track back to the original source YAML for a variety of possible reporting cases:
306
307* JSON processing exceptions (mostly already done for us)
308* Issues flagged during validation of the YAML against the schema
309* Anything we need to report with source information during post processing of the validated JSON tree
310