xref: /aosp_15_r20/external/licenseclassifier/README.md (revision 46c4c49da23cae783fa41bf46525a6505638499a)
1# License Classifier
2
3[![Build status](https://travis-ci.org/google/licenseclassifier.svg?branch=master)](https://travis-ci.org/google/licenseclassifier)
4
5## Introduction
6
7The license classifier is a library and set of tools that can analyze text to
8determine what type of license it contains. It searches for license texts in a
9file and compares them to an archive of known licenses. These files could be,
10e.g., `LICENSE` files with a single or multiple licenses in it, or source code
11files with the license text in a comment.
12
13A "confidence level" is associated with each result indicating how close the
14match was. A confidence level of `1.0` indicates an exact match, while a
15confidence level of `0.0` indicates that no license was able to match the text.
16
17## Adding a new license
18
19Adding a new license is straight-forward:
20
211.  Create a file in `licenses/`.
22
23    *   The filename should be the name of the license or its abbreviation. If
24        the license is an Open Source license, use the appropriate identifier
25        specified at https://spdx.org/licenses/.
26    *   If the license is the "header" version of the license, append the suffix
27        "`.header`" to it. See `licenses/README.md` for more details.
28
292.  Add the license name to the list in `license_type.go`.
30
313.  Regenerate the `licenses.db` file by running the license serializer:
32
33    ```shell
34    $ license_serializer -output licenseclassifier/licenses
35    ```
36
374.  Create and run appropriate tests to verify that the license is indeed
38    present.
39
40## Tools
41
42### Identify license
43
44`identify_license` is a command line tool that can identify the license(s)
45within a file.
46
47```shell
48$ identify_license LICENSE
49LICENSE: GPL-2.0 (confidence: 1, offset: 0, extent: 14794)
50LICENSE: LGPL-2.1 (confidence: 1, offset: 18366, extent: 23829)
51LICENSE: MIT (confidence: 1, offset: 17255, extent: 1059)
52```
53
54### License serializer
55
56The `license_serializer` tool regenerates the `licenses.db` archive. The archive
57contains preprocessed license texts for quicker comparisons against unknown
58texts.
59
60```shell
61$ license_serializer -output licenseclassifier/licenses
62```
63
64----
65This is not an official Google product (experimental or otherwise), it is just
66code that happens to be owned by Google.
67