xref: /aosp_15_r20/external/licenseclassifier/README.md (revision 46c4c49da23cae783fa41bf46525a6505638499a)
1*46c4c49dSIbrahim Kanouche# License Classifier
2*46c4c49dSIbrahim Kanouche
3*46c4c49dSIbrahim Kanouche[![Build status](https://travis-ci.org/google/licenseclassifier.svg?branch=master)](https://travis-ci.org/google/licenseclassifier)
4*46c4c49dSIbrahim Kanouche
5*46c4c49dSIbrahim Kanouche## Introduction
6*46c4c49dSIbrahim Kanouche
7*46c4c49dSIbrahim KanoucheThe license classifier is a library and set of tools that can analyze text to
8*46c4c49dSIbrahim Kanouchedetermine what type of license it contains. It searches for license texts in a
9*46c4c49dSIbrahim Kanouchefile and compares them to an archive of known licenses. These files could be,
10*46c4c49dSIbrahim Kanouchee.g., `LICENSE` files with a single or multiple licenses in it, or source code
11*46c4c49dSIbrahim Kanouchefiles with the license text in a comment.
12*46c4c49dSIbrahim Kanouche
13*46c4c49dSIbrahim KanoucheA "confidence level" is associated with each result indicating how close the
14*46c4c49dSIbrahim Kanouchematch was. A confidence level of `1.0` indicates an exact match, while a
15*46c4c49dSIbrahim Kanoucheconfidence level of `0.0` indicates that no license was able to match the text.
16*46c4c49dSIbrahim Kanouche
17*46c4c49dSIbrahim Kanouche## Adding a new license
18*46c4c49dSIbrahim Kanouche
19*46c4c49dSIbrahim KanoucheAdding a new license is straight-forward:
20*46c4c49dSIbrahim Kanouche
21*46c4c49dSIbrahim Kanouche1.  Create a file in `licenses/`.
22*46c4c49dSIbrahim Kanouche
23*46c4c49dSIbrahim Kanouche    *   The filename should be the name of the license or its abbreviation. If
24*46c4c49dSIbrahim Kanouche        the license is an Open Source license, use the appropriate identifier
25*46c4c49dSIbrahim Kanouche        specified at https://spdx.org/licenses/.
26*46c4c49dSIbrahim Kanouche    *   If the license is the "header" version of the license, append the suffix
27*46c4c49dSIbrahim Kanouche        "`.header`" to it. See `licenses/README.md` for more details.
28*46c4c49dSIbrahim Kanouche
29*46c4c49dSIbrahim Kanouche2.  Add the license name to the list in `license_type.go`.
30*46c4c49dSIbrahim Kanouche
31*46c4c49dSIbrahim Kanouche3.  Regenerate the `licenses.db` file by running the license serializer:
32*46c4c49dSIbrahim Kanouche
33*46c4c49dSIbrahim Kanouche    ```shell
34*46c4c49dSIbrahim Kanouche    $ license_serializer -output licenseclassifier/licenses
35*46c4c49dSIbrahim Kanouche    ```
36*46c4c49dSIbrahim Kanouche
37*46c4c49dSIbrahim Kanouche4.  Create and run appropriate tests to verify that the license is indeed
38*46c4c49dSIbrahim Kanouche    present.
39*46c4c49dSIbrahim Kanouche
40*46c4c49dSIbrahim Kanouche## Tools
41*46c4c49dSIbrahim Kanouche
42*46c4c49dSIbrahim Kanouche### Identify license
43*46c4c49dSIbrahim Kanouche
44*46c4c49dSIbrahim Kanouche`identify_license` is a command line tool that can identify the license(s)
45*46c4c49dSIbrahim Kanouchewithin a file.
46*46c4c49dSIbrahim Kanouche
47*46c4c49dSIbrahim Kanouche```shell
48*46c4c49dSIbrahim Kanouche$ identify_license LICENSE
49*46c4c49dSIbrahim KanoucheLICENSE: GPL-2.0 (confidence: 1, offset: 0, extent: 14794)
50*46c4c49dSIbrahim KanoucheLICENSE: LGPL-2.1 (confidence: 1, offset: 18366, extent: 23829)
51*46c4c49dSIbrahim KanoucheLICENSE: MIT (confidence: 1, offset: 17255, extent: 1059)
52*46c4c49dSIbrahim Kanouche```
53*46c4c49dSIbrahim Kanouche
54*46c4c49dSIbrahim Kanouche### License serializer
55*46c4c49dSIbrahim Kanouche
56*46c4c49dSIbrahim KanoucheThe `license_serializer` tool regenerates the `licenses.db` archive. The archive
57*46c4c49dSIbrahim Kanouchecontains preprocessed license texts for quicker comparisons against unknown
58*46c4c49dSIbrahim Kanouchetexts.
59*46c4c49dSIbrahim Kanouche
60*46c4c49dSIbrahim Kanouche```shell
61*46c4c49dSIbrahim Kanouche$ license_serializer -output licenseclassifier/licenses
62*46c4c49dSIbrahim Kanouche```
63*46c4c49dSIbrahim Kanouche
64*46c4c49dSIbrahim Kanouche----
65*46c4c49dSIbrahim KanoucheThis is not an official Google product (experimental or otherwise), it is just
66*46c4c49dSIbrahim Kanouchecode that happens to be owned by Google.
67