1*46c4c49dSIbrahim Kanouche# License Classifier 2*46c4c49dSIbrahim Kanouche 3*46c4c49dSIbrahim Kanouche[](https://travis-ci.org/google/licenseclassifier) 4*46c4c49dSIbrahim Kanouche 5*46c4c49dSIbrahim Kanouche## Introduction 6*46c4c49dSIbrahim Kanouche 7*46c4c49dSIbrahim KanoucheThe license classifier is a library and set of tools that can analyze text to 8*46c4c49dSIbrahim Kanouchedetermine what type of license it contains. It searches for license texts in a 9*46c4c49dSIbrahim Kanouchefile and compares them to an archive of known licenses. These files could be, 10*46c4c49dSIbrahim Kanouchee.g., `LICENSE` files with a single or multiple licenses in it, or source code 11*46c4c49dSIbrahim Kanouchefiles with the license text in a comment. 12*46c4c49dSIbrahim Kanouche 13*46c4c49dSIbrahim KanoucheA "confidence level" is associated with each result indicating how close the 14*46c4c49dSIbrahim Kanouchematch was. A confidence level of `1.0` indicates an exact match, while a 15*46c4c49dSIbrahim Kanoucheconfidence level of `0.0` indicates that no license was able to match the text. 16*46c4c49dSIbrahim Kanouche 17*46c4c49dSIbrahim Kanouche## Adding a new license 18*46c4c49dSIbrahim Kanouche 19*46c4c49dSIbrahim KanoucheAdding a new license is straight-forward: 20*46c4c49dSIbrahim Kanouche 21*46c4c49dSIbrahim Kanouche1. Create a file in `licenses/`. 22*46c4c49dSIbrahim Kanouche 23*46c4c49dSIbrahim Kanouche * The filename should be the name of the license or its abbreviation. If 24*46c4c49dSIbrahim Kanouche the license is an Open Source license, use the appropriate identifier 25*46c4c49dSIbrahim Kanouche specified at https://spdx.org/licenses/. 26*46c4c49dSIbrahim Kanouche * If the license is the "header" version of the license, append the suffix 27*46c4c49dSIbrahim Kanouche "`.header`" to it. See `licenses/README.md` for more details. 28*46c4c49dSIbrahim Kanouche 29*46c4c49dSIbrahim Kanouche2. Add the license name to the list in `license_type.go`. 30*46c4c49dSIbrahim Kanouche 31*46c4c49dSIbrahim Kanouche3. Regenerate the `licenses.db` file by running the license serializer: 32*46c4c49dSIbrahim Kanouche 33*46c4c49dSIbrahim Kanouche ```shell 34*46c4c49dSIbrahim Kanouche $ license_serializer -output licenseclassifier/licenses 35*46c4c49dSIbrahim Kanouche ``` 36*46c4c49dSIbrahim Kanouche 37*46c4c49dSIbrahim Kanouche4. Create and run appropriate tests to verify that the license is indeed 38*46c4c49dSIbrahim Kanouche present. 39*46c4c49dSIbrahim Kanouche 40*46c4c49dSIbrahim Kanouche## Tools 41*46c4c49dSIbrahim Kanouche 42*46c4c49dSIbrahim Kanouche### Identify license 43*46c4c49dSIbrahim Kanouche 44*46c4c49dSIbrahim Kanouche`identify_license` is a command line tool that can identify the license(s) 45*46c4c49dSIbrahim Kanouchewithin a file. 46*46c4c49dSIbrahim Kanouche 47*46c4c49dSIbrahim Kanouche```shell 48*46c4c49dSIbrahim Kanouche$ identify_license LICENSE 49*46c4c49dSIbrahim KanoucheLICENSE: GPL-2.0 (confidence: 1, offset: 0, extent: 14794) 50*46c4c49dSIbrahim KanoucheLICENSE: LGPL-2.1 (confidence: 1, offset: 18366, extent: 23829) 51*46c4c49dSIbrahim KanoucheLICENSE: MIT (confidence: 1, offset: 17255, extent: 1059) 52*46c4c49dSIbrahim Kanouche``` 53*46c4c49dSIbrahim Kanouche 54*46c4c49dSIbrahim Kanouche### License serializer 55*46c4c49dSIbrahim Kanouche 56*46c4c49dSIbrahim KanoucheThe `license_serializer` tool regenerates the `licenses.db` archive. The archive 57*46c4c49dSIbrahim Kanouchecontains preprocessed license texts for quicker comparisons against unknown 58*46c4c49dSIbrahim Kanouchetexts. 59*46c4c49dSIbrahim Kanouche 60*46c4c49dSIbrahim Kanouche```shell 61*46c4c49dSIbrahim Kanouche$ license_serializer -output licenseclassifier/licenses 62*46c4c49dSIbrahim Kanouche``` 63*46c4c49dSIbrahim Kanouche 64*46c4c49dSIbrahim Kanouche---- 65*46c4c49dSIbrahim KanoucheThis is not an official Google product (experimental or otherwise), it is just 66*46c4c49dSIbrahim Kanouchecode that happens to be owned by Google. 67