1# License Classifier 2 3[](https://travis-ci.org/google/licenseclassifier) 4 5## Introduction 6 7The license classifier is a library and set of tools that can analyze text to 8determine what type of license it contains. It searches for license texts in a 9file and compares them to an archive of known licenses. These files could be, 10e.g., `LICENSE` files with a single or multiple licenses in it, or source code 11files with the license text in a comment. 12 13A "confidence level" is associated with each result indicating how close the 14match was. A confidence level of `1.0` indicates an exact match, while a 15confidence level of `0.0` indicates that no license was able to match the text. 16 17## Adding a new license 18 19Adding a new license is straight-forward: 20 211. Create a file in `licenses/`. 22 23 * The filename should be the name of the license or its abbreviation. If 24 the license is an Open Source license, use the appropriate identifier 25 specified at https://spdx.org/licenses/. 26 * If the license is the "header" version of the license, append the suffix 27 "`.header`" to it. See `licenses/README.md` for more details. 28 292. Add the license name to the list in `license_type.go`. 30 313. Regenerate the `licenses.db` file by running the license serializer: 32 33 ```shell 34 $ license_serializer -output licenseclassifier/licenses 35 ``` 36 374. Create and run appropriate tests to verify that the license is indeed 38 present. 39 40## Tools 41 42### Identify license 43 44`identify_license` is a command line tool that can identify the license(s) 45within a file. 46 47```shell 48$ identify_license LICENSE 49LICENSE: GPL-2.0 (confidence: 1, offset: 0, extent: 14794) 50LICENSE: LGPL-2.1 (confidence: 1, offset: 18366, extent: 23829) 51LICENSE: MIT (confidence: 1, offset: 17255, extent: 1059) 52``` 53 54### License serializer 55 56The `license_serializer` tool regenerates the `licenses.db` archive. The archive 57contains preprocessed license texts for quicker comparisons against unknown 58texts. 59 60```shell 61$ license_serializer -output licenseclassifier/licenses 62``` 63 64---- 65This is not an official Google product (experimental or otherwise), it is just 66code that happens to be owned by Google. 67