zdict.h - OpenGrok cross reference for /aosp_15

Lines Matching full:dictionary
51  * Zstd dictionary builder
55  * Why should I use a dictionary?
62  * structure, you can train a dictionary on ahead of time on some samples of
63  * these files. Then, zstd can use the dictionary to find repetitions that are
66  * When is a dictionary useful?
70  * The larger a file is, the less benefit a dictionary will have. Generally,
71  * we don't expect dictionary compression to be effective past 100KB. And the
72  * smaller a file is, the more we would expect the dictionary to help.
74  * How do I use a dictionary?
77  * Simply pass the dictionary to the zstd compressor with
78  * `ZSTD_CCtx_loadDictionary()`. The same dictionary must then be passed to
83  * What is a zstd dictionary?
86  * A zstd dictionary has two pieces: Its header, and its content. The header
87  * contains a magic number, the dictionary ID, and entropy tables. These
92  * What is a raw content dictionary?
95  * A raw content dictionary is just bytes. It doesn't have a zstd dictionary
96  * header, a dictionary ID, or entropy tables. Any buffer is a valid raw
97  * content dictionary.
99  * How do I train a dictionary?
103  * other. If you have several use cases, you could try to train one dictionary
107  * dictionary. There are a few advanced versions of this function, but this
108  * is a great starting point. If you want to further tune your dictionary
112  * If the dictionary training function fails, that is likely because you
113  * either passed too few samples, or a dictionary would not be effective
114  * for your data. Look at the messages that the dictionary trainer printed,
115  * if it doesn't say too few samples, then a dictionary would not be effective.
117  * How large should my dictionary be?
120  * A reasonable dictionary size, the `dictBufferCapacity`, is about 100KB.
121  * The zstd CLI defaults to a 110KB dictionary. You likely don't need a
122  * dictionary larger than that. But, most use cases can get away with a
123  * smaller dictionary. The advanced dictionary builders can automatically
124  * shrink the dictionary for you, and select the smallest size that doesn't
126  * A smaller dictionary can save memory, and potentially speed up
129  * How many samples should I provide to the dictionary builder?
132  * We generally recommend passing ~100x the size of the dictionary
136  * samples can slow down the dictionary builder.
138  * How do I determine if a dictionary will be effective?
141  * Simply train a dictionary and try it out. You can use zstd's built in
142  * benchmarking tool to test the dictionary effectiveness.
144  *   # Benchmark levels 1-3 without a dictionary
146  *   # Benchmark levels 1-3 with a dictionary
147  *   zstd -b1e3 -r /path/to/my/files -D /path/to/my/dictionary
149  * When should I retrain a dictionary?
152  * You should retrain a dictionary when its effectiveness drops. Dictionary
156  * retrain dictionaries, and if the new dictionary performs significantly
157  * better than the old dictionary, we will ship the new dictionary.
159  * I have a raw content dictionary, how do I turn it into a zstd dictionary?
162  * If you have a raw content dictionary, e.g. by manually constructing it, or
163  * using a third-party dictionary builder, you can turn it into a zstd
164  * dictionary by using `ZDICT_finalizeDictionary()`. You'll also have to
166  * raw content, which contains a dictionary ID and entropy tables, which
167  * will improve compression ratio, and allow zstd to write the dictionary ID
170  * Do I have to use zstd's dictionary builder?
173  * No! You can construct dictionary content however you please, it is just
174  * bytes. It will always be valid as a raw content dictionary. If you want
175  * a zstd dictionary, which can improve compression ratio, use
178  * What is the attack surface of a zstd dictionary?
183  * the dictionary is. However, if an attacker can control the dictionary
191  *  Train a dictionary from an array of samples.
196  *  The resulting dictionary will be saved into `dictBuffer`.
197  * @return: size of dictionary stored into `dictBuffer` (<= `dictBufferCapacity`)
199  *  Note:  Dictionary training will fail if there are not enough samples to construct a
200  *         dictionary, or if most of the samples are too small (< 8 bytes being the lower limit).
201  *         If dictionary training fails, you should use zstd without a dictionary, as the dictionary
202 …     would've been ineffective anyways. If you believe your samples would benefit from a dictionary
205  *  Tips: In general, a reasonable dictionary has a size of ~ 100 KB.
208 …It's recommended that total size of all samples be about ~x100 times the target size of dictionary.
218 …                             *   NOTE: The zstd format reserves some dictionary IDs for future use.
220 …                       *         may be used by zstd in a public dictionary registry in the future.
221                                   *         These dictionary IDs are:
228  * Given a custom content as a basis for dictionary, and a set of samples,
229  * finalize dictionary by adding headers and statistics according to the zstd
230  * dictionary format.
235  * should be representative of what you will compress with this dictionary.
239  * compression level differ, so tuning the dictionary for the compression level
242  * You can set an explicit dictionary ID in `parameters`, or allow us to pick
243  * a random dictionary ID for you, but we can't guarantee no collisions.
248  * is presumed that the most profitable content is at the end of the dictionary,
253  * @return: size of dictionary stored into `dstDictBuffer` (<= `maxDictSize`),
269 …tBuffer, size_t dictSize);  /**< extracts dictID; @return zero if error (not a valid dictionary) */
311 …minimum size and selects the smallest dictionary that is shrinkDictMaxRegression% worse than the l…
312 …axRegression so that a smaller dictionary can be at worse shrinkDictMaxRegression% worse than the …
324 …minimum size and selects the smallest dictionary that is shrinkDictMaxRegression% worse than the l…
325 …axRegression so that a smaller dictionary can be at worse shrinkDictMaxRegression% worse than the …
331  *  Train a dictionary from an array of samples using the COVER algorithm.
334  *  The resulting dictionary will be saved into `dictBuffer`.
335  * @return: size of dictionary stored into `dictBuffer` (<= `dictBufferCapacity`)
339  *  Tips: In general, a reasonable dictionary has a size of ~ 100 KB.
342 …It's recommended that total size of all samples be about ~x100 times the target size of dictionary.
353  * dictionary constructed with those parameters is stored in `dictBuffer`.
360  * @return: size of dictionary stored into `dictBuffer` (<= `dictBufferCapacity`)
372  *  Train a dictionary from an array of samples using a modified version of COVER algorithm.
377  *  The resulting dictionary will be saved into `dictBuffer`.
378  * @return: size of dictionary stored into `dictBuffer` (<= `dictBufferCapacity`)
382  *  Tips: In general, a reasonable dictionary has a size of ~ 100 KB.
385 …It's recommended that total size of all samples be about ~x100 times the target size of dictionary.
396  * dictionary constructed with those parameters is stored in `dictBuffer`.
404  * @return: size of dictionary stored into `dictBuffer` (<= `dictBufferCapacity`)
416     unsigned selectivityLevel;   /* 0 means default; larger => select more => larger dictionary */
421  *  Train a dictionary from an array of samples.
424  *  The resulting dictionary will be saved into `dictBuffer`.
426  * @return: size of dictionary stored into `dictBuffer` (<= `dictBufferCapacity`)
429  *  Tips: In general, a reasonable dictionary has a size of ~ 100 KB.
432 …It's recommended that total size of all samples be about ~x100 times the target size of dictionary.