Lines Matching full:dictionary
51 * Zstd dictionary builder
55 * Why should I use a dictionary?
62 * structure, you can train a dictionary on ahead of time on some samples of
63 * these files. Then, zstd can use the dictionary to find repetitions that are
66 * When is a dictionary useful?
70 * The larger a file is, the less benefit a dictionary will have. Generally,
71 * we don't expect dictionary compression to be effective past 100KB. And the
72 * smaller a file is, the more we would expect the dictionary to help.
74 * How do I use a dictionary?
77 * Simply pass the dictionary to the zstd compressor with
78 * `ZSTD_CCtx_loadDictionary()`. The same dictionary must then be passed to
83 * What is a zstd dictionary?
86 * A zstd dictionary has two pieces: Its header, and its content. The header
87 * contains a magic number, the dictionary ID, and entropy tables. These
92 * What is a raw content dictionary?
95 * A raw content dictionary is just bytes. It doesn't have a zstd dictionary
96 * header, a dictionary ID, or entropy tables. Any buffer is a valid raw
97 * content dictionary.
99 * How do I train a dictionary?
103 * other. If you have several use cases, you could try to train one dictionary
107 * dictionary. There are a few advanced versions of this function, but this
108 * is a great starting point. If you want to further tune your dictionary
112 * If the dictionary training function fails, that is likely because you
113 * either passed too few samples, or a dictionary would not be effective
114 * for your data. Look at the messages that the dictionary trainer printed,
115 * if it doesn't say too few samples, then a dictionary would not be effective.
117 * How large should my dictionary be?
120 * A reasonable dictionary size, the `dictBufferCapacity`, is about 100KB.
121 * The zstd CLI defaults to a 110KB dictionary. You likely don't need a
122 * dictionary larger than that. But, most use cases can get away with a
123 * smaller dictionary. The advanced dictionary builders can automatically
124 * shrink the dictionary for you, and select the smallest size that doesn't
126 * A smaller dictionary can save memory, and potentially speed up
129 * How many samples should I provide to the dictionary builder?
132 * We generally recommend passing ~100x the size of the dictionary
136 * samples can slow down the dictionary builder.
138 * How do I determine if a dictionary will be effective?
141 * Simply train a dictionary and try it out. You can use zstd's built in
142 * benchmarking tool to test the dictionary effectiveness.
144 * # Benchmark levels 1-3 without a dictionary
146 * # Benchmark levels 1-3 with a dictionary
147 * zstd -b1e3 -r /path/to/my/files -D /path/to/my/dictionary
149 * When should I retrain a dictionary?
152 * You should retrain a dictionary when its effectiveness drops. Dictionary
156 * retrain dictionaries, and if the new dictionary performs significantly
157 * better than the old dictionary, we will ship the new dictionary.
159 * I have a raw content dictionary, how do I turn it into a zstd dictionary?
162 * If you have a raw content dictionary, e.g. by manually constructing it, or
163 * using a third-party dictionary builder, you can turn it into a zstd
164 * dictionary by using `ZDICT_finalizeDictionary()`. You'll also have to
166 * raw content, which contains a dictionary ID and entropy tables, which
167 * will improve compression ratio, and allow zstd to write the dictionary ID
170 * Do I have to use zstd's dictionary builder?
173 * No! You can construct dictionary content however you please, it is just
174 * bytes. It will always be valid as a raw content dictionary. If you want
175 * a zstd dictionary, which can improve compression ratio, use
178 * What is the attack surface of a zstd dictionary?
183 * the dictionary is. However, if an attacker can control the dictionary
191 * Train a dictionary from an array of samples.
196 * The resulting dictionary will be saved into `dictBuffer`.
197 * @return: size of dictionary stored into `dictBuffer` (<= `dictBufferCapacity`)
199 * Note: Dictionary training will fail if there are not enough samples to construct a
200 * dictionary, or if most of the samples are too small (< 8 bytes being the lower limit).
201 * If dictionary training fails, you should use zstd without a dictionary, as the dictionary
202 … would've been ineffective anyways. If you believe your samples would benefit from a dictionary
205 * Tips: In general, a reasonable dictionary has a size of ~ 100 KB.
208 …It's recommended that total size of all samples be about ~x100 times the target size of dictionary.
218 … * NOTE: The zstd format reserves some dictionary IDs for future use.
220 … * may be used by zstd in a public dictionary registry in the future.
221 * These dictionary IDs are:
228 * Given a custom content as a basis for dictionary, and a set of samples,
229 * finalize dictionary by adding headers and statistics according to the zstd
230 * dictionary format.
235 * should be representative of what you will compress with this dictionary.
239 * compression level differ, so tuning the dictionary for the compression level
242 * You can set an explicit dictionary ID in `parameters`, or allow us to pick
243 * a random dictionary ID for you, but we can't guarantee no collisions.
248 * is presumed that the most profitable content is at the end of the dictionary,
253 * @return: size of dictionary stored into `dstDictBuffer` (<= `maxDictSize`),
269 …tBuffer, size_t dictSize); /**< extracts dictID; @return zero if error (not a valid dictionary) */
311 …minimum size and selects the smallest dictionary that is shrinkDictMaxRegression% worse than the l…
312 …axRegression so that a smaller dictionary can be at worse shrinkDictMaxRegression% worse than the …
324 …minimum size and selects the smallest dictionary that is shrinkDictMaxRegression% worse than the l…
325 …axRegression so that a smaller dictionary can be at worse shrinkDictMaxRegression% worse than the …
331 * Train a dictionary from an array of samples using the COVER algorithm.
334 * The resulting dictionary will be saved into `dictBuffer`.
335 * @return: size of dictionary stored into `dictBuffer` (<= `dictBufferCapacity`)
339 * Tips: In general, a reasonable dictionary has a size of ~ 100 KB.
342 …It's recommended that total size of all samples be about ~x100 times the target size of dictionary.
353 * dictionary constructed with those parameters is stored in `dictBuffer`.
360 * @return: size of dictionary stored into `dictBuffer` (<= `dictBufferCapacity`)
372 * Train a dictionary from an array of samples using a modified version of COVER algorithm.
377 * The resulting dictionary will be saved into `dictBuffer`.
378 * @return: size of dictionary stored into `dictBuffer` (<= `dictBufferCapacity`)
382 * Tips: In general, a reasonable dictionary has a size of ~ 100 KB.
385 …It's recommended that total size of all samples be about ~x100 times the target size of dictionary.
396 * dictionary constructed with those parameters is stored in `dictBuffer`.
404 * @return: size of dictionary stored into `dictBuffer` (<= `dictBufferCapacity`)
416 unsigned selectivityLevel; /* 0 means default; larger => select more => larger dictionary */
421 * Train a dictionary from an array of samples.
424 * The resulting dictionary will be saved into `dictBuffer`.
426 * @return: size of dictionary stored into `dictBuffer` (<= `dictBufferCapacity`)
429 * Tips: In general, a reasonable dictionary has a size of ~ 100 KB.
432 …It's recommended that total size of all samples be about ~x100 times the target size of dictionary.