1 /*! 2 This library provides heavily optimized routines for string search primitives. 3 4 # Overview 5 6 This section gives a brief high level overview of what this crate offers. 7 8 * The top-level module provides routines for searching for 1, 2 or 3 bytes 9 in the forward or reverse direction. When searching for more than one byte, 10 positions are considered a match if the byte at that position matches any 11 of the bytes. 12 * The [`memmem`] sub-module provides forward and reverse substring search 13 routines. 14 15 In all such cases, routines operate on `&[u8]` without regard to encoding. This 16 is exactly what you want when searching either UTF-8 or arbitrary bytes. 17 18 # Example: using `memchr` 19 20 This example shows how to use `memchr` to find the first occurrence of `z` in 21 a haystack: 22 23 ``` 24 use memchr::memchr; 25 26 let haystack = b"foo bar baz quuz"; 27 assert_eq!(Some(10), memchr(b'z', haystack)); 28 ``` 29 30 # Example: matching one of three possible bytes 31 32 This examples shows how to use `memrchr3` to find occurrences of `a`, `b` or 33 `c`, starting at the end of the haystack. 34 35 ``` 36 use memchr::memchr3_iter; 37 38 let haystack = b"xyzaxyzbxyzc"; 39 40 let mut it = memchr3_iter(b'a', b'b', b'c', haystack).rev(); 41 assert_eq!(Some(11), it.next()); 42 assert_eq!(Some(7), it.next()); 43 assert_eq!(Some(3), it.next()); 44 assert_eq!(None, it.next()); 45 ``` 46 47 # Example: iterating over substring matches 48 49 This example shows how to use the [`memmem`] sub-module to find occurrences of 50 a substring in a haystack. 51 52 ``` 53 use memchr::memmem; 54 55 let haystack = b"foo bar foo baz foo"; 56 57 let mut it = memmem::find_iter(haystack, "foo"); 58 assert_eq!(Some(0), it.next()); 59 assert_eq!(Some(8), it.next()); 60 assert_eq!(Some(16), it.next()); 61 assert_eq!(None, it.next()); 62 ``` 63 64 # Example: repeating a search for the same needle 65 66 It may be possible for the overhead of constructing a substring searcher to be 67 measurable in some workloads. In cases where the same needle is used to search 68 many haystacks, it is possible to do construction once and thus to avoid it for 69 subsequent searches. This can be done with a [`memmem::Finder`]: 70 71 ``` 72 use memchr::memmem; 73 74 let finder = memmem::Finder::new("foo"); 75 76 assert_eq!(Some(4), finder.find(b"baz foo quux")); 77 assert_eq!(None, finder.find(b"quux baz bar")); 78 ``` 79 80 # Why use this crate? 81 82 At first glance, the APIs provided by this crate might seem weird. Why provide 83 a dedicated routine like `memchr` for something that could be implemented 84 clearly and trivially in one line: 85 86 ``` 87 fn memchr(needle: u8, haystack: &[u8]) -> Option<usize> { 88 haystack.iter().position(|&b| b == needle) 89 } 90 ``` 91 92 Or similarly, why does this crate provide substring search routines when Rust's 93 core library already provides them? 94 95 ``` 96 fn search(haystack: &str, needle: &str) -> Option<usize> { 97 haystack.find(needle) 98 } 99 ``` 100 101 The primary reason for both of them to exist is performance. When it comes to 102 performance, at a high level at least, there are two primary ways to look at 103 it: 104 105 * **Throughput**: For this, think about it as, "given some very large haystack 106 and a byte that never occurs in that haystack, how long does it take to 107 search through it and determine that it, in fact, does not occur?" 108 * **Latency**: For this, think about it as, "given a tiny haystack---just a 109 few bytes---how long does it take to determine if a byte is in it?" 110 111 The `memchr` routine in this crate has _slightly_ worse latency than the 112 solution presented above, however, its throughput can easily be over an 113 order of magnitude faster. This is a good general purpose trade off to make. 114 You rarely lose, but often gain big. 115 116 **NOTE:** The name `memchr` comes from the corresponding routine in `libc`. A 117 key advantage of using this library is that its performance is not tied to its 118 quality of implementation in the `libc` you happen to be using, which can vary 119 greatly from platform to platform. 120 121 But what about substring search? This one is a bit more complicated. The 122 primary reason for its existence is still indeed performance, but it's also 123 useful because Rust's core library doesn't actually expose any substring 124 search routine on arbitrary bytes. The only substring search routine that 125 exists works exclusively on valid UTF-8. 126 127 So if you have valid UTF-8, is there a reason to use this over the standard 128 library substring search routine? Yes. This routine is faster on almost every 129 metric, including latency. The natural question then, is why isn't this 130 implementation in the standard library, even if only for searching on UTF-8? 131 The reason is that the implementation details for using SIMD in the standard 132 library haven't quite been worked out yet. 133 134 **NOTE:** Currently, only `x86_64`, `wasm32` and `aarch64` targets have vector 135 accelerated implementations of `memchr` (and friends) and `memmem`. 136 137 # Crate features 138 139 * **std** - When enabled (the default), this will permit features specific to 140 the standard library. Currently, the only thing used from the standard library 141 is runtime SIMD CPU feature detection. This means that this feature must be 142 enabled to get AVX2 accelerated routines on `x86_64` targets without enabling 143 the `avx2` feature at compile time, for example. When `std` is not enabled, 144 this crate will still attempt to use SSE2 accelerated routines on `x86_64`. It 145 will also use AVX2 accelerated routines when the `avx2` feature is enabled at 146 compile time. In general, enable this feature if you can. 147 * **alloc** - When enabled (the default), APIs in this crate requiring some 148 kind of allocation will become available. For example, the 149 [`memmem::Finder::into_owned`](crate::memmem::Finder::into_owned) API and the 150 [`arch::all::shiftor`](crate::arch::all::shiftor) substring search 151 implementation. Otherwise, this crate is designed from the ground up to be 152 usable in core-only contexts, so the `alloc` feature doesn't add much 153 currently. Notably, disabling `std` but enabling `alloc` will **not** result 154 in the use of AVX2 on `x86_64` targets unless the `avx2` feature is enabled 155 at compile time. (With `std` enabled, AVX2 can be used even without the `avx2` 156 feature enabled at compile time by way of runtime CPU feature detection.) 157 * **logging** - When enabled (disabled by default), the `log` crate is used 158 to emit log messages about what kinds of `memchr` and `memmem` algorithms 159 are used. Namely, both `memchr` and `memmem` have a number of different 160 implementation choices depending on the target and CPU, and the log messages 161 can help show what specific implementations are being used. Generally, this is 162 useful for debugging performance issues. 163 * **libc** - **DEPRECATED**. Previously, this enabled the use of the target's 164 `memchr` function from whatever `libc` was linked into the program. This 165 feature is now a no-op because this crate's implementation of `memchr` should 166 now be sufficiently fast on a number of platforms that `libc` should no longer 167 be needed. (This feature is somewhat of a holdover from this crate's origins. 168 Originally, this crate was literally just a safe wrapper function around the 169 `memchr` function from `libc`.) 170 */ 171 172 #![deny(missing_docs)] 173 #![no_std] 174 // It's just not worth trying to squash all dead code warnings. Pretty 175 // unfortunate IMO. Not really sure how to fix this other than to either 176 // live with it or sprinkle a whole mess of `cfg` annotations everywhere. 177 #![cfg_attr( 178 not(any( 179 all(target_arch = "x86_64", target_feature = "sse2"), 180 all(target_arch = "wasm32", target_feature = "simd128"), 181 target_arch = "aarch64", 182 )), 183 allow(dead_code) 184 )] 185 // Same deal for miri. 186 #![cfg_attr(miri, allow(dead_code, unused_macros))] 187 188 // Supporting 8-bit (or others) would be fine. If you need it, please submit a 189 // bug report at https://github.com/BurntSushi/memchr 190 #[cfg(not(any( 191 target_pointer_width = "16", 192 target_pointer_width = "32", 193 target_pointer_width = "64" 194 )))] 195 compile_error!("memchr currently not supported on non-{16,32,64}"); 196 197 #[cfg(any(test, feature = "std"))] 198 extern crate std; 199 200 #[cfg(any(test, feature = "alloc"))] 201 extern crate alloc; 202 203 pub use crate::memchr::{ 204 memchr, memchr2, memchr2_iter, memchr3, memchr3_iter, memchr_iter, 205 memrchr, memrchr2, memrchr2_iter, memrchr3, memrchr3_iter, memrchr_iter, 206 Memchr, Memchr2, Memchr3, 207 }; 208 209 #[macro_use] 210 mod macros; 211 212 #[cfg(test)] 213 #[macro_use] 214 mod tests; 215 216 pub mod arch; 217 mod cow; 218 mod ext; 219 mod memchr; 220 pub mod memmem; 221 mod vector; 222