1[/ 2 Copyright 2006-2007 John Maddock. 3 Distributed under the Boost Software License, Version 1.0. 4 (See accompanying file LICENSE_1_0.txt or copy at 5 http://www.boost.org/LICENSE_1_0.txt). 6] 7 8 9[section:unicode Unicode and Boost.Regex] 10 11There are two ways to use Boost.Regex with Unicode strings: 12 13[h4 Rely on wchar_t] 14 15If your platform's `wchar_t` type can hold Unicode strings, and your 16platform's C/C++ runtime correctly handles wide character constants 17(when passed to `std::iswspace` `std::iswlower` etc), then you can use 18`boost::wregex` to process Unicode. However, there are several 19disadvantages to this approach: 20 21* It's not portable: there's no guarantee on the width of `wchar_t`, or 22even whether the runtime treats wide characters as Unicode at all, 23most Windows compilers do so, but many Unix systems do not. 24* There's no support for Unicode-specific character classes: `[[:Nd:]]`, `[[:Po:]]` etc. 25* You can only search strings that are encoded as sequences of wide 26characters, it is not possible to search UTF-8, or even UTF-16 on many platforms. 27 28[h4 Use a Unicode Aware Regular Expression Type.] 29 30If you have the 31[@http://www.ibm.com/software/globalization/icu/ ICU library], then 32Boost.Regex can be 33[link boost_regex.install.building_with_unicode_and_icu_su 34configured to make use 35of it], and provide a distinct regular expression type (boost::u32regex), 36that supports both Unicode specific character properties, and the searching 37of text that is encoded in either UTF-8, UTF-16, or UTF-32. See: 38[link boost_regex.ref.non_std_strings.icu 39ICU string class support]. 40 41[endsect] 42 43