Seems that std::codecvt_utf8 works well for conversion std::wstring -> utf8. It passed all my tests. (Windows app, Visual Studio 2015, Windows 8 with EN locale)
I needed a way to convert filenames to UTF8. Therefore my test is about filenames.
In my app I use boost::filesystem::path 1.60.0 to deal with file path. It works well, but not able to convert filenames to UTF8 properly.
Internally Windows version of boost::filesystem::path uses std::wstring to store the file path. Unfortunately, build-in conversion to std::string works bad.
Test case:
- create file with mixed symbols
c:\test\皀皁皂皃的 (some random Asian symbols)
- scan dir with
boost::filesystem::directory_iterator, get boost::filesystem::path for the file
- convert it to the
std::string via build-in conversion filenamePath.string()
- you get
c:\test\?????. Asian symbols converted to '?'. Not good.
boost::filesystem uses std::codecvt internally. It doesn't work for conversion std::wstring -> std::string.
Instead of build-in boost::filesystem::path conversion you can define conversion function as this (original snippet):
std::string utf8_to_wstring(const std::wstring & str)
{
std::wstring_convert<std::codecvt_utf8<wchar_t>> myconv;
return myconv.to_bytes(str);
}
Then you can convert filepath to UTF8 easily: utf8_to_wstring(filenamePath.wstring()). It works perfectly.
It works for any filepath. I tested ASCII strings c:\test\test_file, Asian strings c:\test\皀皁皂皃的, Russian strings c:\test\абвгд, mixed strings c:\test\test_皀皁皂皃的, c:\test\test_абвгд, c:\test\test_皀皁皂皃的_абвгд. For every string I receive valid UTF8 representation.