I'm currently learning C and lately, I have been focusing on the topic of character encoding. Note that I'm a Windows programmer. While I currently test my code only on Windows, I want to eventually port it to Linux and macOS, so I'm trying to learn the best practices right now.
In the example below, I store a file path in a wchar_t variable to be opened later on with _wfopen. I need to use _wfopen because my file path may contain chars not in my default codepage. Afterwards, the file path and a text literal is stored inside a char variable named message for further use. My understanding is that you can store a wide string into a multibyte string with the %ls modifier.
char message[8094] = "";
wchar_t file_path[4096] = L"C:\\test\\test.html";
sprintf(message, "Accessing: %ls\n", file_path);
While the code works, GCC/MinGW outputs the following warning and notes:
warning: '%ls' directive writing up to 49146 bytes into a region of size 8083 [-Wformat-overflow=]|
note: assuming directive output of 16382 bytes|
note: 'sprintf' output between 13 and 49159 bytes into a destination of size 8094|
My issue is that I simply do not understand how sprintf could output up to 49159 bytes into the message variable. I output the Accessing: string literal, the file_path variable, the \n char and the \0 char. What else is there to output?
Sure, I could declare message as a wchar_t variable and use wsprintf instead of sprintf, but my understanding is that wchar_t does not make up for nice portable code. As such, I'm trying to avoid using it unless it's required by a specific API.
So, what am I missing?