linuxdcpp-team team mailing list archive
-
linuxdcpp-team team
-
Mailing list archive
-
Message #08756
[Bug 1649066] Re: Invalid UTF-8 data is not always being rejected
Non-Windows versions of Text::utf8ToWide and Text::wideToUtf8 won't
obviously handle UTF-16 surrogate pairs at all, thus producing incorrect
results. See the following test case for 🌍:
string toUtf8(const wstring& str) {
string tgt;
string::size_type n = str.length();
for (string::size_type i = 0; i < n; ++i) {
Text::wcToUtf8(str[i], tgt);
}
return tgt;
}
wstring emoji = L"\U0001F30D";
ASSERT_EQ(Text::wideToUtf8(emoji), toUtf8(emoji)); // error
--
You received this bug notification because you are a member of
Dcplusplus-team, which is subscribed to DC++.
https://bugs.launchpad.net/bugs/1649066
Title:
Invalid UTF-8 data is not always being rejected
Status in AirDC++:
Fix Released
Status in DC++:
New
Bug description:
There are various cases where invalid UTF-8 data is being consumed by
the core:
1. Text::convert will return the original string in case of errors (Linux only, respective Windows-specific functions will return an empty string in case of errors)
2. When using "utf-8" encoding in NMDC hubs, the original string will always be returned by conversion functions without validation (generally Linux only since "utf-8" can't be selected from DC++'s GUI)
3. UTF-8 validation is not performed for strings parsed from XML (specifically file/directory names in filelists)
This will cause issues especially when the data is processed by
external sources/libraries that expect to receive valid UTF-8 data
(https://github.com/airdcpp-web/airdcpp-webclient/issues/204). I'm not
sure about security implications.
Another note: messages that fail UTF-8 validation in ADC hubs are
ignored silently. At least Flexhub seems to be having problems with
data validation which currently goes unnoticed.
To manage notifications about this bug go to:
https://bugs.launchpad.net/airdcpp/+bug/1649066/+subscriptions
References