← Back to team overview

linuxdcpp-team team mailing list archive

[Bug 1649066] Re: Invalid UTF-8 data is not always being rejected

 

Committed the alternative patch which sanitizes strings with invalid
data when the flag is present.

The Windows version of Text::utf8ToWide that's more conform with newer
Unicode standards will replace invalid UTF-8 data with U+FFFD, the
"UNICODE REPLACEMENT CHARACTER".

No change in the non-Windows version of the function right now (for some
reason AirDC++ seems to be still using this one for both targets despite
comment #11).

Anyone affected please reopen this if the above behavior is considered
as appropriate for non-Windows targets or if there's any issue regarding
the changes has been made.

Also note that Text::convert is currently not used in DC++; I applied
the proposed change for that function for both targets.


** Changed in: dcplusplus
       Status: New => Fix Committed

-- 
You received this bug notification because you are a member of
Dcplusplus-team, which is subscribed to DC++.
https://bugs.launchpad.net/bugs/1649066

Title:
  Invalid UTF-8 data is not always being rejected

Status in AirDC++:
  Fix Released
Status in DC++:
  Fix Committed

Bug description:
  There are various cases where invalid UTF-8 data is being consumed by
  the core:

  1. Text::convert will return the original string in case of errors (Linux only, respective Windows-specific functions will return an empty string in case of errors)
  2. When using "utf-8" encoding in NMDC hubs, the original string will always be returned by conversion functions without validation (generally Linux only since "utf-8" can't be selected from DC++'s GUI)
  3. UTF-8 validation is not performed for strings parsed from XML (specifically file/directory names in filelists)

  This will cause issues especially when the data is processed by
  external sources/libraries that expect to receive valid UTF-8 data
  (https://github.com/airdcpp-web/airdcpp-webclient/issues/204). I'm not
  sure about security implications.

  Another note: messages that fail UTF-8 validation in ADC hubs are
  ignored silently. At least Flexhub seems to be having problems with
  data validation which currently goes unnoticed.

To manage notifications about this bug go to:
https://bugs.launchpad.net/airdcpp/+bug/1649066/+subscriptions



References