zorba-coders team mailing list archive
-
zorba-coders team
-
Mailing list archive
-
Message #12382
[Bug 1025622] Re: incorrect JSON serialization of supplementory plane code points
I put some breakpoints in and it never hits my serialization code, so
it's probably in the JSoniq serialization code.
--
You received this bug notification because you are a member of Zorba
Coders, which is the registrant for Zorba.
https://bugs.launchpad.net/bugs/1025622
Title:
incorrect JSON serialization of supplementory plane code points
Status in Zorba - The XQuery Processor:
Incomplete
Bug description:
this bug is a follow-up of bug #1024448
Currently, the result of the following JSONiq query:
let $message := "👊"
return { "message": $message }
is serialized into incorrect JSON:
{ "message" : "\ufffffff0\uffffff9f\uffffff91\uffffff8a" }
the correct result would be:
{ "message" : "\ud83d\udc4a" }
Explanation:
Characters from the supplementory plane are usually represented in
utf-16 surrogate pairs within JSON results. The above result is in
particular incorrect because JSON allows only 4 hex digits after '\u'.
utf-16 surrogate pairs alway fit into a 4 hex digit or 2 x 4 hex digit
window which is most probably the reason why utf-16 is used.
This has been greatly fixed in the JSON parser by Paul (see mp:
https://code.launchpad.net/~paul-lucas/zorba/bug-1024448/+merge/115248
), but it still needs to be fixed in the serializer.
@Paul: I'm not sure if you are the right person to assign this bug to?
thanks
To manage notifications about this bug go to:
https://bugs.launchpad.net/zorba/+bug/1025622/+subscriptions
References