zorba-coders team mailing list archive
-
zorba-coders team
-
Mailing list archive
-
Message #12346
[Bug 1025622] [NEW] incorrect JSON serialization of supplementory plane code points
Public bug reported:
this bug is a follow-up of bug #1024448
Currently, the result of the following JSONiq query:
let $message := "👊"
return { "message": $message }
is serialized into incorrect JSON:
{ "message" : "\ufffffff0\uffffff9f\uffffff91\uffffff8a" }
the correct result would be:
{ "message" : "\ud83d\udc4a" }
Explanation:
Characters from the supplementory plane are usually represented in
utf-16 surrogate pairs within JSON results. The above result is in
particular incorrect because JSON allows only 4 hex digits after '\u'.
utf-16 surrogate pairs alway fit into a 4 hex digit or 2 x 4 hex digit
window which is most probably the reason why utf-16 is used.
This has been greatly fixed in the JSON parser by Paul (see mp:
https://code.launchpad.net/~paul-lucas/zorba/bug-1024448/+merge/115248
), but it still needs to be fixed in the serializer.
@Paul: I'm not sure if you are the right person to assign this bug to?
thanks
** Affects: zorba
Importance: Undecided
Assignee: Paul J. Lucas (paul-lucas)
Status: New
** Tags: incorrect-result jsoniq
--
You received this bug notification because you are a member of Zorba
Coders, which is the registrant for Zorba.
https://bugs.launchpad.net/bugs/1025622
Title:
incorrect JSON serialization of supplementory plane code points
Status in Zorba - The XQuery Processor:
New
Bug description:
this bug is a follow-up of bug #1024448
Currently, the result of the following JSONiq query:
let $message := "👊"
return { "message": $message }
is serialized into incorrect JSON:
{ "message" : "\ufffffff0\uffffff9f\uffffff91\uffffff8a" }
the correct result would be:
{ "message" : "\ud83d\udc4a" }
Explanation:
Characters from the supplementory plane are usually represented in
utf-16 surrogate pairs within JSON results. The above result is in
particular incorrect because JSON allows only 4 hex digits after '\u'.
utf-16 surrogate pairs alway fit into a 4 hex digit or 2 x 4 hex digit
window which is most probably the reason why utf-16 is used.
This has been greatly fixed in the JSON parser by Paul (see mp:
https://code.launchpad.net/~paul-lucas/zorba/bug-1024448/+merge/115248
), but it still needs to be fixed in the serializer.
@Paul: I'm not sure if you are the right person to assign this bug to?
thanks
To manage notifications about this bug go to:
https://bugs.launchpad.net/zorba/+bug/1025622/+subscriptions
Follow ups
-
[Bug 1025622] Re: Incorrect JSON serialization of supplementary plane code points
From: Paul J. Lucas, 2012-07-20
-
[Bug 1025622] Re: Incorrect JSON serialization of supplementary plane code points
From: Paul J. Lucas, 2012-07-19
-
[Bug 1025622] Re: Incorrect JSON serialization of supplementory plane code points
From: Paul J. Lucas, 2012-07-19
-
[Bug 1025622] Re: incorrect JSON serialization of supplementory plane code points
From: Paul J. Lucas, 2012-07-18
-
[Bug 1025622] Re: incorrect JSON serialization of supplementory plane code points
From: Paul J. Lucas, 2012-07-18
-
[Bug 1025622] Re: incorrect JSON serialization of supplementory plane code points
From: Chris Hillery, 2012-07-17
-
[Bug 1025622] Re: incorrect JSON serialization of supplementory plane code points
From: Chris Hillery, 2012-07-17
-
[Bug 1025622] Re: incorrect JSON serialization of supplementory plane code points
From: Paul J. Lucas, 2012-07-17
-
[Bug 1025622] Re: incorrect JSON serialization of supplementory plane code points
From: Dennis Knochenwefel, 2012-07-17
-
[Bug 1025622] Re: incorrect JSON serialization of supplementory plane code points
From: Paul J. Lucas, 2012-07-17
-
[Bug 1025622] [NEW] incorrect JSON serialization of supplementory plane code points
From: Dennis Knochenwefel, 2012-07-17
References