launchpad-dev team mailing list archive
-
launchpad-dev team
-
Mailing list archive
-
Message #03027
Re: Trouble loading a meliae dump
-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1
Guilherme Salgado wrote:
> Hi John,
>
> I've used meliae to get a memory dump from Launchpad, but when I tried
> to load that dump I got http://paste.ubuntu.com/397273/ (the first line
> there shows the line that causes simplejson.loads() to choke).
>
> From my understanding of [1], this seems to be expected, but I wonder
> how these unpaired surrogates ended up in the dump. Any ideas?
>
> BTW, I did some hacks in my local copy of meliae to replace the
> problematic bits on that line, and after that I was able to load the
> dump. Maybe with that I could try and find out where the unpaired
> surrogates are coming from?
>
> [1] <http://en.wikipedia.org/wiki/Mapping_of_Unicode_characters#Surrogates>
>
> Cheers,
>
I'm mostly offline on vacation right now, but I'll try to help out when
I get back. I can think of 2 causes:
1) I trim most output to 100 characters. (So if you have a 1,000 byte
string, I only output 100 bytes.) It is possible that a Unicode
surrogate was at bytes 100 and 101 and just got truncated.
2) I use a pretty stupid method for encoding 8-bit strings, just mapping
them all to the unicode code point '\xff' => U+00FF. Some of that may be
invalid.
3) Other bugs I don't even know about... :)
I'm happy to debug this with you sometimes soon. (If you're getting
this, it probably means I'm back home, rather than offline in an airport.)
John
=:->
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.4.9 (Cygwin)
Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org/
iEYEARECAAYFAkukX9sACgkQJdeBCYSNAANWfwCgw2CBP2rdIwUEGwNK9yE70sIY
LqoAn2J14Q84GDZEBLPDlqBZjol6iVzn
=MvTl
-----END PGP SIGNATURE-----
Follow ups
References