← Back to team overview

launchpad-dev team mailing list archive

Re: Trouble loading a meliae dump

 

-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1

Guilherme Salgado wrote:
> Hi John,
> 
> I've used meliae to get a memory dump from Launchpad, but when I tried
> to load that dump I got http://paste.ubuntu.com/397273/ (the first line
> there shows the line that causes simplejson.loads() to choke).
> 
> From my understanding of [1], this seems to be expected, but I wonder
> how these unpaired surrogates ended up in the dump.  Any ideas?
> 
> BTW, I did some hacks in my local copy of meliae to replace the
> problematic bits on that line, and after that I was able to load the
> dump. Maybe with that I could try and find out where the unpaired
> surrogates are coming from?
> 
> [1] <http://en.wikipedia.org/wiki/Mapping_of_Unicode_characters#Surrogates>
> 
> Cheers,
> 

I'm mostly offline on vacation right now, but I'll try to help out when
I get back. I can think of 2 causes:

1) I trim most output to 100 characters. (So if you have a 1,000 byte
string, I only output 100 bytes.) It is possible that a Unicode
surrogate was at bytes 100 and 101 and just got truncated.

2) I use a pretty stupid method for encoding 8-bit strings, just mapping
them all to the unicode code point '\xff' => U+00FF. Some of that may be
invalid.

3) Other bugs I don't even know about... :)

I'm happy to debug this with you sometimes soon. (If you're getting
this, it probably means I'm back home, rather than offline in an airport.)

John
=:->

-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.4.9 (Cygwin)
Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org/

iEYEARECAAYFAkukX9sACgkQJdeBCYSNAANWfwCgw2CBP2rdIwUEGwNK9yE70sIY
LqoAn2J14Q84GDZEBLPDlqBZjol6iVzn
=MvTl
-----END PGP SIGNATURE-----




Follow ups

References