← Back to team overview

pyexiv2-developers team mailing list archive

Re: Pickling and multiprocessing

 

On 2011-01-21, Damon Lynch <damonlynch@xxxxxxxxx> wrote:
> On Fri, Jan 21, 2011 at 3:35 AM, Olivier Tilloy <olivier@xxxxxxxxxx
> <mailto:olivier@xxxxxxxxxx>> wrote:
> 
> 
>     Out of curiosity, what’s your use case for pickling image metadata?
>     Ultimately, pickling is no more than serializing data (on disk or in
>     memory), and this data is already in the image itself and can be
>     "reconstructed" from just the file name. Wouldn’t that work for you?
> 
> 
> For instance, the problem of copying photos from memory cards onto the
> hard drive and renaming them. For each memory card, a process copies the
> photos and reads the metadata. So if you have two memory cards, that's
> two processes running in parallel. Both processes then send a message to
> a daemon renaming process, whose only task is to rename photos using the
> exif information, sequence numbers, and whatever else is needed. The
> daemon process itself could load the metadata but it's a relatively slow
> operation, and thus better to do in parallel.

I suppose that the daemon in charge of renaming photos only needs a
rather small subset of the EXIF/IPTC/XMP metadata in order to proceed to
renaming a photo. How about pickling and passing only this subset, e.g.
as a dictionary (should be easy since tags can be pickled)?
You’ll potentially save a lot of bandwidth in your inter-process
communication.

> My preliminary testing using multiprocessing, queues and pipes indicates
> that in the case of copying and renaming photos in parallel, the
> scanning phase (determining what photos are at a location and loading
> them into a TreeView to show the user) takes only 6% of the time it
> takes to do the same thing with threads and locks. Clearly the
> performance gains can be enormous.

Impressive gain indeed, way to go!



Follow ups

References