← Back to team overview

launchpad-dev team mailing list archive

Re: sql derived data, eager loading and cachedproperty

 

On Aug 15, 2010, at 4:05 PM, Michael Hudson wrote:

> On Fri, 13 Aug 2010 20:38:26 +1200, Robert Collins <robert.collins@xxxxxxxxxxxxx> wrote:
>> On Fri, Aug 13, 2010 at 4:27 PM, Clint Byrum <clint.byrum@xxxxxxxxxxxxx> wrote:
> 
>>> I'm partial to Gearman for farming work like this out btw. ;)
>>> 
>>> www.gearman.org
>> 
>> I had been meaning to ask - we have some gearman like needs at the
>> moment, and are installing rabbitMQ. Do you see gearman as a
>> replacement for rabbit, or something that might be complementary? I
>> don't think we'd really want the cognitive overhead of two
>> nearly-identical things.
> 
> The only experience I have of gearman is asking the author in his talk
> at LCA how gearman would cope with a job that took a day to execute and
> him completely waffling in response.  The impression I got was that
> gearman was really targeted at getting things done from MySQL triggers,
> which seemed a rather niche use case... probably my impression is wrong
> though :-)
> 

Thats like getting the impression that roads are only for motorcycles
because you saw a documentary on motorcycles. :)

But, I jest, I can understand why you might get that feeling.

Gearman has two very different modes of operation, and so in explaining
these, Eric may have sounded like he was waffling.

If you had a *synchronous* job that took a day, then there would actually
be no real problem other than the job server may die, which would cause the
client to connect to any backup job servers and resubmit the job, so unless
your worker had some kind of progress saved, it would start over. However, 
that means you have a client program blocked for a day, which I doubt makes
sense in most cases.

For async jobs, there are a number of options for persisting messages on the
job server, including drizzle, mysql, postgres, sqlite, and tokyocabinet. In
this case, a day long job simply means that a worker is tied up for a day. The
functionality is a lot like most traditional message queues, though it has 
some limitations like only 3 priorities (high,normal,low) and a fairly
rudimentary design on the persistence that leads to poor performance.

My suggestion for gearman above is for relatively fast tasks that you do not 
want to "fire and forget", but rather you want to farm off to multiple backend
servers.

The best example I can provide is resizing images. If you accept an upload and
want to resize it to thumbnail, 320x200, and 1024x768, these are completely
independent processes. Gearman makes it really easy to go

gearman.set_complete_callback(handle_complete)
gearman.add_task('resize',(20,20,image))
gearman.add_task('resize',(320,200,image))
gearman.add_task('resize',(1024,768,image))
gearman.run_tasks()

This feeds all three jobs to the job server, and will utilize any 'resize' 
registered workers. No blocking happens until 'run_tasks(). As the images 
complete, 'handle_complete' is called and fed the result. In this way, you
can very easily distribute load. You can also keep adding tasks as the
previous tasks complete.

One of the coolest features is coalescing, where you can add a "unique id" to
any client request, and gearman will multiplex the result of that function +
unique ID to all clients waiting on it. This is great for memcache btw, as it
reduces the thundering herd to one cow per unique key.


References