launchpad-dev team mailing list archive

Thread
Date

Re: memcache, responsiveness and load {short story, lets turn memcache off}

To: Martin Pool <mbp@xxxxxxxxxxxxx>
From: Jeroen Vermeulen <jtv@xxxxxxxxxxxxx>
Date: Thu, 05 Aug 2010 17:39:59 +0700
Cc: launchpad-dev@xxxxxxxxxxxxxxxxxxx
In-reply-to: <AANLkTikD-6zrW9iuyYJGoP8inQ49uxS7x3l3dRkmql7L@mail.gmail.com>
User-agent: Mozilla/5.0 (X11; U; Linux i686; en-US; rv:1.9.1.11) Gecko/20100713 Thunderbird/3.0.6

On 2010-08-05 07:59, Martin Pool wrote:

If the main problem is "user changes something but doesn't see the change
reflected in the page," that's the same problem we have with replication
lag.  Couldn't we solve that in the same way, by having a user bypass
memcached for a while after a POST?


We could.  It would be a fairly cute way to solve the "but I thought I
just changed that?" bug, but it's not a a total solution: "but Jeroen,
I thought you said you fixed that?"  On the whole I would still call
it a bandaid.

Actually I think it does solve the problem when the cached lifetime isshorter than the time it takes for me to tell you I've done it and foryou to load up the page with the expectation to see the work done. Thisis why I'm suggesting very short expiry times.

Of course there's also replication lag, browser caching, transparentproxies, and the reverse proxy so I am taking a bit of anI-don't-need-to-outrun-the-bear-I-just-need-to-outrun-you view. If skewfrom those other things aren't a problem now, I'm saying we couldcheaply ensure that memcached doesn't make things worse.

One thing we could do is to use feature flags to turn on or off TAL
caching, so that we can make the correctness/throughput tradeoff
dynamically when we're being slashdotted.  (Again, flickr etc
apparently use this technique.)

I'd want it enabled all the time--but with expiry time set just longenough to take the edge off a load spike for very specific fragments.Could be as short as a second for all I care.

When slashdot strikes, I would _not_ want our users to time out untilthe oopses show up in our email the next morning, and some enterprisingengineer checks the referrer, and the problem is debated with IS, anddecisions are held off until someone responsible comes online, and thencaching is enabled either by cowboy patch or a multi-handoff reviewprocedure, and finally the jolly lot of us figure out whether anyglitches are due to the spike, to a systemic failure, or to pre-existingproblems that we were hiding because caching was disabled.

Maybe I'm over-focusing on the Sudden Deadly Spike. I just find it auseful way to think about memcached because it removes all temptation toreduce timeout counts without fixing latency. It's also what makes mefeel that low hit rates are fine for normal days--as long as they shootup (and app/db load stays relatively steady) when Google's Doodle of theDay happens to link to Bug #1.



Jeroen

References

memcache, responsiveness and load {short story, lets turn memcache off}
From: Robert Collins, 2010-08-04
Re: memcache, responsiveness and load {short story, lets turn memcache off}
From: Robert Collins, 2010-08-04
Re: memcache, responsiveness and load {short story, lets turn memcache off}
From: Jeroen Vermeulen, 2010-08-04
Re: memcache, responsiveness and load {short story, lets turn memcache off}
From: Martin Pool, 2010-08-05