← Back to team overview

drizzle-discuss team mailing list archive

Re: Wild feature: query de-duplication

 

Actually, I rather like this idea too.

I can imagine a version that's:

(1) smart enough to ignore SQL comments (some folks stick a comment at the
end of the query with a unique host Id or something for stats and/or
debugging)

(2) configurable so the site can decide they don't care about transaction
state and is willing to say that 2 queries are duplicates if the match a
very small set of features (user makes sense, but not sure about tx
isolation level and other "exotic stuff")

The reason I suggest a "related" approach to seeing that two queries are
equal is that I bet most folks doing high volumes of possibly duplicate
queries are not doing much fancy in the first place.  They're just hitting a
bank of slaves from their webbies and hoping to repopulate the failed
memcache tier without killing the DB boxes i n the process.

Or am I nuts?

Jeremy

On Fri, Jun 25, 2010 at 3:17 PM, Roland Bouman <roland.bouman@xxxxxxxxx>wrote:

> It's an interesting idea!
>
> I think it could be quite hard to detect whether the same query will
> in fact deliver the same result though - one would have to check to if
> the query is affected by the state of the session (values of
> session-scoped variables, functions returning session-dependent data
> like current_user(), isolation level of an explicit transaction,
> probably a lot more things).
>
> (of course - i am not a drizzle dev or engineer - I may be
> overestimating the complexity)
>
>
> On Fri, Jun 25, 2010 at 3:05 PM, Brian Moon <brian@xxxxxxxxxxxx> wrote:
> > So, one common use for Gearman seems to be running SQL queries through it
> so
> > that a given query only ever is running once at a given time. This solves
> > one big issue I have seen with a cache stampede. The application servers
> all
> > run the same queries at the same time. Pushing this logic down to the
> > database server seems like a good idea from where I sit. The server could
> > recognize that there are N connections all running the same query and not
> > try and do all the work needed to send back the result.
> >
> > What do you guys think? Does this make sense at the database layer? Is
> the
> > drizzle plugin architecture able to support this type of behavior?
> >
> > --
> >
> > Brian.
> > --------
> > http://brian.moonspot.net/
> >
> > _______________________________________________
> > Mailing list: https://launchpad.net/~drizzle-discuss
> > Post to     : drizzle-discuss@xxxxxxxxxxxxxxxxxxx
> > Unsubscribe : https://launchpad.net/~drizzle-discuss
> > More help   : https://help.launchpad.net/ListHelp
> >
>
>
>
> --
> Roland Bouman
> blog: http://rpbouman.blogspot.com/
> twitter: @rolandbouman
>
> Author of "Pentaho Solutions: Business Intelligence and Data
> Warehousing with Pentaho and MySQL",
> http://tinyurl.com/lvxa88 (Wiley, ISBN: 978-0-470-48432-6)
>
> Author of "Pentaho Kettle Solutions: Building Open Source ETL
> Solutions with Pentaho Data Integration",
> http://tinyurl.com/33r7a8m (Wiley, ISBN: 978-0-470-63517-9)
>
> _______________________________________________
> Mailing list: https://launchpad.net/~drizzle-discuss
> Post to     : drizzle-discuss@xxxxxxxxxxxxxxxxxxx
> Unsubscribe : https://launchpad.net/~drizzle-discuss
> More help   : https://help.launchpad.net/ListHelp
>

Follow ups

References