← Back to team overview

zeitgeist team mailing list archive

Fwd: RFC: Database Schema Changes Blueprint

 

---------- Forwarded message ----------
From: Siegfried Gevatter <rainct@xxxxxxxxxx>
Date: 2009/10/10
Subject: Re: [Zeitgeist] RFC: Database Schema Changes Blueprint
To: Mikkel Kamstrup Erlandsen <mikkel.kamstrup@xxxxxxxxx>


2009/10/10 Mikkel Kamstrup Erlandsen <mikkel.kamstrup@xxxxxxxxx>:
> Ok. I get the idea now - the original intent of origin was not what
> you describe though.
>
> The idea with origin was as follows: If I visit
> http://youtube.com/v?7da84bdksy this would also be the URI of the item
> in question, but the origin of the item would be http://youtube.com.
> The reason we want to extract the origin in a rigorous manner (and not
> simply use some prefix-matching on query time) is that we want to be
> able to cluster events based on their origins. "Which youtube videos
> have I watched lately?". Or ask the more general question "what do I
> usually do after watching a youtube video?".

Ah, I see now, but I'm still not convinced we need it (looks like data
duplication). The first use case can already be achieved using
"http://youtube.com/%"; as URI filter in FindEvents. That's more
flexible than just having the host name.

>>> Questionable: Remove the app table all together?
>>> [...] One less table can save us a SQL JOIN.
>> No, if we get two things from the same table we still need two joins
>> as the stuff is in different rows. Further, this would increment disk
>> space usage.
>
> Storing maybe, 100, apps in the item table is not any significant
> overhead space-wise I believe. If we stored 25.000 then maybe, but
> that is way beyond realistic.

Right, the space isn't really the reason why I'm against it, I just
see no benefit in this. The JOIN is still needed anyway (and it
probably becomes slower as there is way more stuff in "item" than in
"app").

> Have a good weekend!

--
Siegfried-Angel Gevatter Pujals (RainCT)
Free Software Developer       363DEAE3



Follow ups

References