← Back to team overview

zeitgeist team mailing list archive

[Bug 494288] Re: "apriori": get most used (websites/notes/documents/etc...)

 

I don't think we should consider open/close events when calculating
these relations. That way it wont work for contacts and other non-file-
like items.

The initial step of the algorithm: "Fetch the last 7 events for this
subject uri" seems good.

The next step where you create a time range neighbourhood around each of
these events, is a bit unclear to me... You create the neighbourhood as
(event.timestamp, <next_event_timestamp>). This seems odd at a glance.
Why not (event.timestamp - delta, event.timestamp + delta) ?

Next thing is that I think you can do the two last steps of the
algorithm in one SQL query. Ie. the parts where you create the k_tuples
and the part where you calculate the support of the k_tuples. Possibly:

SELECT subj_uri, count(subject_uri)
FROM event_view
WHERE (timestamp > ? AND timestamp < ?) OR (timestamp > ? timestamp < ?) OR (...) ...
GROUP BY subj_uri
ORDER BY timestamp ASC
LIMIT 5

I am sure Siegfried can do this even better though :-D

-- 
"apriori": get most used (websites/notes/documents/etc...)
https://bugs.launchpad.net/bugs/494288
You received this bug notification because you are a member of Zeitgeist
Framework, which is the registrant for Zeitgeist Framework.

Status in Zeitgeist Framework: New

Bug description:
We have a branch with the 1-step apriori algorithm built. 
Right now it throws out the most used items with another item
We should make it configurable to be able to ask for most used interpretations of items with other items
This way we can for example ask for most used "websites" with document X
etc....
what do u think?





Follow ups

References