zeitgeist team mailing list archive
-
zeitgeist team
-
Mailing list archive
-
Message #02129
[Merge] lp:~seif/zeitgeist/optimize-find-related-uris into lp:zeitgeist
Seif Lotfy has proposed merging lp:~seif/zeitgeist/optimize-find-related-uris into lp:zeitgeist.
Requested reviews:
Zeitgeist Framework Team (zeitgeist)
This is nothing else but a tiny optimization. Instead of calling get_events for the ids and thus resulting in having "Events" returned I did manual querying of the DB to get timestamp, id and uri. Thus saving us time and memory. The results are 2x faster than before
--
https://code.launchpad.net/~seif/zeitgeist/optimize-find-related-uris/+merge/38820
Your team Zeitgeist Framework Team is requested to review the proposed merge of lp:~seif/zeitgeist/optimize-find-related-uris into lp:zeitgeist.
=== modified file '_zeitgeist/engine/main.py'
--- _zeitgeist/engine/main.py 2010-10-18 20:09:25 +0000
+++ _zeitgeist/engine/main.py 2010-10-19 10:29:41 +0000
@@ -432,15 +432,18 @@
pot.append(x)
# Out of the pot we get all respected events and count which uris occur most
- events = self.get_events(pot)
+ rows = self._cursor.execute("""
+ SELECT id, timestamp, subj_uri FROM event_view
+ WHERE id IN (%s)
+ """ % ",".join("%d" % id for id in pot)).fetchall()
+
subject_uri_counter = defaultdict(int)
latest_uris = defaultdict(int)
- for event in events:
- if event and event.id not in ids:
- subj = event.subjects[0]
- subject_uri_counter[subj.uri] += 1
- if latest_uris[subj.uri] < event.timestamp:
- latest_uris[subj.uri] = event.timestamp
+ for id, timestamp, uri in rows:
+ if id not in ids:
+ subject_uri_counter[uri] += 1
+ if latest_uris[uri] < timestamp:
+ latest_uris[uri] = timestamp
log.debug("FindRelatedUris: Finished ranking subjects %fs." % \
(time.time()-t1))