← Back to team overview

zeitgeist team mailing list archive

[Merge] lp:~seif/zeitgeist/optimize-find-related-uris into lp:zeitgeist

 

Seif Lotfy has proposed merging lp:~seif/zeitgeist/optimize-find-related-uris into lp:zeitgeist.

Requested reviews:
  Zeitgeist Framework Team (zeitgeist)


This is nothing else but a tiny optimization. Instead of calling get_events for the ids and thus resulting in having "Events" returned I did manual querying of the DB to get timestamp, id and uri. Thus saving us time and memory. The results are 2x faster than before
-- 
https://code.launchpad.net/~seif/zeitgeist/optimize-find-related-uris/+merge/38820
Your team Zeitgeist Framework Team is requested to review the proposed merge of lp:~seif/zeitgeist/optimize-find-related-uris into lp:zeitgeist.
=== modified file '_zeitgeist/engine/main.py'
--- _zeitgeist/engine/main.py	2010-10-18 20:09:25 +0000
+++ _zeitgeist/engine/main.py	2010-10-19 10:29:41 +0000
@@ -432,15 +432,18 @@
 						pot.append(x)
 			
 			# Out of the pot we get all respected events and count which uris occur most
-			events = self.get_events(pot)
+			rows = self._cursor.execute("""
+				SELECT id, timestamp, subj_uri FROM event_view
+				WHERE id IN (%s)
+				""" % ",".join("%d" % id for id in pot)).fetchall()
+			
 			subject_uri_counter = defaultdict(int)
 			latest_uris = defaultdict(int)
-			for event in events:
-				if event and event.id not in ids:
-					subj = event.subjects[0]
-					subject_uri_counter[subj.uri] += 1
-					if latest_uris[subj.uri] < event.timestamp:
-						latest_uris[subj.uri] = event.timestamp
+			for id, timestamp, uri in rows:
+				if id not in ids:
+					subject_uri_counter[uri] += 1
+					if latest_uris[uri] < timestamp:
+						latest_uris[uri] = timestamp
 							
 			log.debug("FindRelatedUris: Finished ranking subjects %fs." % \
 				(time.time()-t1))