launchpad-dev team mailing list archive
Mailing list archive
Suggestions for searching bug attachments
One of the areas I am working for Summer of Code is attachment
searching. Basically the aim is to let an Arsenal user do something like
"Search for <text> in attachments of all bugs that exist in source
packages subscribed by <team>"
This would allow Arsenal users to find matching bugs based on patches
and other attachments.
The process of searching could be done at Arsenal (client) side or at
Launchpad (server) side. Doing it at client side requires downloading
all the attachments to search through them. For larger attachments, this
approach becomes painfully slow and inadequate. Therefore, my project
was to implement the functionality in Launchpad.
In Launchpad, I implemented the findAttachments method  which
searched for the text using Horspool's algorithm in order to read files
in chunks. However, the original problem of going through lot of data
still remains -- albeit at server side this time. As Graham has pointed
out in the review, web service requests can time out depending on the
size and number of attachments that need to be searched.
I therefore am looking for alternate ideas and suggestions for
implementing attachment searching. Two ideas have been proposed so far:
1) FTI in DB for all attachments. For this to work, constraints shall
have to be applied so that searching code doesn't yield false positives
because of normalized lexemes being present in a set of source files
(i.e., differentiating stuff like while (i--) from while (i++)).
Increasing the constraints increases DB overhead.
2) Asynchronous REST requests. I haven't yet worked on them so I don't
know about the possible pitfalls. Graham's comment highlighted issues
What implementation do other developers think I should choose. Or better
yet, please suggest if you have an alternative idea in mind for
searching attachment data.
Kamran Riaz Khan.