← Back to team overview

launchpad-dev team mailing list archive

Re: hacking MHonArc to be a microservice

 

Hi Barry.

Thank you for bring up the topic of Pipermail.
 
On 10/28/2011 11:25 AM, Barry Warsaw wrote:
> On Oct 25, 2011, at 05:23 PM, curtis Hovey wrote:
>
>> I fixed some critical bugs in the mailing list archives earlier this
>> year. I also saw an opportunity to fix several UI bugs since I was
>> working in the MHonArc templates. As I was testing the changes it
>> occurred to me that In the a matter of hours, I could change the
>> templates to generate JSON instead of HTML. I dismissed the idea as
>> outlandish, but after 6 months, I think it may not be crazy.
> This is a very neat idea.  I'd suggest structuring the changes in such a way
> that upstream might accept them, but tbh, I'm not sure MHonArc is still a
> healthy project.  The last release was apparently in January 2011, and the
> ViewCVS for the project is a dead link.  It's hard to tell from the CVS
> snapshot what, if anything, has been happening lately.
MHonArc is dead to me. It gets critical security updates, but features
or bug fixes. I have  not interest in keeping it on a respirator.  We
all need to accept that it is an ex-archiver, it has ceased to be.
>
>> Doing this might be easier than fixing private lists.
>>
>> We can replace MHonArc in the future with another system that serves JSON.
> So, let's fix Pipermail to have stable urls and output JSON.  That Perl stuff
> will rot your brain.  :)
I was pondering the effort of hacking in Perl to ensure we get proper
JSON encoding. There are also issues where MHonArc want to paginate the
index.

For the edification of other readers, Mailman pipes the message to the
MHonArc command line app, which does five things: Append the message to
the mbox, update the db, regenerated the index files, generate the
message file, and regenerate the adjacent messages. The JSON proposal
address the last three items.

The db in this case is a perl hash that is interpreted...not unlike the
JSON proposal. The db stores the index of visible messages by time and
thread. We have had two critical problems with the db in the past. We
had to beat it to work with unicode, and it has a habit of undeleting
messages. The former problem is about hiding public data, while the
later is presenting private data. We hate the db and MHonArc's command
line. We use a python script to regenerate web archives and delete
messages to ensure messages stay in the state we want.

It takes a few lines of Python to append the message to an mbox. I do
not think there is much work in writing a means to store the the time
and thread sequence. of messages. I want a mechanism that ensures
confidential data it not undeleted by accident, which is pretty easy to
do if the delete message ids are explicitly stored instead of the
implicit approach used by MHonArc.

I think Pipermail already does the mbox and db or threads and time, so
maybe I want to use what already exists. More confidence using
simplejson in Python.

As for stable URLs, MHonarc's are not stable (the message numbers change
when messages are deleted.). Lp will be dealing with the URLS, and our
serialisation of JSON requires that Lp can lookup and get the needed data.

I think however Barry, that you really care about predicable URLs. We
want to include the URL of the message that *will* be archived in the
outgoing message footer. This is impossible with MHonArc and Pipermail.
I do not think this is hard to solve. Lp requires each message to have a
unique message-id. Lp drops any message that reuses an id, and so to
does our mailman additions. We can  construct a URL using the URL
encoded message-id. eg:
   
https://launchpad.net/~haibunku/+mailing-list/+message/j3224d7324324%40sdaf2344.myhost.dom
^ That sure is ugly, but it will work. We could also encode the
message-id if we want them to be opaque.

Since I have your attention Barry. I have assumed that I can run the new
service in parallel with the existing archive by updating the queue
runner to also call both old archiver and new archive.

-- 
Curtis Hovey
http://launchpad.net/~sinzui


Attachment: signature.asc
Description: OpenPGP digital signature


Follow ups

References