← Back to team overview

launchpad-dev team mailing list archive

Re: performance tuesday - services design progress

 

On Sat, Jun 4, 2011 at 4:28 AM, Marc Tardif <marc.tardif@xxxxxxxxxx> wrote:
> * Robert Collins <robertc@xxxxxxxxxxxxxxxxx> [2011-06-01 14:30 +1200]:
> [snip]
>> Rather than repeat the wiki page here, I'd love it if you were to go
>> and have a read, and reply to this thread with your thoughts. All the
>> discussions I previously put off - e.g. should we use a message queue
>> as the default internal protocol - are now up for discussion!
>
> Regarding the blob storage section of the ServiceRoadmap wiki page,
> I would like to propose lpresults.storage in the Launchpad Results
> project. Even though I'm shamelessly pimping my own solution, the pros
> are that it supports modular backends (currently filesystem and S3),
> does not rely on a database to manage blobs and is Zope3 friendly. The
> cons are that it does not coalesce the content of files by hash. Even
> if another solution might be preferable, like HDFS for example, this
> could potentially be implemented as another backend.

lpresults.storage is a separate standalone daemon? If not, I think we
need to work on refining the definition of service :).

If it is, then I'm certainly happy for it to be considered when we get
around to factoring out the blob store. We need to talk to U1 about
their blob storage needs and examine whether aiming for a single
solution makes sense (it may, it may not).

Considering just LP - we have a SAN for highly available persistence
of files on disk, so I don't see S3 and S3 like solutions as being an
interesting time investment for LP at this point. We will want highly
available front ends and metadata management, and for that a clustered
solution may make sense.

Zope friendliness is uninteresting: as a separate service this can be
written in the leanest, fastest stack we are comfortable with - the
current librarian (also a contender) is written in twisted for
instance.

File coalescing is probably a must, but I don't see it being an
exposed aspect of the service.

Finally we have a number of accounting (aggregate sizes mapped back to
'user'), tenancy (multiple different users which shouldn't overlap -
oops raw data, email raw data, attachments-to-things-in-LP) and
privacy requirements (see the time limited token facility in the lp
tree today) which need to be taken into account when deciding how to
externalize our blob storage.

-Rob


Follow ups

References