← Back to team overview

launchpad-dev team mailing list archive

Re: performance tuesday - services design progress

 

* Robert Collins <robertc@xxxxxxxxxxxxxxxxx> [2011-06-04 09:36 +1200]:
> On Sat, Jun 4, 2011 at 4:28 AM, Marc Tardif <marc.tardif@xxxxxxxxxx> wrote:
> > * Robert Collins <robertc@xxxxxxxxxxxxxxxxx> [2011-06-01 14:30 +1200]:
> > [snip]
> >> Rather than repeat the wiki page here, I'd love it if you were to go
> >> and have a read, and reply to this thread with your thoughts. All the
> >> discussions I previously put off - e.g. should we use a message queue
> >> as the default internal protocol - are now up for discussion!
> >
> > Regarding the blob storage section of the ServiceRoadmap wiki page,
> > I would like to propose lpresults.storage in the Launchpad Results
> > project. Even though I'm shamelessly pimping my own solution, the pros
> > are that it supports modular backends (currently filesystem and S3),
> > does not rely on a database to manage blobs and is Zope3 friendly. The
> > cons are that it does not coalesce the content of files by hash. Even
> > if another solution might be preferable, like HDFS for example, this
> > could potentially be implemented as another backend.
> 
> lpresults.storage is a separate standalone daemon? If not, I think we
> need to work on refining the definition of service :).

Fortunately, we have the same definition of a service but with some slight
variations for performance reasons. See below...

> Considering just LP - we have a SAN for highly available persistence
> of files on disk, so I don't see S3 and S3 like solutions as being an
> interesting time investment for LP at this point. We will want highly
> available front ends and metadata management, and for that a clustered
> solution may make sense.

So, that means using the filesystem backend in lpresults.storage which
happens to be the default when the project installed from package even
in EC2. For performance reasons, here are a couple considerations you
might like to know:

- For uploading files, instead of exposing a service to transfer files
  over HTTP for example I considered simply exposing a Zope component
  that uses the filesystem directly. The motivation is that mounting a
  filesystem using existing protocols like NFS or iSCSI by each appserver
  needing to transfer files would result in better performance than
  implementing my own protocol.

- For downloading files, lpresults.storage provides a WSGI application
  similar to the one in lazr.restful. However, this is mostly used in
  development whereas I have a much simpler RewriteMap rule to account
  for directory hashing when the project is installed from package. So,
  the users only hit Apache and the filesystem.

In the end, even though I could expose a service for downloading files,
I mostly try to reuse existing services.

> File coalescing is probably a must, but I don't see it being an
> exposed aspect of the service.
> 
> Finally we have a number of accounting (aggregate sizes mapped back to
> 'user'), tenancy (multiple different users which shouldn't overlap -
> oops raw data, email raw data, attachments-to-things-in-LP) and
> privacy requirements (see the time limited token facility in the lp
> tree today) which need to be taken into account when deciding how to
> externalize our blob storage.

There are certainly limitations to the lpresults.storage implementation
when compared with the requirements of Launchpad. However, some of these
requirements will be shared with the lpresults project sooner or later,
like privacy that will be a must and aggregating that would be a nice
to have. So, I would definately be interested being involved in any
discussion relating to replacing the blob storage in Launchpad.

-- 
Marc Tardif <marc.tardif@xxxxxxxxxxxxx>
Freenode: cr3, Jabber: cr3@xxxxxxxxxx
1024D/72679CAD 09A9 D871 F7C4 A18F AC08 674D 2B73 740C 7267 9CAD



References