launchpad-dev team mailing list archive

Thread
Date

Re: performance tuesday - services design progress

To: Martin Pool <mbp@xxxxxxxxxxxxx>
From: Robert Collins <robertc@xxxxxxxxxxxxxxxxx>
Date: Wed, 1 Jun 2011 18:38:34 +1200
Cc: Launchpad Community Development Team <launchpad-dev@xxxxxxxxxxxxxxxxxxx>
In-reply-to: <BANLkTimsdSKU9imqf=HQj5a8Np3tSkSiLg@mail.gmail.com>

On Wed, Jun 1, 2011 at 3:37 PM, Martin Pool <mbp@xxxxxxxxxxxxx> wrote:
> That looks great.
>
>> Access to a particular form of persistence (e.g. the librarians files-on-disk, or a particular postgresql schema) requires being in the same source tree as the definition of that persistence mechanism. This avoids having to synchronise deploys to multiple services when schema changes are occuring: no leaky borders. That is to say that a particular persisted collection should only be written and read by a single service. E.g. only one kind of service will be able to read and write to the postgresql database that contains bugtasks.
>
> This is probably a reasonable principle, I guess.  Other architectures
> of the kind you are describing do relax this; not so much in
> postgresql but through allowing slight version skew between different
> services accessing the same persistence layer.  It almost seems to
> imply all the instances of the service that access the persisted layer
> must be updated strictly in lockstep, which might make deployment
> harder.
>
> Or is this essentially saying all "raw" persistence must be wrapped by
> a service, which is a bottleneck through which everything else
> accesses it.
>
> Queued messages can be considered a form of persistence, but
> presumably we'd be ok with multiple services reading/writing them.
>
> Perhaps this is just a matter of terminology about what is
> "persistence".  Maybe you can make it more clear.
>
> Are you trying to say: when there is a lower-level service (eg psql,
> librarian data files) that requires lock-step updates with a dependent
> services, there should be only one higher-level service, which can
> then serve others with a more loosely coupled interface.

Sure, this is roughly the same thing.

>> If care is taken around how information disclosure is managed, this front end service could dispense with the entire zope security model,
>
> I understand and like the scatter-gather templating thing, but could
> you expand on "dispense with the entire zope security model"?  Do you
> mean the templating layer will just pass-through the user identity,
> and security will be handled entirely on the back end?
>
> I understand public APIs are unconnected from adding internal APIs but
> I wonder how they would be implemented in this model: I suppose a new
> front end service that re-presents some internal APIs?

I'd expect more than simply re-presenting. But yes. Or, alternatively
we productise the internal API and expose it directly. Case-by-case
analysis I expect.

>> XMLRPC: pros: already deploys, batteries included in Python and many other languages. cons: XML, RPC model rather than restful - no opportunity for caching, URLs can be opaque when debugging.
>
> bzr bugs have shown the Python module for it is not that great.

mmm, none of the bugs we've had would affect internal services, FWIW.

>> adhoc restful json based apis. pros: nice to look at by hand, easy to interact with manually. cons: not included in the Python standard library, optimises for things that don't really affect us.
>
> python has a json library built in since 2.6
> <http://docs.python.org/library/json.html> and of course also http.

To the extent that elementree is in python yes, but xmlrpc is more
than a serialisation framework.

> Probably faster (citation needed) than xml.

Measurement needed :)

> Much less annoying for
> shipping around binary or unicode data than xml.  Definitely the
> industry standard.  As a consequence, more likely to fit easily with
> tools we could possibly want to use in the future like document
> databases.  Makes it more easily possible to just directly proxy some
> APIs out to clients (obviously you would want some careful
> firewalling.)

I don't think json makes it any easier (or harder) to supply internal
APIs to clients. FWIW xml is still being added to new services - e.g.
amazon services - today, as is json. I think json is the standard for
javascript APIs for sure, but we're not expecting browsers to consume
internal services - by default. So the thing to do is to get a
-really- low bar for entry for microservices and pay a productising
cost for ones we expose remotely. I'm not arguing for or against
xmlrpc or json+a rest mapper; I'm pointing out that having two
different answers for two different problems isn't a bad thing.

> Beyond caching in the sense of inserting squid between different
> services, I think it's nice to be able to just hang on to the
> representation of a particular object and say "here is that object",
> which works in REST but does not make so much sense if the protocol is
> inherently about rpc.

This might be nice in theory. Currently its disabled for LP apis in
production - too slow, was a pessimisation.

An RPC layer answering questions about an object can cache
representations if it wants to anyhow, so this seems unrelated to the
protocol.

> Other things to possibly add as services would be debug/production
> logging, an audit trail/activity log, and incoming-mail processing
> (which doesn't necessarily need to be a cron script.)  But I guess
> they are just examples and not necessarily exhaustive.

Please add them to the roadmap, they sound very much like interesting
things to build up.

-Rob

Follow ups

Re: performance tuesday - services design progress
From: John Arbash Meinel, 2011-06-01

References

performance tuesday - services design progress
From: Robert Collins, 2011-06-01
Re: performance tuesday - services design progress
From: Martin Pool, 2011-06-01