← Back to team overview

launchpad-dev team mailing list archive

Re: micro services: HTTP authentication in the datacentre and default protocol.

 

On Tue, Jun 7, 2011 at 9:48 AM, Jamu Kakar <jkakar@xxxxxxxx> wrote:
> Hi,
>> I'm not sure what we should do queue wise; I'm inclined to stick with
>> Rabbit until its either too much bother, or something massively better
>> (e.g. massively simpler with sameish facilities, or similar complexity
>> and more facilities (like HA)) comes along.
>
> I can't help but wonder if skipping RabbitMQ for the reasons above is
> going too far.  High availability is important, especially for the
> plumbing that connects everything together, but I wonder how much of
> it you really need?  It seems the benefits of a queue based model are
> many and that using HTTP or protobufs will result in an architecture
> that, while it may be (more) highly available, will involve all kinds
> of other deployment, design and performance hassles.

So, for clarity, the reasons to *consider* skipping rabbit are:
 * its a PITA to bring up reliably in a test environment.
 * something else functionally equivalent but simpler comes along
 * something with more facilities and same complexity comes along

Those seem like pretty darn good reasons to consider skipping *any*
commodity item.

> That said, I don't have enough experience with RabbitMQ to say,
> "you're thinking about this wrong", or conversely, "yes, this is a
> serious issue".  I am slightly concerned that overengineering could
> lead to a suboptimal solution.  The impression I get from the outside,
> and maybe I'm totally off the mark, is that Launchpad often chooses a
> hard path to Do Things Right(tm), and then the end result is that
> everything is hard.
>
> I also wonder how many of these services need to talk to each other.
> Maybe you could run many RabbitMQ instances and use them for
> particular tasks?  For example, a bug-focused queue for bug-related
> operations, a code hosting-focused queue for code-related operations,
> etc.  If one of them falls over you end up with degraded service, as
> opposed to losing everything.  I don't know how viable that is, since
> I don't really understand what the topology of micro-services will
> look like.

> Also, is there something that will solve the HA issues you've brought
> up in the pipeline for RabbitMQ?  Maybe it's something worth
> contributing to and/or living without for some time while support for
> these issues gets baked in?
>
> How do other people use RabbitMQ and sleep at night?

Those are good questions to ask. On the HTTP vs Rabbit space, I think
the decoupling between service point and implementation is a useful
thing to have, but if you look at the list of things we need in place
to consider a microservice maintainable -
https://dev.launchpad.net/ArchitectureGuide/ServicesRequirements -
most of those are not impacted by changing the protocol from HTTP+foo
to amqp.

Launchpad has a history of awkard implementation decisions - yes thats
true. However I think many of them are due to the complexity of
analyzing scaling and performance (consider - predict which bottleneck
will we hit next in codehosting: CPU? memory? network bandwidth to the
main host? disk space? fs locks? concurrent IO rate to disks?...) and
then go back 6 years and predict which design will handle all the
bottlenecks gracefully.

It would be easy to throw stones, but we get 20-20 vision in
hindsight. I think that the folk (which includes me for some decisions
- waaaay back :)) did their best to analyse things at the time.
However I think they over-analysed: many problems our past selves
designed for did not occur, and many problems they did not design for
have occurred.

So, I want us to simultaneously:
 - be able to diagnose problems /fast/
 - be able to recover from operational issues rapidly
 - look after our users data
 - be able to modify the design rapidly to deal with the things we
have not designed for.
 - have the lowest implementation cost to meet these four things

To that end, saying 'lets start with rabbit without using its
persistence features':
 - lets us leverage the ops team familiarity with rabbit for
diagnosis, logging, capacity planning
 - and their experience with it for recovering after it breaks
 - avoids concerns about data integrity or storage
 - can be modified easily to permit persistence (add HA) or to move to
a less cumbersome implementation
 - looks pretty cheap to do (we have cookie-cutter deployment
knowledge for http stacks).

-Rob


Follow ups

References