launchpad-dev team mailing list archive
-
launchpad-dev team
-
Mailing list archive
-
Message #07299
Re: micro services: HTTP authentication in the datacentre and default protocol.
On Tue, Jun 7, 2011 at 9:48 AM, Jamu Kakar <jkakar@xxxxxxxx> wrote:
> Hi,
>> I'm not sure what we should do queue wise; I'm inclined to stick with
>> Rabbit until its either too much bother, or something massively better
>> (e.g. massively simpler with sameish facilities, or similar complexity
>> and more facilities (like HA)) comes along.
>
> I can't help but wonder if skipping RabbitMQ for the reasons above is
> going too far. High availability is important, especially for the
> plumbing that connects everything together, but I wonder how much of
> it you really need? It seems the benefits of a queue based model are
> many and that using HTTP or protobufs will result in an architecture
> that, while it may be (more) highly available, will involve all kinds
> of other deployment, design and performance hassles.
So, for clarity, the reasons to *consider* skipping rabbit are:
* its a PITA to bring up reliably in a test environment.
* something else functionally equivalent but simpler comes along
* something with more facilities and same complexity comes along
Those seem like pretty darn good reasons to consider skipping *any*
commodity item.
> That said, I don't have enough experience with RabbitMQ to say,
> "you're thinking about this wrong", or conversely, "yes, this is a
> serious issue". I am slightly concerned that overengineering could
> lead to a suboptimal solution. The impression I get from the outside,
> and maybe I'm totally off the mark, is that Launchpad often chooses a
> hard path to Do Things Right(tm), and then the end result is that
> everything is hard.
>
> I also wonder how many of these services need to talk to each other.
> Maybe you could run many RabbitMQ instances and use them for
> particular tasks? For example, a bug-focused queue for bug-related
> operations, a code hosting-focused queue for code-related operations,
> etc. If one of them falls over you end up with degraded service, as
> opposed to losing everything. I don't know how viable that is, since
> I don't really understand what the topology of micro-services will
> look like.
> Also, is there something that will solve the HA issues you've brought
> up in the pipeline for RabbitMQ? Maybe it's something worth
> contributing to and/or living without for some time while support for
> these issues gets baked in?
>
> How do other people use RabbitMQ and sleep at night?
Those are good questions to ask. On the HTTP vs Rabbit space, I think
the decoupling between service point and implementation is a useful
thing to have, but if you look at the list of things we need in place
to consider a microservice maintainable -
https://dev.launchpad.net/ArchitectureGuide/ServicesRequirements -
most of those are not impacted by changing the protocol from HTTP+foo
to amqp.
Launchpad has a history of awkard implementation decisions - yes thats
true. However I think many of them are due to the complexity of
analyzing scaling and performance (consider - predict which bottleneck
will we hit next in codehosting: CPU? memory? network bandwidth to the
main host? disk space? fs locks? concurrent IO rate to disks?...) and
then go back 6 years and predict which design will handle all the
bottlenecks gracefully.
It would be easy to throw stones, but we get 20-20 vision in
hindsight. I think that the folk (which includes me for some decisions
- waaaay back :)) did their best to analyse things at the time.
However I think they over-analysed: many problems our past selves
designed for did not occur, and many problems they did not design for
have occurred.
So, I want us to simultaneously:
- be able to diagnose problems /fast/
- be able to recover from operational issues rapidly
- look after our users data
- be able to modify the design rapidly to deal with the things we
have not designed for.
- have the lowest implementation cost to meet these four things
To that end, saying 'lets start with rabbit without using its
persistence features':
- lets us leverage the ops team familiarity with rabbit for
diagnosis, logging, capacity planning
- and their experience with it for recovering after it breaks
- avoids concerns about data integrity or storage
- can be modified easily to permit persistence (add HA) or to move to
a less cumbersome implementation
- looks pretty cheap to do (we have cookie-cutter deployment
knowledge for http stacks).
-Rob
Follow ups
References