fuel-dev team mailing list archive

Thread
Date

Re: Release blocker: Moving management vip breaks rabbitmq sessions

To: Dmitry Borodaenko <dborodaenko@xxxxxxxxxxxx>, "fuel-dev@xxxxxxxxxxxxxxxxxxx" <fuel-dev@xxxxxxxxxxxxxxxxxxx>
From: Bogdan Dobrelya <bdobrelia@xxxxxxxxxxxx>
Date: Fri, 28 Feb 2014 11:09:52 +0200
In-reply-to: <CAM0pNLP5AZ0o0ZvhaDFpSZ4vwZ4FuwLxsm3VhDHCPisp1sv5HA@mail.gmail.com>
Organization: Mirantis
User-agent: Mozilla/5.0 (X11; Linux x86_64; rv:24.0) Gecko/20100101 Thunderbird/24.3.0

On 02/28/2014 05:44 AM, Dmitry Borodaenko wrote:
> Team,
> 
> Me and Ryan have spent all day investigating
> https://bugs.launchpad.net/fuel/+bug/1285449
> 
> What we have found so far confirms that this is a critical bug that
> absolutely must be resolved before 4.1 is released.  I have documented
> our findings in the bug comments, someone please take over the
> investigation when you come to the office tomorrow morning MSK time.
> 
> I have a feeling that once the root cause is found, the fix will be
> low-impact and will involve either change in HAProxy configuration for
> RabbitMQ, a patch/upgrade of HAProxy or kombu, or something similar.
> But first we need to understand what exactly breaks, and why this only
> affects some services and not all of them.
> 
> Thanks,
> 

Here is recent rabbitMQ discussion quote from the
Fuel-conductors-support team skype chat (RU + translation):

Wednesday, February 26, 2014
[4:00:10 PM] Maxim Yefimov: Коллеги, вопрос есть:
(I have a question)

listen rabbitmq-openstack
  bind 192.168.0.2:5672
  balance  roundrobin

  server  node-1 192.168.0.3:5673   check inter 5000 rise 2 fall 3
  server  node-2 192.168.0.4:5673   check inter 5000 rise 2 fall 3  backup
  server  node-3 192.168.0.5:5673   check inter 5000 rise 2 fall 3  backup

[4:01:01 PM] Maxim Yefimov: Зачем одновременно roundrobin и active-passive?
(Why do we use roundrobin and active-passive at once for RabbitMQ?)

[4:01:39 PM] Miroslav Anashkin: Чтобы коннект не рвался
(To make sure the connection wouldn't break)

[4:02:01 PM] Miroslav Anashkin: У кролика кластер существует строго в
виде мастер-слейв
(RabbitMQ clustering is restricted to master-slave only)

[4:02:23 PM] Miroslav Anashkin: Соответственно даже если какая-то нода с
запросом к слейву придет - та его на мастер отправит
(Hence, any node's query to the RabbitMQ slave would have been re-sent
to the master)

[4:02:52 PM] Miroslav Anashkin: Поэтому сделали так чтобы ХАПрокси
всегда всех посылал на одну ноду
(Thats why HAproxy always redirects all queries to the single RabbitMQ node)

And I'm not clear with this explanation, honestly. Why couldn't we make
OS establish direct connections to arbitrary (LB) chosen RabbitMQ nodes
skipping HAproxy at all? (because of this: "any node's query to the
RabbitMQ slave would have been re-sent to the master")

Could that resolve the issue? I think I will investigate this option as
well.


-- 
Best regards,
Bogdan Dobrelya,
Skype #bogdando_at_yahoo.com
Irc #bogdando

Follow ups

Re: Release blocker: Moving management vip breaks rabbitmq sessions
From: Vladimir Kuklin, 2014-02-28

References

Release blocker: Moving management vip breaks rabbitmq sessions
From: Dmitry Borodaenko, 2014-02-28