fuel-dev team mailing list archive

Thread
Date
Re: [openstack-dev] Fuel HA/scallability part 1

To: Bogdan Dobrelya <bdobrelia@xxxxxxxxxxxx>
From: Bartosz Kupidura <bkupidura@xxxxxxxxxxxx>
Date: Fri, 16 May 2014 12:00:22 +0200
Cc: fuel-dev@xxxxxxxxxxxxxxxxxxx, openstack-dev@xxxxxxxxxxxxxxxxxxx
In-reply-to: <5375D700.5090903@mirantis.com>
Wiadomość napisana przez Bogdan Dobrelya <bdobrelia@xxxxxxxxxxxx> w dniu 16 maj 2014, o godz. 11:14:

> On 05/16/2014 10:57 AM, Bartosz Kupidura wrote:
>> Hello guys!
>> I would like to sugest a few changes to Fuel HA/scalability features.
>> 
>> 1. [HA] Ensure public/management VIP is running on node where HAproxy is working.
>> 
>> Now if HAproxy dies, VIP is not moved to another node in a cluster.
>> Simple way to check this is (HAProxy can die after segfault, wrong config,
>> uninstalled package...):
>> # echo deadbeef >> /etc/haproxy/haproxy.cfg
>> # /etc/init.d/haproxy stop
>> 
>> What happens:
>> - Corosync can not start HAproxy
>> - Corosync will NOT move VIP to another node
>> - ALL connections to VIPs got 'connection refused'
>> 
>> What should happen:
>> - Corosync can not start HAproxy
>> - Corosync will move VIP to another node
>> 
>> Gerrit change: http://gerrit.vm.mirantis.net:8080/#/c/15617/

Gerrit change in review.openstack.org: https://review.openstack.org/#/c/93884/

> 
> Hello. Thank you for such a great feedback.
> 
> It would be nice to provide an LP bugs for this patch as well as for all
> patches below and submit them as a public openstack gerrit ones. Please
> don't hesitate to submit, I could also help you to address it.
> 
> As far as we are discussing here the issues related not only to the Fuel
> HA but to the Oslo.messaging, RabbitMQ and Nova configuration, I've
> added an openstack-dev tag as well. Perhaps, some of these changes could
> be also contributed to the Nova and Oslo.
> 
>> 
>> Now ocf:mirantis:haproxy check only if haproxy is running, in future we can
>> implement more sophisticated health checks (backend timeouts, current connections limit...)
>> 
>> 2. [HA] Tune TCP keepalive sysctl.
>> 
>> Now we use default ubuntu/centos value (7200+9*75).
>> This mean kernel will notice ‘silent’ (not RST, not FIN) connection failure after >2h.
> 
> Yes, the defaults are (always) poor :-)
> Here is a list for an existing issues (and patches, if any were
> submitted already) https://etherpad.openstack.org/p/fuel-ha-rabbitmq
> 
> That document also is a kind of a brainstorm, feel free to participate.
> Personally I like the ideas to put rabbit cluster management onto the
> Pacemaker and consider two only rabbits cluster in order to address
> https://bugs.launchpad.net/oslo.messaging/+bug/856764/comments/22, if
> there are indeed such a strange things are happening with cluster size 3+
> 
>> 
>> From my experience good value for HA systems is 180s:
>> net.ipv4.tcp_keepalive_time = 120
>> net.ipv4.tcp_keepalive_probes = 3
>> net.ipv4.tcp_keepalive_intvl = 20
>> 
>> Gerrit change: http://gerrit.vm.mirantis.net:8080/#/c/15618/
> 
> Looks like your choice is better. There is a patch
> https://review.openstack.org/#/c/93815/ related to the
> https://bugs.launchpad.net/oslo.messaging/+bug/856764/comments/19 as
> well. We could discuss which exactly TCPKA parameters are fitting better.
> And one more related patch for RabbitMQ cluster
> https://review.openstack.org/#/c/93411/

We can abbandon my patch, and stick to https://review.openstack.org/#/c/93815/

> 
>> 
>> 3. [Scalability] shuffle amqp nodes in Openstack configs.
>> 
>> Now each Openstack node (compute, cinder, ...) connect to #1 controller,
>> after failure it reconnects to #2, after that to #3 controller.
>> 
>> In this case, ALL AMQP traffic is served by #1.
>> 
>> We can shuffle 'rabbit_hosts' on each node.
>> 
>> Gerrit change: http://gerrit.vm.mirantis.net:8080/#/c/15619/

Gerrit change in review.openstack.org: https://review.openstack.org/#/c/93883/

> 
> That is a brilliant idea. I was investigating the related things
> recently and googled this
> http://rabbitmq.1065348.n5.nabble.com/Correct-way-of-determining-which-node-is-master-td91.html.
> According to this thread, there could be a good performance benefit in
> spreading the queue masters around.
> 
> But actually, we already have it in the recent amqp-nodes patches
> accepted for Fuel 5.0. Let me elaborate.
> We configure rabbit hosts for all controllers as:
> 
> rabbit_nodes = 127.0.0.1:5673, rabbit1:5673, ... rabbitX:5673.
> 
> As you can see, the initial connection point for new queues will always
> be the node itself, hence all master queues would be automatically
> "shuffled" as well.

Yes, this is why for controllers i only remote duplicated entry (we already add 127.0.0.1, so we dont need
$::internal_address in $amqp_nodes.

But for other services (compute, cinder,…), patch add shuffle(). 
Previously rabbit_hosts was always '#controller1, #controller2, #controller3'

> 
> Basically, I see this as a main reason of why we shouldn't use VIP for
> rabbit cluster ever.
> 
>> 
>> 
>> Best Regards,
>> Bartosz Kupidura
>> 
> 
> 
> -- 
> Best regards,
> Bogdan Dobrelya,
> Skype #bogdando_at_yahoo.com
> Irc #bogdando
References

Fuel HA/scallability part 1
From: Bartosz Kupidura, 2014-05-16
Re: [openstack-dev] Fuel HA/scallability part 1
From: Bogdan Dobrelya, 2014-05-16