← Back to team overview

fuel-dev team mailing list archive

Fuel HA/scallability part 1

 

Hello guys!
I would like to sugest a few changes to Fuel HA/scalability features.

1. [HA] Ensure public/management VIP is running on node where HAproxy is working.

Now if HAproxy dies, VIP is not moved to another node in a cluster.
Simple way to check this is (HAProxy can die after segfault, wrong config,
uninstalled package...):
# echo deadbeef >> /etc/haproxy/haproxy.cfg
# /etc/init.d/haproxy stop

What happens:
- Corosync can not start HAproxy
- Corosync will NOT move VIP to another node
- ALL connections to VIPs got 'connection refused'

What should happen:
- Corosync can not start HAproxy
- Corosync will move VIP to another node

Gerrit change: http://gerrit.vm.mirantis.net:8080/#/c/15617/

Now ocf:mirantis:haproxy check only if haproxy is running, in future we can
implement more sophisticated health checks (backend timeouts, current connections limit...)

2. [HA] Tune TCP keepalive sysctl.

Now we use default ubuntu/centos value (7200+9*75).
This mean kernel will notice ‘silent’ (not RST, not FIN) connection failure after >2h.

From my experience good value for HA systems is 180s:
net.ipv4.tcp_keepalive_time = 120
net.ipv4.tcp_keepalive_probes = 3
net.ipv4.tcp_keepalive_intvl = 20

Gerrit change: http://gerrit.vm.mirantis.net:8080/#/c/15618/

3. [Scalability] shuffle amqp nodes in Openstack configs.

Now each Openstack node (compute, cinder, ...) connect to #1 controller,
after failure it reconnects to #2, after that to #3 controller.

In this case, ALL AMQP traffic is served by #1.

We can shuffle 'rabbit_hosts' on each node.

Gerrit change: http://gerrit.vm.mirantis.net:8080/#/c/15619/


Best Regards,
Bartosz Kupidura

Follow ups