fuel-dev team mailing list archive
-
fuel-dev team
-
Mailing list archive
-
Message #01053
Fuel HA/scallability part 1
Hello guys!
I would like to sugest a few changes to Fuel HA/scalability features.
1. [HA] Ensure public/management VIP is running on node where HAproxy is working.
Now if HAproxy dies, VIP is not moved to another node in a cluster.
Simple way to check this is (HAProxy can die after segfault, wrong config,
uninstalled package...):
# echo deadbeef >> /etc/haproxy/haproxy.cfg
# /etc/init.d/haproxy stop
What happens:
- Corosync can not start HAproxy
- Corosync will NOT move VIP to another node
- ALL connections to VIPs got 'connection refused'
What should happen:
- Corosync can not start HAproxy
- Corosync will move VIP to another node
Gerrit change: http://gerrit.vm.mirantis.net:8080/#/c/15617/
Now ocf:mirantis:haproxy check only if haproxy is running, in future we can
implement more sophisticated health checks (backend timeouts, current connections limit...)
2. [HA] Tune TCP keepalive sysctl.
Now we use default ubuntu/centos value (7200+9*75).
This mean kernel will notice ‘silent’ (not RST, not FIN) connection failure after >2h.
From my experience good value for HA systems is 180s:
net.ipv4.tcp_keepalive_time = 120
net.ipv4.tcp_keepalive_probes = 3
net.ipv4.tcp_keepalive_intvl = 20
Gerrit change: http://gerrit.vm.mirantis.net:8080/#/c/15618/
3. [Scalability] shuffle amqp nodes in Openstack configs.
Now each Openstack node (compute, cinder, ...) connect to #1 controller,
after failure it reconnects to #2, after that to #3 controller.
In this case, ALL AMQP traffic is served by #1.
We can shuffle 'rabbit_hosts' on each node.
Gerrit change: http://gerrit.vm.mirantis.net:8080/#/c/15619/
Best Regards,
Bartosz Kupidura
Follow ups