fuel-dev team mailing list archive
-
fuel-dev team
-
Mailing list archive
-
Message #01057
[Fuel] HA Fixes Catalogue
Fuelers,
I have compiled a catalogue of all OpenStack HA fixes we have
implemented so far, researched, or need to research and implement.
Here is a summary of where things stand today (I've added the same
list to https://etherpad.openstack.org/p/fuel-ha-rabbitmq):
Applied in 5.0, needs a backport to 4.1.1:
- https://review.openstack.org/78178 ocf-neutron-dhcp-orphan
- https://review.openstack.org/93927 nova-reap-deleted-instance
- https://review.openstack.org/77276 oslo-ccn-handling
- https://review.openstack.org/76686 oslo-kombu-reconnect-delay
Proposed for 5.0:
- https://review.openstack.org/93884 ocf-haproxy-vip-colocate
- https://review.openstack.org/93411 rabbitmq-keepalive
- https://review.openstack.org/93815
kernel-match-tcp-keepalive-to-nova-report-interval
- https://review.openstack.org/93883 rabbitmq-hosts-shuffle
Must be implemented in 5.0:
- python-kombu-and-amqp-upgrade (multiple CCN fixes)
- https://launchpadlibrarian.net/160766270/transport.py.patch
python-amqp-tcp-user-timeout
- https://bugs.launchpad.net/fuel/+bug/1312177
pacemaker-neutron-agent-stickiness
- https://bugs.launchpad.net/fuel/+bug/1297355 ocf-galera-full-stop
- https://bugs.launchpad.net/fuel/+bug/1293680 ocf-galera-take-donor-out
Should be implemented in 5.1:
- https://bugs.launchpad.net/fuel/+bug/1318936 rabbitmq-does-not-restart
Known not to help or cause breakage:
- https://review.openstack.org/34949 rabbitmq-amqp-heartbeat (requires
a heartbeat periodic task in every OpenStack component)
Below is the full catalogue:
pacemaker-haproxy-reload
- applied in 4.0
- https://bugs.launchpad.net/fuel/+bug/1259639
- https://review.openstack.org/61453
ceph-mon-list
- applied in 4.1
- https://bugs.launchpad.net/fuel/+bug/1268579
- https://review.openstack.org/73106
ocf-neutron-agent-pid-matching
- applied in 4.1
- https://bugs.launchpad.net/fuel/+bug/1269334
- https://review.openstack.org/67101
ocf-galera-restart-wait
- applied in 4.1
- https://bugs.launchpad.net/fuel/+bug/1281625
- https://review.openstack.org/74431
pacemaker-fd-leak
- applied in 4.1
- https://bugs.launchpad.net/fuel/+bug/1272840
- https://github.com/ClusterLabs/libqb/commit/b327dbec7380e7de6896f9bb6cb1ca58677f4ed8
pacemaker-broadcast-calculation
- applied in 4.1 # TODO(angdraug): report to upstream
- https://bugs.launchpad.net/fuel/+bug/1277614
- https://review.openstack.org/72438
rabbitmq-hosts
- applied in 4.1
- https://bugs.launchpad.net/fuel/+bug/1285449
- https://review.openstack.org/77409
mysql-read-timeout
- applied in 4.1
- https://bugs.launchpad.net/fuel/+bug/1285449
- https://review.openstack.org/77643
drop-mysql-on-disconnect
- applied in 4.1.1, 5.0 # TODO(angdraug): confirm all fixes are present in 5.0
- https://bugs.launchpad.net/fuel/+bug/1288438
- https://review.openstack.org/81225
haproxy-netns
- applied in 4.1.1, 5.0
- https://review.openstack.org/82518
rabbitmq3
- applied in 4.1.1, 5.0
- depends on rabbitmq3-ha-mode
- https://bugs.launchpad.net/fuel/+bug/1288831
rabbitmq3-ha-mode
- applied in 4.1.1, 5.0
- https://bugs.launchpad.net/fuel/+bug/1296922
- https://review.openstack.org/84707
rabbitmq-init-retry
- applied in 4.1.1, 5.0
- https://bugs.launchpad.net/fuel/+bug/1314617
- https://review.openstack.org/88593
ocf-gratuitous-arp
- applied in 4.1.1, 5.0
- https://bugs.launchpad.net/fuel/+bug/1310676
- https://review.openstack.org/89378
neutron-l3-rootwrap
- applied in 4.1.1, 5.0 # TODO(rmoe): confirm how this is related to
the neutron umask/pid flock bug (0751)
- https://bugs.launchpad.net/fuel/+bug/1310926
- https://bugs.launchpad.net/neutron/+bug/1311804
ocf-neutron-l3-cleanup-ns
- applied in 4.1.1, 5.0
- https://review.openstack.org/89872
ocf-neutron-dhcp-cleanup-ns
- applied in 4.1.1, 5.0
- https://bugs.launchpad.net/fuel/+bug/1285929
- https://review.openstack.org/89557
rabbitmq-fd-ulimit
- applied in 4.1.1, 5.0
- https://bugs.launchpad.net/fuel/+bug/1279594
- https://gerrit.mirantis.com/10566
ocf-neutron-agent-lost-mysql
- applied in 4.1.1, 5.0
- https://bugs.launchpad.net/fuel/+bug/1287716
- https://review.openstack.org/77895
ocf-neutron-dhcp-orphan
- applied in 5.0 # TODO(xenolog): backport to 4.1.1
- https://bugs.launchpad.net/fuel/+bug/1285929
- https://review.openstack.org/78178
nova-reap-deleted-instance
- applied in 5.0, proposed for 4.1.1
- https://review.openstack.org/93927
oslo-ccn-handling
- applied in 5.0 # TODO(angdraug): backport to 4.1.1
- https://review.openstack.org/77276
oslo-kombu-reconnect-delay
- applied in 5.0 # TODO(angdraug): backport to 4.1.1
- https://review.openstack.org/76686
ocf-haproxy-vip-colocate
- https://review.openstack.org/93884
rabbitmq-keepalive
- https://review.openstack.org/93411
kernel-match-tcp-keepalive-to-nova-report-interval
- https://review.openstack.org/93815
rabbitmq-hosts-shuffle
- https://review.openstack.org/93883
python-kombu-and-amqp-upgrade
- # NOTE(angdraug): multiple CCN handling fixes
- # TODO(rmoe): try kombu 3.0.15 and amqp 1.4.5; if breaks, check
whether kombu 2.5.13 and amqp 1.0.13 is enough
python-amqp-tcp-user-timeout
- depends on python-kombu-and-amqp-upgrade
- https://launchpadlibrarian.net/160766270/transport.py.patch
pacemaker-neutron-agent-stickiness
- https://bugs.launchpad.net/fuel/+bug/1312177
ocf-galera-full-stop
- # NOTE(angdraug): requires a rewrite of galera OCF script
- https://bugs.launchpad.net/fuel/+bug/1297355
ocf-galera-take-donor-out
- https://bugs.launchpad.net/fuel/+bug/1293680
rabbitmq-does-not-restart
- NOTE(angdraug): managing rabbitmq by pacemaker is proposed
- https://bugs.launchpad.net/fuel/+bug/1318936
rabbitmq-amqp-heartbeat
- reverted # NOTE(angdraug): requires a heartbeat periodic task in
every OpenStack component
<https://lists.launchpad.net/openstack/msg15111.html>
- https://review.openstack.org/34949
Please respond if you know about any other HA fixes and improvements
that can help avoid breakage of OpenStack, RabbitMQ, and MySQL on
failover.
Thanks,
--
Dmitry Borodaenko