sts-sponsors team mailing list archive
-
sts-sponsors team
-
Mailing list archive
-
Message #00237
[Bug 1657444] [NEW] Can't failover when rabbit_hosts is configured as 3 hosts
You have been subscribed to a public bug by Felipe Reyes (freyes):
[Impact]
When the heartbeat connection times out it is not treated as a
recoverable error nor attempts to reconnect calling ensure_connection().
This leaves the heartbeat thread attempting to reconnect to the same
host over and over again.
[Test Case]
* deploy openstack
bzr branch lp:openstack-charm-testing
cd openstack-charm-testing
juju deployer -c default.yaml -d -v artful-pike
juju add-unit rabbitmq-server
* Force timeout using iptables in a rabbitmq-server node
sudo iptables -I INPUT -p tcp --dport 5672 -j DROP
Expected result:
once the timeout happens, the heartbeat thread reconnects (picking a new rabbit host if needed).
Actual result:
the heartbeat thread is left in a loop (connect, socket closed, retry, connect...)
[Regression Potential]
Without this patch when the heartbeat connection times out, and it does
not attempt to connect to the next configured rabbit host. So the risk
is that situations where currently the daemons using this library made
it to reconnect to the same host (e.g. the disconnection from the host
is only for a few seconds) with this change they will reconnect to the
next host, so users may see the connections flapping between two (or
more) rabbit hosts.
[Other Info]
I have a rabbitmq cluster of 3 nodes
root@47704165d2bb:/# rabbitmqctl cluster_status
Cluster status of node rabbit@47704165d2bb ...
[{nodes,[{disc,[rabbit@0482398a286e,rabbit@3709521b608a,
rabbit@47704165d2bb]}]},
{running_nodes,[rabbit@0482398a286e,rabbit@3709521b608a,rabbit@47704165d2bb]},
{cluster_name,<<"rabbit@47704165d2bb">>},
{partitions,[]},
{alarms,[{rabbit@0482398a286e,[]},
{rabbit@3709521b608a,[]},
{rabbit@47704165d2bb,[]}]}]
root@47704165d2bb:/# rabbitmqctl list_policies
Listing policies ...
/ ha-all all ^ha\\. {"ha-mode":"all"} 0
My oslo_message client configuration
[oslo_messaging_rabbit]
rabbit_hosts=120.0.0.56:5671,120.0.0.57:5671,120.0.0.55:5671
rabbit_userid=cloud
rabbit_password=cloud
rabbit_ha_queues=True
rabbit_retry_interval=1
rabbit_retry_backoff=2
rabbit_max_retries=0
rabbit_durable_queues=False
When I run "service rabbitmq-server stop" on one node to simulating a
failure, I got following error logs, and the consumer can't failover
from the bad node. It will reconnect the failure node forever instead of
other nodes. "kombu_failover_strategy" is default value of "round-
robin".
2009-01-13 18:32:42.785 17 ERROR oslo.messaging._drivers.impl_rabbit [-] [4e976d46-ceee-4617-b9be-5e4821990738] AMQP server 120.0.0.56:5671 closed the connection. Check login credentials: Socket closed
2009-01-13 18:32:43.819 17 ERROR oslo.messaging._drivers.impl_rabbit [-] Unable to connect to AMQP server on 120.0.0.56:5671 after None tries: Socket closed
2009-01-13 18:32:43.819 17 WARNING oslo.messaging._drivers.impl_rabbit [-] Unexpected error during heartbeart thread processing, retrying...
2009-01-13 18:32:58.874 17 ERROR oslo.messaging._drivers.impl_rabbit [-] [4e976d46-ceee-4617-b9be-5e4821990738] AMQP server 120.0.0.56:5671 closed the connection. Check login credentials: Socket closed
2009-01-13 18:32:59.907 17 ERROR oslo.messaging._drivers.impl_rabbit [-] Unable to connect to AMQP server on 120.0.0.56:5671 after None tries: Socket closed
2009-01-13 18:32:59.907 17 WARNING oslo.messaging._drivers.impl_rabbit [-] Unexpected error during heartbeart thread processing, retrying...
Who can help me. Thanks!
** Affects: cloud-archive
Importance: Undecided
Status: Invalid
** Affects: cloud-archive/pike
Importance: High
Status: Triaged
** Affects: oslo.messaging
Importance: Undecided
Assignee: Vincent Untz (vuntz)
Status: Fix Released
** Affects: python-oslo.messaging (Ubuntu)
Importance: Undecided
Status: Invalid
** Affects: python-oslo.messaging (Ubuntu Artful)
Importance: High
Assignee: Felipe Reyes (freyes)
Status: Triaged
** Tags: in-stable-pike sts
--
Can't failover when rabbit_hosts is configured as 3 hosts
https://bugs.launchpad.net/bugs/1657444
You received this bug notification because you are a member of STS Sponsors, which is subscribed to the bug report.