yahoo-eng-team team mailing list archive
-
yahoo-eng-team team
-
Mailing list archive
-
Message #88745
[Bug 1917645] Re: Nova can't create instances if RabbitMQ notification cluster is down
I'm setting the nova part of this bug as Invalid as the this is fixed by
an oslo.messaging patch.
** Changed in: nova
Status: Confirmed => Invalid
--
You received this bug notification because you are a member of Yahoo!
Engineering Team, which is subscribed to OpenStack Compute (nova).
https://bugs.launchpad.net/bugs/1917645
Title:
Nova can't create instances if RabbitMQ notification cluster is down
Status in OpenStack Compute (nova):
Invalid
Status in oslo.messaging:
Fix Released
Bug description:
We use independent RabbitMQ clusters for each OpenStack project, Nova
Cells and also for notifications. Recently, I noticed in our test
infrastructure that if the RabbitMQ cluster for notifications has an
outage, Nova can't create new instances. Possibly other operations
will also hang.
Not being able to send a notification/connect to the RabbitMQ cluster
shouldn't stop new instances to be created. (If this is actually an
use-case for some deployments, the operator should have the
possibility to configure it.)
Tested against the master branch.
If the notification RabbitMQ is stooped, when creating an instance,
nova-scheduler is stuck with:
```
Mar 01 21:16:28 devstack nova-scheduler[18384]: DEBUG nova.scheduler.request_filter [None req-353318d1-f4bd-499d-98db-a0919d28ecf7 demo demo] Request filter 'accelerators_filter' took 0.0 seconds {{(pid=18384) wrapper /opt/stack/nova/nova/scheduler/request_filter.py:46}}
Mar 01 21:16:32 devstack nova-scheduler[18384]: ERROR oslo.messaging._drivers.impl_rabbit [None req-353318d1-f4bd-499d-98db-a0919d28ecf7 demo demo] Connection failed: [Errno 113] EHOSTUNREACH (retrying in 2.0 seconds): OSError: [Errno 113] EHOSTUNREACH
Mar 01 21:16:35 devstack nova-scheduler[18384]: ERROR oslo.messaging._drivers.impl_rabbit [None req-353318d1-f4bd-499d-98db-a0919d28ecf7 demo demo] Connection failed: [Errno 113] EHOSTUNREACH (retrying in 4.0 seconds): OSError: [Errno 113] EHOSTUNREACH
Mar 01 21:16:42 devstack nova-scheduler[18384]: ERROR oslo.messaging._drivers.impl_rabbit [None req-353318d1-f4bd-499d-98db-a0919d28ecf7 demo demo] Connection failed: [Errno 113] EHOSTUNREACH (retrying in 6.0 seconds): OSError: [Errno 113] EHOSTUNREACH
Mar 01 21:16:51 devstack nova-scheduler[18384]: ERROR oslo.messaging._drivers.impl_rabbit [None req-353318d1-f4bd-499d-98db-a0919d28ecf7 demo demo] Connection failed: [Errno 113] EHOSTUNREACH (retrying in 8.0 seconds): OSError: [Errno 113] EHOSTUNREACH
Mar 01 21:17:02 devstack nova-scheduler[18384]: ERROR oslo.messaging._drivers.impl_rabbit [None req-353318d1-f4bd-499d-98db-a0919d28ecf7 demo demo] Connection failed: [Errno 113] EHOSTUNREACH (retrying in 10.0 seconds): OSError: [Errno 113] EHOSTUNREACH
(...)
```
Because the notification RabbitMQ cluster is down, Nova gets stuck in:
https://github.com/openstack/nova/blob/5b66caab870558b8a7f7b662c01587b959ad3d41/nova/scheduler/filter_scheduler.py#L85
because oslo messaging never gives up:
https://github.com/openstack/oslo.messaging/blob/5aa645b38b4c1cf08b00e687eb6c7c4b8a0211fc/oslo_messaging/_drivers/impl_rabbit.py#L736
To manage notifications about this bug go to:
https://bugs.launchpad.net/nova/+bug/1917645/+subscriptions
References