yahoo-eng-team team mailing list archive
-
yahoo-eng-team team
-
Mailing list archive
-
Message #01524
[Bug 1138184] Re: Scheduler selects deleted baremetal nodes
** Changed in: nova
Status: Fix Committed => Fix Released
--
You received this bug notification because you are a member of Yahoo!
Engineering Team, which is subscribed to OpenStack Compute (nova).
https://bugs.launchpad.net/bugs/1138184
Title:
Scheduler selects deleted baremetal nodes
Status in OpenStack Compute (Nova):
Fix Released
Bug description:
When a baremetal node is deleted, the associated compute_node record
stops receiving periodic updates (but is not actually deleted).
However, the scheduler's ComputeFilter seems to be unaware of this and
continues to try to assign Nova instances to the deleted node.
To reproduce, start devstack with the baremetal driver, enroll a node
(nova baremetal-node-create ...), wait a minute for the PeriodicTask
to update compute, then delete the node (nova baremetal-node-delete
...). Then try to launch an instance (nova boot ...) and observe the
failure.
To see whether this was just a timeout issue, I left devstack running
for many hours after deleting the baremetal node, as can be seen from
the database records below (some columns snipped for brevity).
stack@ubuntu:~/devstack$ mysql nova -e 'select * from compute_nodes\G'
*************************** 1. row ***************************
created_at: 2013-02-28 18:22:38
updated_at: 2013-02-28 18:49:08
deleted_at: NULL
id: 1
service_id: 2
hypervisor_type: baremetal
hypervisor_version: 1
hypervisor_hostname: 653b6c79-35a1-4af8-99a5-edd62fe9625b
deleted: 0
stack@ubuntu:~/devstack$ mysql nova_bm -e 'select * from bm_nodes where uuid="653b6c79-35a1-4af8-99a5-edd62fe9625b"\G'
*************************** 1. row ***************************
created_at: 2013-02-28 18:22:04
updated_at: 2013-02-28 18:48:25
deleted_at: 2013-02-28 21:08:03
deleted: 1
id: 1
instance_uuid: NULL
registration_status: NULL
task_state: deleted
uuid: 653b6c79-35a1-4af8-99a5-edd62fe9625b
instance_name: NULL
stack@ubuntu:~/devstack$ mysql -e 'select now()'
2013-03-01 16:51:34
Here is a snippet from n-schd in devstack when calling "nova boot". What I don't understand, and what seems to be causing this issue, is why the servicegroup API believes this compute_node is still up! Note the 'updated_at' value logged by servicegroup.api is recent, whereas in the db, it is much older.
2013-03-01 16:50:12.271 DEBUG nova.scheduler.filter_scheduler [req-b8afe75b-dbb9-49dc-a643-eb0712cf3e5f demo demo] Attempting to build 1 instance(s) from (pid=8693) schedule_run_instance /opt/stack/nova/nova/scheduler/filter_scheduler.py:75
2013-03-01 16:50:12.280 DEBUG nova.servicegroup.api [req-b8afe75b-dbb9-49dc-a643-eb0712cf3e5f demo demo] Check if the given member [{'binary': u'nova-compute', 'deleted': 0L, 'created_at': datetime.datetime(2013, 2, 28, 17, 40, 45), 'updated_at': datetime.datetime(2013, 3, 1, 16, 50, 2), 'report_count': 8172L, 'topic': u'compute', 'host': u'ubuntu', 'disabled': False, 'deleted_at': None, 'id': 2L}] is part of the ServiceGroup, is up from (pid=8693) service_is_up /opt/stack/nova/nova/servicegroup/api.py:93
2013-03-01 16:50:12.281 DEBUG nova.servicegroup.drivers.db [req-b8afe75b-dbb9-49dc-a643-eb0712cf3e5f demo demo] DB_Driver.is_up last_heartbeat = 2013-03-01 16:50:02 elapsed = 10.281252 from (pid=8693) is_up /opt/stack/nova/nova/servicegroup/drivers/db.py:68
2013-03-01 16:50:12.281 DEBUG nova.scheduler.filters.compute_filter [req-b8afe75b-dbb9-49dc-a643-eb0712cf3e5f demo demo] ComputeFilter: Service {'binary': u'nova-compute', 'deleted': 0L, 'created_at': datetime.datetime(2013, 2, 28, 17, 40, 45), 'updated_at': datetime.datetime(2013, 3, 1, 16, 50, 2), 'report_count': 8172L, 'topic': u'compute', 'host': u'ubuntu', 'disabled': False, 'deleted_at': None, 'id': 2L} is True from (pid=8693) host_passes /opt/stack/nova/nova/scheduler/filters/compute_filter.py:39
To manage notifications about this bug go to:
https://bugs.launchpad.net/nova/+bug/1138184/+subscriptions