yahoo-eng-team team mailing list archive

Thread
Date
[Bug 1138184] Re: Scheduler selects deleted baremetal nodes

To: yahoo-eng-team@xxxxxxxxxxxxxxxxxxx
From: Thierry Carrez <thierry.carrez+lp@xxxxxxxxx>
Date: Wed, 20 Mar 2013 16:07:05 -0000
Reply-to: Bug 1138184 <1138184@xxxxxxxxxxxxxxxxxx>
Sender: bounces@xxxxxxxxxxxxx
** Changed in: nova
       Status: Fix Committed => Fix Released

-- 
You received this bug notification because you are a member of Yahoo!
Engineering Team, which is subscribed to OpenStack Compute (nova).
https://bugs.launchpad.net/bugs/1138184

Title:
  Scheduler selects deleted baremetal nodes

Status in OpenStack Compute (Nova):
  Fix Released

Bug description:
  When a baremetal node is deleted, the associated compute_node record
  stops receiving periodic updates (but is not actually deleted).
  However, the scheduler's ComputeFilter seems to be unaware of this and
  continues to try to assign Nova instances to the deleted node.

  To reproduce, start devstack with the baremetal driver, enroll a node
  (nova baremetal-node-create ...), wait a minute for the PeriodicTask
  to update compute, then delete the node (nova baremetal-node-delete
  ...). Then try to launch an instance (nova boot ...) and observe the
  failure.

  To see whether this was just a timeout issue, I left devstack running
  for many hours after deleting the baremetal node, as can be seen from
  the database records below (some columns snipped for brevity).

  stack@ubuntu:~/devstack$ mysql nova -e 'select * from compute_nodes\G'
  *************************** 1. row ***************************
            created_at: 2013-02-28 18:22:38
            updated_at: 2013-02-28 18:49:08
            deleted_at: NULL
                    id: 1
            service_id: 2
       hypervisor_type: baremetal
    hypervisor_version: 1
   hypervisor_hostname: 653b6c79-35a1-4af8-99a5-edd62fe9625b
               deleted: 0

  stack@ubuntu:~/devstack$ mysql nova_bm -e 'select * from bm_nodes where uuid="653b6c79-35a1-4af8-99a5-edd62fe9625b"\G'
  *************************** 1. row ***************************
           created_at: 2013-02-28 18:22:04
           updated_at: 2013-02-28 18:48:25
           deleted_at: 2013-02-28 21:08:03
              deleted: 1
                   id: 1
        instance_uuid: NULL
  registration_status: NULL
           task_state: deleted
                 uuid: 653b6c79-35a1-4af8-99a5-edd62fe9625b
        instance_name: NULL

  stack@ubuntu:~/devstack$ mysql -e 'select now()'
    2013-03-01 16:51:34 

  
  Here is a snippet from n-schd in devstack when calling "nova boot". What I don't understand, and what seems to be causing this issue, is why the servicegroup API believes this compute_node is still up! Note the 'updated_at' value logged by servicegroup.api is recent, whereas in the db, it is much older.

  2013-03-01 16:50:12.271 DEBUG nova.scheduler.filter_scheduler [req-b8afe75b-dbb9-49dc-a643-eb0712cf3e5f demo demo] Attempting to build 1 instance(s) from (pid=8693) schedule_run_instance /opt/stack/nova/nova/scheduler/filter_scheduler.py:75
  2013-03-01 16:50:12.280 DEBUG nova.servicegroup.api [req-b8afe75b-dbb9-49dc-a643-eb0712cf3e5f demo demo] Check if the given member [{'binary': u'nova-compute', 'deleted': 0L, 'created_at': datetime.datetime(2013, 2, 28, 17, 40, 45), 'updated_at': datetime.datetime(2013, 3, 1, 16, 50, 2), 'report_count': 8172L, 'topic': u'compute', 'host': u'ubuntu', 'disabled': False, 'deleted_at': None, 'id': 2L}] is part of the ServiceGroup, is up from (pid=8693) service_is_up /opt/stack/nova/nova/servicegroup/api.py:93
  2013-03-01 16:50:12.281 DEBUG nova.servicegroup.drivers.db [req-b8afe75b-dbb9-49dc-a643-eb0712cf3e5f demo demo] DB_Driver.is_up last_heartbeat = 2013-03-01 16:50:02 elapsed = 10.281252 from (pid=8693) is_up /opt/stack/nova/nova/servicegroup/drivers/db.py:68
  2013-03-01 16:50:12.281 DEBUG nova.scheduler.filters.compute_filter [req-b8afe75b-dbb9-49dc-a643-eb0712cf3e5f demo demo] ComputeFilter: Service {'binary': u'nova-compute', 'deleted': 0L, 'created_at': datetime.datetime(2013, 2, 28, 17, 40, 45), 'updated_at': datetime.datetime(2013, 3, 1, 16, 50, 2), 'report_count': 8172L, 'topic': u'compute', 'host': u'ubuntu', 'disabled': False, 'deleted_at': None, 'id': 2L} is True from (pid=8693) host_passes /opt/stack/nova/nova/scheduler/filters/compute_filter.py:39

To manage notifications about this bug go to:
https://bugs.launchpad.net/nova/+bug/1138184/+subscriptions