yahoo-eng-team team mailing list archive

Thread
Date
[Bug 1610679] Re: race conditions between compute and schedule disk report

To: yahoo-eng-team@xxxxxxxxxxxxxxxxxxx
From: OpenStack Infra <1610679@xxxxxxxxxxxxxxxxxx>
Date: Tue, 11 Oct 2016 16:43:55 -0000
Reply-to: Bug 1610679 <1610679@xxxxxxxxxxxxxxxxxx>
Sender: bounces@xxxxxxxxxxxxx
Reviewed:  https://review.openstack.org/352317
Committed: https://git.openstack.org/cgit/openstack/nova/commit/?id=36a0ba9c8141b445f2c6bfc093fde4cc98d229b2
Submitter: Jenkins
Branch:    master

commit 36a0ba9c8141b445f2c6bfc093fde4cc98d229b2
Author: jichenjc <jichenjc@xxxxxxxxxx>
Date:   Sat Jul 23 23:21:39 2016 +0800

    Avoid update resource if compute node not updated
    
    There's a race condition that compute record might be created
    and updated through objects.compute_node.update_from_virt_driver
    function, but some fields free_disk_gb are not updated yet
    so accessing from scheduler service will find a 'None' value
    and lead to exception
    
    Change-Id: Ia68298a3f01d89bbf302ac734389f7282176c553
    Closes-Bug: 1610679


** Changed in: nova
       Status: In Progress => Fix Released

-- 
You received this bug notification because you are a member of Yahoo!
Engineering Team, which is subscribed to OpenStack Compute (nova).
https://bugs.launchpad.net/bugs/1610679

Title:
  race conditions between compute and schedule disk report

Status in OpenStack Compute (nova):
  Fix Released

Bug description:
  devstack recently built and *not* with in-tree virt layer but I think
  it's not related to it

  
  got following error in conductor log 

    File "/usr/local/lib/python2.7/dist-packages/oslo_messaging/rpc/server.py", line 133, in _process_incoming
      res = self.dispatcher.dispatch(message)

    File "/usr/local/lib/python2.7/dist-packages/oslo_messaging/rpc/dispatcher.py", line 150, in dispatch
      return self._do_dispatch(endpoint, method, ctxt, args)

    File "/usr/local/lib/python2.7/dist-packages/oslo_messaging/rpc/dispatcher.py", line 121, in _do_dispatch
      result = func(ctxt, **new_args)

    File "/usr/local/lib/python2.7/dist-packages/oslo_messaging/rpc/server.py", line 199, in inner
      return func(*args, **kwargs)

    File "/opt/stack/new/nova/nova/scheduler/manager.py", line 104, in select_destinations
      dests = self.driver.select_destinations(ctxt, spec_obj)

    File "/opt/stack/new/nova/nova/scheduler/filter_scheduler.py", line 53, in select_destinations
      selected_hosts = self._schedule(context, spec_obj)

    File "/opt/stack/new/nova/nova/scheduler/filter_scheduler.py", line 104, in _schedule
      hosts = self._get_all_host_states(elevated)

    File "/opt/stack/new/nova/nova/scheduler/filter_scheduler.py", line 145, in _get_all_host_states
      return self.host_manager.get_all_host_states(context)

    File "/opt/stack/new/nova/nova/scheduler/host_manager.py", line 597, in get_all_host_states
      self._get_instance_info(context, compute))

    File "/opt/stack/new/nova/nova/scheduler/host_manager.py", line 180, in update
      return _locked_update(self, compute, service, aggregates, inst_dict)

    File "/usr/local/lib/python2.7/dist-packages/oslo_concurrency/lockutils.py", line 271, in inner
      return f(*args, **kwargs)

    File "/opt/stack/new/nova/nova/scheduler/host_manager.py", line 169, in _locked_update
      self._update_from_compute_node(compute)

    File "/opt/stack/new/nova/nova/scheduler/host_manager.py", line 201, in _update_from_compute_node
      free_disk_mb = free_gb * 1024

  TypeError: unsupported operand type(s) for *: 'NoneType' and 'int'

  
  scheduler log shows following 

  2016-07-27 13:45:00.934 21123 DEBUG oslo_concurrency.lockutils [req-fc02af23-3279-4c93-8732-9b4f9f3a0b8d tempest-ServersTestJSON-2014112056 tempest-ServersTestJSON-2014112056] Lock "(u'opnssi1', u'OPNSSI1')" acquired by "nova.scheduler.host_manager._locked_update" :: waited 0.000s inner /usr/local/lib/python2.7/dist-packages/oslo_concurrency/lockutils.py:270
  2016-07-27 13:45:00.935 21123 DEBUG nova.scheduler.host_manager [req-fc02af23-3279-4c93-8732-9b4f9f3a0b8d tempest-ServersTestJSON-2014112056 tempest-ServersTestJSON-2014112056] Update host state from compute node: ComputeNode(cpu_allocation_ratio=16.0,cpu_info='{"cec_model": "2827", "architecture": "s390x"}',created_at=2016-07-27T13:45:00Z,current_workload=None,deleted=False,deleted_at=None,disk_allocation_ratio=1.0,disk_available_least=432,free_disk_gb=None,free_ram_mb=None,host='opnssi1',host_ip=10.32.201.222,hypervisor_hostname='OPNSSI1',hypervisor_type='zvm',hypervisor_version=630,id=2,local_gb=556,local_gb_used=124,memory_mb=24576,memory_mb_used=0,metrics=None,numa_topology=None,pci_device_pools=None,ram_allocation_ratio=10.0,running_vms=None,service_id=None,stats={},supported_hv_specs=[HVSpec],updated_at=None,uuid=366d9b37-8570-4188-8aab-bc9819cb2312,vcpus=8,vcpus_used=8) _locked_update /opt/stack/new/nova/nova/scheduler/host_manager.py:168
  2016-07-27 13:45:00.936 21123 WARNING nova.scheduler.host_manager [req-fc02af23-3279-4c93-8732-9b4f9f3a0b8d tempest-ServersTestJSON-2014112056 tempest-ServersTestJSON-2014112056] Host OPNSSI1 has more disk space than database expected (432 GB > None GB)


  
  and compute logs shows the compute node was created first time here 

  2016-07-27 13:45:00.396 21125 WARNING nova.compute.resource_tracker
  [req-1ec17be4-cc14-494b-8f88-8bce1999fba1 - -] No compute node record
  for opnssi1:OPNSSI1

  2016-07-27 13:45:01.009 21125 INFO nova.compute.resource_tracker [req-
  1ec17be4-cc14-494b-8f88-8bce1999fba1 - -] Compute_service record
  updated for opnssi1:OPNSSI1

To manage notifications about this bug go to:
https://bugs.launchpad.net/nova/+bug/1610679/+subscriptions
References

[Bug 1610679] [NEW] race conditions between compute and schedule disk report
From: jichenjc, 2016-08-07