← Back to team overview

yahoo-eng-team team mailing list archive

[Bug 1941819] Re: A mistack caused by temporary_mutation reentry

 

First I tried to reproduce the issue on master. But it turned out that
instance.flavor is not lazy loaded at all in the nova-compute any more
after [1] added a instance.flavor call as part of the compute API
live_migrate[2] call in the block_accelerator decorator[3]. This run in
the nova-api service a lot before the live migration hits the compute
service.

Right now on master there is no double lazy load on the instance during
the live migration so the problem cannot be reproduced there any more.
It has been fixed by accident since Ussuri. The older than Ussuri nova
branches are in extended maintenance mode [4]. You can still propose
fixes but there won't be any point release out from Stein any more.

I think fix that you can try is to simply trigger the instance.flavor
lazy load before the _live_migration() spawns the new thread.


[1] https://review.opendev.org/c/openstack/nova/+/674726
[2] https://github.com/openstack/nova/blob/edaaa97d9911b849f3b5731746274b44a08ce14c/nova/compute/api.py#L5241
[3] https://github.com/openstack/nova/blob/edaaa97d9911b849f3b5731746274b44a08ce14c/nova/compute/api.py#L328
[4] https://docs.openstack.org/project-team-guide/stable-branches.html

** Also affects: nova/stein
   Importance: Undecided
       Status: New

** Changed in: nova
       Status: New => Invalid

-- 
You received this bug notification because you are a member of Yahoo!
Engineering Team, which is subscribed to OpenStack Compute (nova).
https://bugs.launchpad.net/bugs/1941819

Title:
  A mistack caused by temporary_mutation reentry

Status in OpenStack Compute (nova):
  Invalid
Status in OpenStack Compute (nova) stein series:
  New

Bug description:
  nova.virt.libvirt./driver.py:LibvirtDriver._live_migration() spawn a thread to execute _live_migration_operation(called after threadA).Original thread execute _live_migration_monitor (called after threadB).
  Assignment statement inst_type=instance.flavor call nova/objects/instance.py:obj_load_attr in function _live_migration_operation.
  Function _live_migration_monitor call func _live_migration_data_gb。Assignment statement ram_gb = instance.flavor.memory_mb * units.Mi / units.Gi also call nova/objects/instance.py:obj_load_attr. 
  Function temporary_mutation is called by obj_load_attr. The mistack caused by the temporary_mutation is called by two threads simultaneously。
  Time0: self._context[‘read.deleted’] is ‘no’
  Time1: ThreadA called temporary_mutation, self._context[‘read.deleted’] is assigned a value of ‘yes’. Old value is ‘no’.
  Time2: ThreadB called temporary_mutation, self._context[‘read.deleted’] is assigned a value of ‘yes’. Old value is ‘yes’.
  Time3: ThreadA executing finally code of temporary_mutation, the value of  self._context[‘read.deleted’] is restored to ‘no’.
  Time3: ThreadA executing finally code of temporary_mutation, the value of  self._context[‘read.deleted’] is restored to ‘yes’.
  Result : Two calls to temporary_mutation cause the value of self._context[‘read.deleted’] to change from ‘no’ to ‘yes’. When Source host calling update_available_resource(ctxt) in _post_live_migration, Grabbing all instances assigned to this node will read deleted instances, which is time-consuming.

To manage notifications about this bug go to:
https://bugs.launchpad.net/nova/+bug/1941819/+subscriptions