yahoo-eng-team team mailing list archive
-
yahoo-eng-team team
-
Mailing list archive
-
Message #81740
[Bug 1864665] [NEW] Circular reference error during re-schedule
Public bug reported:
Description
===========
Server cold migration fails after re-schedule.
Steps to reproduce
==================
* create a devstack with two compute hosts with libvirt driver
* set allow_resize_to_same_host=True on both computes
* set up cellsv2 without cell conductor and rabbit separation to allow re-schedule logic to call back to the super conductor / scheduler
* enable NUMATopologyFilter and make sure both computes has NUMA resources
* create a flavor with hw:cpu_policy='dedicated' extra spec
* boot a server with the flavor. Check which compute the server is placed (let's call it host1)
* boot enough servers on host2 so that the next scheduling request could still be fulfilled by both computes but host1 will be preferred by the weighers
* cold migrate the pinned server
Expected result
===============
* scheduler selects host1 first but that host fails with UnableToMigrateToSelf exception as libvirt does not have the capability
* re-schedule happens
* scheduler selects host2 where the server spawns successfully
Actual result
=============
* during the re-schedule when the conductor sends prep_resize RPC to host2 the json serialization of the request spec fails with Circural reference error.
Environment
===========
* two node devstack with libvirt driver
* stable/pike nova. But expected to be reproduced in newer branches but not since stein. See triage part
Triage
======
The json serialization blows up in the migrate conductor task. [1] After debugging I see that the infinit loop happens when jsonutils.to_primitive tries to serialize a VirtCPUTopology instance.
The problematic piece of code has been removed by
I4244f7dd8fe74565180f73684678027067b4506e in Stein.
[1]
https://github.com/openstack/nova/blob/4224a61b4f3a8b910dcaa498f9663479d61a6060/nova/conductor/tasks/migrate.py#L87
** Affects: nova
Importance: Medium
Assignee: Balazs Gibizer (balazs-gibizer)
Status: Invalid
** Affects: nova/ocata
Importance: Undecided
Status: New
** Affects: nova/pike
Importance: Medium
Assignee: Balazs Gibizer (balazs-gibizer)
Status: Triaged
** Affects: nova/queens
Importance: Undecided
Status: New
** Affects: nova/rocky
Importance: Undecided
Status: New
** Tags: stable-only
** Tags added: stable-only
** Changed in: nova
Assignee: (unassigned) => Balazs Gibizer (balazs-gibizer)
** Changed in: nova
Status: New => Triaged
** Changed in: nova
Importance: Undecided => Medium
** Also affects: nova/pike
Importance: Undecided
Status: New
** Also affects: nova/rocky
Importance: Undecided
Status: New
** Also affects: nova/queens
Importance: Undecided
Status: New
** Also affects: nova/ocata
Importance: Undecided
Status: New
** Changed in: nova
Status: Triaged => Invalid
** Changed in: nova/pike
Status: New => Triaged
** Changed in: nova/pike
Importance: Undecided => Medium
** Changed in: nova/pike
Assignee: (unassigned) => Balazs Gibizer (balazs-gibizer)
** Description changed:
Description
===========
Server cold migration fails after re-schedule.
Steps to reproduce
==================
* create a devstack with two compute hosts with libvirt driver
* set allow_resize_to_same_host=True on both computes
* set up cellsv2 without cell conductor and rabbit separation to allow re-schedule logic to call back to the super conductor / scheduler
* enable NUMATopologyFilter and make sure both computes has NUMA resources
* create a flavor with hw:cpu_policy='dedicated' extra spec
- * boot a server with the flavor and ensure that the server. Check which compute the server is placed (let's call it host1)
+ * boot a server with the flavor. Check which compute the server is placed (let's call it host1)
* boot enough servers on host2 so that the next scheduling request could still be fulfilled by both computes but host1 will be preferred by the weighers
* cold migrate the pinned server
Expected result
===============
* scheduler selects host1 first but that host fails with UnableToMigrateToSelf exception as libvirt does not have the capability
* re-schedule happens
* scheduler selects host2 where the server spawns successfully
Actual result
=============
* during the re-schedule when the conductor sends prep_resize RPC to host2 the json serialization of the request spec fails with Circural reference error.
Environment
===========
- * two node devstack with libvirt driver
+ * two node devstack with libvirt driver
* stable/pike nova. But expected to be reproduced in newer branches but not since stein. See triage part
-
Triage
======
The json serialization blows up in the migrate conductor task. [1] After debugging I see that the infinit loop happens when jsonutils.to_primitive tries to serialize a VirtCPUTopology instance.
The problematic piece of code has been removed by
I4244f7dd8fe74565180f73684678027067b4506e in Stein.
[1]
https://github.com/openstack/nova/blob/4224a61b4f3a8b910dcaa498f9663479d61a6060/nova/conductor/tasks/migrate.py#L87
--
You received this bug notification because you are a member of Yahoo!
Engineering Team, which is subscribed to OpenStack Compute (nova).
https://bugs.launchpad.net/bugs/1864665
Title:
Circular reference error during re-schedule
Status in OpenStack Compute (nova):
Invalid
Status in OpenStack Compute (nova) ocata series:
New
Status in OpenStack Compute (nova) pike series:
Triaged
Status in OpenStack Compute (nova) queens series:
New
Status in OpenStack Compute (nova) rocky series:
New
Bug description:
Description
===========
Server cold migration fails after re-schedule.
Steps to reproduce
==================
* create a devstack with two compute hosts with libvirt driver
* set allow_resize_to_same_host=True on both computes
* set up cellsv2 without cell conductor and rabbit separation to allow re-schedule logic to call back to the super conductor / scheduler
* enable NUMATopologyFilter and make sure both computes has NUMA resources
* create a flavor with hw:cpu_policy='dedicated' extra spec
* boot a server with the flavor. Check which compute the server is placed (let's call it host1)
* boot enough servers on host2 so that the next scheduling request could still be fulfilled by both computes but host1 will be preferred by the weighers
* cold migrate the pinned server
Expected result
===============
* scheduler selects host1 first but that host fails with UnableToMigrateToSelf exception as libvirt does not have the capability
* re-schedule happens
* scheduler selects host2 where the server spawns successfully
Actual result
=============
* during the re-schedule when the conductor sends prep_resize RPC to host2 the json serialization of the request spec fails with Circural reference error.
Environment
===========
* two node devstack with libvirt driver
* stable/pike nova. But expected to be reproduced in newer branches but not since stein. See triage part
Triage
======
The json serialization blows up in the migrate conductor task. [1] After debugging I see that the infinit loop happens when jsonutils.to_primitive tries to serialize a VirtCPUTopology instance.
The problematic piece of code has been removed by
I4244f7dd8fe74565180f73684678027067b4506e in Stein.
[1]
https://github.com/openstack/nova/blob/4224a61b4f3a8b910dcaa498f9663479d61a6060/nova/conductor/tasks/migrate.py#L87
To manage notifications about this bug go to:
https://bugs.launchpad.net/nova/+bug/1864665/+subscriptions
Follow ups