yahoo-eng-team team mailing list archive
-
yahoo-eng-team team
-
Mailing list archive
-
Message #95300
[Bug 2097359] [NEW] InstanceNUMACell ovo version is incorrect in the instance_extra table
Public bug reported:
In Victoria the InstanceNUMACell ovo got a new field `pcpuset` (ovo
version 1.5) and also it got a data migration that moves the value of
the pre-existing `cpuset` variable to the new `pcpuset` variable for
instances with cpu_policy `dedicated` during the load of the
InstanceNUMATopogy ovo from the numa_topology field of the
instance_extra table of the cell DB.
If the nova-conductor is Victoria or newer (supporting ovo version 1.6)
and there is a nova-compute that is older than Victoria (supporting ovo
version 1.4) the nova-compute service gets a wrong IntanceNUMACell ovo
via RPC when loading the instance e.g. in _init_instance during the
nova-compute startup.
The root cause of the problem is that the data migration logic only do
the data move between the fields but does not bump the ovo version in
DB. So the DB will contain a data structure in 1.6 format but it has a
version field set to 1.4.
{
"nova_object.name": "InstanceNUMATopology",
"nova_object.namespace": "nova",
"nova_object.version": "1.3",
"nova_object.data": {
"cells": [
{
"nova_object.name": "InstanceNUMACell",
"nova_object.namespace": "nova",
"nova_object.version": "1.4", <------- !!!
"nova_object.data": {
"id": 0,
"cpuset": [],
"pcpuset": [ <------ !!!
0
],
"cpuset_reserved": null,
"memory": 512,
"pagesize": null,
"cpu_pinning_raw": {
"0": 1
},
"cpu_policy": "dedicated",
"cpu_thread_policy": null
},
"nova_object.changes": [
"cpuset_reserved",
"id",
"cpuset",
"cpu_pinning_raw",
"pcpuset"
]
}
],
"emulator_threads_policy": null
},
"nova_object.changes": [
"emulator_threads_policy",
"cells"
]
}
This result in multiple issues:
1. when the nova-compute gets this data it only sees the cpuset field
and not the pcpuset field as it is not part of the ovo 1.4 version it
understands. But because the version field indicates version 1.4 the
compute does not request backlevelling of the ovo from the conductor as
it is not considered a too new ovo. Instead the compute tries to use the
object as is, with the empty cpuset field. If the compute is configured
to restart the instance at nova-compute startup with
resume_guest_state_on_host_boot config, or if the user try to reboot the
instance via the API, then the nova-compute will generate an invalid XML
based on the empyt cpuset field.
<cputune>
<shares>2048</shares>
<emulatorpin cpuset=""/>
</cputune>
Then libvirt rejects such XML with
Failed to start libvirt guest: libvirt.libvirtError: invalid argument:
Failed to parse bitmap ''
as the emulatorpin cpuset cannot be empty. So the reboot of the instance
fails and the instance is put into ERROR state.
2. During 1. the compute sets the instance to ERROR state and saves the
new instance state back to the DB. As part of this it sends back the
incorrect InstanceNUMACell ovo data to the conductor that blindly
persist it into the DB. So the DB will now contain inconsistent data.
The cpuset is empty and the pcpuset field is lost:
{
"nova_object.name": "InstanceNUMATopology",
"nova_object.namespace": "nova",
"nova_object.version": "1.3",
"nova_object.data": {
"cells": [
{
"nova_object.name": "InstanceNUMACell",
"nova_object.namespace": "nova",
"nova_object.version": "1.4",
"nova_object.data": {
"id": 0,
"cpuset": [], <------------------------------- empty!!!
"cpuset_reserved": null,
"memory": 4096,
"pagesize": 1048576,
"cpu_pinning_raw": {
"0": 63,
"1": 7
},
"cpu_policy": "dedicated",
"cpu_thread_policy": null
},
"nova_object.changes": [
"id",
"cpu_pinning_raw",
"cpuset_reserved",
"cpuset",
"pagesize"
]
}
],
"emulator_threads_policy": null
},
"nova_object.changes": [
"emulator_threads_policy",
"cells"
]
}
Any subsequent instance lifecycle operation will fail due to the empyt
cpuset field.
** Affects: nova
Importance: Undecided
Assignee: Balazs Gibizer (balazs-gibizer)
Status: New
** Changed in: nova
Assignee: (unassigned) => Balazs Gibizer (balazs-gibizer)
** Summary changed:
- InstanceNUMATopology ovo version is incorrect in the instance_extra table
+ InstanceNUMACell ovo version is incorrect in the instance_extra table
--
You received this bug notification because you are a member of Yahoo!
Engineering Team, which is subscribed to OpenStack Compute (nova).
https://bugs.launchpad.net/bugs/2097359
Title:
InstanceNUMACell ovo version is incorrect in the instance_extra table
Status in OpenStack Compute (nova):
New
Bug description:
In Victoria the InstanceNUMACell ovo got a new field `pcpuset` (ovo
version 1.5) and also it got a data migration that moves the value of
the pre-existing `cpuset` variable to the new `pcpuset` variable for
instances with cpu_policy `dedicated` during the load of the
InstanceNUMATopogy ovo from the numa_topology field of the
instance_extra table of the cell DB.
If the nova-conductor is Victoria or newer (supporting ovo version
1.6) and there is a nova-compute that is older than Victoria
(supporting ovo version 1.4) the nova-compute service gets a wrong
IntanceNUMACell ovo via RPC when loading the instance e.g. in
_init_instance during the nova-compute startup.
The root cause of the problem is that the data migration logic only do
the data move between the fields but does not bump the ovo version in
DB. So the DB will contain a data structure in 1.6 format but it has a
version field set to 1.4.
{
"nova_object.name": "InstanceNUMATopology",
"nova_object.namespace": "nova",
"nova_object.version": "1.3",
"nova_object.data": {
"cells": [
{
"nova_object.name": "InstanceNUMACell",
"nova_object.namespace": "nova",
"nova_object.version": "1.4", <------- !!!
"nova_object.data": {
"id": 0,
"cpuset": [],
"pcpuset": [ <------ !!!
0
],
"cpuset_reserved": null,
"memory": 512,
"pagesize": null,
"cpu_pinning_raw": {
"0": 1
},
"cpu_policy": "dedicated",
"cpu_thread_policy": null
},
"nova_object.changes": [
"cpuset_reserved",
"id",
"cpuset",
"cpu_pinning_raw",
"pcpuset"
]
}
],
"emulator_threads_policy": null
},
"nova_object.changes": [
"emulator_threads_policy",
"cells"
]
}
This result in multiple issues:
1. when the nova-compute gets this data it only sees the cpuset field
and not the pcpuset field as it is not part of the ovo 1.4 version it
understands. But because the version field indicates version 1.4 the
compute does not request backlevelling of the ovo from the conductor
as it is not considered a too new ovo. Instead the compute tries to
use the object as is, with the empty cpuset field. If the compute is
configured to restart the instance at nova-compute startup with
resume_guest_state_on_host_boot config, or if the user try to reboot
the instance via the API, then the nova-compute will generate an
invalid XML based on the empyt cpuset field.
<cputune>
<shares>2048</shares>
<emulatorpin cpuset=""/>
</cputune>
Then libvirt rejects such XML with
Failed to start libvirt guest: libvirt.libvirtError: invalid argument:
Failed to parse bitmap ''
as the emulatorpin cpuset cannot be empty. So the reboot of the
instance fails and the instance is put into ERROR state.
2. During 1. the compute sets the instance to ERROR state and saves
the new instance state back to the DB. As part of this it sends back
the incorrect InstanceNUMACell ovo data to the conductor that blindly
persist it into the DB. So the DB will now contain inconsistent data.
The cpuset is empty and the pcpuset field is lost:
{
"nova_object.name": "InstanceNUMATopology",
"nova_object.namespace": "nova",
"nova_object.version": "1.3",
"nova_object.data": {
"cells": [
{
"nova_object.name": "InstanceNUMACell",
"nova_object.namespace": "nova",
"nova_object.version": "1.4",
"nova_object.data": {
"id": 0,
"cpuset": [], <------------------------------- empty!!!
"cpuset_reserved": null,
"memory": 4096,
"pagesize": 1048576,
"cpu_pinning_raw": {
"0": 63,
"1": 7
},
"cpu_policy": "dedicated",
"cpu_thread_policy": null
},
"nova_object.changes": [
"id",
"cpu_pinning_raw",
"cpuset_reserved",
"cpuset",
"pagesize"
]
}
],
"emulator_threads_policy": null
},
"nova_object.changes": [
"emulator_threads_policy",
"cells"
]
}
Any subsequent instance lifecycle operation will fail due to the empyt
cpuset field.
To manage notifications about this bug go to:
https://bugs.launchpad.net/nova/+bug/2097359/+subscriptions
Follow ups