yahoo-eng-team team mailing list archive
-
yahoo-eng-team team
-
Mailing list archive
-
Message #66623
[Bug 1710141] [NEW] Continual warnings in n-cpu logs about being unable to delete inventory for an ironic node with an instance on it
Public bug reported:
Seen here:
http://logs.openstack.org/54/487954/12/check/gate-tempest-dsvm-ironic-
ipa-wholedisk-bios-agent_ipmitool-tinyipa-ubuntu-xenial-
nv/041c03a/logs/screen-n-cpu.txt.gz#_Aug_09_19_31_21_450705
Aug 09 19:31:21.450705 ubuntu-xenial-internap-mtl01-10351013 nova-
compute[19132]: WARNING nova.scheduler.client.report [None req-9db22a6d-
e88a-42b0-879e-8fe523dcc664 None None] [req-
2eead243-5e63-4dd0-a208-4ceed95478ff] We cannot delete inventory 'VCPU,
MEMORY_MB, DISK_GB' for resource provider 38b274b2-2e37-4c23-ad6f-
d86c1f0a0e3f because the inventory is in use.
As soon as an ironic node has an instance built on it, the node state is
ACTIVE which means that this method returns True:
https://github.com/openstack/nova/blob/c2d33c3271370358d48553233b41bf9119d834fb/nova/virt/ironic/driver.py#L176
Saying the node is unavailable, because it's wholly consumed I guess.
That's used here:
https://github.com/openstack/nova/blob/c2d33c3271370358d48553233b41bf9119d834fb/nova/virt/ironic/driver.py#L324
And that's checked here when reporting inventory to the resource
tracker:
https://github.com/openstack/nova/blob/c2d33c3271370358d48553233b41bf9119d834fb/nova/virt/ironic/driver.py#L741
Which then tries to delete the inventory for the node resource provider
in placement, which fails because it's already got an instance running
on it that is consuming inventory:
http://logs.openstack.org/54/487954/12/check/gate-tempest-dsvm-ironic-
ipa-wholedisk-bios-agent_ipmitool-tinyipa-ubuntu-xenial-
nv/041c03a/logs/screen-n-cpu.txt.gz#_Aug_09_19_31_21_450705
Aug 09 19:31:21.391146 ubuntu-xenial-internap-mtl01-10351013 nova-compute[19132]: INFO nova.scheduler.client.report [None req-9db22a6d-e88a-42b0-879e-8fe523dcc664 None None] Compute node 38b274b2-2e37-4c23-ad6f-d86c1f0a0e3f reported no inventory but previous inventory was detected. Deleting existing inventory records.
Aug 09 19:31:21.450705 ubuntu-xenial-internap-mtl01-10351013 nova-compute[19132]: WARNING nova.scheduler.client.report [None req-9db22a6d-e88a-42b0-879e-8fe523dcc664 None None] [req-2eead243-5e63-4dd0-a208-4ceed95478ff] We cannot delete inventory 'VCPU, MEMORY_MB, DISK_GB' for resource provider 38b274b2-2e37-4c23-ad6f-d86c1f0a0e3f because the inventory is in use.
This is also bad because if the node was updated with a resource_class,
that resource class won't be automatically created in Placement here:
https://github.com/openstack/nova/blob/c2d33c3271370358d48553233b41bf9119d834fb/nova/scheduler/client/report.py#L789
Because the driver didn't report it in the get_inventory method.
And that has an impact on this code to migrate
instance.flavor.extra_specs to have custom resource class overrides from
ironic nodes that now have a resource_class set:
https://review.openstack.org/#/c/487954/
So we've got a bit of a chicken and egg problem here.
Manually testing the ironic flavor migration code hits this problem, as
seen here:
http://paste.openstack.org/show/618160/
** Affects: nova
Importance: High
Status: Triaged
** Tags: ironic pike-rc-potential placement
** Changed in: nova
Status: New => Triaged
** Changed in: nova
Importance: Undecided => High
** Tags added: pike-rc-potential
--
You received this bug notification because you are a member of Yahoo!
Engineering Team, which is subscribed to OpenStack Compute (nova).
https://bugs.launchpad.net/bugs/1710141
Title:
Continual warnings in n-cpu logs about being unable to delete
inventory for an ironic node with an instance on it
Status in OpenStack Compute (nova):
Triaged
Bug description:
Seen here:
http://logs.openstack.org/54/487954/12/check/gate-tempest-dsvm-ironic-
ipa-wholedisk-bios-agent_ipmitool-tinyipa-ubuntu-xenial-
nv/041c03a/logs/screen-n-cpu.txt.gz#_Aug_09_19_31_21_450705
Aug 09 19:31:21.450705 ubuntu-xenial-internap-mtl01-10351013 nova-
compute[19132]: WARNING nova.scheduler.client.report [None req-
9db22a6d-e88a-42b0-879e-8fe523dcc664 None None] [req-
2eead243-5e63-4dd0-a208-4ceed95478ff] We cannot delete inventory
'VCPU, MEMORY_MB, DISK_GB' for resource provider 38b274b2-2e37-4c23
-ad6f-d86c1f0a0e3f because the inventory is in use.
As soon as an ironic node has an instance built on it, the node state
is ACTIVE which means that this method returns True:
https://github.com/openstack/nova/blob/c2d33c3271370358d48553233b41bf9119d834fb/nova/virt/ironic/driver.py#L176
Saying the node is unavailable, because it's wholly consumed I guess.
That's used here:
https://github.com/openstack/nova/blob/c2d33c3271370358d48553233b41bf9119d834fb/nova/virt/ironic/driver.py#L324
And that's checked here when reporting inventory to the resource
tracker:
https://github.com/openstack/nova/blob/c2d33c3271370358d48553233b41bf9119d834fb/nova/virt/ironic/driver.py#L741
Which then tries to delete the inventory for the node resource
provider in placement, which fails because it's already got an
instance running on it that is consuming inventory:
http://logs.openstack.org/54/487954/12/check/gate-tempest-dsvm-ironic-
ipa-wholedisk-bios-agent_ipmitool-tinyipa-ubuntu-xenial-
nv/041c03a/logs/screen-n-cpu.txt.gz#_Aug_09_19_31_21_450705
Aug 09 19:31:21.391146 ubuntu-xenial-internap-mtl01-10351013 nova-compute[19132]: INFO nova.scheduler.client.report [None req-9db22a6d-e88a-42b0-879e-8fe523dcc664 None None] Compute node 38b274b2-2e37-4c23-ad6f-d86c1f0a0e3f reported no inventory but previous inventory was detected. Deleting existing inventory records.
Aug 09 19:31:21.450705 ubuntu-xenial-internap-mtl01-10351013 nova-compute[19132]: WARNING nova.scheduler.client.report [None req-9db22a6d-e88a-42b0-879e-8fe523dcc664 None None] [req-2eead243-5e63-4dd0-a208-4ceed95478ff] We cannot delete inventory 'VCPU, MEMORY_MB, DISK_GB' for resource provider 38b274b2-2e37-4c23-ad6f-d86c1f0a0e3f because the inventory is in use.
This is also bad because if the node was updated with a
resource_class, that resource class won't be automatically created in
Placement here:
https://github.com/openstack/nova/blob/c2d33c3271370358d48553233b41bf9119d834fb/nova/scheduler/client/report.py#L789
Because the driver didn't report it in the get_inventory method.
And that has an impact on this code to migrate
instance.flavor.extra_specs to have custom resource class overrides
from ironic nodes that now have a resource_class set:
https://review.openstack.org/#/c/487954/
So we've got a bit of a chicken and egg problem here.
Manually testing the ironic flavor migration code hits this problem,
as seen here:
http://paste.openstack.org/show/618160/
To manage notifications about this bug go to:
https://bugs.launchpad.net/nova/+bug/1710141/+subscriptions
Follow ups