← Back to team overview

yahoo-eng-team team mailing list archive

[Bug 1710141] [NEW] Continual warnings in n-cpu logs about being unable to delete inventory for an ironic node with an instance on it

 

Public bug reported:

Seen here:

http://logs.openstack.org/54/487954/12/check/gate-tempest-dsvm-ironic-
ipa-wholedisk-bios-agent_ipmitool-tinyipa-ubuntu-xenial-
nv/041c03a/logs/screen-n-cpu.txt.gz#_Aug_09_19_31_21_450705

Aug 09 19:31:21.450705 ubuntu-xenial-internap-mtl01-10351013 nova-
compute[19132]: WARNING nova.scheduler.client.report [None req-9db22a6d-
e88a-42b0-879e-8fe523dcc664 None None] [req-
2eead243-5e63-4dd0-a208-4ceed95478ff] We cannot delete inventory 'VCPU,
MEMORY_MB, DISK_GB' for resource provider 38b274b2-2e37-4c23-ad6f-
d86c1f0a0e3f because the inventory is in use.

As soon as an ironic node has an instance built on it, the node state is
ACTIVE which means that this method returns True:

https://github.com/openstack/nova/blob/c2d33c3271370358d48553233b41bf9119d834fb/nova/virt/ironic/driver.py#L176

Saying the node is unavailable, because it's wholly consumed I guess.

That's used here:

https://github.com/openstack/nova/blob/c2d33c3271370358d48553233b41bf9119d834fb/nova/virt/ironic/driver.py#L324

And that's checked here when reporting inventory to the resource
tracker:

https://github.com/openstack/nova/blob/c2d33c3271370358d48553233b41bf9119d834fb/nova/virt/ironic/driver.py#L741

Which then tries to delete the inventory for the node resource provider
in placement, which fails because it's already got an instance running
on it that is consuming inventory:

http://logs.openstack.org/54/487954/12/check/gate-tempest-dsvm-ironic-
ipa-wholedisk-bios-agent_ipmitool-tinyipa-ubuntu-xenial-
nv/041c03a/logs/screen-n-cpu.txt.gz#_Aug_09_19_31_21_450705

Aug 09 19:31:21.391146 ubuntu-xenial-internap-mtl01-10351013 nova-compute[19132]: INFO nova.scheduler.client.report [None req-9db22a6d-e88a-42b0-879e-8fe523dcc664 None None] Compute node 38b274b2-2e37-4c23-ad6f-d86c1f0a0e3f reported no inventory but previous inventory was detected. Deleting existing inventory records.
Aug 09 19:31:21.450705 ubuntu-xenial-internap-mtl01-10351013 nova-compute[19132]: WARNING nova.scheduler.client.report [None req-9db22a6d-e88a-42b0-879e-8fe523dcc664 None None] [req-2eead243-5e63-4dd0-a208-4ceed95478ff] We cannot delete inventory 'VCPU, MEMORY_MB, DISK_GB' for resource provider 38b274b2-2e37-4c23-ad6f-d86c1f0a0e3f because the inventory is in use.

This is also bad because if the node was updated with a resource_class,
that resource class won't be automatically created in Placement here:

https://github.com/openstack/nova/blob/c2d33c3271370358d48553233b41bf9119d834fb/nova/scheduler/client/report.py#L789

Because the driver didn't report it in the get_inventory method.

And that has an impact on this code to migrate
instance.flavor.extra_specs to have custom resource class overrides from
ironic nodes that now have a resource_class set:

https://review.openstack.org/#/c/487954/

So we've got a bit of a chicken and egg problem here.

Manually testing the ironic flavor migration code hits this problem, as
seen here:

http://paste.openstack.org/show/618160/

** Affects: nova
     Importance: High
         Status: Triaged


** Tags: ironic pike-rc-potential placement

** Changed in: nova
       Status: New => Triaged

** Changed in: nova
   Importance: Undecided => High

** Tags added: pike-rc-potential

-- 
You received this bug notification because you are a member of Yahoo!
Engineering Team, which is subscribed to OpenStack Compute (nova).
https://bugs.launchpad.net/bugs/1710141

Title:
  Continual warnings in n-cpu logs about being unable to delete
  inventory for an ironic node with an instance on it

Status in OpenStack Compute (nova):
  Triaged

Bug description:
  Seen here:

  http://logs.openstack.org/54/487954/12/check/gate-tempest-dsvm-ironic-
  ipa-wholedisk-bios-agent_ipmitool-tinyipa-ubuntu-xenial-
  nv/041c03a/logs/screen-n-cpu.txt.gz#_Aug_09_19_31_21_450705

  Aug 09 19:31:21.450705 ubuntu-xenial-internap-mtl01-10351013 nova-
  compute[19132]: WARNING nova.scheduler.client.report [None req-
  9db22a6d-e88a-42b0-879e-8fe523dcc664 None None] [req-
  2eead243-5e63-4dd0-a208-4ceed95478ff] We cannot delete inventory
  'VCPU, MEMORY_MB, DISK_GB' for resource provider 38b274b2-2e37-4c23
  -ad6f-d86c1f0a0e3f because the inventory is in use.

  As soon as an ironic node has an instance built on it, the node state
  is ACTIVE which means that this method returns True:

  https://github.com/openstack/nova/blob/c2d33c3271370358d48553233b41bf9119d834fb/nova/virt/ironic/driver.py#L176

  Saying the node is unavailable, because it's wholly consumed I guess.

  That's used here:

  https://github.com/openstack/nova/blob/c2d33c3271370358d48553233b41bf9119d834fb/nova/virt/ironic/driver.py#L324

  And that's checked here when reporting inventory to the resource
  tracker:

  https://github.com/openstack/nova/blob/c2d33c3271370358d48553233b41bf9119d834fb/nova/virt/ironic/driver.py#L741

  Which then tries to delete the inventory for the node resource
  provider in placement, which fails because it's already got an
  instance running on it that is consuming inventory:

  http://logs.openstack.org/54/487954/12/check/gate-tempest-dsvm-ironic-
  ipa-wholedisk-bios-agent_ipmitool-tinyipa-ubuntu-xenial-
  nv/041c03a/logs/screen-n-cpu.txt.gz#_Aug_09_19_31_21_450705

  Aug 09 19:31:21.391146 ubuntu-xenial-internap-mtl01-10351013 nova-compute[19132]: INFO nova.scheduler.client.report [None req-9db22a6d-e88a-42b0-879e-8fe523dcc664 None None] Compute node 38b274b2-2e37-4c23-ad6f-d86c1f0a0e3f reported no inventory but previous inventory was detected. Deleting existing inventory records.
  Aug 09 19:31:21.450705 ubuntu-xenial-internap-mtl01-10351013 nova-compute[19132]: WARNING nova.scheduler.client.report [None req-9db22a6d-e88a-42b0-879e-8fe523dcc664 None None] [req-2eead243-5e63-4dd0-a208-4ceed95478ff] We cannot delete inventory 'VCPU, MEMORY_MB, DISK_GB' for resource provider 38b274b2-2e37-4c23-ad6f-d86c1f0a0e3f because the inventory is in use.

  This is also bad because if the node was updated with a
  resource_class, that resource class won't be automatically created in
  Placement here:

  https://github.com/openstack/nova/blob/c2d33c3271370358d48553233b41bf9119d834fb/nova/scheduler/client/report.py#L789

  Because the driver didn't report it in the get_inventory method.

  And that has an impact on this code to migrate
  instance.flavor.extra_specs to have custom resource class overrides
  from ironic nodes that now have a resource_class set:

  https://review.openstack.org/#/c/487954/

  So we've got a bit of a chicken and egg problem here.

  Manually testing the ironic flavor migration code hits this problem,
  as seen here:

  http://paste.openstack.org/show/618160/

To manage notifications about this bug go to:
https://bugs.launchpad.net/nova/+bug/1710141/+subscriptions


Follow ups