yahoo-eng-team team mailing list archive

Thread
Date

[Bug 1793533] Re: Deleting a service with nova-compute binary doesn't remove compute node

To: yahoo-eng-team@xxxxxxxxxxxxxxxxxxx
From: Matt Riedemann <mriedem.os@xxxxxxxxx>
Date: Thu, 20 Sep 2018 13:47:45 -0000
Reply-to: Bug 1793533 <1793533@xxxxxxxxxxxxxxxxxx>
Sender: bounces@xxxxxxxxxxxxx

The related issue is that the scheduler was not filtering out deleted
compute node records when pulling them from the cell DB:

https://github.com/openstack/nova/blob/d87852ae6a1987b6faa3cb5851f9758b47ef4636/nova/objects/compute_node.py#L443

Because ^ that query doesn't filter out deleted records. Granted, if the
resource provider record in placement was cleaned up properly, we
wouldn't have gotten that far anyway, but it's still an issue.

** Changed in: nova
       Status: Invalid => Triaged

** Changed in: nova
   Importance: Undecided => Medium

** Summary changed:

- Deleting a service with nova-compute binary doesn't remove compute node
+ Scheduler doesn't filter out deleted compute node records based on placement RP UUIDs

** Also affects: nova/pike
   Importance: Undecided
       Status: New

** Also affects: nova/rocky
   Importance: Undecided
       Status: New

** Also affects: nova/ocata
   Importance: Undecided
       Status: New

** Also affects: nova/queens
   Importance: Undecided
       Status: New

** Changed in: nova/ocata
       Status: New => Triaged

** Changed in: nova/pike
       Status: New => Triaged

** Changed in: nova/queens
       Status: New => Triaged

** Changed in: nova/rocky
       Status: New => Incomplete

** Changed in: nova/rocky
       Status: Incomplete => Triaged

-- 
You received this bug notification because you are a member of Yahoo!
Engineering Team, which is subscribed to OpenStack Compute (nova).
https://bugs.launchpad.net/bugs/1793533

Title:
  Scheduler doesn't filter out deleted compute node records based on
  placement RP UUIDs

Status in OpenStack Compute (nova):
  Triaged
Status in OpenStack Compute (nova) ocata series:
  Triaged
Status in OpenStack Compute (nova) pike series:
  Triaged
Status in OpenStack Compute (nova) queens series:
  Triaged
Status in OpenStack Compute (nova) rocky series:
  Triaged

Bug description:
  If you are taking a nova-compute service out of service permanently,
  the logical steps would be:

  1) Take down the service
  2) Delete it from the service list (nova service-delete <uuid>)

  However, this does not delete the compute node record which stays
  forever, leading to the scheduler to always complain about it as well:

  2018-09-20 13:15:45.312 131035 WARNING nova.scheduler.host_manager
  [req-c4a7c383-c606-48a7-b870-cc143710114a
  234412d3482f4707877ca696e105bf5b acb15d2ffaae4eda98580c7b874d7f89 -
  default default] No compute service record found for host
  <snip>.vexxhost.net

  https://github.com/openstack/nova/blob/master/nova/scheduler/host_manager.py#L716-L720

  We should be deleting the compute node if a nova-compute binary is
  deleted, or that section should automatically clean up while warning
  (because service records can be rebuilt anyways?)

To manage notifications about this bug go to:
https://bugs.launchpad.net/nova/+bug/1793533/+subscriptions

References

[Bug 1793533] [NEW] Deleting a service with nova-compute binary doesn't remove compute node
From: Mohammed Naser, 2018-09-20