← Back to team overview

yahoo-eng-team team mailing list archive

[Bug 1258625] [NEW] Need some kind of 'auto' boolean column in the Service table

 

Public bug reported:

Bug 1250049 reported a problem with automatically disabling/enabling a
host via the libvirt driver, but rather than fix it the right way, i.e.
add a new column to the Service table which indicates if an admin
intentionally disabled the host or if nova detected a fail and did it
automatically, a hack was done instead to prefix the 'disabled_reason'
with "AUTO:" and build some logic in the driver around that.

The problem with that approach is the ComputeFilter in the scheduler
can't perform any kind of retry logic around that if needed, i.e. bug
1257644.

Right now if the ComputeFilter encounters a disabled host, it just logs
it at debug level and skips it.  If the host was automatically disabled
because of a connection fail, we should at least log that as a warning
in the scheduler (like we do now for hosts that haven't checked in for
awhile) - or possibly build some retry logic around that to make it more
robust in case the connection fail is just a hiccup that quickly
resolves itself.

One could maybe argue that some kind of connection retry logic could be
built into the libvirt driver instead, I wouldn't be against that.

** Affects: nova
     Importance: Undecided
         Status: New


** Tags: db libvirt scheduler

-- 
You received this bug notification because you are a member of Yahoo!
Engineering Team, which is subscribed to OpenStack Compute (nova).
https://bugs.launchpad.net/bugs/1258625

Title:
  Need some kind of 'auto' boolean column in the Service table

Status in OpenStack Compute (Nova):
  New

Bug description:
  Bug 1250049 reported a problem with automatically disabling/enabling a
  host via the libvirt driver, but rather than fix it the right way,
  i.e. add a new column to the Service table which indicates if an admin
  intentionally disabled the host or if nova detected a fail and did it
  automatically, a hack was done instead to prefix the 'disabled_reason'
  with "AUTO:" and build some logic in the driver around that.

  The problem with that approach is the ComputeFilter in the scheduler
  can't perform any kind of retry logic around that if needed, i.e. bug
  1257644.

  Right now if the ComputeFilter encounters a disabled host, it just
  logs it at debug level and skips it.  If the host was automatically
  disabled because of a connection fail, we should at least log that as
  a warning in the scheduler (like we do now for hosts that haven't
  checked in for awhile) - or possibly build some retry logic around
  that to make it more robust in case the connection fail is just a
  hiccup that quickly resolves itself.

  One could maybe argue that some kind of connection retry logic could
  be built into the libvirt driver instead, I wouldn't be against that.

To manage notifications about this bug go to:
https://bugs.launchpad.net/nova/+bug/1258625/+subscriptions


Follow ups

References