← Back to team overview

yahoo-eng-team team mailing list archive

[Bug 1465176] [NEW] Host maintenance mode unavailable in Libvirt.

 

Public bug reported:

When executing the following command on a KVM box with the latest version of Nova:
        nova host-update XXX --maintenance enable
it returns the following error:
        ERROR (HTTPNotImplemented): Virt driver does not implement host maintenance mode.

The libvirt driver has not implemented host maintenance mode.

In a workload cloud, nova compute nodes need to be maintained (such as
system or hardware upgrade) from time to time. But the service or VMs
running on the nodes cannot be paused to minimize the effect to users.
So we need a functionality to put a compute node to maintenance mode so
that no VM will be scheduled to run on this node and no new instance
will be create on this node.

This topic has been discussed since 2013 here
(https://blueprints.launchpad.net/nova/+spec/host-maintenance). In the
previous discussion, Oshrit Feder explained why the current
implementation is not enough, and how to improve it. And after some
discussions, the basic idea came to (IIUC):

1.	CLI (nova host-update –maintenance enable/disable) or API (set_host_maintenance) should not target the request directly to the compute node to be put in maintenance itself. Instead, nova conductor is responsible for the orchestration.
2.	When the operation starts, nova disables the target host (disable nova-compute) so that no new instance will be created on this host.
3.	Live migrate all instances on this host. The nova conductor delivers the request to nova scheduler, and the scheduler picks suitable destination hosts for the migration.
4.	Rebuild all not-running instances on destination hosts if necessary.
5.	Set the host mode to maintenance.

And according to the source code, I think step1 has been implemented by
merging live-migrate and migrate operations, and move the orchestration
to nova conductor, providing all basic functionalities for the other
steps. So to enable KVM host maintenance mode, we need to finish step
2,3,4,5 in libvirt driver.

The basic implementation will be like this: 
1.	When user calls “nova host-update XXX maintenance --enable”, disable nova-compute service on host XXX.
2.	The command request should be handled in nova/virt/libvirt/host.py. So implement host_maintenance_mode() in host.py, and call it through libvirt driver.
3.	Deliver the request to nova conductor, and the conductor is able to list up all the instances on the host.
4.	List up all running instances, find the Instance objects of them, and live migrate them. 
5.	Rebuild all not-running instances on destination hosts.
6.	When all the above have been done successfully, set the host mode to maintenance mode.
7.	Reenable nova-compute service on error.

The first version of patch will come soon.

** Affects: nova
     Importance: Undecided
     Assignee: tangchen (tangchen)
         Status: New

-- 
You received this bug notification because you are a member of Yahoo!
Engineering Team, which is subscribed to OpenStack Compute (nova).
https://bugs.launchpad.net/bugs/1465176

Title:
  Host maintenance mode unavailable in Libvirt.

Status in OpenStack Compute (Nova):
  New

Bug description:
  When executing the following command on a KVM box with the latest version of Nova:
          nova host-update XXX --maintenance enable
  it returns the following error:
          ERROR (HTTPNotImplemented): Virt driver does not implement host maintenance mode.

  The libvirt driver has not implemented host maintenance mode.

  In a workload cloud, nova compute nodes need to be maintained (such as
  system or hardware upgrade) from time to time. But the service or VMs
  running on the nodes cannot be paused to minimize the effect to users.
  So we need a functionality to put a compute node to maintenance mode
  so that no VM will be scheduled to run on this node and no new
  instance will be create on this node.

  This topic has been discussed since 2013 here
  (https://blueprints.launchpad.net/nova/+spec/host-maintenance). In the
  previous discussion, Oshrit Feder explained why the current
  implementation is not enough, and how to improve it. And after some
  discussions, the basic idea came to (IIUC):

  1.	CLI (nova host-update –maintenance enable/disable) or API (set_host_maintenance) should not target the request directly to the compute node to be put in maintenance itself. Instead, nova conductor is responsible for the orchestration.
  2.	When the operation starts, nova disables the target host (disable nova-compute) so that no new instance will be created on this host.
  3.	Live migrate all instances on this host. The nova conductor delivers the request to nova scheduler, and the scheduler picks suitable destination hosts for the migration.
  4.	Rebuild all not-running instances on destination hosts if necessary.
  5.	Set the host mode to maintenance.

  And according to the source code, I think step1 has been implemented
  by merging live-migrate and migrate operations, and move the
  orchestration to nova conductor, providing all basic functionalities
  for the other steps. So to enable KVM host maintenance mode, we need
  to finish step 2,3,4,5 in libvirt driver.

  The basic implementation will be like this: 
  1.	When user calls “nova host-update XXX maintenance --enable”, disable nova-compute service on host XXX.
  2.	The command request should be handled in nova/virt/libvirt/host.py. So implement host_maintenance_mode() in host.py, and call it through libvirt driver.
  3.	Deliver the request to nova conductor, and the conductor is able to list up all the instances on the host.
  4.	List up all running instances, find the Instance objects of them, and live migrate them. 
  5.	Rebuild all not-running instances on destination hosts.
  6.	When all the above have been done successfully, set the host mode to maintenance mode.
  7.	Reenable nova-compute service on error.

  The first version of patch will come soon.

To manage notifications about this bug go to:
https://bugs.launchpad.net/nova/+bug/1465176/+subscriptions


Follow ups

References