← Back to team overview

yahoo-eng-team team mailing list archive

[Bug 2018318] [NEW] 'openstack server resize --flavor' should not migrate VMs to another AZ

 

Public bug reported:

Before I start, let me describe the agents involved in the process
migration and/or resize flow of OpenStack (in this case, Nova
component). These are the mapping and interpretation I created while
troubleshooting the reported problem.

- Nova-API: the agent responsible for receiving the HTTP requests (create/resize/migrate) from the OpenStack end-user. It does some basic validation, and then sends a message with the requested command via RPC call to other agents.
- Nova-conductor: the agent responsible to "conduct/guide" the workflow. Nova-conductor will read the commands from the RPC queue and then process the request from Nova-API. It does some extra validation, and for every command (create/resize/migrate), it asks for the scheduler to define the target host for the operation (if the target host was not defined by the user).
- Nova-scheduler: the agent responsible to "schedule" VMs on hosts. It defines where a VM must reside. It receives the "select host request", and processes the algorithms to determine where the VM can be allocated. Before applying the scheduling algorithms, it calls/queries the Placement system to get the possible hosts where VMs might be allocated. I mean, hosts that fit the requested parameters, such as being in a given Cell, availability zone (AZ), having available/free computing resources to support the VM. The call from Nova-scheduler to Placement is an HTTP request.
- Placement: behaves as an inventory system. It tracks where resources are allocated, their characteristics, and providers (hosts/storage/network system) where resources are (can be) allocated. It also has some functions to return the possible hosts where a "request spec" can be fulfilled.
- Nova: the agent responsible to execute/process the commands and implement actions in the hypervisor. 


Then, we have the following workflow from the different processes. 

- migrate: Nova API ->(via RPC call --
nova.conductor.manager.ComputeTaskManager.live_migrate_instance) Nova
Conductor (loads request spec) -> (via RPC call) Nova scheduler -> (via
HTTP) Placement -> (after the placement return) Nova scheduler executes
the filtering of the hosts, based on active filters. - > (return for the
other processes in conductor) -> (via RPC call) Nova to execute the
migration.

- resize: Nova API ->(via RPC call --
nova.conductor.manager.ComputeTaskManager.migrate_server --
_cold_migrate) Nova Conductor (loads request spec) -> (via RPC call)
Nova scheduler -> (via HTTP) Placement -> (after the placement return)
nova scheduler executes the filtering of the hosts, based on active
filters - > (return for the other processes), in Nova conductor -> (RPC
call) Nova to execute the cold migration and start the VM again with the
new computing resource definition

As a side note, this mapping also explains why the "resize" was not
executing the CPU compatibility check that the "migration" is executing
(this is something else that I was checking, but it is worth mentioning
here). The resize is basically a cold migration to a new host, where a
new flavor (definition of the VM) is applied; thus, it does not need to
evaluate CPU feature set compatibility.

The problem we are reporting happens with both "migrate" and "resize"
operations. Therefore, I had to add some logs to see what was going on
there (that whole process is/was "logless"). The issue happens because
Placement always returns all hosts of the environment for a given VM
being migrated (resize is a migration process); this only happens if the
VM is deployed without defining its availability zone in the request
spec.

To be more precise,  Nova-conductor in
`nova.conductor.tasks.live_migrate.LiveMigrationTask._get_request_spec_for_select_destinations`
(https://github.com/openstack/nova/blob/3d83bb3356e10355437851919e161f258cebf761/nova/conductor/tasks/live_migrate.py#L460)
always uses the original request specification, used to deploy the VM,
to find a new host to migrate it to. Therefore, if the VM is deployed to
a specific AZ, it will always send this AZ to Placement (because the AZ
is in the request spec), and Placement will filter out hosts that are
not from that AZ. However, if the VM is deployed without defining the
AZ, Nova will select a host (from an AZ) to deploy it (the VM), and when
migrating the VM, Nova is not trying to find another host in the same AZ
where the VM is already running. It is always behaving as a new
deployment process to select the host.

That raised a question. How is it possible that the create (deploy VM)
process works? It works because of the parameter "cross_az_attach"
configured in Nova. As we can see in
https://github.com/openstack/nova/blob/3d83bb3356e10355437851919e161f258cebf761/nova/virt/block_device.py#L53,
if this parameter is False, when creating a volume, Nova is going to use
the AZ where the VM was scheduled to create the volume in Cinder.
Everything works because the host selection process is executed before
the volume is created in Cinder.

After discovering all that, we were under the impression that OpenStack
was designed to have (require) different Cells to implement multiple
AZs. Therefore, we assumed that the problem was caused due to this
code/line
(https://github.com/openstack/nova/blob/3d83bb3356e10355437851919e161f258cebf761/nova/conductor/tasks/live_migrate.py#L495).
Whenever a request is made to Nova-scheduler, Nova conductor always
sends the current Cell where the VM resides to Nova Placement.
Therefore, if we had multiple AZs, each one of them with different Cell
configurations, we would never have had this situation; that is why we
were thinking that the problem might be a setup one.

However, while discussing, and after checking the documentation
(https://docs.openstack.org/nova/latest/admin/availability-zones.html)
that describes the use of AZs, we concluded that there is an issue with
the code. It should be possible to have multiple AZs sharing the same
Cell. We conclude that similar to what happens when "cross_az_attach" is
False and we deploy a VM, Nova is going to allocate the Cinder volume in
a specific AZ, then when executing migrations this parameter
"cross_az_attach" should be evaluated, and the current AZ of the VM
should be added in the request spec to Placement to list the possible
hosts where the VM can be moved to.

We also discussed if the Placement should be the one doing this check
before returning the possible hosts to migrate the VM to. However, this
does not seem to be in the Placement context/goal/design. Therefore, the
place where we need a patch/fix is in Nova.

Furthermore, the solution proposed
(https://review.opendev.org/c/openstack/nova/+/469675/12/nova/compute/api.py#1173)
is only addressing the cases when the VM is created based on volumes,
and then it sets the AZ of the volumes in the request spec of the VM
(even though the user is not setting that in the request spec of the
VM). That is why everything works for the setups where
cross_az_attach=False. However, if we create a VM based on an image, and
then it (Nova) creates a new volume in Cinder, the AZ is not set in the
request spec, but it (the request spec) is used to execute the first
call to placement to select the hosts, as we described above here.

Following the same process that is used with Nova cells, we propose the
solution for this situation at
https://review.opendev.org/c/openstack/nova/+/864760.

Any other comments and reviews are welcome!

** Affects: nova
     Importance: Undecided
         Status: In Progress

-- 
You received this bug notification because you are a member of Yahoo!
Engineering Team, which is subscribed to OpenStack Compute (nova).
https://bugs.launchpad.net/bugs/2018318

Title:
  'openstack server resize --flavor' should not migrate VMs to another
  AZ

Status in OpenStack Compute (nova):
  In Progress

Bug description:
  Before I start, let me describe the agents involved in the process
  migration and/or resize flow of OpenStack (in this case, Nova
  component). These are the mapping and interpretation I created while
  troubleshooting the reported problem.

  - Nova-API: the agent responsible for receiving the HTTP requests (create/resize/migrate) from the OpenStack end-user. It does some basic validation, and then sends a message with the requested command via RPC call to other agents.
  - Nova-conductor: the agent responsible to "conduct/guide" the workflow. Nova-conductor will read the commands from the RPC queue and then process the request from Nova-API. It does some extra validation, and for every command (create/resize/migrate), it asks for the scheduler to define the target host for the operation (if the target host was not defined by the user).
  - Nova-scheduler: the agent responsible to "schedule" VMs on hosts. It defines where a VM must reside. It receives the "select host request", and processes the algorithms to determine where the VM can be allocated. Before applying the scheduling algorithms, it calls/queries the Placement system to get the possible hosts where VMs might be allocated. I mean, hosts that fit the requested parameters, such as being in a given Cell, availability zone (AZ), having available/free computing resources to support the VM. The call from Nova-scheduler to Placement is an HTTP request.
  - Placement: behaves as an inventory system. It tracks where resources are allocated, their characteristics, and providers (hosts/storage/network system) where resources are (can be) allocated. It also has some functions to return the possible hosts where a "request spec" can be fulfilled.
  - Nova: the agent responsible to execute/process the commands and implement actions in the hypervisor. 

  
  Then, we have the following workflow from the different processes. 

  - migrate: Nova API ->(via RPC call --
  nova.conductor.manager.ComputeTaskManager.live_migrate_instance) Nova
  Conductor (loads request spec) -> (via RPC call) Nova scheduler ->
  (via HTTP) Placement -> (after the placement return) Nova scheduler
  executes the filtering of the hosts, based on active filters. - >
  (return for the other processes in conductor) -> (via RPC call) Nova
  to execute the migration.

  - resize: Nova API ->(via RPC call --
  nova.conductor.manager.ComputeTaskManager.migrate_server --
  _cold_migrate) Nova Conductor (loads request spec) -> (via RPC call)
  Nova scheduler -> (via HTTP) Placement -> (after the placement return)
  nova scheduler executes the filtering of the hosts, based on active
  filters - > (return for the other processes), in Nova conductor ->
  (RPC call) Nova to execute the cold migration and start the VM again
  with the new computing resource definition

  As a side note, this mapping also explains why the "resize" was not
  executing the CPU compatibility check that the "migration" is
  executing (this is something else that I was checking, but it is worth
  mentioning here). The resize is basically a cold migration to a new
  host, where a new flavor (definition of the VM) is applied; thus, it
  does not need to evaluate CPU feature set compatibility.

  The problem we are reporting happens with both "migrate" and "resize"
  operations. Therefore, I had to add some logs to see what was going on
  there (that whole process is/was "logless"). The issue happens because
  Placement always returns all hosts of the environment for a given VM
  being migrated (resize is a migration process); this only happens if
  the VM is deployed without defining its availability zone in the
  request spec.

  To be more precise,  Nova-conductor in
  `nova.conductor.tasks.live_migrate.LiveMigrationTask._get_request_spec_for_select_destinations`
  (https://github.com/openstack/nova/blob/3d83bb3356e10355437851919e161f258cebf761/nova/conductor/tasks/live_migrate.py#L460)
  always uses the original request specification, used to deploy the VM,
  to find a new host to migrate it to. Therefore, if the VM is deployed
  to a specific AZ, it will always send this AZ to Placement (because
  the AZ is in the request spec), and Placement will filter out hosts
  that are not from that AZ. However, if the VM is deployed without
  defining the AZ, Nova will select a host (from an AZ) to deploy it
  (the VM), and when migrating the VM, Nova is not trying to find
  another host in the same AZ where the VM is already running. It is
  always behaving as a new deployment process to select the host.

  That raised a question. How is it possible that the create (deploy VM)
  process works? It works because of the parameter "cross_az_attach"
  configured in Nova. As we can see in
  https://github.com/openstack/nova/blob/3d83bb3356e10355437851919e161f258cebf761/nova/virt/block_device.py#L53,
  if this parameter is False, when creating a volume, Nova is going to
  use the AZ where the VM was scheduled to create the volume in Cinder.
  Everything works because the host selection process is executed before
  the volume is created in Cinder.

  After discovering all that, we were under the impression that
  OpenStack was designed to have (require) different Cells to implement
  multiple AZs. Therefore, we assumed that the problem was caused due to
  this code/line
  (https://github.com/openstack/nova/blob/3d83bb3356e10355437851919e161f258cebf761/nova/conductor/tasks/live_migrate.py#L495).
  Whenever a request is made to Nova-scheduler, Nova conductor always
  sends the current Cell where the VM resides to Nova Placement.
  Therefore, if we had multiple AZs, each one of them with different
  Cell configurations, we would never have had this situation; that is
  why we were thinking that the problem might be a setup one.

  However, while discussing, and after checking the documentation
  (https://docs.openstack.org/nova/latest/admin/availability-zones.html)
  that describes the use of AZs, we concluded that there is an issue
  with the code. It should be possible to have multiple AZs sharing the
  same Cell. We conclude that similar to what happens when
  "cross_az_attach" is False and we deploy a VM, Nova is going to
  allocate the Cinder volume in a specific AZ, then when executing
  migrations this parameter "cross_az_attach" should be evaluated, and
  the current AZ of the VM should be added in the request spec to
  Placement to list the possible hosts where the VM can be moved to.

  We also discussed if the Placement should be the one doing this check
  before returning the possible hosts to migrate the VM to. However,
  this does not seem to be in the Placement context/goal/design.
  Therefore, the place where we need a patch/fix is in Nova.

  Furthermore, the solution proposed
  (https://review.opendev.org/c/openstack/nova/+/469675/12/nova/compute/api.py#1173)
  is only addressing the cases when the VM is created based on volumes,
  and then it sets the AZ of the volumes in the request spec of the VM
  (even though the user is not setting that in the request spec of the
  VM). That is why everything works for the setups where
  cross_az_attach=False. However, if we create a VM based on an image,
  and then it (Nova) creates a new volume in Cinder, the AZ is not set
  in the request spec, but it (the request spec) is used to execute the
  first call to placement to select the hosts, as we described above
  here.

  Following the same process that is used with Nova cells, we propose
  the solution for this situation at
  https://review.opendev.org/c/openstack/nova/+/864760.

  Any other comments and reviews are welcome!

To manage notifications about this bug go to:
https://bugs.launchpad.net/nova/+bug/2018318/+subscriptions



Follow ups