yahoo-eng-team team mailing list archive

Thread
Date

[Bug 1680773] Re: Migration of a one VM deployed as part of group fails with NoValidHost

To: yahoo-eng-team@xxxxxxxxxxxxxxxxxxx
From: Matt Riedemann <mriedem.os@xxxxxxxxx>
Date: Mon, 26 Nov 2018 15:33:40 -0000
Reply-to: Bug 1680773 <1680773@xxxxxxxxxxxxxxxxxx>
Sender: bounces@xxxxxxxxxxxxx

*** This bug is a duplicate of bug 1718455 ***
    https://bugs.launchpad.net/bugs/1718455

This has been fixed awhile ago, I need to find the duplicate bug.

** This bug has been marked a duplicate of bug 1718455
   [pike] Nova host disable and Live Migrate all instances fail.

-- 
You received this bug notification because you are a member of Yahoo!
Engineering Team, which is subscribed to OpenStack Compute (nova).
https://bugs.launchpad.net/bugs/1680773

Title:
  Migration of a one VM deployed as part of group fails with NoValidHost

Status in OpenStack Compute (nova):
  In Progress

Bug description:
  Unable to migrate a VM that was originally deployed as a part of
  multi-vm deploy request. (eg set number of instances to greater than 1
  in the UI/REST)

  Steps to reproduce:
  - Set up the controller and register compute nodes
  - Now, try a multi-deploy of VMs
  - Once the deploy is successful, try to migrate(untargeted migration) one of the VM deployed as part of group(group here means,  attempting to deploy several VMs with a single request and NOT a server group.)
  - The operation will fail with NoValidHost error at the scheduler

  The issue here is that the request spec the scheduler is getting
  during migration has num_instances greater than 1(or however many were
  initially deployed). This is expected on the initial deploy but is not
  expected on the later migration.

  The problem seems to be related to nova.compute.api._provision_instances().
  In mitaka it was:
                  req_spec = objects.RequestSpec.from_components(context,
                          instance_uuid, boot_meta, instance_type,
                          base_options['numa_topology'],
                          base_options['pci_requests'], filter_properties,
                          instance_group, base_options['availability_zone'])
                  req_spec.create()
  In ocata it is:
                  req_spec = objects.RequestSpec.from_components(context,
                          instance_uuid, boot_meta, instance_type,
                          base_options['numa_topology'],
                          base_options['pci_requests'], filter_properties,
                          instance_group, base_options['availability_zone'],
                          security_groups=security_groups)
                  # NOTE(danms): We need to record num_instances on the request
                  # spec as this is how the conductor knows how many were in this batch.
                  req_spec.num_instances = num_instances
                  req_spec.create()

  In mitaka, on deploy...the RequestSpec was saved to the db and then the num_instances was set to the current object on the fly based on len(num_instances). So on deploy, the scheduler gets an object with num_instances equal to the number deployed, but what got saved in the db was the default value 1. On later migrations, when the new RequestSpec object is created from the db information the object has the default 1 value.
     Now in ocata, the local object's num_instances is updated and then the db object is created/saved. This means the db's copy also has the larger value. When a migration is attempted on one of the VM, the new RequestSpec object created for the migration also shows this larger value causing the migration to fail at scheduler.

To manage notifications about this bug go to:
https://bugs.launchpad.net/nova/+bug/1680773/+subscriptions

References

[Bug 1680773] [NEW] Migration of a one VM deployed as part of group fails with NoValidHost
From: Arun Mani, 2017-04-07