← Back to team overview

yahoo-eng-team team mailing list archive

[Bug 1830747] Re: Error 500 trying to migrate an instance after wrong request_spec

 

Reviewed:  https://review.opendev.org/661786
Committed: https://git.openstack.org/cgit/openstack/nova/commit/?id=da453c2bfe86ab7a825f0aa7ebced15886f7a5fd
Submitter: Zuul
Branch:    master

commit da453c2bfe86ab7a825f0aa7ebced15886f7a5fd
Author: Matt Riedemann <mriedem.os@xxxxxxxxx>
Date:   Tue May 28 11:24:11 2019 -0400

    Workaround missing RequestSpec.instance_group.uuid
    
    It's clear that we could have a RequestSpec.instance_group
    without a uuid field if the InstanceGroup is set from the
    _populate_group_info method which should only be used for
    legacy translation of request specs using legacy filter
    properties dicts.
    
    To workaround the issue, we look for the group scheduler hint
    to get the group uuid before loading it from the DB.
    
    The related functional regression recreate test is updated
    to show this solves the issue.
    
    Change-Id: I20981c987549eec40ad9762e74b0db16e54f4e63
    Closes-Bug: #1830747


** Changed in: nova
       Status: In Progress => Fix Released

-- 
You received this bug notification because you are a member of Yahoo!
Engineering Team, which is subscribed to OpenStack Compute (nova).
https://bugs.launchpad.net/bugs/1830747

Title:
  Error 500 trying to migrate an instance after wrong request_spec

Status in OpenStack Compute (nova):
  Fix Released
Status in OpenStack Compute (nova) ocata series:
  Triaged
Status in OpenStack Compute (nova) pike series:
  Triaged
Status in OpenStack Compute (nova) queens series:
  Triaged
Status in OpenStack Compute (nova) rocky series:
  Triaged
Status in OpenStack Compute (nova) stein series:
  Triaged

Bug description:
  We've started an instance last Wednesday, and the compute where it ran
  failed (maybe hardware issue?). Since the networking looked wrong (ie:
  missing network interfaces), I tried to migrate the instance.

  According to Matt, it looked like the request_spec entry for the
  instance is wrong:

  <mriedem> my guess is something like this happened: 1. create server in a group, 2. cold migrate the server which fails on host A and does a reschedule to host B which maybe also fails (would be good to know if previous cold migration attempts failed with reschedules), 3. try to cold migrate again which fails with the instance_group.uuid thing
  <mriedem> the reschedule might be the key b/c like i said conductor has to rebuild a request spec and i think that's probably where we're doing a partial build of the request spec but missing the group uuid

  Here's what I had in my novaapidb:

  {
    "nova_object.name": "RequestSpec",
    "nova_object.version": "1.11",
    "nova_object.data": {
      "ignore_hosts": null,
      "requested_destination": null,
      "instance_uuid": "2098b550-c749-460a-a44e-5932535993a9",
      "num_instances": 1,
      "image": {
        "nova_object.name": "ImageMeta",
        "nova_object.version": "1.8",
        "nova_object.data": {
          "min_disk": 40,
          "disk_format": "raw",
          "min_ram": 0,
          "container_format": "bare",
          "properties": {
            "nova_object.name": "ImageMetaProps",
            "nova_object.version": "1.20",
            "nova_object.data": {},
            "nova_object.namespace": "nova"
          }
        },
        "nova_object.namespace": "nova",
        "nova_object.changes": [
          "properties",
          "min_ram",
          "container_format",
          "disk_format",
          "min_disk"
        ]
      },
      "availability_zone": "AZ3",
      "flavor": {
        "nova_object.name": "Flavor",
        "nova_object.version": "1.2",
        "nova_object.data": {
          "id": 28,
          "name": "cpu2-ram6-disk40",
          "is_public": true,
          "rxtx_factor": 1,
          "deleted_at": null,
          "root_gb": 40,
          "vcpus": 2,
          "memory_mb": 6144,
          "disabled": false,
          "extra_specs": {},
          "updated_at": null,
          "flavorid": "e29f3ee9-3f07-46d2-b2e2-efa4950edc95",
          "deleted": false,
          "swap": 0,
          "description": null,
          "created_at": "2019-02-07T07:48:21Z",
          "vcpu_weight": 0,
          "ephemeral_gb": 0
        },
        "nova_object.namespace": "nova"
      },
      "force_hosts": null,
      "retry": null,
      "instance_group": {
        "nova_object.name": "InstanceGroup",
        "nova_object.version": "1.11",
        "nova_object.data": {
          "members": null,
          "hosts": null,
          "policy": "anti-affinity"
        },
        "nova_object.namespace": "nova",
        "nova_object.changes": [
          "policy",
          "members",
          "hosts"
        ]
      },
      "scheduler_hints": {
        "group": [
          "295c99ea-2db6-469a-877f-454a3903a8d8"
        ]
      },
      "limits": {
        "nova_object.name": "SchedulerLimits",
        "nova_object.version": "1.0",
        "nova_object.data": {
          "disk_gb": null,
          "numa_topology": null,
          "memory_mb": null,
          "vcpu": null
        },
        "nova_object.namespace": "nova",
        "nova_object.changes": [
          "disk_gb",
          "vcpu",
          "memory_mb",
          "numa_topology"
        ]
      },
      "force_nodes": null,
      "project_id": "1bf4dbb3d2c746658f462bf8e59ec6be",
      "user_id": "255cca4584c24b16a684e3e8322b436b",
      "numa_topology": null,
      "is_bfv": false,
      "pci_requests": {
        "nova_object.name": "InstancePCIRequests",
        "nova_object.version": "1.1",
        "nova_object.data": {
          "instance_uuid": "2098b550-c749-460a-a44e-5932535993a9",
          "requests": []
        },
        "nova_object.namespace": "nova"
      }
    },
    "nova_object.namespace": "nova",
    "nova_object.changes": [
      "ignore_hosts",
      "requested_destination",
      "num_instances",
      "image",
      "availability_zone",
      "instance_uuid",
      "flavor",
      "scheduler_hints",
      "pci_requests",
      "instance_group",
      "limits",
      "project_id",
      "user_id",
      "numa_topology",
      "is_bfv",
      "retry"
    ]
  }

To manage notifications about this bug go to:
https://bugs.launchpad.net/nova/+bug/1830747/+subscriptions


References