← Back to team overview

yahoo-eng-team team mailing list archive

[Bug 2073365] [NEW] Resizes an instance wrongly reports 409 error when the instances is located on a compute host that has deleted records in nova db services table

 

Public bug reported:

Steps to reproduce
==================
A chronological list of steps which will bring off the
issue you noticed:
* added a KVM Host and remove, repeat this steps 2-3 times, to let the nova services table have records that has marked as deleted.
* add the same KVM Host back.
* Deploy a VM on this KVM Host.
* Select it and trying to doing a resize, then it always reports 409 error.

Expected result
===============
resize should be succeed

Actual result
=============
What happened instead of the expected result?
How did the issue look like?

Environment
===========
1. Openstack version 2023.1(Antelope)

Cause Analysis
==============
resize reporting 409, Service Unavailable,  is because the check_instance_host raises exception.ServiceUnavailable().

    @check_instance_host(check_is_up=True)
    def resize(self, context, instance, flavor_id=None, clean_shutdown=True,
               host_name=None, auto_disk_config=None):
        """Resize (ie, migrate) a running instance.

def check_instance_host(check_is_up=False):
    """Validate the instance.host before performing the operation.

    At a minimum this method will check that the instance.host is set.

    :param check_is_up: If True, check that the instance.host status is UP
        or MAINTENANCE (disabled but not down).
    :raises: InstanceNotReady if the instance.host is not set
    :raises: ServiceUnavailable if check_is_up=True and the instance.host
        compute service status is not UP or MAINTENANCE
    """
    def outer(function):
        @functools.wraps(function)
        def wrapped(self, context, instance, *args, **kwargs):
            if not instance.host:
                raise exception.InstanceNotReady(instance_id=instance.uuid)
            if check_is_up:
                # Make sure the source compute service is not down otherwise we
                # cannot proceed.
                service = [
                    service for service in instance.services
                        if service.binary == 'nova-compute'][0]
                if not self.servicegroup_api.service_is_up(service):
                    # ComputeServiceUnavailable would make more sense here but
                    # we do not want to leak hostnames to end users.
                    raise exception.ServiceUnavailable()
            return function(self, context, instance, *args, **kwargs)
        return wrapped
    return outer
        return host_status

Debugging shows the instance.services included the services records that
already deleted; while it should not.

(Pdb) p instance.services
ServiceList(objects=[Service(a2b948f2-6d82-4d5c-bf67-9d7b45cf9868),Service(50b280ac-9ed6-4bf3-a87f-1f8697d87966),Service(ade10449-24ad-4535-bbaa-38fbb79956fc),Service(dc0fa50d-83ab-4ca3-bdde-68e527a35a8d),Service(2191a868-5e66-4758-b7f1-67b0c4c43294)])

MariaDB [nova]> select * from services where host='kvmcore14';
+---------------------+---------------------+---------------------+----+-----------+--------------+---------+--------------+----------+---------+-----------------+---------------------+-------------+---------+--------------------------------------+
| created_at          | updated_at          | deleted_at          | id | host      | binary       | topic   | report_count | disabled | deleted | disabled_reason | last_seen_up        | forced_down | version | uuid                                 |
+---------------------+---------------------+---------------------+----+-----------+--------------+---------+--------------+----------+---------+-----------------+---------------------+-------------+---------+--------------------------------------+
| 2024-07-02 01:42:07 | 2024-07-02 08:56:51 | 2024-07-02 08:56:58 | 10 | kvmcore14 | nova-compute | compute |          861 |        0 |      10 | NULL            | 2024-07-02 08:56:51 |           0 |      51 | a2b948f2-6d82-4d5c-bf67-9d7b45cf9868 |
| 2024-07-02 09:20:37 | 2024-07-04 01:19:48 | 2024-07-04 01:19:54 | 16 | kvmcore14 | nova-compute | compute |         4742 |        0 |      16 | NULL            | 2024-07-04 01:19:48 |           0 |      51 | 50b280ac-9ed6-4bf3-a87f-1f8697d87966 |
| 2024-07-09 10:09:40 | 2024-07-10 00:47:44 | 2024-07-10 00:48:12 | 17 | kvmcore14 | nova-compute | compute |         1733 |        0 |      17 | NULL            | 2024-07-10 00:47:44 |           0 |      51 | ade10449-24ad-4535-bbaa-38fbb79956fc |
| 2024-07-10 01:04:46 | 2024-07-10 01:04:57 | 2024-07-10 01:05:21 | 19 | kvmcore14 | nova-compute | compute |            1 |        0 |      19 | NULL            | 2024-07-10 01:04:57 |           0 |      51 | dc0fa50d-83ab-4ca3-bdde-68e527a35a8d |
| 2024-07-10 01:22:57 | 2024-07-10 02:13:43 | NULL                | 20 | kvmcore14 | nova-compute | compute |          100 |        0 |       0 | NULL            | 2024-07-10 02:13:43 |           0 |      51 | 2191a868-5e66-4758-b7f1-67b0c4c43294 |
+---------------------+---------------------+---------------------+----+-----------+--------------+---------+--------------+----------+---------+-----------------+---------------------+-------------+---------+--------------------------------------+
5 rows in set (0.002 sec)


The proposed code fix is to add an extra condition Service.deleted == 0 in 
https://github.com/openstack/nova/blob/stable/2023.1/nova/db/main/models.py#L168

** Affects: nova
     Importance: Undecided
         Status: New

** Description changed:

  Steps to reproduce
  ==================
  A chronological list of steps which will bring off the
  issue you noticed:
  * added a KVM Host and remove, repeat this steps 2-3 times, to let the nova services table have records that has marked as deleted.
  * add the same KVM Host back.
  * Deploy a VM on this KVM Host.
  * Select it and trying to doing a resize, then it always reports 409 error.
- 
  
  Expected result
  ===============
  resize should be succeed
  
  Actual result
  =============
  What happened instead of the expected result?
  How did the issue look like?
  
  Environment
  ===========
  1. Openstack version 2023.1(Antelope)
  
- 
  Cause Analysis
  ==============
  resize reporting 409, Service Unavailable,  is because the check_instance_host raises exception.ServiceUnavailable().
  
- 
-     @check_instance_host(check_is_up=True)
-     def resize(self, context, instance, flavor_id=None, clean_shutdown=True,
-                host_name=None, auto_disk_config=None):
-         """Resize (ie, migrate) a running instance.
- 
+     @check_instance_host(check_is_up=True)
+     def resize(self, context, instance, flavor_id=None, clean_shutdown=True,
+                host_name=None, auto_disk_config=None):
+         """Resize (ie, migrate) a running instance.
  
  def check_instance_host(check_is_up=False):
-     """Validate the instance.host before performing the operation.
+     """Validate the instance.host before performing the operation.
  
-     At a minimum this method will check that the instance.host is set.
+     At a minimum this method will check that the instance.host is set.
  
-     :param check_is_up: If True, check that the instance.host status is UP
-         or MAINTENANCE (disabled but not down).
-     :raises: InstanceNotReady if the instance.host is not set
-     :raises: ServiceUnavailable if check_is_up=True and the instance.host
-         compute service status is not UP or MAINTENANCE
-     """
-     def outer(function):
-         @functools.wraps(function)
-         def wrapped(self, context, instance, *args, **kwargs):
-             if not instance.host:
-                 raise exception.InstanceNotReady(instance_id=instance.uuid)
-             if check_is_up:
-                 # Make sure the source compute service is not down otherwise we
-                 # cannot proceed.
-                 service = [
-                     service for service in instance.services
-                         if service.binary == 'nova-compute'][0]
-                 if not self.servicegroup_api.service_is_up(service):
-                     # ComputeServiceUnavailable would make more sense here but
-                     # we do not want to leak hostnames to end users.
-                     raise exception.ServiceUnavailable()
-             return function(self, context, instance, *args, **kwargs)
-         return wrapped
-     return outer
-         return host_status
+     :param check_is_up: If True, check that the instance.host status is UP
+         or MAINTENANCE (disabled but not down).
+     :raises: InstanceNotReady if the instance.host is not set
+     :raises: ServiceUnavailable if check_is_up=True and the instance.host
+         compute service status is not UP or MAINTENANCE
+     """
+     def outer(function):
+         @functools.wraps(function)
+         def wrapped(self, context, instance, *args, **kwargs):
+             if not instance.host:
+                 raise exception.InstanceNotReady(instance_id=instance.uuid)
+             if check_is_up:
+                 # Make sure the source compute service is not down otherwise we
+                 # cannot proceed.
+                 service = [
+                     service for service in instance.services
+                         if service.binary == 'nova-compute'][0]
+                 if not self.servicegroup_api.service_is_up(service):
+                     # ComputeServiceUnavailable would make more sense here but
+                     # we do not want to leak hostnames to end users.
+                     raise exception.ServiceUnavailable()
+             return function(self, context, instance, *args, **kwargs)
+         return wrapped
+     return outer
+         return host_status
  
- 
- Debugging shows the instance.services included the services records that already deleted; why it should not.
+ Debugging shows the instance.services included the services records that
+ already deleted; while it should not.
  
  (Pdb) p instance.services
  ServiceList(objects=[Service(a2b948f2-6d82-4d5c-bf67-9d7b45cf9868),Service(50b280ac-9ed6-4bf3-a87f-1f8697d87966),Service(ade10449-24ad-4535-bbaa-38fbb79956fc),Service(dc0fa50d-83ab-4ca3-bdde-68e527a35a8d),Service(2191a868-5e66-4758-b7f1-67b0c4c43294)])
- 
  
  MariaDB [nova]> select * from services where host='kvmcore14';
  +---------------------+---------------------+---------------------+----+-----------+--------------+---------+--------------+----------+---------+-----------------+---------------------+-------------+---------+--------------------------------------+
  | created_at          | updated_at          | deleted_at          | id | host      | binary       | topic   | report_count | disabled | deleted | disabled_reason | last_seen_up        | forced_down | version | uuid                                 |
  +---------------------+---------------------+---------------------+----+-----------+--------------+---------+--------------+----------+---------+-----------------+---------------------+-------------+---------+--------------------------------------+
  | 2024-07-02 01:42:07 | 2024-07-02 08:56:51 | 2024-07-02 08:56:58 | 10 | kvmcore14 | nova-compute | compute |          861 |        0 |      10 | NULL            | 2024-07-02 08:56:51 |           0 |      51 | a2b948f2-6d82-4d5c-bf67-9d7b45cf9868 |
  | 2024-07-02 09:20:37 | 2024-07-04 01:19:48 | 2024-07-04 01:19:54 | 16 | kvmcore14 | nova-compute | compute |         4742 |        0 |      16 | NULL            | 2024-07-04 01:19:48 |           0 |      51 | 50b280ac-9ed6-4bf3-a87f-1f8697d87966 |
  | 2024-07-09 10:09:40 | 2024-07-10 00:47:44 | 2024-07-10 00:48:12 | 17 | kvmcore14 | nova-compute | compute |         1733 |        0 |      17 | NULL            | 2024-07-10 00:47:44 |           0 |      51 | ade10449-24ad-4535-bbaa-38fbb79956fc |
  | 2024-07-10 01:04:46 | 2024-07-10 01:04:57 | 2024-07-10 01:05:21 | 19 | kvmcore14 | nova-compute | compute |            1 |        0 |      19 | NULL            | 2024-07-10 01:04:57 |           0 |      51 | dc0fa50d-83ab-4ca3-bdde-68e527a35a8d |
  | 2024-07-10 01:22:57 | 2024-07-10 02:13:43 | NULL                | 20 | kvmcore14 | nova-compute | compute |          100 |        0 |       0 | NULL            | 2024-07-10 02:13:43 |           0 |      51 | 2191a868-5e66-4758-b7f1-67b0c4c43294 |
  +---------------------+---------------------+---------------------+----+-----------+--------------+---------+--------------+----------+---------+-----------------+---------------------+-------------+---------+--------------------------------------+
  5 rows in set (0.002 sec)
+ 
+ 
+ The proposed code fix is to add an extra condition Service.deleted == 0 in 
+ https://github.com/openstack/nova/blob/stable/2023.1/nova/db/main/models.py#L168

-- 
You received this bug notification because you are a member of Yahoo!
Engineering Team, which is subscribed to OpenStack Compute (nova).
https://bugs.launchpad.net/bugs/2073365

Title:
  Resizes an instance wrongly reports 409 error when the instances is
  located on a compute host that has deleted records in nova db services
  table

Status in OpenStack Compute (nova):
  New

Bug description:
  Steps to reproduce
  ==================
  A chronological list of steps which will bring off the
  issue you noticed:
  * added a KVM Host and remove, repeat this steps 2-3 times, to let the nova services table have records that has marked as deleted.
  * add the same KVM Host back.
  * Deploy a VM on this KVM Host.
  * Select it and trying to doing a resize, then it always reports 409 error.

  Expected result
  ===============
  resize should be succeed

  Actual result
  =============
  What happened instead of the expected result?
  How did the issue look like?

  Environment
  ===========
  1. Openstack version 2023.1(Antelope)

  Cause Analysis
  ==============
  resize reporting 409, Service Unavailable,  is because the check_instance_host raises exception.ServiceUnavailable().

      @check_instance_host(check_is_up=True)
      def resize(self, context, instance, flavor_id=None, clean_shutdown=True,
                 host_name=None, auto_disk_config=None):
          """Resize (ie, migrate) a running instance.

  def check_instance_host(check_is_up=False):
      """Validate the instance.host before performing the operation.

      At a minimum this method will check that the instance.host is set.

      :param check_is_up: If True, check that the instance.host status is UP
          or MAINTENANCE (disabled but not down).
      :raises: InstanceNotReady if the instance.host is not set
      :raises: ServiceUnavailable if check_is_up=True and the instance.host
          compute service status is not UP or MAINTENANCE
      """
      def outer(function):
          @functools.wraps(function)
          def wrapped(self, context, instance, *args, **kwargs):
              if not instance.host:
                  raise exception.InstanceNotReady(instance_id=instance.uuid)
              if check_is_up:
                  # Make sure the source compute service is not down otherwise we
                  # cannot proceed.
                  service = [
                      service for service in instance.services
                          if service.binary == 'nova-compute'][0]
                  if not self.servicegroup_api.service_is_up(service):
                      # ComputeServiceUnavailable would make more sense here but
                      # we do not want to leak hostnames to end users.
                      raise exception.ServiceUnavailable()
              return function(self, context, instance, *args, **kwargs)
          return wrapped
      return outer
          return host_status

  Debugging shows the instance.services included the services records
  that already deleted; while it should not.

  (Pdb) p instance.services
  ServiceList(objects=[Service(a2b948f2-6d82-4d5c-bf67-9d7b45cf9868),Service(50b280ac-9ed6-4bf3-a87f-1f8697d87966),Service(ade10449-24ad-4535-bbaa-38fbb79956fc),Service(dc0fa50d-83ab-4ca3-bdde-68e527a35a8d),Service(2191a868-5e66-4758-b7f1-67b0c4c43294)])

  MariaDB [nova]> select * from services where host='kvmcore14';
  +---------------------+---------------------+---------------------+----+-----------+--------------+---------+--------------+----------+---------+-----------------+---------------------+-------------+---------+--------------------------------------+
  | created_at          | updated_at          | deleted_at          | id | host      | binary       | topic   | report_count | disabled | deleted | disabled_reason | last_seen_up        | forced_down | version | uuid                                 |
  +---------------------+---------------------+---------------------+----+-----------+--------------+---------+--------------+----------+---------+-----------------+---------------------+-------------+---------+--------------------------------------+
  | 2024-07-02 01:42:07 | 2024-07-02 08:56:51 | 2024-07-02 08:56:58 | 10 | kvmcore14 | nova-compute | compute |          861 |        0 |      10 | NULL            | 2024-07-02 08:56:51 |           0 |      51 | a2b948f2-6d82-4d5c-bf67-9d7b45cf9868 |
  | 2024-07-02 09:20:37 | 2024-07-04 01:19:48 | 2024-07-04 01:19:54 | 16 | kvmcore14 | nova-compute | compute |         4742 |        0 |      16 | NULL            | 2024-07-04 01:19:48 |           0 |      51 | 50b280ac-9ed6-4bf3-a87f-1f8697d87966 |
  | 2024-07-09 10:09:40 | 2024-07-10 00:47:44 | 2024-07-10 00:48:12 | 17 | kvmcore14 | nova-compute | compute |         1733 |        0 |      17 | NULL            | 2024-07-10 00:47:44 |           0 |      51 | ade10449-24ad-4535-bbaa-38fbb79956fc |
  | 2024-07-10 01:04:46 | 2024-07-10 01:04:57 | 2024-07-10 01:05:21 | 19 | kvmcore14 | nova-compute | compute |            1 |        0 |      19 | NULL            | 2024-07-10 01:04:57 |           0 |      51 | dc0fa50d-83ab-4ca3-bdde-68e527a35a8d |
  | 2024-07-10 01:22:57 | 2024-07-10 02:13:43 | NULL                | 20 | kvmcore14 | nova-compute | compute |          100 |        0 |       0 | NULL            | 2024-07-10 02:13:43 |           0 |      51 | 2191a868-5e66-4758-b7f1-67b0c4c43294 |
  +---------------------+---------------------+---------------------+----+-----------+--------------+---------+--------------+----------+---------+-----------------+---------------------+-------------+---------+--------------------------------------+
  5 rows in set (0.002 sec)


  The proposed code fix is to add an extra condition Service.deleted == 0 in 
  https://github.com/openstack/nova/blob/stable/2023.1/nova/db/main/models.py#L168

To manage notifications about this bug go to:
https://bugs.launchpad.net/nova/+bug/2073365/+subscriptions