yahoo-eng-team team mailing list archive
-
yahoo-eng-team team
-
Mailing list archive
-
Message #94260
[Bug 2073365] [NEW] Resizes an instance wrongly reports 409 error when the instances is located on a compute host that has deleted records in nova db services table
Public bug reported:
Steps to reproduce
==================
A chronological list of steps which will bring off the
issue you noticed:
* added a KVM Host and remove, repeat this steps 2-3 times, to let the nova services table have records that has marked as deleted.
* add the same KVM Host back.
* Deploy a VM on this KVM Host.
* Select it and trying to doing a resize, then it always reports 409 error.
Expected result
===============
resize should be succeed
Actual result
=============
What happened instead of the expected result?
How did the issue look like?
Environment
===========
1. Openstack version 2023.1(Antelope)
Cause Analysis
==============
resize reporting 409, Service Unavailable, is because the check_instance_host raises exception.ServiceUnavailable().
@check_instance_host(check_is_up=True)
def resize(self, context, instance, flavor_id=None, clean_shutdown=True,
host_name=None, auto_disk_config=None):
"""Resize (ie, migrate) a running instance.
def check_instance_host(check_is_up=False):
"""Validate the instance.host before performing the operation.
At a minimum this method will check that the instance.host is set.
:param check_is_up: If True, check that the instance.host status is UP
or MAINTENANCE (disabled but not down).
:raises: InstanceNotReady if the instance.host is not set
:raises: ServiceUnavailable if check_is_up=True and the instance.host
compute service status is not UP or MAINTENANCE
"""
def outer(function):
@functools.wraps(function)
def wrapped(self, context, instance, *args, **kwargs):
if not instance.host:
raise exception.InstanceNotReady(instance_id=instance.uuid)
if check_is_up:
# Make sure the source compute service is not down otherwise we
# cannot proceed.
service = [
service for service in instance.services
if service.binary == 'nova-compute'][0]
if not self.servicegroup_api.service_is_up(service):
# ComputeServiceUnavailable would make more sense here but
# we do not want to leak hostnames to end users.
raise exception.ServiceUnavailable()
return function(self, context, instance, *args, **kwargs)
return wrapped
return outer
return host_status
Debugging shows the instance.services included the services records that
already deleted; while it should not.
(Pdb) p instance.services
ServiceList(objects=[Service(a2b948f2-6d82-4d5c-bf67-9d7b45cf9868),Service(50b280ac-9ed6-4bf3-a87f-1f8697d87966),Service(ade10449-24ad-4535-bbaa-38fbb79956fc),Service(dc0fa50d-83ab-4ca3-bdde-68e527a35a8d),Service(2191a868-5e66-4758-b7f1-67b0c4c43294)])
MariaDB [nova]> select * from services where host='kvmcore14';
+---------------------+---------------------+---------------------+----+-----------+--------------+---------+--------------+----------+---------+-----------------+---------------------+-------------+---------+--------------------------------------+
| created_at | updated_at | deleted_at | id | host | binary | topic | report_count | disabled | deleted | disabled_reason | last_seen_up | forced_down | version | uuid |
+---------------------+---------------------+---------------------+----+-----------+--------------+---------+--------------+----------+---------+-----------------+---------------------+-------------+---------+--------------------------------------+
| 2024-07-02 01:42:07 | 2024-07-02 08:56:51 | 2024-07-02 08:56:58 | 10 | kvmcore14 | nova-compute | compute | 861 | 0 | 10 | NULL | 2024-07-02 08:56:51 | 0 | 51 | a2b948f2-6d82-4d5c-bf67-9d7b45cf9868 |
| 2024-07-02 09:20:37 | 2024-07-04 01:19:48 | 2024-07-04 01:19:54 | 16 | kvmcore14 | nova-compute | compute | 4742 | 0 | 16 | NULL | 2024-07-04 01:19:48 | 0 | 51 | 50b280ac-9ed6-4bf3-a87f-1f8697d87966 |
| 2024-07-09 10:09:40 | 2024-07-10 00:47:44 | 2024-07-10 00:48:12 | 17 | kvmcore14 | nova-compute | compute | 1733 | 0 | 17 | NULL | 2024-07-10 00:47:44 | 0 | 51 | ade10449-24ad-4535-bbaa-38fbb79956fc |
| 2024-07-10 01:04:46 | 2024-07-10 01:04:57 | 2024-07-10 01:05:21 | 19 | kvmcore14 | nova-compute | compute | 1 | 0 | 19 | NULL | 2024-07-10 01:04:57 | 0 | 51 | dc0fa50d-83ab-4ca3-bdde-68e527a35a8d |
| 2024-07-10 01:22:57 | 2024-07-10 02:13:43 | NULL | 20 | kvmcore14 | nova-compute | compute | 100 | 0 | 0 | NULL | 2024-07-10 02:13:43 | 0 | 51 | 2191a868-5e66-4758-b7f1-67b0c4c43294 |
+---------------------+---------------------+---------------------+----+-----------+--------------+---------+--------------+----------+---------+-----------------+---------------------+-------------+---------+--------------------------------------+
5 rows in set (0.002 sec)
The proposed code fix is to add an extra condition Service.deleted == 0 in
https://github.com/openstack/nova/blob/stable/2023.1/nova/db/main/models.py#L168
** Affects: nova
Importance: Undecided
Status: New
** Description changed:
Steps to reproduce
==================
A chronological list of steps which will bring off the
issue you noticed:
* added a KVM Host and remove, repeat this steps 2-3 times, to let the nova services table have records that has marked as deleted.
* add the same KVM Host back.
* Deploy a VM on this KVM Host.
* Select it and trying to doing a resize, then it always reports 409 error.
-
Expected result
===============
resize should be succeed
Actual result
=============
What happened instead of the expected result?
How did the issue look like?
Environment
===========
1. Openstack version 2023.1(Antelope)
-
Cause Analysis
==============
resize reporting 409, Service Unavailable, is because the check_instance_host raises exception.ServiceUnavailable().
-
- @check_instance_host(check_is_up=True)
- def resize(self, context, instance, flavor_id=None, clean_shutdown=True,
- host_name=None, auto_disk_config=None):
- """Resize (ie, migrate) a running instance.
-
+ @check_instance_host(check_is_up=True)
+ def resize(self, context, instance, flavor_id=None, clean_shutdown=True,
+ host_name=None, auto_disk_config=None):
+ """Resize (ie, migrate) a running instance.
def check_instance_host(check_is_up=False):
- """Validate the instance.host before performing the operation.
+ """Validate the instance.host before performing the operation.
- At a minimum this method will check that the instance.host is set.
+ At a minimum this method will check that the instance.host is set.
- :param check_is_up: If True, check that the instance.host status is UP
- or MAINTENANCE (disabled but not down).
- :raises: InstanceNotReady if the instance.host is not set
- :raises: ServiceUnavailable if check_is_up=True and the instance.host
- compute service status is not UP or MAINTENANCE
- """
- def outer(function):
- @functools.wraps(function)
- def wrapped(self, context, instance, *args, **kwargs):
- if not instance.host:
- raise exception.InstanceNotReady(instance_id=instance.uuid)
- if check_is_up:
- # Make sure the source compute service is not down otherwise we
- # cannot proceed.
- service = [
- service for service in instance.services
- if service.binary == 'nova-compute'][0]
- if not self.servicegroup_api.service_is_up(service):
- # ComputeServiceUnavailable would make more sense here but
- # we do not want to leak hostnames to end users.
- raise exception.ServiceUnavailable()
- return function(self, context, instance, *args, **kwargs)
- return wrapped
- return outer
- return host_status
+ :param check_is_up: If True, check that the instance.host status is UP
+ or MAINTENANCE (disabled but not down).
+ :raises: InstanceNotReady if the instance.host is not set
+ :raises: ServiceUnavailable if check_is_up=True and the instance.host
+ compute service status is not UP or MAINTENANCE
+ """
+ def outer(function):
+ @functools.wraps(function)
+ def wrapped(self, context, instance, *args, **kwargs):
+ if not instance.host:
+ raise exception.InstanceNotReady(instance_id=instance.uuid)
+ if check_is_up:
+ # Make sure the source compute service is not down otherwise we
+ # cannot proceed.
+ service = [
+ service for service in instance.services
+ if service.binary == 'nova-compute'][0]
+ if not self.servicegroup_api.service_is_up(service):
+ # ComputeServiceUnavailable would make more sense here but
+ # we do not want to leak hostnames to end users.
+ raise exception.ServiceUnavailable()
+ return function(self, context, instance, *args, **kwargs)
+ return wrapped
+ return outer
+ return host_status
-
- Debugging shows the instance.services included the services records that already deleted; why it should not.
+ Debugging shows the instance.services included the services records that
+ already deleted; while it should not.
(Pdb) p instance.services
ServiceList(objects=[Service(a2b948f2-6d82-4d5c-bf67-9d7b45cf9868),Service(50b280ac-9ed6-4bf3-a87f-1f8697d87966),Service(ade10449-24ad-4535-bbaa-38fbb79956fc),Service(dc0fa50d-83ab-4ca3-bdde-68e527a35a8d),Service(2191a868-5e66-4758-b7f1-67b0c4c43294)])
-
MariaDB [nova]> select * from services where host='kvmcore14';
+---------------------+---------------------+---------------------+----+-----------+--------------+---------+--------------+----------+---------+-----------------+---------------------+-------------+---------+--------------------------------------+
| created_at | updated_at | deleted_at | id | host | binary | topic | report_count | disabled | deleted | disabled_reason | last_seen_up | forced_down | version | uuid |
+---------------------+---------------------+---------------------+----+-----------+--------------+---------+--------------+----------+---------+-----------------+---------------------+-------------+---------+--------------------------------------+
| 2024-07-02 01:42:07 | 2024-07-02 08:56:51 | 2024-07-02 08:56:58 | 10 | kvmcore14 | nova-compute | compute | 861 | 0 | 10 | NULL | 2024-07-02 08:56:51 | 0 | 51 | a2b948f2-6d82-4d5c-bf67-9d7b45cf9868 |
| 2024-07-02 09:20:37 | 2024-07-04 01:19:48 | 2024-07-04 01:19:54 | 16 | kvmcore14 | nova-compute | compute | 4742 | 0 | 16 | NULL | 2024-07-04 01:19:48 | 0 | 51 | 50b280ac-9ed6-4bf3-a87f-1f8697d87966 |
| 2024-07-09 10:09:40 | 2024-07-10 00:47:44 | 2024-07-10 00:48:12 | 17 | kvmcore14 | nova-compute | compute | 1733 | 0 | 17 | NULL | 2024-07-10 00:47:44 | 0 | 51 | ade10449-24ad-4535-bbaa-38fbb79956fc |
| 2024-07-10 01:04:46 | 2024-07-10 01:04:57 | 2024-07-10 01:05:21 | 19 | kvmcore14 | nova-compute | compute | 1 | 0 | 19 | NULL | 2024-07-10 01:04:57 | 0 | 51 | dc0fa50d-83ab-4ca3-bdde-68e527a35a8d |
| 2024-07-10 01:22:57 | 2024-07-10 02:13:43 | NULL | 20 | kvmcore14 | nova-compute | compute | 100 | 0 | 0 | NULL | 2024-07-10 02:13:43 | 0 | 51 | 2191a868-5e66-4758-b7f1-67b0c4c43294 |
+---------------------+---------------------+---------------------+----+-----------+--------------+---------+--------------+----------+---------+-----------------+---------------------+-------------+---------+--------------------------------------+
5 rows in set (0.002 sec)
+
+
+ The proposed code fix is to add an extra condition Service.deleted == 0 in
+ https://github.com/openstack/nova/blob/stable/2023.1/nova/db/main/models.py#L168
--
You received this bug notification because you are a member of Yahoo!
Engineering Team, which is subscribed to OpenStack Compute (nova).
https://bugs.launchpad.net/bugs/2073365
Title:
Resizes an instance wrongly reports 409 error when the instances is
located on a compute host that has deleted records in nova db services
table
Status in OpenStack Compute (nova):
New
Bug description:
Steps to reproduce
==================
A chronological list of steps which will bring off the
issue you noticed:
* added a KVM Host and remove, repeat this steps 2-3 times, to let the nova services table have records that has marked as deleted.
* add the same KVM Host back.
* Deploy a VM on this KVM Host.
* Select it and trying to doing a resize, then it always reports 409 error.
Expected result
===============
resize should be succeed
Actual result
=============
What happened instead of the expected result?
How did the issue look like?
Environment
===========
1. Openstack version 2023.1(Antelope)
Cause Analysis
==============
resize reporting 409, Service Unavailable, is because the check_instance_host raises exception.ServiceUnavailable().
@check_instance_host(check_is_up=True)
def resize(self, context, instance, flavor_id=None, clean_shutdown=True,
host_name=None, auto_disk_config=None):
"""Resize (ie, migrate) a running instance.
def check_instance_host(check_is_up=False):
"""Validate the instance.host before performing the operation.
At a minimum this method will check that the instance.host is set.
:param check_is_up: If True, check that the instance.host status is UP
or MAINTENANCE (disabled but not down).
:raises: InstanceNotReady if the instance.host is not set
:raises: ServiceUnavailable if check_is_up=True and the instance.host
compute service status is not UP or MAINTENANCE
"""
def outer(function):
@functools.wraps(function)
def wrapped(self, context, instance, *args, **kwargs):
if not instance.host:
raise exception.InstanceNotReady(instance_id=instance.uuid)
if check_is_up:
# Make sure the source compute service is not down otherwise we
# cannot proceed.
service = [
service for service in instance.services
if service.binary == 'nova-compute'][0]
if not self.servicegroup_api.service_is_up(service):
# ComputeServiceUnavailable would make more sense here but
# we do not want to leak hostnames to end users.
raise exception.ServiceUnavailable()
return function(self, context, instance, *args, **kwargs)
return wrapped
return outer
return host_status
Debugging shows the instance.services included the services records
that already deleted; while it should not.
(Pdb) p instance.services
ServiceList(objects=[Service(a2b948f2-6d82-4d5c-bf67-9d7b45cf9868),Service(50b280ac-9ed6-4bf3-a87f-1f8697d87966),Service(ade10449-24ad-4535-bbaa-38fbb79956fc),Service(dc0fa50d-83ab-4ca3-bdde-68e527a35a8d),Service(2191a868-5e66-4758-b7f1-67b0c4c43294)])
MariaDB [nova]> select * from services where host='kvmcore14';
+---------------------+---------------------+---------------------+----+-----------+--------------+---------+--------------+----------+---------+-----------------+---------------------+-------------+---------+--------------------------------------+
| created_at | updated_at | deleted_at | id | host | binary | topic | report_count | disabled | deleted | disabled_reason | last_seen_up | forced_down | version | uuid |
+---------------------+---------------------+---------------------+----+-----------+--------------+---------+--------------+----------+---------+-----------------+---------------------+-------------+---------+--------------------------------------+
| 2024-07-02 01:42:07 | 2024-07-02 08:56:51 | 2024-07-02 08:56:58 | 10 | kvmcore14 | nova-compute | compute | 861 | 0 | 10 | NULL | 2024-07-02 08:56:51 | 0 | 51 | a2b948f2-6d82-4d5c-bf67-9d7b45cf9868 |
| 2024-07-02 09:20:37 | 2024-07-04 01:19:48 | 2024-07-04 01:19:54 | 16 | kvmcore14 | nova-compute | compute | 4742 | 0 | 16 | NULL | 2024-07-04 01:19:48 | 0 | 51 | 50b280ac-9ed6-4bf3-a87f-1f8697d87966 |
| 2024-07-09 10:09:40 | 2024-07-10 00:47:44 | 2024-07-10 00:48:12 | 17 | kvmcore14 | nova-compute | compute | 1733 | 0 | 17 | NULL | 2024-07-10 00:47:44 | 0 | 51 | ade10449-24ad-4535-bbaa-38fbb79956fc |
| 2024-07-10 01:04:46 | 2024-07-10 01:04:57 | 2024-07-10 01:05:21 | 19 | kvmcore14 | nova-compute | compute | 1 | 0 | 19 | NULL | 2024-07-10 01:04:57 | 0 | 51 | dc0fa50d-83ab-4ca3-bdde-68e527a35a8d |
| 2024-07-10 01:22:57 | 2024-07-10 02:13:43 | NULL | 20 | kvmcore14 | nova-compute | compute | 100 | 0 | 0 | NULL | 2024-07-10 02:13:43 | 0 | 51 | 2191a868-5e66-4758-b7f1-67b0c4c43294 |
+---------------------+---------------------+---------------------+----+-----------+--------------+---------+--------------+----------+---------+-----------------+---------------------+-------------+---------+--------------------------------------+
5 rows in set (0.002 sec)
The proposed code fix is to add an extra condition Service.deleted == 0 in
https://github.com/openstack/nova/blob/stable/2023.1/nova/db/main/models.py#L168
To manage notifications about this bug go to:
https://bugs.launchpad.net/nova/+bug/2073365/+subscriptions