yahoo-eng-team team mailing list archive
-
yahoo-eng-team team
-
Mailing list archive
-
Message #94413
[Bug 2076614] [NEW] nova-manage db online_data_migrations fails after upgrading to 2024.1
Public bug reported:
Description
===========
After upgrading our openstack infrastructure to 2024.1 we're unable to successfully run the required nova-manage online_data_migrations command.
Steps to reproduce
==================
Run the db online_data_migrations command:
# nova-manage db online_data_migrations --max-count 5000
Which always reports that one match for the query
populate_instance_compute_id has been found:
# nova-manage db online_data_migrations --max-count 5000
1 rows matched query populate_instance_compute_id, 0 migrated
+-------------------------------------+--------------+-----------+
| Migration | Total Needed | Completed |
+-------------------------------------+--------------+-----------+
| fill_virtual_interface_list | 0 | 0 |
| migrate_empty_ratio | 0 | 0 |
| migrate_quota_classes_to_api_db | 0 | 0 |
| migrate_quota_limits_to_api_db | 0 | 0 |
| migration_migrate_to_uuid | 0 | 0 |
| populate_dev_uuids | 0 | 0 |
| populate_instance_compute_id | 1 | 0 |
| populate_missing_availability_zones | 0 | 0 |
| populate_queued_for_delete | 0 | 0 |
| populate_user_id | 0 | 0 |
| populate_uuids | 0 | 0 |
+-------------------------------------+--------------+-----------+
But the entry never gets migrated due to the following error:
# tail /var/log/nova/nova-manage.log
...
2024-08-12 15:37:09.388 1428234 ERROR nova.objects.instance [None req-1234 - - - - - -] [instance: 00000000-0000-0000-0000-000000000000] Unable to migrate instance because host None with node None not found: nova.exception.ComputeHostNotFound: Compute host None could not be found.
A closer look into the database reveals that every time I run the nova-
manage command (nova-manage db online_data_migrations --max-count 5000)
an interesting entry is created in the nova database:
MariaDB [nova]> select * from instances where host is null;

| created_at | updated_at | deleted_at | id | internal_id | user_id | project_id | image_ref | kernel_id | ramdisk_id | launch_index | key_name | key_data | power_state | vm_state | memory_mb | vcpus | hostname | host | user_data | reservation_id | launched_at | terminated_at | display_name | display_description | availability_zone | locked | os_type | launched_on | instance_type_id | vm_mode | uuid | architecture | root_device_name | access_ip_v4 | access_ip_v6 | config_drive | task_state | default_ephemeral_device | default_swap_device | progress | auto_disk_config | shutdown_terminate | disable_terminate | root_gb | ephemeral_gb | cell_name | node | deleted | locked_by | cleaned | ephemeral_key_uuid | hidden | compute_id |
+---------------------+------------+---------------------+-------+-------------+--------------------------------------+--------------------------------------+-----------+-----------+------------+--------------+----------+----------+-------------+----------+-----------+-------+----------+------+-----------+----------------+-------------+---------------+--------------+---------------------+-------------------+--------+---------+-------------+------------------+---------+--------------------------------------+--------------+------------------+--------------+--------------+--------------+------------+--------------------------+---------------------+----------+------------------+--------------------+-------------------+---------+--------------+-----------+------+---------+-----------+---------+--------------------+--------+------------+
| 2024-08-12 13:36:59 | NULL | 2024-08-12 13:36:59 | 10898 | NULL | 00000000-0000-0000-0000-000000000000 | 00000000-0000-0000-0000-000000000000 | NULL | NULL | NULL | NULL | NULL | NULL | NULL | NULL | NULL | NULL | NULL | NULL | NULL | NULL | NULL | NULL | NULL | NULL | NULL | NULL | NULL | NULL | NULL | NULL | 00000000-0000-0000-0000-000000000000 | NULL | NULL | NULL | NULL | NULL | NULL | NULL | NULL | NULL | NULL | 0 | 0 | NULL | NULL | NULL | NULL | 10898 | NULL | 0 | NULL | 0 | NULL |

That entry can be cleaned up by running
# nova-manage db archive_deleted_rows --verbose
But the next time running the online_data_migrations command fails again
and the instance is present again in the nova database.
I was able to track down the creation of that "empty" instance to that
code:
https://github.com/openstack/nova/blob/stable/2024.1/nova/objects/virtual_interface.py#L30
Expected result
===============
After the first execution all migrations are executed and no more "empty" instances are in the database.
Actual result
=============
See above in the steps to reproduce section.
Environment
===========
1. Exact version of OpenStack you are running: OpenStack Caracal 2024.1
# dpkg -l | grep nova
ii nova-api 3:29.0.1-0ubuntu1.3~cloud0 all OpenStack Compute - API frontend
ii nova-common 3:29.0.1-0ubuntu1.3~cloud0 all OpenStack Compute - common files
ii nova-conductor 3:29.0.1-0ubuntu1.3~cloud0 all OpenStack Compute - conductor service
ii nova-novncproxy 3:29.0.1-0ubuntu1.3~cloud0 all OpenStack Compute - NoVNC proxy
ii nova-scheduler 3:29.0.1-0ubuntu1.3~cloud0 all OpenStack Compute - virtual machine scheduler
ii python3-nova 3:29.0.1-0ubuntu1.3~cloud0 all OpenStack Compute Python 3 libraries
ii python3-novaclient 2:18.5.0-0ubuntu1~cloud0 all client library for OpenStack Compute API - 3.x
** Affects: nova
Importance: Undecided
Status: New
--
You received this bug notification because you are a member of Yahoo!
Engineering Team, which is subscribed to OpenStack Compute (nova).
https://bugs.launchpad.net/bugs/2076614
Title:
nova-manage db online_data_migrations fails after upgrading to 2024.1
Status in OpenStack Compute (nova):
New
Bug description:
Description
===========
After upgrading our openstack infrastructure to 2024.1 we're unable to successfully run the required nova-manage online_data_migrations command.
Steps to reproduce
==================
Run the db online_data_migrations command:
# nova-manage db online_data_migrations --max-count 5000
Which always reports that one match for the query
populate_instance_compute_id has been found:
# nova-manage db online_data_migrations --max-count 5000
1 rows matched query populate_instance_compute_id, 0 migrated
+-------------------------------------+--------------+-----------+
| Migration | Total Needed | Completed |
+-------------------------------------+--------------+-----------+
| fill_virtual_interface_list | 0 | 0 |
| migrate_empty_ratio | 0 | 0 |
| migrate_quota_classes_to_api_db | 0 | 0 |
| migrate_quota_limits_to_api_db | 0 | 0 |
| migration_migrate_to_uuid | 0 | 0 |
| populate_dev_uuids | 0 | 0 |
| populate_instance_compute_id | 1 | 0 |
| populate_missing_availability_zones | 0 | 0 |
| populate_queued_for_delete | 0 | 0 |
| populate_user_id | 0 | 0 |
| populate_uuids | 0 | 0 |
+-------------------------------------+--------------+-----------+
But the entry never gets migrated due to the following error:
# tail /var/log/nova/nova-manage.log
...
2024-08-12 15:37:09.388 1428234 ERROR nova.objects.instance [None req-1234 - - - - - -] [instance: 00000000-0000-0000-0000-000000000000] Unable to migrate instance because host None with node None not found: nova.exception.ComputeHostNotFound: Compute host None could not be found.
A closer look into the database reveals that every time I run the
nova-manage command (nova-manage db online_data_migrations --max-count
5000) an interesting entry is created in the nova database:
MariaDB [nova]> select * from instances where host is null;

| created_at | updated_at | deleted_at | id | internal_id | user_id | project_id | image_ref | kernel_id | ramdisk_id | launch_index | key_name | key_data | power_state | vm_state | memory_mb | vcpus | hostname | host | user_data | reservation_id | launched_at | terminated_at | display_name | display_description | availability_zone | locked | os_type | launched_on | instance_type_id | vm_mode | uuid | architecture | root_device_name | access_ip_v4 | access_ip_v6 | config_drive | task_state | default_ephemeral_device | default_swap_device | progress | auto_disk_config | shutdown_terminate | disable_terminate | root_gb | ephemeral_gb | cell_name | node | deleted | locked_by | cleaned | ephemeral_key_uuid | hidden | compute_id |
+---------------------+------------+---------------------+-------+-------------+--------------------------------------+--------------------------------------+-----------+-----------+------------+--------------+----------+----------+-------------+----------+-----------+-------+----------+------+-----------+----------------+-------------+---------------+--------------+---------------------+-------------------+--------+---------+-------------+------------------+---------+--------------------------------------+--------------+------------------+--------------+--------------+--------------+------------+--------------------------+---------------------+----------+------------------+--------------------+-------------------+---------+--------------+-----------+------+---------+-----------+---------+--------------------+--------+------------+
| 2024-08-12 13:36:59 | NULL | 2024-08-12 13:36:59 | 10898 | NULL | 00000000-0000-0000-0000-000000000000 | 00000000-0000-0000-0000-000000000000 | NULL | NULL | NULL | NULL | NULL | NULL | NULL | NULL | NULL | NULL | NULL | NULL | NULL | NULL | NULL | NULL | NULL | NULL | NULL | NULL | NULL | NULL | NULL | NULL | 00000000-0000-0000-0000-000000000000 | NULL | NULL | NULL | NULL | NULL | NULL | NULL | NULL | NULL | NULL | 0 | 0 | NULL | NULL | NULL | NULL | 10898 | NULL | 0 | NULL | 0 | NULL |

That entry can be cleaned up by running
# nova-manage db archive_deleted_rows --verbose
But the next time running the online_data_migrations command fails
again and the instance is present again in the nova database.
I was able to track down the creation of that "empty" instance to that code: https://github.com/openstack/nova/blob/stable/2024.1/nova/objects/virtual_interface.py#L30
Expected result
===============
After the first execution all migrations are executed and no more "empty" instances are in the database.
Actual result
=============
See above in the steps to reproduce section.
Environment
===========
1. Exact version of OpenStack you are running: OpenStack Caracal 2024.1
# dpkg -l | grep nova
ii nova-api 3:29.0.1-0ubuntu1.3~cloud0 all OpenStack Compute - API frontend
ii nova-common 3:29.0.1-0ubuntu1.3~cloud0 all OpenStack Compute - common files
ii nova-conductor 3:29.0.1-0ubuntu1.3~cloud0 all OpenStack Compute - conductor service
ii nova-novncproxy 3:29.0.1-0ubuntu1.3~cloud0 all OpenStack Compute - NoVNC proxy
ii nova-scheduler 3:29.0.1-0ubuntu1.3~cloud0 all OpenStack Compute - virtual machine scheduler
ii python3-nova 3:29.0.1-0ubuntu1.3~cloud0 all OpenStack Compute Python 3 libraries
ii python3-novaclient 2:18.5.0-0ubuntu1~cloud0 all client library for OpenStack Compute API - 3.x
To manage notifications about this bug go to:
https://bugs.launchpad.net/nova/+bug/2076614/+subscriptions