← Back to team overview

yahoo-eng-team team mailing list archive

[Bug 2076614] [NEW] nova-manage db online_data_migrations fails after upgrading to 2024.1

 

Public bug reported:

Description
===========
After upgrading our openstack infrastructure to 2024.1 we're unable to successfully run the required nova-manage online_data_migrations command.


Steps to reproduce
==================
Run the db online_data_migrations command:

    # nova-manage db online_data_migrations --max-count 5000

Which always reports that one match for the query
populate_instance_compute_id has been found:


    # nova-manage db online_data_migrations --max-count 5000
    1 rows matched query populate_instance_compute_id, 0 migrated
    +-------------------------------------+--------------+-----------+
    |              Migration              | Total Needed | Completed |
    +-------------------------------------+--------------+-----------+
    |     fill_virtual_interface_list     |      0       |     0     |
    |         migrate_empty_ratio         |      0       |     0     |
    |   migrate_quota_classes_to_api_db   |      0       |     0     |
    |    migrate_quota_limits_to_api_db   |      0       |     0     |
    |      migration_migrate_to_uuid      |      0       |     0     |
    |          populate_dev_uuids         |      0       |     0     |
    |     populate_instance_compute_id    |      1       |     0     |
    | populate_missing_availability_zones |      0       |     0     |
    |      populate_queued_for_delete     |      0       |     0     |
    |           populate_user_id          |      0       |     0     |
    |            populate_uuids           |      0       |     0     |
    +-------------------------------------+--------------+-----------+


But the entry never gets migrated due to the following error:

    # tail /var/log/nova/nova-manage.log
    ...
    2024-08-12 15:37:09.388 1428234 ERROR nova.objects.instance [None req-1234 - - - - - -] [instance: 00000000-0000-0000-0000-000000000000] Unable to migrate instance because host None with node None not found: nova.exception.ComputeHostNotFound: Compute host None could not be found.


A closer look into the database reveals that every time I run the nova-
manage command (nova-manage db online_data_migrations --max-count 5000)
an interesting entry is created in the nova database:

    MariaDB [nova]> select * from instances where host is null;
    +---------------------+------------+---------------------+-------+-------------+--------------------------------------+--------------------------------------+-----------+-----------+------------+--------------+----------+----------+-------------+----------+-----------+-------+----------+------+-----------+----------------+-------------+----------------------------------------------------+-------------------+--------+---------+-------------+------------------+---------+--------------------------------------+--------------+------------------+--------------+--------------+--------------+------------+--------------------------+---------------------+----------+------------------+--------------------+-------------------+---------+--------------+-----------+------+---------+-----------+---------+--------------------+--------+------------+
    | created_at          | updated_at | deleted_at          | id    | internal_id | user_id                              | project_id                           | image_ref | kernel_id | ramdisk_id | launch_index | key_name | key_data | power_state | vm_state | memory_mb | vcpus | hostname | host | user_data | reservation_id | launched_at | terminated_at | display_name | display_description | availability_zone | locked | os_type | launched_on | instance_type_id | vm_mode | uuid                                 | architecture | root_device_name | access_ip_v4 | access_ip_v6 | config_drive | task_state | default_ephemeral_device | default_swap_device | progress | auto_disk_config | shutdown_terminate | disable_terminate | root_gb | ephemeral_gb | cell_name | node | deleted | locked_by | cleaned | ephemeral_key_uuid | hidden | compute_id |
    +---------------------+------------+---------------------+-------+-------------+--------------------------------------+--------------------------------------+-----------+-----------+------------+--------------+----------+----------+-------------+----------+-----------+-------+----------+------+-----------+----------------+-------------+---------------+--------------+---------------------+-------------------+--------+---------+-------------+------------------+---------+--------------------------------------+--------------+------------------+--------------+--------------+--------------+------------+--------------------------+---------------------+----------+------------------+--------------------+-------------------+---------+--------------+-----------+------+---------+-----------+---------+--------------------+--------+------------+
    | 2024-08-12 13:36:59 | NULL       | 2024-08-12 13:36:59 | 10898 |        NULL | 00000000-0000-0000-0000-000000000000 | 00000000-0000-0000-0000-000000000000 | NULL      | NULL      | NULL       |         NULL | NULL     | NULL     |        NULL | NULL     |      NULL |  NULL | NULL     | NULL | NULL      | NULL           | NULL        | NULL          | NULL         | NULL                | NULL              |   NULL | NULL    | NULL        |             NULL | NULL    | 00000000-0000-0000-0000-000000000000 | NULL         | NULL             | NULL         | NULL         | NULL         | NULL       | NULL                     | NULL                |     NULL |             NULL |                  0 |                 0 |    NULL |         NULL | NULL      | NULL |   10898 | NULL      |       0 | NULL               |      0 |       NULL |
    +---------------------+------------+---------------------+-------+-------------+--------------------------------------+--------------------------------------+-----------+-----------+------------+--------------+----------+----------+-------------+----------+-----------+-------+----------+------+-----------+----------------+-------------+---------------+--------------+---------------------+-------------------+--------+---------+-------------+------------------+---------+--------------------------------------+--------------+------------------+--------------+--------------+--------------+------------+--------------------------+---------------------+----------+------------------+--------------------+-------------------+---------+--------------+-----------+------+---------+-----------+---------+--------------------+--------+------------+


That entry can be cleaned up by running
    # nova-manage db archive_deleted_rows --verbose

But the next time running the online_data_migrations command fails again
and the instance is present again in the nova database.


I was able to track down the creation of that "empty" instance to that
code:
https://github.com/openstack/nova/blob/stable/2024.1/nova/objects/virtual_interface.py#L30


Expected result
===============
After the first execution all migrations are executed and no more "empty" instances are in the database.


Actual result
=============
See above in the steps to reproduce section.


Environment
===========
1. Exact version of OpenStack you are running: OpenStack Caracal 2024.1

    # dpkg -l | grep nova
    ii  nova-api                               3:29.0.1-0ubuntu1.3~cloud0                           all          OpenStack Compute - API frontend
    ii  nova-common                            3:29.0.1-0ubuntu1.3~cloud0                           all          OpenStack Compute - common files
    ii  nova-conductor                         3:29.0.1-0ubuntu1.3~cloud0                           all          OpenStack Compute - conductor service
    ii  nova-novncproxy                        3:29.0.1-0ubuntu1.3~cloud0                           all          OpenStack Compute - NoVNC proxy
    ii  nova-scheduler                         3:29.0.1-0ubuntu1.3~cloud0                           all          OpenStack Compute - virtual machine scheduler
    ii  python3-nova                           3:29.0.1-0ubuntu1.3~cloud0                           all          OpenStack Compute Python 3 libraries
    ii  python3-novaclient                     2:18.5.0-0ubuntu1~cloud0                             all          client library for OpenStack Compute API - 3.x

** Affects: nova
     Importance: Undecided
         Status: New

-- 
You received this bug notification because you are a member of Yahoo!
Engineering Team, which is subscribed to OpenStack Compute (nova).
https://bugs.launchpad.net/bugs/2076614

Title:
  nova-manage db online_data_migrations fails after upgrading to 2024.1

Status in OpenStack Compute (nova):
  New

Bug description:
  Description
  ===========
  After upgrading our openstack infrastructure to 2024.1 we're unable to successfully run the required nova-manage online_data_migrations command.

  
  Steps to reproduce
  ==================
  Run the db online_data_migrations command:

      # nova-manage db online_data_migrations --max-count 5000

  Which always reports that one match for the query
  populate_instance_compute_id has been found:

  
      # nova-manage db online_data_migrations --max-count 5000
      1 rows matched query populate_instance_compute_id, 0 migrated
      +-------------------------------------+--------------+-----------+
      |              Migration              | Total Needed | Completed |
      +-------------------------------------+--------------+-----------+
      |     fill_virtual_interface_list     |      0       |     0     |
      |         migrate_empty_ratio         |      0       |     0     |
      |   migrate_quota_classes_to_api_db   |      0       |     0     |
      |    migrate_quota_limits_to_api_db   |      0       |     0     |
      |      migration_migrate_to_uuid      |      0       |     0     |
      |          populate_dev_uuids         |      0       |     0     |
      |     populate_instance_compute_id    |      1       |     0     |
      | populate_missing_availability_zones |      0       |     0     |
      |      populate_queued_for_delete     |      0       |     0     |
      |           populate_user_id          |      0       |     0     |
      |            populate_uuids           |      0       |     0     |
      +-------------------------------------+--------------+-----------+

  
  But the entry never gets migrated due to the following error:

      # tail /var/log/nova/nova-manage.log
      ...
      2024-08-12 15:37:09.388 1428234 ERROR nova.objects.instance [None req-1234 - - - - - -] [instance: 00000000-0000-0000-0000-000000000000] Unable to migrate instance because host None with node None not found: nova.exception.ComputeHostNotFound: Compute host None could not be found.


  A closer look into the database reveals that every time I run the
  nova-manage command (nova-manage db online_data_migrations --max-count
  5000) an interesting entry is created in the nova database:

      MariaDB [nova]> select * from instances where host is null;
      +---------------------+------------+---------------------+-------+-------------+--------------------------------------+--------------------------------------+-----------+-----------+------------+--------------+----------+----------+-------------+----------+-----------+-------+----------+------+-----------+----------------+-------------+----------------------------------------------------+-------------------+--------+---------+-------------+------------------+---------+--------------------------------------+--------------+------------------+--------------+--------------+--------------+------------+--------------------------+---------------------+----------+------------------+--------------------+-------------------+---------+--------------+-----------+------+---------+-----------+---------+--------------------+--------+------------+
      | created_at          | updated_at | deleted_at          | id    | internal_id | user_id                              | project_id                           | image_ref | kernel_id | ramdisk_id | launch_index | key_name | key_data | power_state | vm_state | memory_mb | vcpus | hostname | host | user_data | reservation_id | launched_at | terminated_at | display_name | display_description | availability_zone | locked | os_type | launched_on | instance_type_id | vm_mode | uuid                                 | architecture | root_device_name | access_ip_v4 | access_ip_v6 | config_drive | task_state | default_ephemeral_device | default_swap_device | progress | auto_disk_config | shutdown_terminate | disable_terminate | root_gb | ephemeral_gb | cell_name | node | deleted | locked_by | cleaned | ephemeral_key_uuid | hidden | compute_id |
      +---------------------+------------+---------------------+-------+-------------+--------------------------------------+--------------------------------------+-----------+-----------+------------+--------------+----------+----------+-------------+----------+-----------+-------+----------+------+-----------+----------------+-------------+---------------+--------------+---------------------+-------------------+--------+---------+-------------+------------------+---------+--------------------------------------+--------------+------------------+--------------+--------------+--------------+------------+--------------------------+---------------------+----------+------------------+--------------------+-------------------+---------+--------------+-----------+------+---------+-----------+---------+--------------------+--------+------------+
      | 2024-08-12 13:36:59 | NULL       | 2024-08-12 13:36:59 | 10898 |        NULL | 00000000-0000-0000-0000-000000000000 | 00000000-0000-0000-0000-000000000000 | NULL      | NULL      | NULL       |         NULL | NULL     | NULL     |        NULL | NULL     |      NULL |  NULL | NULL     | NULL | NULL      | NULL           | NULL        | NULL          | NULL         | NULL                | NULL              |   NULL | NULL    | NULL        |             NULL | NULL    | 00000000-0000-0000-0000-000000000000 | NULL         | NULL             | NULL         | NULL         | NULL         | NULL       | NULL                     | NULL                |     NULL |             NULL |                  0 |                 0 |    NULL |         NULL | NULL      | NULL |   10898 | NULL      |       0 | NULL               |      0 |       NULL |
      +---------------------+------------+---------------------+-------+-------------+--------------------------------------+--------------------------------------+-----------+-----------+------------+--------------+----------+----------+-------------+----------+-----------+-------+----------+------+-----------+----------------+-------------+---------------+--------------+---------------------+-------------------+--------+---------+-------------+------------------+---------+--------------------------------------+--------------+------------------+--------------+--------------+--------------+------------+--------------------------+---------------------+----------+------------------+--------------------+-------------------+---------+--------------+-----------+------+---------+-----------+---------+--------------------+--------+------------+

  
  That entry can be cleaned up by running
      # nova-manage db archive_deleted_rows --verbose

  But the next time running the online_data_migrations command fails
  again and the instance is present again in the nova database.


  
  I was able to track down the creation of that "empty" instance to that code: https://github.com/openstack/nova/blob/stable/2024.1/nova/objects/virtual_interface.py#L30



  Expected result
  ===============
  After the first execution all migrations are executed and no more "empty" instances are in the database.


  Actual result
  =============
  See above in the steps to reproduce section.


  Environment
  ===========
  1. Exact version of OpenStack you are running: OpenStack Caracal 2024.1

      # dpkg -l | grep nova
      ii  nova-api                               3:29.0.1-0ubuntu1.3~cloud0                           all          OpenStack Compute - API frontend
      ii  nova-common                            3:29.0.1-0ubuntu1.3~cloud0                           all          OpenStack Compute - common files
      ii  nova-conductor                         3:29.0.1-0ubuntu1.3~cloud0                           all          OpenStack Compute - conductor service
      ii  nova-novncproxy                        3:29.0.1-0ubuntu1.3~cloud0                           all          OpenStack Compute - NoVNC proxy
      ii  nova-scheduler                         3:29.0.1-0ubuntu1.3~cloud0                           all          OpenStack Compute - virtual machine scheduler
      ii  python3-nova                           3:29.0.1-0ubuntu1.3~cloud0                           all          OpenStack Compute Python 3 libraries
      ii  python3-novaclient                     2:18.5.0-0ubuntu1~cloud0                             all          client library for OpenStack Compute API - 3.x

To manage notifications about this bug go to:
https://bugs.launchpad.net/nova/+bug/2076614/+subscriptions