← Back to team overview

yahoo-eng-team team mailing list archive

[Bug 1793419] Re: database online data migration fail due to missing request spec marker

 

OK I think I see, _get_marker_for_migrate_instances returns the marker
because there is still a request_specs table entry with the marker
instance_uuid (because we didn't used to clean up request specs on db
archive/purge - but now we do). So when listing instances we passed a
marker to an instance which wasn't found, and that raised MarkerNotFound
and failed.

** Changed in: nova
   Importance: Undecided => Low

** Changed in: nova
       Status: New => Triaged

** Also affects: nova/pike
   Importance: Undecided
       Status: New

** Also affects: nova/queens
   Importance: Undecided
       Status: New

** Also affects: nova/rocky
   Importance: Undecided
       Status: New

** Changed in: nova/pike
       Status: New => Triaged

** Changed in: nova/queens
       Status: New => Triaged

** Changed in: nova/rocky
       Status: New => Triaged

-- 
You received this bug notification because you are a member of Yahoo!
Engineering Team, which is subscribed to OpenStack Compute (nova).
https://bugs.launchpad.net/bugs/1793419

Title:
  database online data migration fail due to missing request spec marker

Status in OpenStack Compute (nova):
  Triaged
Status in OpenStack Compute (nova) pike series:
  Triaged
Status in OpenStack Compute (nova) queens series:
  Triaged
Status in OpenStack Compute (nova) rocky series:
  Triaged

Bug description:
  Description
  ===========
  During upgrade we run nova online migration that goes through the list of instances and creates a request spec record in the db if one does not exist. As the online migrations are batched, the request spec migration leaves a marker record in the request_specs table to indicate the last instance uuid that was processed.  It continues processing starting from that instances on the next batch.

  In our upgrade test, we hit a scenario where the marker instance from
  the online migration that was run during the Mitaka->Newton upgrade
  had been deleted and purged from the db by time we ran the
  Newton->Pike upgrade. This caused the online migration to fail as the
  marker instance couldn't be found.

  Steps to reproduce
  ==================
  - run data online migration on installed Newton load.
    nova-manage db online_data_migrations
  - delete the instance referenced by the marker (instance_uuid 00000000-0000-0000-0000-000000000000)
  - purge db:
    nova-manage db purge
  - upgrade to Pike.

  Expected result
  ===============
  Upgrade successful with no exceptions.
     
  Actual result
  =============
  Exceptions occur during upgrade with missing marker an upgrade failed.
  Error attempting to run <function migrate_instances_add_request_spec at 0x5151050> 
  14 rows matched query service_uuids_online_data_migration, 14 migrated 
  13 rows matched query migrate_quota_limits_to_api_db, 13 migrated 
  Error attempting to run <function migrate_instances_add_request_spec at 0x5151050> 
  +---------------------------------------------+--------------+-----------+ 
  | Migration | Total Needed | Completed | 
  +---------------------------------------------+--------------+-----------+ 
  | delete_build_requests_with_no_instance_uuid | 0 | 0 | 
  | migrate_aggregate_reset_autoincrement | 0 | 0 | 
  | migrate_aggregates | 0 | 0 | 
  | migrate_flavor_reset_autoincrement | 0 | 0 | 
  | migrate_flavors | 0 | 0 | 
  | migrate_instance_groups_to_api_db | 0 | 0 | 
  | migrate_instance_keypairs | 0 | 0 | 
  | migrate_instances_add_request_spec | 0 | 0 | 
  | migrate_keypairs_to_api_db | 0 | 0 | 
  | migrate_quota_classes_to_api_db | 0 | 0 | 
  | migrate_quota_limits_to_api_db | 0 | 0 | 
  | service_uuids_online_data_migration | 0 | 0 | 
  +---------------------------------------------+--------------+-----------+

To manage notifications about this bug go to:
https://bugs.launchpad.net/nova/+bug/1793419/+subscriptions


References