← Back to team overview

yahoo-eng-team team mailing list archive

[Bug 1862205] Re: Instances not visible when hidden=NULL

 

Reviewed:  https://review.opendev.org/706331
Committed: https://git.openstack.org/cgit/openstack/nova/commit/?id=001f3a7bfe6b2c8af135daff8e154a708792070e
Submitter: Zuul
Branch:    master

commit 001f3a7bfe6b2c8af135daff8e154a708792070e
Author: Dan Smith <dansmith@xxxxxxxxxx>
Date:   Thu Feb 6 09:21:38 2020 -0800

    Fix instance.hidden migration and querying
    
    It was discovered that default= on a Column definition in a schema migration
    will attempt to update the table with the provided value, instead of just
    translating on read, which is often the assumption. The Instance.hidden=False
    change introduced in Train[1] used such a default on the new column, which caused
    at least one real-world deployment to time out rewriting the instances table
    due to size. Apparently SQLAlchemy-migrate also does not consider such a timeout
    to be a failure and proceeds on. The end result is that some existing instances
    in the database have hidden=NULL values, and the DB model layer does not convert
    those to hidden=False when we read/query them, causing those instances to be
    excluded from the API list view.
    
    This change alters the 399 schema migration to remove the default=False
    specification. This does not actually change the schema, but /will/ prevent
    users who have not yet upgraded to Train from rewriting the table.
    
    This change also makes the instance_get_all_by_filters() code handle hidden
    specially, including false and NULL in a query for non-hidden instances.
    
    A future change should add a developer trap test to ensure that future migrations
    do not add default= values to new columns to avoid this situation in the future.
    
    [1] Iaffb27bd8c562ba120047c04bb62619c0864f594
    
    Change-Id: Iace3f653b42c20887b40ee0105c8e9a4edeff1f7
    Closes-Bug: #1862205


** Changed in: nova
       Status: In Progress => Fix Released

-- 
You received this bug notification because you are a member of Yahoo!
Engineering Team, which is subscribed to OpenStack Compute (nova).
https://bugs.launchpad.net/bugs/1862205

Title:
  Instances not visible when hidden=NULL

Status in OpenStack Compute (nova):
  Fix Released

Bug description:
  During an upgrade of a cloud from Stein to Train, there is a migration
  which adds the `hidden` field to the database.

  In that migration, it was assumed that it does not backfill all of the
  columns.  However, upon verifying, it actually does backfill all
  columns and the order of operations *seems* to be:

  1. Create new column for `hidden`
  2. Update database migration version
  3. Start backfilling all existing instances with hidden=0

  In my case, the migration did create the column but failed to backfill
  all existing instances because of the large number of instances.
  However, running the migrations again seems to simply continue and not
  block on that migration, but leaving all columns with hidden=NULL.

  ====================
  Feb 06 14:06:13 control02-nova-api-container-f89ad8b4 nova-manage[10596]: 2020-02-06 14:06:13.566 10596 INFO migrate.versioning.api [req-34f0c5a6-2983-4c8e-9b9d-14167851c984 - - - - -] 398 -> 399... 
  Feb 06 14:07:18 control02-nova-api-container-f89ad8b4 nova-manage[10596]: 2020-02-06 14:07:18.129 10596 ERROR oslo_db.sqlalchemy.exc_filters [req-34f0c5a6-2983-4c8e-9b9d-14167851c984 - - - - -] DBAPIError exception wrapped from (pymysql.err.InternalError) (1180, 'Got error 90 "Message too long" during COMMIT')
  Feb 06 14:07:18 control02-nova-api-container-f89ad8b4 nova-manage[10596]: 2020-02-06 14:07:18.132 10596 ERROR oslo_db.sqlalchemy.exc_filters [req-34f0c5a6-2983-4c8e-9b9d-14167851c984 - - - - -] DB exception wrapped.: sqlalchemy.exc.ResourceClosedError: This Connection is closed
  Feb 06 14:10:22 control02-nova-api-container-f89ad8b4 nova-manage[14139]: 2020-02-06 14:10:22.930 14139 INFO migrate.versioning.api [req-032e5b40-88c9-4f4b-8ab0-525c50389967 - - - - -] 398 -> 399... 
  Feb 06 14:10:22 control02-nova-api-container-f89ad8b4 nova-manage[14139]: 2020-02-06 14:10:22.985 14139 INFO migrate.versioning.api [req-032e5b40-88c9-4f4b-8ab0-525c50389967 - - - - -] done
  Feb 06 14:10:22 control02-nova-api-container-f89ad8b4 nova-manage[14139]: 2020-02-06 14:10:22.985 14139 INFO migrate.versioning.api [req-032e5b40-88c9-4f4b-8ab0-525c50389967 - - - - -] 399 -> 400... 
  Feb 06 14:10:22 control02-nova-api-container-f89ad8b4 nova-manage[14139]: 2020-02-06 14:10:22.995 14139 INFO migrate.versioning.api [req-032e5b40-88c9-4f4b-8ab0-525c50389967 - - - - -] done
  Feb 06 14:10:22 control02-nova-api-container-f89ad8b4 nova-manage[14139]: 2020-02-06 14:10:22.995 14139 INFO migrate.versioning.api [req-032e5b40-88c9-4f4b-8ab0-525c50389967 - - - - -] 400 -> 401... 
  Feb 06 14:10:23 control02-nova-api-container-f89ad8b4 nova-manage[14139]: 2020-02-06 14:10:23.145 14139 INFO migrate.versioning.api [req-032e5b40-88c9-4f4b-8ab0-525c50389967 - - - - -] done
  Feb 06 14:10:23 control02-nova-api-container-f89ad8b4 nova-manage[14139]: 2020-02-06 14:10:23.145 14139 INFO migrate.versioning.api [req-032e5b40-88c9-4f4b-8ab0-525c50389967 - - - - -] 401 -> 402... 
  Feb 06 14:10:23 control02-nova-api-container-f89ad8b4 nova-manage[14139]: 2020-02-06 14:10:23.244 14139 INFO migrate.versioning.api [req-032e5b40-88c9-4f4b-8ab0-525c50389967 - - - - -] done
  ====================

  This issue is two-part, because now it seems that Nova does not assume
  that hidden=NULL means that the instance is not hidden and no longer
  displays the instance via API or any other operations.

  The "very silly" confirmation of this behaviour of backfilling was my
  attempt at patching things up resulted in the same error:

  ==================
  MariaDB [nova]> update instances set hidden=0;                        
  ERROR 1180 (HY000): Got error 90 "Message too long" during COMMIT
  ===================

  Ideally, Nova shouldn't try and backfill values and it should treat
  hidden=NULL as 0.

To manage notifications about this bug go to:
https://bugs.launchpad.net/nova/+bug/1862205/+subscriptions


References