yahoo-eng-team team mailing list archive
-
yahoo-eng-team team
-
Mailing list archive
-
Message #81546
[Bug 1862205] Re: Instances not visible when hidden=NULL
Reviewed: https://review.opendev.org/706331
Committed: https://git.openstack.org/cgit/openstack/nova/commit/?id=001f3a7bfe6b2c8af135daff8e154a708792070e
Submitter: Zuul
Branch: master
commit 001f3a7bfe6b2c8af135daff8e154a708792070e
Author: Dan Smith <dansmith@xxxxxxxxxx>
Date: Thu Feb 6 09:21:38 2020 -0800
Fix instance.hidden migration and querying
It was discovered that default= on a Column definition in a schema migration
will attempt to update the table with the provided value, instead of just
translating on read, which is often the assumption. The Instance.hidden=False
change introduced in Train[1] used such a default on the new column, which caused
at least one real-world deployment to time out rewriting the instances table
due to size. Apparently SQLAlchemy-migrate also does not consider such a timeout
to be a failure and proceeds on. The end result is that some existing instances
in the database have hidden=NULL values, and the DB model layer does not convert
those to hidden=False when we read/query them, causing those instances to be
excluded from the API list view.
This change alters the 399 schema migration to remove the default=False
specification. This does not actually change the schema, but /will/ prevent
users who have not yet upgraded to Train from rewriting the table.
This change also makes the instance_get_all_by_filters() code handle hidden
specially, including false and NULL in a query for non-hidden instances.
A future change should add a developer trap test to ensure that future migrations
do not add default= values to new columns to avoid this situation in the future.
[1] Iaffb27bd8c562ba120047c04bb62619c0864f594
Change-Id: Iace3f653b42c20887b40ee0105c8e9a4edeff1f7
Closes-Bug: #1862205
** Changed in: nova
Status: In Progress => Fix Released
--
You received this bug notification because you are a member of Yahoo!
Engineering Team, which is subscribed to OpenStack Compute (nova).
https://bugs.launchpad.net/bugs/1862205
Title:
Instances not visible when hidden=NULL
Status in OpenStack Compute (nova):
Fix Released
Bug description:
During an upgrade of a cloud from Stein to Train, there is a migration
which adds the `hidden` field to the database.
In that migration, it was assumed that it does not backfill all of the
columns. However, upon verifying, it actually does backfill all
columns and the order of operations *seems* to be:
1. Create new column for `hidden`
2. Update database migration version
3. Start backfilling all existing instances with hidden=0
In my case, the migration did create the column but failed to backfill
all existing instances because of the large number of instances.
However, running the migrations again seems to simply continue and not
block on that migration, but leaving all columns with hidden=NULL.
====================
Feb 06 14:06:13 control02-nova-api-container-f89ad8b4 nova-manage[10596]: 2020-02-06 14:06:13.566 10596 INFO migrate.versioning.api [req-34f0c5a6-2983-4c8e-9b9d-14167851c984 - - - - -] 398 -> 399...
Feb 06 14:07:18 control02-nova-api-container-f89ad8b4 nova-manage[10596]: 2020-02-06 14:07:18.129 10596 ERROR oslo_db.sqlalchemy.exc_filters [req-34f0c5a6-2983-4c8e-9b9d-14167851c984 - - - - -] DBAPIError exception wrapped from (pymysql.err.InternalError) (1180, 'Got error 90 "Message too long" during COMMIT')
Feb 06 14:07:18 control02-nova-api-container-f89ad8b4 nova-manage[10596]: 2020-02-06 14:07:18.132 10596 ERROR oslo_db.sqlalchemy.exc_filters [req-34f0c5a6-2983-4c8e-9b9d-14167851c984 - - - - -] DB exception wrapped.: sqlalchemy.exc.ResourceClosedError: This Connection is closed
Feb 06 14:10:22 control02-nova-api-container-f89ad8b4 nova-manage[14139]: 2020-02-06 14:10:22.930 14139 INFO migrate.versioning.api [req-032e5b40-88c9-4f4b-8ab0-525c50389967 - - - - -] 398 -> 399...
Feb 06 14:10:22 control02-nova-api-container-f89ad8b4 nova-manage[14139]: 2020-02-06 14:10:22.985 14139 INFO migrate.versioning.api [req-032e5b40-88c9-4f4b-8ab0-525c50389967 - - - - -] done
Feb 06 14:10:22 control02-nova-api-container-f89ad8b4 nova-manage[14139]: 2020-02-06 14:10:22.985 14139 INFO migrate.versioning.api [req-032e5b40-88c9-4f4b-8ab0-525c50389967 - - - - -] 399 -> 400...
Feb 06 14:10:22 control02-nova-api-container-f89ad8b4 nova-manage[14139]: 2020-02-06 14:10:22.995 14139 INFO migrate.versioning.api [req-032e5b40-88c9-4f4b-8ab0-525c50389967 - - - - -] done
Feb 06 14:10:22 control02-nova-api-container-f89ad8b4 nova-manage[14139]: 2020-02-06 14:10:22.995 14139 INFO migrate.versioning.api [req-032e5b40-88c9-4f4b-8ab0-525c50389967 - - - - -] 400 -> 401...
Feb 06 14:10:23 control02-nova-api-container-f89ad8b4 nova-manage[14139]: 2020-02-06 14:10:23.145 14139 INFO migrate.versioning.api [req-032e5b40-88c9-4f4b-8ab0-525c50389967 - - - - -] done
Feb 06 14:10:23 control02-nova-api-container-f89ad8b4 nova-manage[14139]: 2020-02-06 14:10:23.145 14139 INFO migrate.versioning.api [req-032e5b40-88c9-4f4b-8ab0-525c50389967 - - - - -] 401 -> 402...
Feb 06 14:10:23 control02-nova-api-container-f89ad8b4 nova-manage[14139]: 2020-02-06 14:10:23.244 14139 INFO migrate.versioning.api [req-032e5b40-88c9-4f4b-8ab0-525c50389967 - - - - -] done
====================
This issue is two-part, because now it seems that Nova does not assume
that hidden=NULL means that the instance is not hidden and no longer
displays the instance via API or any other operations.
The "very silly" confirmation of this behaviour of backfilling was my
attempt at patching things up resulted in the same error:
==================
MariaDB [nova]> update instances set hidden=0;
ERROR 1180 (HY000): Got error 90 "Message too long" during COMMIT
===================
Ideally, Nova shouldn't try and backfill values and it should treat
hidden=NULL as 0.
To manage notifications about this bug go to:
https://bugs.launchpad.net/nova/+bug/1862205/+subscriptions
References