yahoo-eng-team team mailing list archive
-
yahoo-eng-team team
-
Mailing list archive
-
Message #79330
[Bug 1832860] Re: Failed instances stuck in BUILD state after Rocky upgrade
Reviewed: https://review.opendev.org/665626
Committed: https://git.openstack.org/cgit/openstack/nova/commit/?id=e99937c9a95f7049a53bcae7beff9a2a00c5889c
Submitter: Zuul
Branch: master
commit e99937c9a95f7049a53bcae7beff9a2a00c5889c
Author: Mark Goddard <mark@xxxxxxxxxxxx>
Date: Mon Jun 17 09:56:15 2019 +0100
Exit 1 when db sync runs before api_db sync
Since cells v2 was introduced, nova operators must run two commands to
migrate the database schemas of nova's databases - nova-manage api_db
sync and nova-manage db sync. It is necessary to run them in this order,
since the db sync may depend on schema changes made to the api database
in the api_db sync. Executing the db sync first may fail, for example
with the following seen in a Queens to Rocky upgrade:
nova-manage db sync
ERROR: Could not access cell0.
Has the nova_api database been created?
Has the nova_cell0 database been created?
Has "nova-manage api_db sync" been run?
Has "nova-manage cell_v2 map_cell0" been run?
Is [api_database]/connection set in nova.conf?
Is the cell0 database connection URL correct?
Error: (pymysql.err.InternalError) (1054, u"Unknown column
'cell_mappings.disabled' in 'field list'") [SQL: u'SELECT
cell_mappings.created_at AS cell_mappings_created_at,
cell_mappings.updated_at AS cell_mappings_updated_at,
cell_mappings.id AS cell_mappings_id, cell_mappings.uuid AS
cell_mappings_uuid, cell_mappings.name AS cell_mappings_name,
cell_mappings.transport_url AS cell_mappings_transport_url,
cell_mappings.database_connection AS
cell_mappings_database_connection, cell_mappings.disabled AS
cell_mappings_disabled \nFROM cell_mappings \nWHERE
cell_mappings.uuid = %(uuid_1)s \n LIMIT %(param_1)s'] [parameters:
{u'uuid_1': '00000000-0000-0000-0000-000000000000', u'param_1': 1}]
(Background on this error at: http://sqlalche.me/e/2j85)
Despite this error, the command actually exits zero, so deployment tools
are likely to continue with the upgrade, leading to issues down the
line.
This change modifies the command to exit 1 if the cell0 sync fails.
This change also clarifies this ordering in the upgrade and nova-manage
documentation, and adds information on exit codes for the command.
Change-Id: Iff2a23e09f2c5330b8fc0e9456860b65bd6ac149
Closes-Bug: #1832860
** Changed in: nova
Status: In Progress => Fix Released
--
You received this bug notification because you are a member of Yahoo!
Engineering Team, which is subscribed to OpenStack Compute (nova).
https://bugs.launchpad.net/bugs/1832860
Title:
Failed instances stuck in BUILD state after Rocky upgrade
Status in kolla:
Fix Released
Status in kolla rocky series:
Fix Committed
Status in kolla stein series:
Fix Committed
Status in kolla train series:
Fix Released
Status in OpenStack Compute (nova):
Fix Released
Bug description:
Steps to reproduce
==================
Starting with a cloud running the Queens release, upgrade to Rocky.
Create a flavor that cannot fit on any compute node, e.g.
openstack flavor create --ram 100000000 --disk 2147483647 --vcpus
10000 huge
Then create an instance using that flavor:
openstack server create huge --flavor huge --image cirros --network
demo-net
Expected
========
The instance fails to boot and ends up in the ERROR state.
Actual
======
The instance fails to boot and gets stuck in the BUILD state.
From nova-conductor.log:
2019-06-12 15:00:24.443 6 ERROR oslo_messaging.rpc.server Traceback (most recent call last):
2019-06-12 15:00:24.443 6 ERROR oslo_messaging.rpc.server File "/usr/lib/python2.7/site-packages/oslo_messaging/rpc/server.py", line 163, in _process_incoming
2019-06-12 15:00:24.443 6 ERROR oslo_messaging.rpc.server res = self.dispatcher.dispatch(message)
2019-06-12 15:00:24.443 6 ERROR oslo_messaging.rpc.server File "/usr/lib/python2.7/site-packages/oslo_messaging/rpc/dispatcher.py", line 265, in dispatch
2019-06-12 15:00:24.443 6 ERROR oslo_messaging.rpc.server return self._do_dispatch(endpoint, method, ctxt, args)
2019-06-12 15:00:24.443 6 ERROR oslo_messaging.rpc.server File "/usr/lib/python2.7/site-packages/oslo_messaging/rpc/dispatcher.py", line 194, in _do_dispatch
2019-06-12 15:00:24.443 6 ERROR oslo_messaging.rpc.server result = func(ctxt, **new_args)
2019-06-12 15:00:24.443 6 ERROR oslo_messaging.rpc.server File "/usr/lib/python2.7/site-packages/nova/conductor/manager.py", line 1244, in schedule_and_build_instances
2019-06-12 15:00:24.443 6 ERROR oslo_messaging.rpc.server tags=tags)
2019-06-12 15:00:24.443 6 ERROR oslo_messaging.rpc.server File "/usr/lib/python2.7/site-packages/nova/conductor/manager.py", line 1193, in _bury_in_cell0
2019-06-12 15:00:24.443 6 ERROR oslo_messaging.rpc.server instance.create()
2019-06-12 15:00:24.443 6 ERROR oslo_messaging.rpc.server File "/usr/lib/python2.7/site-packages/oslo_versionedobjects/base.py", line 226, in wrapper
2019-06-12 15:00:24.443 6 ERROR oslo_messaging.rpc.server return fn(self, *args, **kwargs)
2019-06-12 15:00:24.443 6 ERROR oslo_messaging.rpc.server File "/usr/lib/python2.7/site-packages/nova/objects/instance.py", line 600, in create
2019-06-12 15:00:24.443 6 ERROR oslo_messaging.rpc.server db_inst = db.instance_create(self._context, updates)
2019-06-12 15:00:24.443 6 ERROR oslo_messaging.rpc.server File "/usr/lib/python2.7/site-packages/nova/db/api.py", line 748, in instance_create
2019-06-12 15:00:24.443 6 ERROR oslo_messaging.rpc.server return IMPL.instance_create(context, values)
2019-06-12 15:00:24.443 6 ERROR oslo_messaging.rpc.server File "/usr/lib/python2.7/site-packages/nova/db/sqlalchemy/api.py", line 170, in wrapper
2019-06-12 15:00:24.443 6 ERROR oslo_messaging.rpc.server return f(*args, **kwargs)
2019-06-12 15:00:24.443 6 ERROR oslo_messaging.rpc.server File "/usr/lib/python2.7/site-packages/oslo_db/api.py", line 154, in wrapper
2019-06-12 15:00:24.443 6 ERROR oslo_messaging.rpc.server ectxt.value = e.inner_exc
2019-06-12 15:00:24.443 6 ERROR oslo_messaging.rpc.server File "/usr/lib/python2.7/site-packages/oslo_utils/excutils.py", line 220, in __exit__
2019-06-12 15:00:24.443 6 ERROR oslo_messaging.rpc.server self.force_reraise()
2019-06-12 15:00:24.443 6 ERROR oslo_messaging.rpc.server File "/usr/lib/python2.7/site-packages/oslo_utils/excutils.py", line 196, in force_reraise
2019-06-12 15:00:24.443 6 ERROR oslo_messaging.rpc.server six.reraise(self.type_, self.value, self.tb)
2019-06-12 15:00:24.443 6 ERROR oslo_messaging.rpc.server File "/usr/lib/python2.7/site-packages/oslo_db/api.py", line 142, in wrapper
2019-06-12 15:00:24.443 6 ERROR oslo_messaging.rpc.server return f(*args, **kwargs)
2019-06-12 15:00:24.443 6 ERROR oslo_messaging.rpc.server File "/usr/lib/python2.7/site-packages/nova/db/sqlalchemy/api.py", line 227, in wrapped
2019-06-12 15:00:24.443 6 ERROR oslo_messaging.rpc.server return f(context, *args, **kwargs)
2019-06-12 15:00:24.443 6 ERROR oslo_messaging.rpc.server File "/usr/lib/python2.7/site-packages/nova/db/sqlalchemy/api.py", line 1774, in instance_create
2019-06-12 15:00:24.443 6 ERROR oslo_messaging.rpc.server ec2_instance_create(context, instance_ref['uuid'])
2019-06-12 15:00:24.443 6 ERROR oslo_messaging.rpc.server File "/usr/lib/python2.7/site-packages/nova/db/sqlalchemy/api.py", line 170, in wrapper
2019-06-12 15:00:24.443 6 ERROR oslo_messaging.rpc.server return f(*args, **kwargs)
2019-06-12 15:00:24.443 6 ERROR oslo_messaging.rpc.server File "/usr/lib/python2.7/site-packages/nova/db/sqlalchemy/api.py", line 227, in wrapped
2019-06-12 15:00:24.443 6 ERROR oslo_messaging.rpc.server return f(context, *args, **kwargs)
2019-06-12 15:00:24.443 6 ERROR oslo_messaging.rpc.server File "/usr/lib/python2.7/site-packages/nova/db/sqlalchemy/api.py", line 5286, in ec2_instance_create
2019-06-12 15:00:24.443 6 ERROR oslo_messaging.rpc.server ec2_instance_ref.save(context.session)
2019-06-12 15:00:24.443 6 ERROR oslo_messaging.rpc.server File "/usr/lib/python2.7/site-packages/oslo_db/sqlalchemy/models.py", line 50, in save
2019-06-12 15:00:24.443 6 ERROR oslo_messaging.rpc.server session.flush()
2019-06-12 15:00:24.443 6 ERROR oslo_messaging.rpc.server File "/usr/lib64/python2.7/site-packages/sqlalchemy/orm/session.py", line 2254, in flush
2019-06-12 15:00:24.443 6 ERROR oslo_messaging.rpc.server self._flush(objects)
2019-06-12 15:00:24.443 6 ERROR oslo_messaging.rpc.server File "/usr/lib64/python2.7/site-packages/sqlalchemy/orm/session.py", line 2380, in _flush
2019-06-12 15:00:24.443 6 ERROR oslo_messaging.rpc.server transaction.rollback(_capture_exception=True)
2019-06-12 15:00:24.443 6 ERROR oslo_messaging.rpc.server File "/usr/lib64/python2.7/site-packages/sqlalchemy/util/langhelpers.py", line 66, in __exit__
2019-06-12 15:00:24.443 6 ERROR oslo_messaging.rpc.server compat.reraise(exc_type, exc_value, exc_tb)
2019-06-12 15:00:24.443 6 ERROR oslo_messaging.rpc.server File "/usr/lib64/python2.7/site-packages/sqlalchemy/orm/session.py", line 2344, in _flush
2019-06-12 15:00:24.443 6 ERROR oslo_messaging.rpc.server flush_context.execute()
2019-06-12 15:00:24.443 6 ERROR oslo_messaging.rpc.server File "/usr/lib64/python2.7/site-packages/sqlalchemy/orm/unitofwork.py", line 391, in execute
2019-06-12 15:00:24.443 6 ERROR oslo_messaging.rpc.server rec.execute(self)
2019-06-12 15:00:24.443 6 ERROR oslo_messaging.rpc.server File "/usr/lib64/python2.7/site-packages/sqlalchemy/orm/unitofwork.py", line 556, in execute
2019-06-12 15:00:24.443 6 ERROR oslo_messaging.rpc.server uow
2019-06-12 15:00:24.443 6 ERROR oslo_messaging.rpc.server File "/usr/lib64/python2.7/site-packages/sqlalchemy/orm/persistence.py", line 181, in save_obj
2019-06-12 15:00:24.443 6 ERROR oslo_messaging.rpc.server mapper, table, insert)
2019-06-12 15:00:24.443 6 ERROR oslo_messaging.rpc.server File "/usr/lib64/python2.7/site-packages/sqlalchemy/orm/persistence.py", line 866, in _emit_insert_statements
2019-06-12 15:00:24.443 6 ERROR oslo_messaging.rpc.server execute(statement, params)
2019-06-12 15:00:24.443 6 ERROR oslo_messaging.rpc.server File "/usr/lib64/python2.7/site-packages/sqlalchemy/engine/base.py", line 948, in execute
2019-06-12 15:00:24.443 6 ERROR oslo_messaging.rpc.server return meth(self, multiparams, params)
2019-06-12 15:00:24.443 6 ERROR oslo_messaging.rpc.server File "/usr/lib64/python2.7/site-packages/sqlalchemy/sql/elements.py", line 269, in _execute_on_connection
2019-06-12 15:00:24.443 6 ERROR oslo_messaging.rpc.server return connection._execute_clauseelement(self, multiparams, params)
2019-06-12 15:00:24.443 6 ERROR oslo_messaging.rpc.server File "/usr/lib64/python2.7/site-packages/sqlalchemy/engine/base.py", line 1060, in _execute_clauseelement
2019-06-12 15:00:24.443 6 ERROR oslo_messaging.rpc.server compiled_sql, distilled_params
2019-06-12 15:00:24.443 6 ERROR oslo_messaging.rpc.server File "/usr/lib64/python2.7/site-packages/sqlalchemy/engine/base.py", line 1200, in _execute_context
2019-06-12 15:00:24.443 6 ERROR oslo_messaging.rpc.server context)
2019-06-12 15:00:24.443 6 ERROR oslo_messaging.rpc.server File "/usr/lib64/python2.7/site-packages/sqlalchemy/engine/base.py", line 1409, in _handle_dbapi_exception
2019-06-12 15:00:24.443 6 ERROR oslo_messaging.rpc.server util.raise_from_cause(newraise, exc_info)
2019-06-12 15:00:24.443 6 ERROR oslo_messaging.rpc.server File "/usr/lib64/python2.7/site-packages/sqlalchemy/util/compat.py", line 203, in raise_from_cause
2019-06-12 15:00:24.443 6 ERROR oslo_messaging.rpc.server reraise(type(exception), exception, tb=exc_tb, cause=cause)
2019-06-12 15:00:24.443 6 ERROR oslo_messaging.rpc.server File "/usr/lib64/python2.7/site-packages/sqlalchemy/engine/base.py", line 1193, in _execute_context
2019-06-12 15:00:24.443 6 ERROR oslo_messaging.rpc.server context)
2019-06-12 15:00:24.443 6 ERROR oslo_messaging.rpc.server File "/usr/lib64/python2.7/site-packages/sqlalchemy/engine/default.py", line 507, in do_execute
2019-06-12 15:00:24.443 6 ERROR oslo_messaging.rpc.server cursor.execute(statement, parameters)
2019-06-12 15:00:24.443 6 ERROR oslo_messaging.rpc.server File "/usr/lib/python2.7/site-packages/pymysql/cursors.py", line 170, in execute
2019-06-12 15:00:24.443 6 ERROR oslo_messaging.rpc.server result = self._query(query)
2019-06-12 15:00:24.443 6 ERROR oslo_messaging.rpc.server File "/usr/lib/python2.7/site-packages/pymysql/cursors.py", line 328, in _query
2019-06-12 15:00:24.443 6 ERROR oslo_messaging.rpc.server conn.query(q)
2019-06-12 15:00:24.443 6 ERROR oslo_messaging.rpc.server File "/usr/lib/python2.7/site-packages/pymysql/connections.py", line 516, in query
2019-06-12 15:00:24.443 6 ERROR oslo_messaging.rpc.server self._affected_rows = self._read_query_result(unbuffered=unbuffered)
2019-06-12 15:00:24.443 6 ERROR oslo_messaging.rpc.server File "/usr/lib/python2.7/site-packages/pymysql/connections.py", line 727, in _read_query_result
2019-06-12 15:00:24.443 6 ERROR oslo_messaging.rpc.server result.read()
2019-06-12 15:00:24.443 6 ERROR oslo_messaging.rpc.server File "/usr/lib/python2.7/site-packages/pymysql/connections.py", line 1066, in read
2019-06-12 15:00:24.443 6 ERROR oslo_messaging.rpc.server first_packet = self.connection._read_packet()
2019-06-12 15:00:24.443 6 ERROR oslo_messaging.rpc.server File "/usr/lib/python2.7/site-packages/pymysql/connections.py", line 683, in _read_packet
2019-06-12 15:00:24.443 6 ERROR oslo_messaging.rpc.server packet.check_error()
2019-06-12 15:00:24.443 6 ERROR oslo_messaging.rpc.server File "/usr/lib/python2.7/site-packages/pymysql/protocol.py", line 220, in check_error
2019-06-12 15:00:24.443 6 ERROR oslo_messaging.rpc.server err.raise_mysql_exception(self._data)
2019-06-12 15:00:24.443 6 ERROR oslo_messaging.rpc.server File "/usr/lib/python2.7/site-packages/pymysql/err.py", line 109, in raise_mysql_exception
2019-06-12 15:00:24.443 6 ERROR oslo_messaging.rpc.server raise errorclass(errno, errval)
2019-06-12 15:00:24.443 6 ERROR oslo_messaging.rpc.server DBError: (pymysql.err.InternalError) (1054, u"Unknown column 'trusted_certs' in 'field list'") [SQL: u'INSERT INTO instance_extra (created_at, upd
ated_at, deleted_at, deleted, instance_uuid, device_metadata, numa_topology, pci_requests, flavor, vcpu_model, migration_context, keypairs, trusted_certs) VALUES (%(created_at)s, %(updated_at)s, %(deleted
_at)s, %(deleted)s, %(instance_uuid)s, %(device_metadata)s, %(numa_topology)s, %(pci_requests)s, %(flavor)s, %(vcpu_model)s, %(migration_context)s, %(keypairs)s, %(trusted_certs)s)'] [parameters: {'instan
ce_uuid': u'df1bd38c-67cb-4eb0-b2d2-ac08233dadae', 'keypairs': '{"nova_object.version": "1.3", "nova_object.name": "KeyPairList", "nova_object.data": {"objects": []}, "nova_object.namespace": "nova"}', 'p
ci_requests': '[]', 'vcpu_model': None, 'device_metadata': None, 'created_at': datetime.datetime(2019, 6, 12, 15, 0, 24, 430084), 'updated_at': None, 'numa_topology': None, 'trusted_certs': None, 'deleted
': 0, 'migration_context': None, 'flavor': '{"new": null, "old": null, "cur": {"nova_object.version": "1.2", "nova_object.name": "Flavor", "nova_object.data": {"disabled": false, "root_gb": 214 ... (234 c
haracters truncated) ... , "swap": 0, "rxtx_factor": 1.0, "is_public": true, "deleted_at": null, "vcpu_weight": 0, "id": 6, "name": "huge"}, "nova_object.namespace": "nova"}}', 'deleted_at': None}] (Backg
round on this error at: http://sqlalche.me/e/2j85)
Workaround
==========
On the controller, perform a nova DB sync:
docker exec -it nova_api nova-manage db sync
Despite this making no changes to the database (checked with
mysqldump), it seems to 'fix' nova. New instances created using the
'huge' flavor will go to the ERROR state.
To manage notifications about this bug go to:
https://bugs.launchpad.net/kolla/+bug/1832860/+subscriptions