← Back to team overview

yahoo-eng-team team mailing list archive

[Bug 1832860] Re: Failed instances stuck in BUILD state after Rocky upgrade

 

Reviewed:  https://review.opendev.org/665626
Committed: https://git.openstack.org/cgit/openstack/nova/commit/?id=e99937c9a95f7049a53bcae7beff9a2a00c5889c
Submitter: Zuul
Branch:    master

commit e99937c9a95f7049a53bcae7beff9a2a00c5889c
Author: Mark Goddard <mark@xxxxxxxxxxxx>
Date:   Mon Jun 17 09:56:15 2019 +0100

    Exit 1 when db sync runs before api_db sync
    
    Since cells v2 was introduced, nova operators must run two commands to
    migrate the database schemas of nova's databases - nova-manage api_db
    sync and nova-manage db sync. It is necessary to run them in this order,
    since the db sync may depend on schema changes made to the api database
    in the api_db sync. Executing the db sync first may fail, for example
    with the following seen in a Queens to Rocky upgrade:
    
    nova-manage db sync
    ERROR: Could not access cell0.
    Has the nova_api database been created?
    Has the nova_cell0 database been created?
    Has "nova-manage api_db sync" been run?
    Has "nova-manage cell_v2 map_cell0" been run?
    Is [api_database]/connection set in nova.conf?
    Is the cell0 database connection URL correct?
    Error: (pymysql.err.InternalError) (1054, u"Unknown column
            'cell_mappings.disabled' in 'field list'") [SQL: u'SELECT
    cell_mappings.created_at AS cell_mappings_created_at,
    cell_mappings.updated_at AS cell_mappings_updated_at,
    cell_mappings.id AS cell_mappings_id, cell_mappings.uuid AS
    cell_mappings_uuid, cell_mappings.name AS cell_mappings_name,
    cell_mappings.transport_url AS cell_mappings_transport_url,
    cell_mappings.database_connection AS
    cell_mappings_database_connection, cell_mappings.disabled AS
    cell_mappings_disabled \nFROM cell_mappings \nWHERE
    cell_mappings.uuid = %(uuid_1)s \n LIMIT %(param_1)s'] [parameters:
    {u'uuid_1': '00000000-0000-0000-0000-000000000000', u'param_1': 1}]
    (Background on this error at: http://sqlalche.me/e/2j85)
    
    Despite this error, the command actually exits zero, so deployment tools
    are likely to continue with the upgrade, leading to issues down the
    line.
    
    This change modifies the command to exit 1 if the cell0 sync fails.
    
    This change also clarifies this ordering in the upgrade and nova-manage
    documentation, and adds information on exit codes for the command.
    
    Change-Id: Iff2a23e09f2c5330b8fc0e9456860b65bd6ac149
    Closes-Bug: #1832860


** Changed in: nova
       Status: In Progress => Fix Released

-- 
You received this bug notification because you are a member of Yahoo!
Engineering Team, which is subscribed to OpenStack Compute (nova).
https://bugs.launchpad.net/bugs/1832860

Title:
  Failed instances stuck in BUILD state after Rocky upgrade

Status in kolla:
  Fix Released
Status in kolla rocky series:
  Fix Committed
Status in kolla stein series:
  Fix Committed
Status in kolla train series:
  Fix Released
Status in OpenStack Compute (nova):
  Fix Released

Bug description:
  Steps to reproduce
  ==================

  Starting with a cloud running the Queens release, upgrade to Rocky.

  Create a flavor that cannot fit on any compute node, e.g.

  openstack flavor create --ram 100000000 --disk 2147483647 --vcpus
  10000 huge

  Then create an instance using that flavor:

  openstack server create huge --flavor huge --image cirros --network
  demo-net

  Expected
  ========

  The instance fails to boot and ends up in the ERROR state.

  Actual
  ======

  The instance fails to boot and gets stuck in the BUILD state.

  From nova-conductor.log:

  2019-06-12 15:00:24.443 6 ERROR oslo_messaging.rpc.server Traceback (most recent call last):
  2019-06-12 15:00:24.443 6 ERROR oslo_messaging.rpc.server   File "/usr/lib/python2.7/site-packages/oslo_messaging/rpc/server.py", line 163, in _process_incoming
  2019-06-12 15:00:24.443 6 ERROR oslo_messaging.rpc.server     res = self.dispatcher.dispatch(message)
  2019-06-12 15:00:24.443 6 ERROR oslo_messaging.rpc.server   File "/usr/lib/python2.7/site-packages/oslo_messaging/rpc/dispatcher.py", line 265, in dispatch
  2019-06-12 15:00:24.443 6 ERROR oslo_messaging.rpc.server     return self._do_dispatch(endpoint, method, ctxt, args)
  2019-06-12 15:00:24.443 6 ERROR oslo_messaging.rpc.server   File "/usr/lib/python2.7/site-packages/oslo_messaging/rpc/dispatcher.py", line 194, in _do_dispatch
  2019-06-12 15:00:24.443 6 ERROR oslo_messaging.rpc.server     result = func(ctxt, **new_args)
  2019-06-12 15:00:24.443 6 ERROR oslo_messaging.rpc.server   File "/usr/lib/python2.7/site-packages/nova/conductor/manager.py", line 1244, in schedule_and_build_instances
  2019-06-12 15:00:24.443 6 ERROR oslo_messaging.rpc.server     tags=tags)
  2019-06-12 15:00:24.443 6 ERROR oslo_messaging.rpc.server   File "/usr/lib/python2.7/site-packages/nova/conductor/manager.py", line 1193, in _bury_in_cell0
  2019-06-12 15:00:24.443 6 ERROR oslo_messaging.rpc.server     instance.create()
  2019-06-12 15:00:24.443 6 ERROR oslo_messaging.rpc.server   File "/usr/lib/python2.7/site-packages/oslo_versionedobjects/base.py", line 226, in wrapper
  2019-06-12 15:00:24.443 6 ERROR oslo_messaging.rpc.server     return fn(self, *args, **kwargs)
  2019-06-12 15:00:24.443 6 ERROR oslo_messaging.rpc.server   File "/usr/lib/python2.7/site-packages/nova/objects/instance.py", line 600, in create
  2019-06-12 15:00:24.443 6 ERROR oslo_messaging.rpc.server     db_inst = db.instance_create(self._context, updates)
  2019-06-12 15:00:24.443 6 ERROR oslo_messaging.rpc.server   File "/usr/lib/python2.7/site-packages/nova/db/api.py", line 748, in instance_create
  2019-06-12 15:00:24.443 6 ERROR oslo_messaging.rpc.server     return IMPL.instance_create(context, values)
  2019-06-12 15:00:24.443 6 ERROR oslo_messaging.rpc.server   File "/usr/lib/python2.7/site-packages/nova/db/sqlalchemy/api.py", line 170, in wrapper
  2019-06-12 15:00:24.443 6 ERROR oslo_messaging.rpc.server     return f(*args, **kwargs)
  2019-06-12 15:00:24.443 6 ERROR oslo_messaging.rpc.server   File "/usr/lib/python2.7/site-packages/oslo_db/api.py", line 154, in wrapper
  2019-06-12 15:00:24.443 6 ERROR oslo_messaging.rpc.server     ectxt.value = e.inner_exc
  2019-06-12 15:00:24.443 6 ERROR oslo_messaging.rpc.server   File "/usr/lib/python2.7/site-packages/oslo_utils/excutils.py", line 220, in __exit__
  2019-06-12 15:00:24.443 6 ERROR oslo_messaging.rpc.server     self.force_reraise()
  2019-06-12 15:00:24.443 6 ERROR oslo_messaging.rpc.server   File "/usr/lib/python2.7/site-packages/oslo_utils/excutils.py", line 196, in force_reraise
  2019-06-12 15:00:24.443 6 ERROR oslo_messaging.rpc.server     six.reraise(self.type_, self.value, self.tb)
  2019-06-12 15:00:24.443 6 ERROR oslo_messaging.rpc.server   File "/usr/lib/python2.7/site-packages/oslo_db/api.py", line 142, in wrapper
  2019-06-12 15:00:24.443 6 ERROR oslo_messaging.rpc.server     return f(*args, **kwargs)
  2019-06-12 15:00:24.443 6 ERROR oslo_messaging.rpc.server   File "/usr/lib/python2.7/site-packages/nova/db/sqlalchemy/api.py", line 227, in wrapped
  2019-06-12 15:00:24.443 6 ERROR oslo_messaging.rpc.server     return f(context, *args, **kwargs)
  2019-06-12 15:00:24.443 6 ERROR oslo_messaging.rpc.server   File "/usr/lib/python2.7/site-packages/nova/db/sqlalchemy/api.py", line 1774, in instance_create
  2019-06-12 15:00:24.443 6 ERROR oslo_messaging.rpc.server     ec2_instance_create(context, instance_ref['uuid'])
  2019-06-12 15:00:24.443 6 ERROR oslo_messaging.rpc.server   File "/usr/lib/python2.7/site-packages/nova/db/sqlalchemy/api.py", line 170, in wrapper
  2019-06-12 15:00:24.443 6 ERROR oslo_messaging.rpc.server     return f(*args, **kwargs)
  2019-06-12 15:00:24.443 6 ERROR oslo_messaging.rpc.server   File "/usr/lib/python2.7/site-packages/nova/db/sqlalchemy/api.py", line 227, in wrapped
  2019-06-12 15:00:24.443 6 ERROR oslo_messaging.rpc.server     return f(context, *args, **kwargs)
  2019-06-12 15:00:24.443 6 ERROR oslo_messaging.rpc.server   File "/usr/lib/python2.7/site-packages/nova/db/sqlalchemy/api.py", line 5286, in ec2_instance_create
  2019-06-12 15:00:24.443 6 ERROR oslo_messaging.rpc.server     ec2_instance_ref.save(context.session)
  2019-06-12 15:00:24.443 6 ERROR oslo_messaging.rpc.server   File "/usr/lib/python2.7/site-packages/oslo_db/sqlalchemy/models.py", line 50, in save
  2019-06-12 15:00:24.443 6 ERROR oslo_messaging.rpc.server     session.flush()
  2019-06-12 15:00:24.443 6 ERROR oslo_messaging.rpc.server   File "/usr/lib64/python2.7/site-packages/sqlalchemy/orm/session.py", line 2254, in flush
  2019-06-12 15:00:24.443 6 ERROR oslo_messaging.rpc.server     self._flush(objects)
  2019-06-12 15:00:24.443 6 ERROR oslo_messaging.rpc.server   File "/usr/lib64/python2.7/site-packages/sqlalchemy/orm/session.py", line 2380, in _flush
  2019-06-12 15:00:24.443 6 ERROR oslo_messaging.rpc.server     transaction.rollback(_capture_exception=True)
  2019-06-12 15:00:24.443 6 ERROR oslo_messaging.rpc.server   File "/usr/lib64/python2.7/site-packages/sqlalchemy/util/langhelpers.py", line 66, in __exit__
  2019-06-12 15:00:24.443 6 ERROR oslo_messaging.rpc.server     compat.reraise(exc_type, exc_value, exc_tb)
  2019-06-12 15:00:24.443 6 ERROR oslo_messaging.rpc.server   File "/usr/lib64/python2.7/site-packages/sqlalchemy/orm/session.py", line 2344, in _flush
  2019-06-12 15:00:24.443 6 ERROR oslo_messaging.rpc.server     flush_context.execute()
  2019-06-12 15:00:24.443 6 ERROR oslo_messaging.rpc.server   File "/usr/lib64/python2.7/site-packages/sqlalchemy/orm/unitofwork.py", line 391, in execute
  2019-06-12 15:00:24.443 6 ERROR oslo_messaging.rpc.server     rec.execute(self)
  2019-06-12 15:00:24.443 6 ERROR oslo_messaging.rpc.server   File "/usr/lib64/python2.7/site-packages/sqlalchemy/orm/unitofwork.py", line 556, in execute
  2019-06-12 15:00:24.443 6 ERROR oslo_messaging.rpc.server     uow
  2019-06-12 15:00:24.443 6 ERROR oslo_messaging.rpc.server   File "/usr/lib64/python2.7/site-packages/sqlalchemy/orm/persistence.py", line 181, in save_obj
  2019-06-12 15:00:24.443 6 ERROR oslo_messaging.rpc.server     mapper, table, insert)
  2019-06-12 15:00:24.443 6 ERROR oslo_messaging.rpc.server   File "/usr/lib64/python2.7/site-packages/sqlalchemy/orm/persistence.py", line 866, in _emit_insert_statements
  2019-06-12 15:00:24.443 6 ERROR oslo_messaging.rpc.server     execute(statement, params)
  2019-06-12 15:00:24.443 6 ERROR oslo_messaging.rpc.server   File "/usr/lib64/python2.7/site-packages/sqlalchemy/engine/base.py", line 948, in execute
  2019-06-12 15:00:24.443 6 ERROR oslo_messaging.rpc.server     return meth(self, multiparams, params)
  2019-06-12 15:00:24.443 6 ERROR oslo_messaging.rpc.server   File "/usr/lib64/python2.7/site-packages/sqlalchemy/sql/elements.py", line 269, in _execute_on_connection
  2019-06-12 15:00:24.443 6 ERROR oslo_messaging.rpc.server     return connection._execute_clauseelement(self, multiparams, params)
  2019-06-12 15:00:24.443 6 ERROR oslo_messaging.rpc.server   File "/usr/lib64/python2.7/site-packages/sqlalchemy/engine/base.py", line 1060, in _execute_clauseelement
  2019-06-12 15:00:24.443 6 ERROR oslo_messaging.rpc.server     compiled_sql, distilled_params
  2019-06-12 15:00:24.443 6 ERROR oslo_messaging.rpc.server   File "/usr/lib64/python2.7/site-packages/sqlalchemy/engine/base.py", line 1200, in _execute_context
  2019-06-12 15:00:24.443 6 ERROR oslo_messaging.rpc.server     context)
  2019-06-12 15:00:24.443 6 ERROR oslo_messaging.rpc.server   File "/usr/lib64/python2.7/site-packages/sqlalchemy/engine/base.py", line 1409, in _handle_dbapi_exception
  2019-06-12 15:00:24.443 6 ERROR oslo_messaging.rpc.server     util.raise_from_cause(newraise, exc_info)
  2019-06-12 15:00:24.443 6 ERROR oslo_messaging.rpc.server   File "/usr/lib64/python2.7/site-packages/sqlalchemy/util/compat.py", line 203, in raise_from_cause
  2019-06-12 15:00:24.443 6 ERROR oslo_messaging.rpc.server     reraise(type(exception), exception, tb=exc_tb, cause=cause)
  2019-06-12 15:00:24.443 6 ERROR oslo_messaging.rpc.server   File "/usr/lib64/python2.7/site-packages/sqlalchemy/engine/base.py", line 1193, in _execute_context
  2019-06-12 15:00:24.443 6 ERROR oslo_messaging.rpc.server     context)
  2019-06-12 15:00:24.443 6 ERROR oslo_messaging.rpc.server   File "/usr/lib64/python2.7/site-packages/sqlalchemy/engine/default.py", line 507, in do_execute
  2019-06-12 15:00:24.443 6 ERROR oslo_messaging.rpc.server     cursor.execute(statement, parameters)
  2019-06-12 15:00:24.443 6 ERROR oslo_messaging.rpc.server   File "/usr/lib/python2.7/site-packages/pymysql/cursors.py", line 170, in execute
  2019-06-12 15:00:24.443 6 ERROR oslo_messaging.rpc.server     result = self._query(query)
  2019-06-12 15:00:24.443 6 ERROR oslo_messaging.rpc.server   File "/usr/lib/python2.7/site-packages/pymysql/cursors.py", line 328, in _query
  2019-06-12 15:00:24.443 6 ERROR oslo_messaging.rpc.server     conn.query(q)
  2019-06-12 15:00:24.443 6 ERROR oslo_messaging.rpc.server   File "/usr/lib/python2.7/site-packages/pymysql/connections.py", line 516, in query
  2019-06-12 15:00:24.443 6 ERROR oslo_messaging.rpc.server     self._affected_rows = self._read_query_result(unbuffered=unbuffered)
  2019-06-12 15:00:24.443 6 ERROR oslo_messaging.rpc.server   File "/usr/lib/python2.7/site-packages/pymysql/connections.py", line 727, in _read_query_result
  2019-06-12 15:00:24.443 6 ERROR oslo_messaging.rpc.server     result.read()
  2019-06-12 15:00:24.443 6 ERROR oslo_messaging.rpc.server   File "/usr/lib/python2.7/site-packages/pymysql/connections.py", line 1066, in read
  2019-06-12 15:00:24.443 6 ERROR oslo_messaging.rpc.server     first_packet = self.connection._read_packet()
  2019-06-12 15:00:24.443 6 ERROR oslo_messaging.rpc.server   File "/usr/lib/python2.7/site-packages/pymysql/connections.py", line 683, in _read_packet
  2019-06-12 15:00:24.443 6 ERROR oslo_messaging.rpc.server     packet.check_error()
  2019-06-12 15:00:24.443 6 ERROR oslo_messaging.rpc.server   File "/usr/lib/python2.7/site-packages/pymysql/protocol.py", line 220, in check_error
  2019-06-12 15:00:24.443 6 ERROR oslo_messaging.rpc.server     err.raise_mysql_exception(self._data)
  2019-06-12 15:00:24.443 6 ERROR oslo_messaging.rpc.server   File "/usr/lib/python2.7/site-packages/pymysql/err.py", line 109, in raise_mysql_exception
  2019-06-12 15:00:24.443 6 ERROR oslo_messaging.rpc.server     raise errorclass(errno, errval)
  2019-06-12 15:00:24.443 6 ERROR oslo_messaging.rpc.server DBError: (pymysql.err.InternalError) (1054, u"Unknown column 'trusted_certs' in 'field list'") [SQL: u'INSERT INTO instance_extra (created_at, upd
  ated_at, deleted_at, deleted, instance_uuid, device_metadata, numa_topology, pci_requests, flavor, vcpu_model, migration_context, keypairs, trusted_certs) VALUES (%(created_at)s, %(updated_at)s, %(deleted
  _at)s, %(deleted)s, %(instance_uuid)s, %(device_metadata)s, %(numa_topology)s, %(pci_requests)s, %(flavor)s, %(vcpu_model)s, %(migration_context)s, %(keypairs)s, %(trusted_certs)s)'] [parameters: {'instan
  ce_uuid': u'df1bd38c-67cb-4eb0-b2d2-ac08233dadae', 'keypairs': '{"nova_object.version": "1.3", "nova_object.name": "KeyPairList", "nova_object.data": {"objects": []}, "nova_object.namespace": "nova"}', 'p
  ci_requests': '[]', 'vcpu_model': None, 'device_metadata': None, 'created_at': datetime.datetime(2019, 6, 12, 15, 0, 24, 430084), 'updated_at': None, 'numa_topology': None, 'trusted_certs': None, 'deleted
  ': 0, 'migration_context': None, 'flavor': '{"new": null, "old": null, "cur": {"nova_object.version": "1.2", "nova_object.name": "Flavor", "nova_object.data": {"disabled": false, "root_gb": 214 ... (234 c
  haracters truncated) ... , "swap": 0, "rxtx_factor": 1.0, "is_public": true, "deleted_at": null, "vcpu_weight": 0, "id": 6, "name": "huge"}, "nova_object.namespace": "nova"}}', 'deleted_at': None}] (Backg
  round on this error at: http://sqlalche.me/e/2j85)

  Workaround
  ==========

  On the controller, perform a nova DB sync:

  docker exec -it nova_api nova-manage db sync

  Despite this making no changes to the database (checked with
  mysqldump), it seems to 'fix' nova. New instances created using the
  'huge' flavor will go to the ERROR state.

To manage notifications about this bug go to:
https://bugs.launchpad.net/kolla/+bug/1832860/+subscriptions