yahoo-eng-team team mailing list archive
-
yahoo-eng-team team
-
Mailing list archive
-
Message #73767
[Bug 1781286] Re: CantStartEngineError in cell conductor during rebuild
OK looking at the stacktrace I see it's not the '_destroy_build_request'
call that's blowing up on reschedule, it's the up-call to get the
availability zone for the next chosen host from the list of alternates:
https://github.com/openstack/nova/blob/39b05ee9e34ae7e7c1854439f887588ec157bc69/nova/conductor/manager.py#L647
And if an AZ is not requested during server create, we are free to move
the instance to another AZ during reschedule. So it seems we've fallen
into a dreaded up-call hole here that needs to be tracked:
https://docs.openstack.org/nova/latest/user/cellsv2-layout.html
#operations-requiring-upcalls
** Summary changed:
- CantStartEngineError in cell conductor during rebuild
+ CantStartEngineError in cell conductor during reschedule - get_host_availability_zone up-call
** Changed in: nova
Status: New => Triaged
** Changed in: nova
Importance: Undecided => Medium
** Tags added: cells
** Also affects: nova/pike
Importance: Undecided
Status: New
** Also affects: nova/queens
Importance: Undecided
Status: New
--
You received this bug notification because you are a member of Yahoo!
Engineering Team, which is subscribed to OpenStack Compute (nova).
https://bugs.launchpad.net/bugs/1781286
Title:
CantStartEngineError in cell conductor during reschedule -
get_host_availability_zone up-call
Status in OpenStack Compute (nova):
Triaged
Status in OpenStack Compute (nova) pike series:
New
Status in OpenStack Compute (nova) queens series:
New
Bug description:
In a stable/queens devstack environment with multiple PowerVM compute
nodes, everytime I see this in devstack@n-cond-cell1.service logs:
Jul 11 15:48:57 myhostname nova-conductor[3796]: DEBUG
nova.conductor.manager [None req-af22375c-f920-4747-bd2f-0de80ee69465
admin admin] Rescheduling: True {{(pid=4108) build_instances
/opt/stack/nova/nova/conductor/manager.py:571}}
it is shortly thereafter followed by:
Jul 11 15:48:57 myhostname nova-conductor[3796]: ERROR oslo_messaging.rpc.server [None req-af22375c-f920-4747-bd2f-0de80ee69465 admin admin] Exception during message handling: CantStartEngineError: No sql_connection parameter is established
Jul 11 15:48:57 myhostname nova-conductor[3796]: ERROR oslo_messaging.rpc.server Traceback (most recent call last):
Jul 11 15:48:57 myhostname nova-conductor[3796]: ERROR oslo_messaging.rpc.server File "/usr/local/lib/python2.7/dist-packages/oslo_messaging/rpc/server.py", line 163, in _process_incoming
Jul 11 15:48:57 myhostname nova-conductor[3796]: ERROR oslo_messaging.rpc.server res = self.dispatcher.dispatch(message)
Jul 11 15:48:57 myhostname nova-conductor[3796]: ERROR oslo_messaging.rpc.server File "/usr/local/lib/python2.7/dist-packages/oslo_messaging/rpc/dispatcher.py", line 220, in dispatch
Jul 11 15:48:57 myhostname nova-conductor[3796]: ERROR oslo_messaging.rpc.server return self._do_dispatch(endpoint, method, ctxt, args)
Jul 11 15:48:57 myhostname nova-conductor[3796]: ERROR oslo_messaging.rpc.server File "/usr/local/lib/python2.7/dist-packages/oslo_messaging/rpc/dispatcher.py", line 190, in _do_dispatch
Jul 11 15:48:57 myhostname nova-conductor[3796]: ERROR oslo_messaging.rpc.server result = func(ctxt, **new_args)
Jul 11 15:48:57 myhostname nova-conductor[3796]: ERROR oslo_messaging.rpc.server File "/opt/stack/nova/nova/conductor/manager.py", line 652, in build_instances
Jul 11 15:48:57 myhostname nova-conductor[3796]: ERROR oslo_messaging.rpc.server host.service_host))
Jul 11 15:48:57 myhostname nova-conductor[3796]: ERROR oslo_messaging.rpc.server File "/opt/stack/nova/nova/availability_zones.py", line 95, in get_host_availability_zone
Jul 11 15:48:57 myhostname nova-conductor[3796]: ERROR oslo_messaging.rpc.server key='availability_zone')
Jul 11 15:48:57 myhostname nova-conductor[3796]: ERROR oslo_messaging.rpc.server File "/usr/local/lib/python2.7/dist-packages/oslo_versionedobjects/base.py", line 184, in wrapper
Jul 11 15:48:57 myhostname nova-conductor[3796]: ERROR oslo_messaging.rpc.server result = fn(cls, context, *args, **kwargs)
Jul 11 15:48:57 myhostname nova-conductor[3796]: ERROR oslo_messaging.rpc.server File "/opt/stack/nova/nova/objects/aggregate.py", line 541, in get_by_host
Jul 11 15:48:57 myhostname nova-conductor[3796]: ERROR oslo_messaging.rpc.server _get_by_host_from_db(context, host, key=key)]
Jul 11 15:48:57 myhostname nova-conductor[3796]: ERROR oslo_messaging.rpc.server File "/usr/local/lib/python2.7/dist-packages/oslo_db/sqlalchemy/enginefacade.py", line 987, in wrapper
Jul 11 15:48:57 myhostname nova-conductor[3796]: ERROR oslo_messaging.rpc.server with self._transaction_scope(context):
Jul 11 15:48:57 myhostname nova-conductor[3796]: ERROR oslo_messaging.rpc.server File "/usr/lib/python2.7/contextlib.py", line 17, in __enter__
Jul 11 15:48:57 myhostname nova-conductor[3796]: ERROR oslo_messaging.rpc.server return self.gen.next()
Jul 11 15:48:57 myhostname nova-conductor[3796]: ERROR oslo_messaging.rpc.server File "/usr/local/lib/python2.7/dist-packages/oslo_db/sqlalchemy/enginefacade.py", line 1037, in _transaction_scope
Jul 11 15:48:57 myhostname nova-conductor[3796]: ERROR oslo_messaging.rpc.server context=context) as resource:
Jul 11 15:48:57 myhostname nova-conductor[3796]: ERROR oslo_messaging.rpc.server File "/usr/lib/python2.7/contextlib.py", line 17, in __enter__
Jul 11 15:48:57 myhostname nova-conductor[3796]: ERROR oslo_messaging.rpc.server return self.gen.next()
Jul 11 15:48:57 myhostname nova-conductor[3796]: ERROR oslo_messaging.rpc.server File "/usr/local/lib/python2.7/dist-packages/oslo_db/sqlalchemy/enginefacade.py", line 640, in _session
Jul 11 15:48:57 myhostname nova-conductor[3796]: ERROR oslo_messaging.rpc.server bind=self.connection, mode=self.mode)
Jul 11 15:48:57 myhostname nova-conductor[3796]: ERROR oslo_messaging.rpc.server File "/usr/local/lib/python2.7/dist-packages/oslo_db/sqlalchemy/enginefacade.py", line 404, in _create_session
Jul 11 15:48:57 myhostname nova-conductor[3796]: ERROR oslo_messaging.rpc.server self._start()
Jul 11 15:48:57 myhostname nova-conductor[3796]: ERROR oslo_messaging.rpc.server File "/usr/local/lib/python2.7/dist-packages/oslo_db/sqlalchemy/enginefacade.py", line 491, in _start
Jul 11 15:48:57 myhostname nova-conductor[3796]: ERROR oslo_messaging.rpc.server engine_args, maker_args)
Jul 11 15:48:57 myhostname nova-conductor[3796]: ERROR oslo_messaging.rpc.server File "/usr/local/lib/python2.7/dist-packages/oslo_db/sqlalchemy/enginefacade.py", line 513, in _setup_for_connection
Jul 11 15:48:57 myhostname nova-conductor[3796]: ERROR oslo_messaging.rpc.server "No sql_connection parameter is established")
Jul 11 15:48:57 myhostname nova-conductor[3796]: ERROR oslo_messaging.rpc.server CantStartEngineError: No sql_connection parameter is established
Jul 11 15:48:57 myhostname nova-conductor[3796]: ERROR oslo_messaging.rpc.server
The nova_cell1.conf does have [database]connection set:
[database]
connection = mysql+pymysql://root:mysql@127.0.0.1/nova_cell1?charset=utf8
This may be related to https://bugs.launchpad.net/nova/+bug/1736946 ,
though that is supposedly fixed in stable/queens and the trace is
different, hence the new defect.
To manage notifications about this bug go to:
https://bugs.launchpad.net/nova/+bug/1781286/+subscriptions
References