← Back to team overview

yahoo-eng-team team mailing list archive

[Bug 1781286] Re: CantStartEngineError in cell conductor during rebuild

 

OK looking at the stacktrace I see it's not the '_destroy_build_request'
call that's blowing up on reschedule, it's the up-call to get the
availability zone for the next chosen host from the list of alternates:

https://github.com/openstack/nova/blob/39b05ee9e34ae7e7c1854439f887588ec157bc69/nova/conductor/manager.py#L647

And if an AZ is not requested during server create, we are free to move
the instance to another AZ during reschedule. So it seems we've fallen
into a dreaded up-call hole here that needs to be tracked:

https://docs.openstack.org/nova/latest/user/cellsv2-layout.html
#operations-requiring-upcalls

** Summary changed:

- CantStartEngineError in cell conductor during rebuild
+ CantStartEngineError in cell conductor during reschedule - get_host_availability_zone up-call

** Changed in: nova
       Status: New => Triaged

** Changed in: nova
   Importance: Undecided => Medium

** Tags added: cells

** Also affects: nova/pike
   Importance: Undecided
       Status: New

** Also affects: nova/queens
   Importance: Undecided
       Status: New

-- 
You received this bug notification because you are a member of Yahoo!
Engineering Team, which is subscribed to OpenStack Compute (nova).
https://bugs.launchpad.net/bugs/1781286

Title:
  CantStartEngineError in cell conductor during reschedule -
  get_host_availability_zone up-call

Status in OpenStack Compute (nova):
  Triaged
Status in OpenStack Compute (nova) pike series:
  New
Status in OpenStack Compute (nova) queens series:
  New

Bug description:
  In a stable/queens devstack environment with multiple PowerVM compute
  nodes, everytime I see this in devstack@n-cond-cell1.service logs:

  Jul 11 15:48:57 myhostname nova-conductor[3796]: DEBUG
  nova.conductor.manager [None req-af22375c-f920-4747-bd2f-0de80ee69465
  admin admin] Rescheduling: True {{(pid=4108) build_instances
  /opt/stack/nova/nova/conductor/manager.py:571}}

  it is shortly thereafter followed by:

  Jul 11 15:48:57 myhostname nova-conductor[3796]: ERROR oslo_messaging.rpc.server [None req-af22375c-f920-4747-bd2f-0de80ee69465 admin admin] Exception during message handling: CantStartEngineError: No sql_connection parameter is established
  Jul 11 15:48:57 myhostname nova-conductor[3796]: ERROR oslo_messaging.rpc.server Traceback (most recent call last):
  Jul 11 15:48:57 myhostname nova-conductor[3796]: ERROR oslo_messaging.rpc.server   File "/usr/local/lib/python2.7/dist-packages/oslo_messaging/rpc/server.py", line 163, in _process_incoming
  Jul 11 15:48:57 myhostname nova-conductor[3796]: ERROR oslo_messaging.rpc.server     res = self.dispatcher.dispatch(message)
  Jul 11 15:48:57 myhostname nova-conductor[3796]: ERROR oslo_messaging.rpc.server   File "/usr/local/lib/python2.7/dist-packages/oslo_messaging/rpc/dispatcher.py", line 220, in dispatch
  Jul 11 15:48:57 myhostname nova-conductor[3796]: ERROR oslo_messaging.rpc.server     return self._do_dispatch(endpoint, method, ctxt, args)
  Jul 11 15:48:57 myhostname nova-conductor[3796]: ERROR oslo_messaging.rpc.server   File "/usr/local/lib/python2.7/dist-packages/oslo_messaging/rpc/dispatcher.py", line 190, in _do_dispatch
  Jul 11 15:48:57 myhostname nova-conductor[3796]: ERROR oslo_messaging.rpc.server     result = func(ctxt, **new_args)
  Jul 11 15:48:57 myhostname nova-conductor[3796]: ERROR oslo_messaging.rpc.server   File "/opt/stack/nova/nova/conductor/manager.py", line 652, in build_instances
  Jul 11 15:48:57 myhostname nova-conductor[3796]: ERROR oslo_messaging.rpc.server     host.service_host))
  Jul 11 15:48:57 myhostname nova-conductor[3796]: ERROR oslo_messaging.rpc.server   File "/opt/stack/nova/nova/availability_zones.py", line 95, in get_host_availability_zone
  Jul 11 15:48:57 myhostname nova-conductor[3796]: ERROR oslo_messaging.rpc.server     key='availability_zone')
  Jul 11 15:48:57 myhostname nova-conductor[3796]: ERROR oslo_messaging.rpc.server   File "/usr/local/lib/python2.7/dist-packages/oslo_versionedobjects/base.py", line 184, in wrapper
  Jul 11 15:48:57 myhostname nova-conductor[3796]: ERROR oslo_messaging.rpc.server     result = fn(cls, context, *args, **kwargs)
  Jul 11 15:48:57 myhostname nova-conductor[3796]: ERROR oslo_messaging.rpc.server   File "/opt/stack/nova/nova/objects/aggregate.py", line 541, in get_by_host
  Jul 11 15:48:57 myhostname nova-conductor[3796]: ERROR oslo_messaging.rpc.server     _get_by_host_from_db(context, host, key=key)]
  Jul 11 15:48:57 myhostname nova-conductor[3796]: ERROR oslo_messaging.rpc.server   File "/usr/local/lib/python2.7/dist-packages/oslo_db/sqlalchemy/enginefacade.py", line 987, in wrapper
  Jul 11 15:48:57 myhostname nova-conductor[3796]: ERROR oslo_messaging.rpc.server     with self._transaction_scope(context):
  Jul 11 15:48:57 myhostname nova-conductor[3796]: ERROR oslo_messaging.rpc.server   File "/usr/lib/python2.7/contextlib.py", line 17, in __enter__
  Jul 11 15:48:57 myhostname nova-conductor[3796]: ERROR oslo_messaging.rpc.server     return self.gen.next()
  Jul 11 15:48:57 myhostname nova-conductor[3796]: ERROR oslo_messaging.rpc.server   File "/usr/local/lib/python2.7/dist-packages/oslo_db/sqlalchemy/enginefacade.py", line 1037, in _transaction_scope
  Jul 11 15:48:57 myhostname nova-conductor[3796]: ERROR oslo_messaging.rpc.server     context=context) as resource:
  Jul 11 15:48:57 myhostname nova-conductor[3796]: ERROR oslo_messaging.rpc.server   File "/usr/lib/python2.7/contextlib.py", line 17, in __enter__
  Jul 11 15:48:57 myhostname nova-conductor[3796]: ERROR oslo_messaging.rpc.server     return self.gen.next()
  Jul 11 15:48:57 myhostname nova-conductor[3796]: ERROR oslo_messaging.rpc.server   File "/usr/local/lib/python2.7/dist-packages/oslo_db/sqlalchemy/enginefacade.py", line 640, in _session
  Jul 11 15:48:57 myhostname nova-conductor[3796]: ERROR oslo_messaging.rpc.server     bind=self.connection, mode=self.mode)
  Jul 11 15:48:57 myhostname nova-conductor[3796]: ERROR oslo_messaging.rpc.server   File "/usr/local/lib/python2.7/dist-packages/oslo_db/sqlalchemy/enginefacade.py", line 404, in _create_session
  Jul 11 15:48:57 myhostname nova-conductor[3796]: ERROR oslo_messaging.rpc.server     self._start()
  Jul 11 15:48:57 myhostname nova-conductor[3796]: ERROR oslo_messaging.rpc.server   File "/usr/local/lib/python2.7/dist-packages/oslo_db/sqlalchemy/enginefacade.py", line 491, in _start
  Jul 11 15:48:57 myhostname nova-conductor[3796]: ERROR oslo_messaging.rpc.server     engine_args, maker_args)
  Jul 11 15:48:57 myhostname nova-conductor[3796]: ERROR oslo_messaging.rpc.server   File "/usr/local/lib/python2.7/dist-packages/oslo_db/sqlalchemy/enginefacade.py", line 513, in _setup_for_connection
  Jul 11 15:48:57 myhostname nova-conductor[3796]: ERROR oslo_messaging.rpc.server     "No sql_connection parameter is established")
  Jul 11 15:48:57 myhostname nova-conductor[3796]: ERROR oslo_messaging.rpc.server CantStartEngineError: No sql_connection parameter is established
  Jul 11 15:48:57 myhostname nova-conductor[3796]: ERROR oslo_messaging.rpc.server

  The nova_cell1.conf does have [database]connection set:

  [database]
  connection = mysql+pymysql://root:mysql@127.0.0.1/nova_cell1?charset=utf8

  This may be related to https://bugs.launchpad.net/nova/+bug/1736946 ,
  though that is supposedly fixed in stable/queens and the trace is
  different, hence the new defect.

To manage notifications about this bug go to:
https://bugs.launchpad.net/nova/+bug/1781286/+subscriptions


References