← Back to team overview

yahoo-eng-team team mailing list archive

[Bug 1733933] [NEW] nova-conductor is masking error when rescheduling

 

Public bug reported:

Sometimes when build_instance fails on n-cpu, the error that n-cond
receives is mangles like this:

Nov 22 17:39:04 jh-devstack-03 nova-conductor[26556]: ERROR nova.scheduler.utils [None req-fd8acb29-8a2c-4603-8786-54f2580d0ea9 tempest-FloatingIpSameNetwork-1597192363 tempest-FloatingIpSameNetwork-1597192363] 
[instance: 5ee9d527-0043-474e-bfb3-e6621426662e] Error from last host: jh-devstack-03 (node jh-devstack03): [u'Traceback (most recent call last):\n', u'  File "/opt/stack/nova/nova/compute/manager.py", line 1847, in
 _do_build_and_run_instance\n    filter_properties)\n', u'  File "/opt/stack/nova/nova/compute/manager.py", line 2086, in _build_and_run_instance\n    instance_uuid=instance.uuid, reason=six.text_type(e))\n', 
u"RescheduledException: Build of instance 5ee9d527-0043-474e-bfb3-e6621426662e was re-scheduled: operation failed: domain 'instance-00000028' already exists with uuid 
93974d36e3a7-4139bbd8-2d5b51195a5f\n"]
Nov 22 17:39:04 jh-devstack-03 nova-conductor[26556]: WARNING nova.scheduler.utils [None req-fd8acb29-8a2c-4603-8786-54f2580d0ea9 tempest-FloatingIpSameNetwork-1597192363 tempest-FloatingIpSameNetwork-1597192363] 
Failed to compute_task_build_instances: No sql_connection parameter is established: CantStartEngineError: No sql_connection parameter is established
Nov 22 17:39:04 jh-devstack-03 nova-conductor[26556]: WARNING nova.scheduler.utils [None req-fd8acb29-8a2c-4603-8786-54f2580d0ea9 tempest-FloatingIpSameNetwork-1597192363 tempest-FloatingIpSameNetwork-1597192363] 
[instance: 5ee9d527-0043-474e-bfb3-e6621426662e] Setting instance to ERROR state.: CantStartEngineError: No sql_connection parameter is established

Seem to occur quite often in gate, too.
http://logstash.openstack.org/#dashboard/file/logstash.json?query=message%3A%5C%22Setting%20instance%20to%20ERROR%20state.%3A%20CantStartEngineError%5C%22

The result is that the instance information shows "No sql_connection
parameter is established" instead of the original error, making
debugging the root cause quite difficult.

** Affects: nova
     Importance: Undecided
         Status: New

-- 
You received this bug notification because you are a member of Yahoo!
Engineering Team, which is subscribed to OpenStack Compute (nova).
https://bugs.launchpad.net/bugs/1733933

Title:
  nova-conductor is masking error when rescheduling

Status in OpenStack Compute (nova):
  New

Bug description:
  Sometimes when build_instance fails on n-cpu, the error that n-cond
  receives is mangles like this:

  Nov 22 17:39:04 jh-devstack-03 nova-conductor[26556]: ERROR nova.scheduler.utils [None req-fd8acb29-8a2c-4603-8786-54f2580d0ea9 tempest-FloatingIpSameNetwork-1597192363 tempest-FloatingIpSameNetwork-1597192363] 
  [instance: 5ee9d527-0043-474e-bfb3-e6621426662e] Error from last host: jh-devstack-03 (node jh-devstack03): [u'Traceback (most recent call last):\n', u'  File "/opt/stack/nova/nova/compute/manager.py", line 1847, in
   _do_build_and_run_instance\n    filter_properties)\n', u'  File "/opt/stack/nova/nova/compute/manager.py", line 2086, in _build_and_run_instance\n    instance_uuid=instance.uuid, reason=six.text_type(e))\n', 
  u"RescheduledException: Build of instance 5ee9d527-0043-474e-bfb3-e6621426662e was re-scheduled: operation failed: domain 'instance-00000028' already exists with uuid 
  93974d36e3a7-4139bbd8-2d5b51195a5f\n"]
  Nov 22 17:39:04 jh-devstack-03 nova-conductor[26556]: WARNING nova.scheduler.utils [None req-fd8acb29-8a2c-4603-8786-54f2580d0ea9 tempest-FloatingIpSameNetwork-1597192363 tempest-FloatingIpSameNetwork-1597192363] 
  Failed to compute_task_build_instances: No sql_connection parameter is established: CantStartEngineError: No sql_connection parameter is established
  Nov 22 17:39:04 jh-devstack-03 nova-conductor[26556]: WARNING nova.scheduler.utils [None req-fd8acb29-8a2c-4603-8786-54f2580d0ea9 tempest-FloatingIpSameNetwork-1597192363 tempest-FloatingIpSameNetwork-1597192363] 
  [instance: 5ee9d527-0043-474e-bfb3-e6621426662e] Setting instance to ERROR state.: CantStartEngineError: No sql_connection parameter is established

  Seem to occur quite often in gate, too.
  http://logstash.openstack.org/#dashboard/file/logstash.json?query=message%3A%5C%22Setting%20instance%20to%20ERROR%20state.%3A%20CantStartEngineError%5C%22

  The result is that the instance information shows "No sql_connection
  parameter is established" instead of the original error, making
  debugging the root cause quite difficult.

To manage notifications about this bug go to:
https://bugs.launchpad.net/nova/+bug/1733933/+subscriptions


Follow ups