yahoo-eng-team team mailing list archive

Thread
Date

[Bug 1291991] Re: ipmi cmds run too fast, cause BMC to run out of resources

To: yahoo-eng-team@xxxxxxxxxxxxxxxxxxx
From: Dan Prince <dprince@xxxxxxxxxx>
Date: Tue, 09 Sep 2014 12:33:58 -0000
Reply-to: Bug 1291991 <1291991@xxxxxxxxxxxxxxxxxx>
Sender: bounces@xxxxxxxxxxxxx

** Changed in: nova
       Status: In Progress => Won't Fix

-- 
You received this bug notification because you are a member of Yahoo!
Engineering Team, which is subscribed to OpenStack Compute (nova).
https://bugs.launchpad.net/bugs/1291991

Title:
  ipmi cmds run too fast, cause BMC to run out of resources

Status in OpenStack Bare Metal Provisioning Service (Ironic):
  Fix Released
Status in OpenStack Compute (Nova):
  Won't Fix

Bug description:
  When using Nova baremetal the IPMI power commands are still proving to
  be too fast. I routinely get stack traces that look like this when
  deleting baremetal instances:

  2 10:39:33.351 5112 TRACE nova.compute.manager [instance: 032250f7-3255-47f5-b866-35687dcd14ce] ProcessExecutionError: Unexpected error while running command.
  Mar 12 10:39:33 undercloud-undercloud-7be7u2y6y5cz nova-compute[5112]: 2014-03-12 10:39:33.351 5112 TRACE nova.compute.manager [instance: 032250f7-3255-47f5-b866-35687dcd14ce] Command: ipmitool -I lanplus -H 10.1.8.23 -U ooo-dev -f /tmp/tmpMa8D4u power status
  Mar 12 10:39:33 undercloud-undercloud-7be7u2y6y5cz nova-compute[5112]: 2014-03-12 10:39:33.351 5112 TRACE nova.compute.manager [instance: 032250f7-3255-47f5-b866-35687dcd14ce] Exit code: 1
  Mar 12 10:39:33 undercloud-undercloud-7be7u2y6y5cz nova-compute[5112]: 2014-03-12 10:39:33.351 5112 TRACE nova.compute.manager [instance: 032250f7-3255-47f5-b866-35687dcd14ce] Stdout: ''
  Mar 12 10:39:33 undercloud-undercloud-7be7u2y6y5cz nova-compute[5112]: 2014-03-12 10:39:33.351 5112 TRACE nova.compute.manager [instance: 032250f7-3255-47f5-b866-35687dcd14ce] Stderr: 'Error in open session response message : insufficient resources for session\n\nError: Unable to establish IPMI v2 / RMCP+ session\nUnable to get Chassis Power Status\n'
  Mar 12 10:39:33 undercloud-undercloud-7be7u2y6y5cz nova-compute[5112]: 2014-03-12 10:39:33.351 5112 TRACE nova.compute.manager [instance: 032250f7-3255-47f5-b866-35687dcd14ce]
  Mar 12 10:39:33 undercloud-undercloud-7be7u2y6y5cz nova-compute[5112]: 2014-03-12 10:39:33.931 5112 ERROR oslo.messaging.rpc.dispatcher [-] Exception during message handling: Unexpected error while running command.

  ----

  The root cause seems to be in the _power_off routine which repeatedly
  calls "power status" to determine if the instance has properly powered
  down after issuing the "power off". Once this fails simply resetting
  the instance state and retrying the delete again usually fixes the
  issue.

  On the CLI the same commands always seem to work as well.

  It does seem like our retry code is still too aggressive and we need
  to wait longer for each IPMI retry.

To manage notifications about this bug go to:
https://bugs.launchpad.net/ironic/+bug/1291991/+subscriptions

References

[Bug 1291991] [NEW] ipmi cmds run to fast
From: Dan Prince, 2014-03-13