yahoo-eng-team team mailing list archive

Thread
Date

[Bug 1291991] [NEW] ipmi cmds run to fast

To: yahoo-eng-team@xxxxxxxxxxxxxxxxxxx
From: Dan Prince <dprince@xxxxxxxxxx>
Date: Thu, 13 Mar 2014 13:20:43 -0000
Reply-to: Bug 1291991 <1291991@xxxxxxxxxxxxxxxxxx>
Sender: bounces@xxxxxxxxxxxxx

Public bug reported:

When using Nova baremetal the IPMI power commands are still proving to
be too fast. I routinely get stack traces that look like this when
deleting baremetal instances:

2 10:39:33.351 5112 TRACE nova.compute.manager [instance: 032250f7-3255-47f5-b866-35687dcd14ce] ProcessExecutionError: Unexpected error while running command.
Mar 12 10:39:33 undercloud-undercloud-7be7u2y6y5cz nova-compute[5112]: 2014-03-12 10:39:33.351 5112 TRACE nova.compute.manager [instance: 032250f7-3255-47f5-b866-35687dcd14ce] Command: ipmitool -I lanplus -H 10.1.8.23 -U ooo-dev -f /tmp/tmpMa8D4u power status
Mar 12 10:39:33 undercloud-undercloud-7be7u2y6y5cz nova-compute[5112]: 2014-03-12 10:39:33.351 5112 TRACE nova.compute.manager [instance: 032250f7-3255-47f5-b866-35687dcd14ce] Exit code: 1
Mar 12 10:39:33 undercloud-undercloud-7be7u2y6y5cz nova-compute[5112]: 2014-03-12 10:39:33.351 5112 TRACE nova.compute.manager [instance: 032250f7-3255-47f5-b866-35687dcd14ce] Stdout: ''
Mar 12 10:39:33 undercloud-undercloud-7be7u2y6y5cz nova-compute[5112]: 2014-03-12 10:39:33.351 5112 TRACE nova.compute.manager [instance: 032250f7-3255-47f5-b866-35687dcd14ce] Stderr: 'Error in open session response message : insufficient resources for session\n\nError: Unable to establish IPMI v2 / RMCP+ session\nUnable to get Chassis Power Status\n'
Mar 12 10:39:33 undercloud-undercloud-7be7u2y6y5cz nova-compute[5112]: 2014-03-12 10:39:33.351 5112 TRACE nova.compute.manager [instance: 032250f7-3255-47f5-b866-35687dcd14ce]
Mar 12 10:39:33 undercloud-undercloud-7be7u2y6y5cz nova-compute[5112]: 2014-03-12 10:39:33.931 5112 ERROR oslo.messaging.rpc.dispatcher [-] Exception during message handling: Unexpected error while running command.

----

The root cause seems to be in the _power_off routine which repeatedly
calls "power status" to determine if the instance has properly powered
down after issuing the "power off". Once this fails simply resetting the
instance state and retrying the delete again usually fixes the issue.

On the CLI the same commands always seem to work as well.

It does seem like our retry code is still too aggressive and we need to
wait longer for each IPMI retry.

** Affects: ironic
     Importance: Undecided
     Assignee: Dan Prince (dan-prince)
         Status: In Progress

** Affects: nova
     Importance: High
     Assignee: Dan Prince (dan-prince)
         Status: In Progress

** Changed in: nova
     Assignee: (unassigned) => Dan Prince (dan-prince)

** Changed in: nova
   Importance: Undecided => High

** Changed in: nova
       Status: New => In Progress

** Also affects: ironic
   Importance: Undecided
       Status: New

** Changed in: ironic
     Assignee: (unassigned) => Dan Prince (dan-prince)

** Changed in: ironic
       Status: New => In Progress

-- 
You received this bug notification because you are a member of Yahoo!
Engineering Team, which is subscribed to OpenStack Compute (nova).
https://bugs.launchpad.net/bugs/1291991

Title:
  ipmi cmds run to fast

Status in Ironic (Bare Metal Provisioning):
  In Progress
Status in OpenStack Compute (Nova):
  In Progress

Bug description:
  When using Nova baremetal the IPMI power commands are still proving to
  be too fast. I routinely get stack traces that look like this when
  deleting baremetal instances:

  2 10:39:33.351 5112 TRACE nova.compute.manager [instance: 032250f7-3255-47f5-b866-35687dcd14ce] ProcessExecutionError: Unexpected error while running command.
  Mar 12 10:39:33 undercloud-undercloud-7be7u2y6y5cz nova-compute[5112]: 2014-03-12 10:39:33.351 5112 TRACE nova.compute.manager [instance: 032250f7-3255-47f5-b866-35687dcd14ce] Command: ipmitool -I lanplus -H 10.1.8.23 -U ooo-dev -f /tmp/tmpMa8D4u power status
  Mar 12 10:39:33 undercloud-undercloud-7be7u2y6y5cz nova-compute[5112]: 2014-03-12 10:39:33.351 5112 TRACE nova.compute.manager [instance: 032250f7-3255-47f5-b866-35687dcd14ce] Exit code: 1
  Mar 12 10:39:33 undercloud-undercloud-7be7u2y6y5cz nova-compute[5112]: 2014-03-12 10:39:33.351 5112 TRACE nova.compute.manager [instance: 032250f7-3255-47f5-b866-35687dcd14ce] Stdout: ''
  Mar 12 10:39:33 undercloud-undercloud-7be7u2y6y5cz nova-compute[5112]: 2014-03-12 10:39:33.351 5112 TRACE nova.compute.manager [instance: 032250f7-3255-47f5-b866-35687dcd14ce] Stderr: 'Error in open session response message : insufficient resources for session\n\nError: Unable to establish IPMI v2 / RMCP+ session\nUnable to get Chassis Power Status\n'
  Mar 12 10:39:33 undercloud-undercloud-7be7u2y6y5cz nova-compute[5112]: 2014-03-12 10:39:33.351 5112 TRACE nova.compute.manager [instance: 032250f7-3255-47f5-b866-35687dcd14ce]
  Mar 12 10:39:33 undercloud-undercloud-7be7u2y6y5cz nova-compute[5112]: 2014-03-12 10:39:33.931 5112 ERROR oslo.messaging.rpc.dispatcher [-] Exception during message handling: Unexpected error while running command.

  ----

  The root cause seems to be in the _power_off routine which repeatedly
  calls "power status" to determine if the instance has properly powered
  down after issuing the "power off". Once this fails simply resetting
  the instance state and retrying the delete again usually fixes the
  issue.

  On the CLI the same commands always seem to work as well.

  It does seem like our retry code is still too aggressive and we need
  to wait longer for each IPMI retry.

To manage notifications about this bug go to:
https://bugs.launchpad.net/ironic/+bug/1291991/+subscriptions

Follow ups

[Bug 1291991] Re: ipmi cmds run too fast, cause BMC to run out of resources
From: Dan Prince, 2014-09-09
[Bug 1291991] Re: ipmi cmds run too fast, cause BMC to run out of resources
From: Thierry Carrez, 2014-04-01
[Bug 1291991] [NEW] ipmi cmds run to fast
From: Dan Prince, 2014-03-13

References

[Bug 1291991] [NEW] ipmi cmds run to fast
From: Dan Prince, 2014-03-13