yahoo-eng-team team mailing list archive
-
yahoo-eng-team team
-
Mailing list archive
-
Message #12884
[Bug 1291991] Re: ipmi cmds run too fast, cause BMC to run out of resources
** Changed in: ironic
Status: Fix Committed => Fix Released
--
You received this bug notification because you are a member of Yahoo!
Engineering Team, which is subscribed to OpenStack Compute (nova).
https://bugs.launchpad.net/bugs/1291991
Title:
ipmi cmds run too fast, cause BMC to run out of resources
Status in Ironic (Bare Metal Provisioning):
Fix Released
Status in OpenStack Compute (Nova):
In Progress
Bug description:
When using Nova baremetal the IPMI power commands are still proving to
be too fast. I routinely get stack traces that look like this when
deleting baremetal instances:
2 10:39:33.351 5112 TRACE nova.compute.manager [instance: 032250f7-3255-47f5-b866-35687dcd14ce] ProcessExecutionError: Unexpected error while running command.
Mar 12 10:39:33 undercloud-undercloud-7be7u2y6y5cz nova-compute[5112]: 2014-03-12 10:39:33.351 5112 TRACE nova.compute.manager [instance: 032250f7-3255-47f5-b866-35687dcd14ce] Command: ipmitool -I lanplus -H 10.1.8.23 -U ooo-dev -f /tmp/tmpMa8D4u power status
Mar 12 10:39:33 undercloud-undercloud-7be7u2y6y5cz nova-compute[5112]: 2014-03-12 10:39:33.351 5112 TRACE nova.compute.manager [instance: 032250f7-3255-47f5-b866-35687dcd14ce] Exit code: 1
Mar 12 10:39:33 undercloud-undercloud-7be7u2y6y5cz nova-compute[5112]: 2014-03-12 10:39:33.351 5112 TRACE nova.compute.manager [instance: 032250f7-3255-47f5-b866-35687dcd14ce] Stdout: ''
Mar 12 10:39:33 undercloud-undercloud-7be7u2y6y5cz nova-compute[5112]: 2014-03-12 10:39:33.351 5112 TRACE nova.compute.manager [instance: 032250f7-3255-47f5-b866-35687dcd14ce] Stderr: 'Error in open session response message : insufficient resources for session\n\nError: Unable to establish IPMI v2 / RMCP+ session\nUnable to get Chassis Power Status\n'
Mar 12 10:39:33 undercloud-undercloud-7be7u2y6y5cz nova-compute[5112]: 2014-03-12 10:39:33.351 5112 TRACE nova.compute.manager [instance: 032250f7-3255-47f5-b866-35687dcd14ce]
Mar 12 10:39:33 undercloud-undercloud-7be7u2y6y5cz nova-compute[5112]: 2014-03-12 10:39:33.931 5112 ERROR oslo.messaging.rpc.dispatcher [-] Exception during message handling: Unexpected error while running command.
----
The root cause seems to be in the _power_off routine which repeatedly
calls "power status" to determine if the instance has properly powered
down after issuing the "power off". Once this fails simply resetting
the instance state and retrying the delete again usually fixes the
issue.
On the CLI the same commands always seem to work as well.
It does seem like our retry code is still too aggressive and we need
to wait longer for each IPMI retry.
To manage notifications about this bug go to:
https://bugs.launchpad.net/ironic/+bug/1291991/+subscriptions
References