← Back to team overview

yahoo-eng-team team mailing list archive

[Bug 1447884] [NEW] Boot from volume, block device allocate timeout cause VM error, but volume would be available later

 

Public bug reported:

When we  try to boot multi instances from volume (with a  large image
source)  at the same time,  we usually got a block device allocate error
as the logs in nova-compute.log:

2015-03-30 23:22:46.920 6445 WARNING nova.compute.manager [-] Volume id: 551ea616-e1c4-4ef2-9bf3-b0ca6d4474dc finished being created but was not set as 'available'
2015-03-30 23:22:47.131 6445 ERROR nova.compute.manager [-] [instance: 483472b2-61b3-4574-95e2-8cd0304f90f6] Instance failed block device setup
2015-03-30 23:22:47.131 6445 TRACE nova.compute.manager [instance: 483472b2-61b3-4574-95e2-8cd0304f90f6] Traceback (most recent call last):
2015-03-30 23:22:47.131 6445 TRACE nova.compute.manager [instance: 483472b2-61b3-4574-95e2-8cd0304f90f6]   File "/usr/lib/python2.6/site-packages/nova/compute/manager.py", line 1829, in _prep_block_device
2015-03-30 23:22:47.131 6445 TRACE nova.compute.manager [instance: 483472b2-61b3-4574-95e2-8cd0304f90f6]     do_check_attach=do_check_attach) +
2015-03-30 23:22:47.131 6445 TRACE nova.compute.manager [instance: 483472b2-61b3-4574-95e2-8cd0304f90f6]   File "/usr/lib/python2.6/site-packages/nova/virt/block_device.py", line 407, in attach_block_devices
2015-03-30 23:22:47.131 6445 TRACE nova.compute.manager [instance: 483472b2-61b3-4574-95e2-8cd0304f90f6]     map(_log_and_attach, block_device_mapping)
2015-03-30 23:22:47.131 6445 TRACE nova.compute.manager [instance: 483472b2-61b3-4574-95e2-8cd0304f90f6]   File "/usr/lib/python2.6/site-packages/nova/virt/block_device.py", line 405, in _log_and_attach
2015-03-30 23:22:47.131 6445 TRACE nova.compute.manager [instance: 483472b2-61b3-4574-95e2-8cd0304f90f6]     bdm.attach(*attach_args, **attach_kwargs)
2015-03-30 23:22:47.131 6445 TRACE nova.compute.manager [instance: 483472b2-61b3-4574-95e2-8cd0304f90f6]   File "/usr/lib/python2.6/site-packages/nova/virt/block_device.py", line 339, in attach
2015-03-30 23:22:47.131 6445 TRACE nova.compute.manager [instance: 483472b2-61b3-4574-95e2-8cd0304f90f6]     do_check_attach=do_check_attach)
2015-03-30 23:22:47.131 6445 TRACE nova.compute.manager [instance: 483472b2-61b3-4574-95e2-8cd0304f90f6]   File "/usr/lib/python2.6/site-packages/nova/virt/block_device.py", line 46, in wrapped
2015-03-30 23:22:47.131 6445 TRACE nova.compute.manager [instance: 483472b2-61b3-4574-95e2-8cd0304f90f6]     ret_val = method(obj, context, *args, **kwargs)
2015-03-30 23:22:47.131 6445 TRACE nova.compute.manager [instance: 483472b2-61b3-4574-95e2-8cd0304f90f6]   File "/usr/lib/python2.6/site-packages/nova/virt/block_device.py", line 229, in attach
2015-03-30 23:22:47.131 6445 TRACE nova.compute.manager [instance: 483472b2-61b3-4574-95e2-8cd0304f90f6]     volume_api.check_attach(context, volume, instance=instance)
2015-03-30 23:22:47.131 6445 TRACE nova.compute.manager [instance: 483472b2-61b3-4574-95e2-8cd0304f90f6]   File "/usr/lib/python2.6/site-packages/nova/volume/cinder.py", line 305, in check_attach
2015-03-30 23:22:47.131 6445 TRACE nova.compute.manager [instance: 483472b2-61b3-4574-95e2-8cd0304f90f6]     raise exception.InvalidVolume(reason=msg)
2015-03-30 23:22:47.131 6445 TRACE nova.compute.manager [instance: 483472b2-61b3-4574-95e2-8cd0304f90f6] InvalidVolume: Invalid volume: status must be 'available'
2015-03-30 23:22:47.131 6445 TRACE nova.compute.manager [instance: 483472b2-61b3-4574-95e2-8cd0304f90f6]

This error cause the VM in "error" status:
+--------------------------------------+------------+--------+----------------------+-------------+----------------------+
| ID                                   | Name       | Status | Task State           | Power State | Networks             |
+--------------------------------------+------------+--------+----------------------+-------------+----------------------+
| 1fa2d7aa-8bd9-4a22-8538-0a07d9dae8aa | inst02     | ERROR  | block_device_mapping | NOSTATE     |                      |
+--------------------------------------+------------+--------+----------------------+-------------+----------------------+
But the volume was in "available" status:
-----------------------+
|                  ID                  |   Status  | Name | Size | Volume Type | Bootable |             Attached to              |
+--------------------------------------+-----------+------+------+-------------+----------+--------------------------------------+
| a9ab2dc2-b117-44ef-8678-f71067a9e770 | available | None |  2   |     None    |   true   |                                      |
+--------------------------------------+-----------+------+------+-------------+----------+--------------------------------------+
+--------------------------------------+------------+--------+----------------------+-------------+-------


And when we teminiate this VM, the volume will still exist,  since there is no volume attachment info stored in this VM.

This can be easily reproduced:
1. add the following options  to nova.conf  in compute node ( make sure the error will  be happened even with small image such as cirros):
block_device_allocate_retries = 1
block_device_allocate_retries_interval = 1

2. restart the nova-compute service and try boot one instance from
volume:

nova boot --flavor 1 --block-device source=image,id=929dfa0b-
0b9b-4819-9f84-e4ed22e81f33,dest=volume,size=2,shutdown=remove,bootindex=0
--nic net-id=a302dfa6-0c5d-46ab-9295-8f8b9119fe83 --availability-zone
nova:compute01-juno.ibm.com inst02

** Affects: nova
     Importance: Undecided
         Status: New

-- 
You received this bug notification because you are a member of Yahoo!
Engineering Team, which is subscribed to OpenStack Compute (nova).
https://bugs.launchpad.net/bugs/1447884

Title:
  Boot from volume, block device allocate timeout cause VM error, but
  volume would be available later

Status in OpenStack Compute (Nova):
  New

Bug description:
  When we  try to boot multi instances from volume (with a  large image
  source)  at the same time,  we usually got a block device allocate
  error as the logs in nova-compute.log:

  2015-03-30 23:22:46.920 6445 WARNING nova.compute.manager [-] Volume id: 551ea616-e1c4-4ef2-9bf3-b0ca6d4474dc finished being created but was not set as 'available'
  2015-03-30 23:22:47.131 6445 ERROR nova.compute.manager [-] [instance: 483472b2-61b3-4574-95e2-8cd0304f90f6] Instance failed block device setup
  2015-03-30 23:22:47.131 6445 TRACE nova.compute.manager [instance: 483472b2-61b3-4574-95e2-8cd0304f90f6] Traceback (most recent call last):
  2015-03-30 23:22:47.131 6445 TRACE nova.compute.manager [instance: 483472b2-61b3-4574-95e2-8cd0304f90f6]   File "/usr/lib/python2.6/site-packages/nova/compute/manager.py", line 1829, in _prep_block_device
  2015-03-30 23:22:47.131 6445 TRACE nova.compute.manager [instance: 483472b2-61b3-4574-95e2-8cd0304f90f6]     do_check_attach=do_check_attach) +
  2015-03-30 23:22:47.131 6445 TRACE nova.compute.manager [instance: 483472b2-61b3-4574-95e2-8cd0304f90f6]   File "/usr/lib/python2.6/site-packages/nova/virt/block_device.py", line 407, in attach_block_devices
  2015-03-30 23:22:47.131 6445 TRACE nova.compute.manager [instance: 483472b2-61b3-4574-95e2-8cd0304f90f6]     map(_log_and_attach, block_device_mapping)
  2015-03-30 23:22:47.131 6445 TRACE nova.compute.manager [instance: 483472b2-61b3-4574-95e2-8cd0304f90f6]   File "/usr/lib/python2.6/site-packages/nova/virt/block_device.py", line 405, in _log_and_attach
  2015-03-30 23:22:47.131 6445 TRACE nova.compute.manager [instance: 483472b2-61b3-4574-95e2-8cd0304f90f6]     bdm.attach(*attach_args, **attach_kwargs)
  2015-03-30 23:22:47.131 6445 TRACE nova.compute.manager [instance: 483472b2-61b3-4574-95e2-8cd0304f90f6]   File "/usr/lib/python2.6/site-packages/nova/virt/block_device.py", line 339, in attach
  2015-03-30 23:22:47.131 6445 TRACE nova.compute.manager [instance: 483472b2-61b3-4574-95e2-8cd0304f90f6]     do_check_attach=do_check_attach)
  2015-03-30 23:22:47.131 6445 TRACE nova.compute.manager [instance: 483472b2-61b3-4574-95e2-8cd0304f90f6]   File "/usr/lib/python2.6/site-packages/nova/virt/block_device.py", line 46, in wrapped
  2015-03-30 23:22:47.131 6445 TRACE nova.compute.manager [instance: 483472b2-61b3-4574-95e2-8cd0304f90f6]     ret_val = method(obj, context, *args, **kwargs)
  2015-03-30 23:22:47.131 6445 TRACE nova.compute.manager [instance: 483472b2-61b3-4574-95e2-8cd0304f90f6]   File "/usr/lib/python2.6/site-packages/nova/virt/block_device.py", line 229, in attach
  2015-03-30 23:22:47.131 6445 TRACE nova.compute.manager [instance: 483472b2-61b3-4574-95e2-8cd0304f90f6]     volume_api.check_attach(context, volume, instance=instance)
  2015-03-30 23:22:47.131 6445 TRACE nova.compute.manager [instance: 483472b2-61b3-4574-95e2-8cd0304f90f6]   File "/usr/lib/python2.6/site-packages/nova/volume/cinder.py", line 305, in check_attach
  2015-03-30 23:22:47.131 6445 TRACE nova.compute.manager [instance: 483472b2-61b3-4574-95e2-8cd0304f90f6]     raise exception.InvalidVolume(reason=msg)
  2015-03-30 23:22:47.131 6445 TRACE nova.compute.manager [instance: 483472b2-61b3-4574-95e2-8cd0304f90f6] InvalidVolume: Invalid volume: status must be 'available'
  2015-03-30 23:22:47.131 6445 TRACE nova.compute.manager [instance: 483472b2-61b3-4574-95e2-8cd0304f90f6]

  This error cause the VM in "error" status:
  +--------------------------------------+------------+--------+----------------------+-------------+----------------------+
  | ID                                   | Name       | Status | Task State           | Power State | Networks             |
  +--------------------------------------+------------+--------+----------------------+-------------+----------------------+
  | 1fa2d7aa-8bd9-4a22-8538-0a07d9dae8aa | inst02     | ERROR  | block_device_mapping | NOSTATE     |                      |
  +--------------------------------------+------------+--------+----------------------+-------------+----------------------+
  But the volume was in "available" status:
  -----------------------+
  |                  ID                  |   Status  | Name | Size | Volume Type | Bootable |             Attached to              |
  +--------------------------------------+-----------+------+------+-------------+----------+--------------------------------------+
  | a9ab2dc2-b117-44ef-8678-f71067a9e770 | available | None |  2   |     None    |   true   |                                      |
  +--------------------------------------+-----------+------+------+-------------+----------+--------------------------------------+
  +--------------------------------------+------------+--------+----------------------+-------------+-------

  
  And when we teminiate this VM, the volume will still exist,  since there is no volume attachment info stored in this VM.

  This can be easily reproduced:
  1. add the following options  to nova.conf  in compute node ( make sure the error will  be happened even with small image such as cirros):
  block_device_allocate_retries = 1
  block_device_allocate_retries_interval = 1

  2. restart the nova-compute service and try boot one instance from
  volume:

  nova boot --flavor 1 --block-device source=image,id=929dfa0b-
  0b9b-4819-9f84-e4ed22e81f33,dest=volume,size=2,shutdown=remove,bootindex=0
  --nic net-id=a302dfa6-0c5d-46ab-9295-8f8b9119fe83 --availability-zone
  nova:compute01-juno.ibm.com inst02

To manage notifications about this bug go to:
https://bugs.launchpad.net/nova/+bug/1447884/+subscriptions


Follow ups

References