yahoo-eng-team team mailing list archive
-
yahoo-eng-team team
-
Mailing list archive
-
Message #63147
[Bug 1681998] [NEW] Bypass the dirty BDM enty no matter how it is produced
Public bug reported:
Sometimes the following dirty BDM enty (1.row) can be seen in the
database that multiple BDMs with the same image_id and instance_uuid.
mysql> select * from block_device_mapping where volume_id='153bcab4-1f88-440c-9782-3c661a7502a8' \G
*************************** 1. row ***************************
created_at: 2017-02-02 02:28:45
updated_at: NULL
deleted_at: NULL
id: 9754
device_name: /dev/vdb
delete_on_termination: 0
snapshot_id: NULL
volume_id: 153bcab4-1f88-440c-9782-3c661a7502a8
volume_size: NULL
no_device: NULL
connection_info: NULL
instance_uuid: b52f9264-d8b3-406a-bf9b-d7d7471b13fc
deleted: 0
source_type: volume
destination_type: volume
guest_format: NULL
device_type: NULL
disk_bus: NULL
boot_index: NULL
image_id: NULL
*************************** 2. row ***************************
created_at: 2017-02-02 02:29:31
updated_at: 2017-02-27 10:59:42
deleted_at: NULL
id: 9757
device_name: /dev/vdc
delete_on_termination: 0
snapshot_id: NULL
volume_id: 153bcab4-1f88-440c-9782-3c661a7502a8
volume_size: NULL
no_device: NULL
connection_info: {"driver_volume_type": "rbd", "serial": "153bcab4-1f88-440c-9782-3c661a7502a8", "data": {"secret_type": "ceph", "name": "cinder-ceph/volume-153bcab4-1f88-440c-9782-3c661a7502a8", "secret_uuid": null, "qos_specs": null, "hosts": ["10.7.1.202", "10.7.1.203", "10.7.1.204"], "auth_enabled": true, "access_mode": "rw", "auth_username": "cinder-ceph", "ports": ["6789", "6789", "6789"]}}
instance_uuid: b52f9264-d8b3-406a-bf9b-d7d7471b13fc
deleted: 0
source_type: volume
destination_type: volume
guest_format: NULL
device_type: disk
disk_bus: virtio
boot_index: NULL
image_id: NULL
then it cause we fail to detach the volume and see the following error
since connection_info of row 1 is NULL.
2017-03-23 13:28:05.360 1865733 TRACE oslo_messaging.rpc.dispatcher self._detach_volume(context, instance, bdm)
2017-03-23 13:28:05.360 1865733 TRACE oslo_messaging.rpc.dispatcher File "/usr/lib/python2.7/dist-packages/nova/compute/manager.py", line 4801, in _detach_volume
2017-03-23 13:28:05.360 1865733 TRACE oslo_messaging.rpc.dispatcher connection_info = jsonutils.loads(bdm.connection_info)
2017-03-23 13:28:05.360 1865733 TRACE oslo_messaging.rpc.dispatcher File "/usr/lib/python2.7/dist-packages/oslo_serialization/jsonutils.py", line 215, in loads
2017-03-23 13:28:05.360 1865733 TRACE oslo_messaging.rpc.dispatcher return json.loads(encodeutils.safe_decode(s, encoding), **kwargs)
2017-03-23 13:28:05.360 1865733 TRACE oslo_messaging.rpc.dispatcher File "/usr/lib/python2.7/dist-packages/oslo_utils/encodeutils.py", line 33, in safe_decode
2017-03-23 13:28:05.360 1865733 TRACE oslo_messaging.rpc.dispatcher raise TypeError("%s can't be decoded" % type(text))
2017-03-23 13:28:05.360 1865733 TRACE oslo_messaging.rpc.dispatcher TypeError: <type 'NoneType'> can't be decoded
This kind of dirty data can be produced when happened to fail to run this line _attach_volume()#volume_bdm.destroy() [1], I think these conditions may cause it to happen:
1, lose the database during the operation volume_bdm.destroy()
2, lose an MQ connection or RPC timing out during the operation volume_bdm.destroy()
If you lose the database during any operation, things are going to be
bad, so in general I'm not sure how realistic guarding for that case is.
Losing an MQ connection or RPC timing out is probably more realistic.
Seems the fix [2] is trying to solve the point 2.
However, I'm thinking if we can bypass the dirty BDM entry according to
the condition that connection_info is NULL no matter how it is produced.
[1] https://github.com/openstack/nova/blob/master/nova/compute/api.py#L3724
[2] https://review.openstack.org/#/c/290793
** Affects: nova
Importance: Undecided
Status: New
--
You received this bug notification because you are a member of Yahoo!
Engineering Team, which is subscribed to OpenStack Compute (nova).
https://bugs.launchpad.net/bugs/1681998
Title:
Bypass the dirty BDM enty no matter how it is produced
Status in OpenStack Compute (nova):
New
Bug description:
Sometimes the following dirty BDM enty (1.row) can be seen in the
database that multiple BDMs with the same image_id and instance_uuid.
mysql> select * from block_device_mapping where volume_id='153bcab4-1f88-440c-9782-3c661a7502a8' \G
*************************** 1. row ***************************
created_at: 2017-02-02 02:28:45
updated_at: NULL
deleted_at: NULL
id: 9754
device_name: /dev/vdb
delete_on_termination: 0
snapshot_id: NULL
volume_id: 153bcab4-1f88-440c-9782-3c661a7502a8
volume_size: NULL
no_device: NULL
connection_info: NULL
instance_uuid: b52f9264-d8b3-406a-bf9b-d7d7471b13fc
deleted: 0
source_type: volume
destination_type: volume
guest_format: NULL
device_type: NULL
disk_bus: NULL
boot_index: NULL
image_id: NULL
*************************** 2. row ***************************
created_at: 2017-02-02 02:29:31
updated_at: 2017-02-27 10:59:42
deleted_at: NULL
id: 9757
device_name: /dev/vdc
delete_on_termination: 0
snapshot_id: NULL
volume_id: 153bcab4-1f88-440c-9782-3c661a7502a8
volume_size: NULL
no_device: NULL
connection_info: {"driver_volume_type": "rbd", "serial": "153bcab4-1f88-440c-9782-3c661a7502a8", "data": {"secret_type": "ceph", "name": "cinder-ceph/volume-153bcab4-1f88-440c-9782-3c661a7502a8", "secret_uuid": null, "qos_specs": null, "hosts": ["10.7.1.202", "10.7.1.203", "10.7.1.204"], "auth_enabled": true, "access_mode": "rw", "auth_username": "cinder-ceph", "ports": ["6789", "6789", "6789"]}}
instance_uuid: b52f9264-d8b3-406a-bf9b-d7d7471b13fc
deleted: 0
source_type: volume
destination_type: volume
guest_format: NULL
device_type: disk
disk_bus: virtio
boot_index: NULL
image_id: NULL
then it cause we fail to detach the volume and see the following error
since connection_info of row 1 is NULL.
2017-03-23 13:28:05.360 1865733 TRACE oslo_messaging.rpc.dispatcher self._detach_volume(context, instance, bdm)
2017-03-23 13:28:05.360 1865733 TRACE oslo_messaging.rpc.dispatcher File "/usr/lib/python2.7/dist-packages/nova/compute/manager.py", line 4801, in _detach_volume
2017-03-23 13:28:05.360 1865733 TRACE oslo_messaging.rpc.dispatcher connection_info = jsonutils.loads(bdm.connection_info)
2017-03-23 13:28:05.360 1865733 TRACE oslo_messaging.rpc.dispatcher File "/usr/lib/python2.7/dist-packages/oslo_serialization/jsonutils.py", line 215, in loads
2017-03-23 13:28:05.360 1865733 TRACE oslo_messaging.rpc.dispatcher return json.loads(encodeutils.safe_decode(s, encoding), **kwargs)
2017-03-23 13:28:05.360 1865733 TRACE oslo_messaging.rpc.dispatcher File "/usr/lib/python2.7/dist-packages/oslo_utils/encodeutils.py", line 33, in safe_decode
2017-03-23 13:28:05.360 1865733 TRACE oslo_messaging.rpc.dispatcher raise TypeError("%s can't be decoded" % type(text))
2017-03-23 13:28:05.360 1865733 TRACE oslo_messaging.rpc.dispatcher TypeError: <type 'NoneType'> can't be decoded
This kind of dirty data can be produced when happened to fail to run this line _attach_volume()#volume_bdm.destroy() [1], I think these conditions may cause it to happen:
1, lose the database during the operation volume_bdm.destroy()
2, lose an MQ connection or RPC timing out during the operation volume_bdm.destroy()
If you lose the database during any operation, things are going to be
bad, so in general I'm not sure how realistic guarding for that case
is. Losing an MQ connection or RPC timing out is probably more
realistic. Seems the fix [2] is trying to solve the point 2.
However, I'm thinking if we can bypass the dirty BDM entry according
to the condition that connection_info is NULL no matter how it is
produced.
[1] https://github.com/openstack/nova/blob/master/nova/compute/api.py#L3724
[2] https://review.openstack.org/#/c/290793
To manage notifications about this bug go to:
https://bugs.launchpad.net/nova/+bug/1681998/+subscriptions