← Back to team overview

yahoo-eng-team team mailing list archive

[Bug 1860991] [NEW] fail to evacuate instance

 

Public bug reported:

Description
===========
When vm host(nova-02) loss network connection, I try to evacuate instance from nova-02 to another host(nova-04), the status show no problem, but instance is broken.

instance migrate/live-migrate is no problem.

Steps to reproduce
==================
1. cut off running host networking, after 1 min. verify compute service
[root@tp-osc-01 ~]# openstack compute service list
+-----+----------------+------------+----------+---------+-------+-------------
|  ID | Binary         | Host       | Zone     | Status  | State | Updated At                 |
+-----+----------------+------------+----------+---------+-------+-------------
| 293 | nova-compute   | tp-nova-01 | nova     | enabled | up    | 2020-01-27T14:05:50.000000 |
| 327 | nova-compute   | tp-nova-02 | nova     | enabled | down  | 2020-01-27T14:04:32.000000 |
| 329 | nova-compute   | tp-nova-04 | nova     | enabled | up    | 2020-01-27T14:05:51.000000 |
| 331 | nova-compute   | tp-nova-05 | nova     | enabled | up    | 2020-01-27T14:05:53.000000 |
| 333 | nova-compute   | tp-nova-03 | nova     | enabled | up    | 2020-01-27T14:05:51.000000 |
+-----+----------------+------------+----------+---------+-------+-------------
2. evacuate vm instance
[root@tp-osc-01 ~]# nova evacuate test1
3. check vm status
[root@tp-osc-01 ~]# openstack server list
+--------------------------------------+-------+---------+---------------------
| ID                                   | Name  | Status  | Networks                  | Image                | Flavor |
+--------------------------------------+-------+---------+---------------------
| 1d8d3b6d-34f4-4f49-9c19-72c0e84f498a | test1 | REBUILD | net_vlan1040=172.22.40.70 | CentOS-7-x86_64-1907 | m1     |
+--------------------------------------+-------+---------+---------------------
[root@tp-osc-01 ~]# openstack server list
+--------------------------------------+-------+--------+----------------------
| ID                                   | Name  | Status | Networks                  | Image                | Flavor |
+--------------------------------------+-------+--------+----------------------
| 1d8d3b6d-34f4-4f49-9c19-72c0e84f498a | test1 | ACTIVE | net_vlan1040=172.22.40.70 | CentOS-7-x86_64-1907 | m1     |
+--------------------------------------+-------+--------+----------------------
[root@tp-osc-01 ~]# openstack server show test1
+-------------------------------------+----------------------------------------
| Field                               | Value                                          |
+-------------------------------------+----------------------------------------
| OS-DCF:diskConfig                   | MANUAL                                         |
| OS-EXT-AZ:availability_zone         | nova                                           |
| OS-EXT-SRV-ATTR:host                | tp-nova-04                                     |
| OS-EXT-SRV-ATTR:hypervisor_hostname | tp-nova-04                                     |
| OS-EXT-SRV-ATTR:instance_name       | instance-0000007c                              |
| OS-EXT-STS:power_state              | Running                                        |
| OS-EXT-STS:task_state               | None                                           |
| OS-EXT-STS:vm_state                 | active                                         |
| OS-SRV-USG:launched_at              | 2020-01-27T14:09:19.000000                     |
| OS-SRV-USG:terminated_at            | None                                           |
| accessIPv4                          |                                                |

Expected result
===============
instance successfully evacuate from dead host to new one.

Actual result
=============
instance console log show fail to mount /sysroot

[  OK  ] Started File System Check on /dev/d...806-efd7-4eef-aaa2-2584909365ff.
         Mounting /sysroot...
[    4.088225] SGI XFS with ACLs, security attributes, no debug enabled
[    4.096798] XFS (vda1): Mounting V5 Filesystem
[    4.245558] blk_update_request: I/O error, dev vda, sector 8395962
[    4.252896] blk_update_request: I/O error, dev vda, sector 8396970
[    4.259931] blk_update_request: I/O error, dev vda, sector 8397994
[    4.266486] blk_update_request: I/O error, dev vda, sector 8399018
[    4.272896] blk_update_request: I/O error, dev vda, sector 8400042
[    4.279461] XFS (vda1): xfs_do_force_shutdown(0x1) called from line 1240 of file fs/xfs/xfs_buf.c.  Return address = 0xffffffffc054265c
[    4.290789] XFS (vda1): I/O Error Detected. Shutting down filesystem
[    4.296513] XFS (vda1): Please umount the filesystem and rectify the problem(s)
[    4.304120] XFS (vda1): metadata I/O error: block 0x8014ba ("xlog_bwrite") error 5 numblks 8192
[    4.312798] XFS (vda1): failed to locate log tail
[    4.317410] XFS (vda1): log mount/recovery failed: error -5
[    4.322768] XFS (vda1): log mount failed
[    4.333181] blk_update_request: I/O error, dev vda, sector 0
[FAILED] Failed to mount /sysroot.

Environment
===========
1. CentOS 7.7.1908, openstack-train
# rpm -qa|grep nova
python2-novaclient-15.1.0-1.el7.noarch
openstack-nova-common-20.0.1-1.el7.noarch
openstack-nova-compute-20.0.1-1.el7.noarch
python2-nova-20.0.1-1.el7.noarch
2. Which hypervisor did you use?
Libvirt + KVM
3. Which storage type did you use?
# rpm -qa|grep ceph
python-ceph-argparse-14.2.6-0.el7.x86_64
centos-release-ceph-nautilus-1.2-2.el7.centos.noarch
python-cephfs-14.2.6-0.el7.x86_64
libcephfs2-14.2.6-0.el7.x86_64
ceph-common-14.2.6-0.el7.x86_64

Logs & Configs
==============
2020-01-27 22:09:07.488 8033 INFO nova.compute.manager [req-cf1812d7-04ac-4e9c-80e7-a7d59849a6d2 454223bb7b1a4230a9354ac3e6d348ce 2f568d0c51664761bd1f8b3855e07fdc - default default] [instance: 1d8d3b6d-34f4-4f49-9c19-72c0e84f498a] Evacuating instance
2020-01-27 22:09:07.586 8033 INFO nova.compute.claims [req-cf1812d7-04ac-4e9c-80e7-a7d59849a6d2 454223bb7b1a4230a9354ac3e6d348ce 2f568d0c51664761bd1f8b3855e07fdc - default default] [instance: 1d8d3b6d-34f4-4f49-9c19-72c0e84f498a] Claim successful on node tp-nova-04
2020-01-27 22:09:07.814 8033 INFO nova.compute.resource_tracker [req-cf1812d7-04ac-4e9c-80e7-a7d59849a6d2 454223bb7b1a4230a9354ac3e6d348ce 2f568d0c51664761bd1f8b3855e07fdc - default default] [instance: 1d8d3b6d-34f4-4f49-9c19-72c0e84f498a] Updating resource usage from migration 4f1bc10c-879c-494c-9975-c21344b2885a
2020-01-27 22:09:08.766 8033 INFO nova.compute.manager [req-cf1812d7-04ac-4e9c-80e7-a7d59849a6d2 454223bb7b1a4230a9354ac3e6d348ce 2f568d0c51664761bd1f8b3855e07fdc - default default] disk on shared storage, evacuating using existing disk
2020-01-27 22:09:12.096 8033 INFO nova.network.neutronv2.api [req-cf1812d7-04ac-4e9c-80e7-a7d59849a6d2 454223bb7b1a4230a9354ac3e6d348ce 2f568d0c51664761bd1f8b3855e07fdc - default default] [instance: 1d8d3b6d-34f4-4f49-9c19-72c0e84f498a] Updating port 421618b1-f5e7-4059-8afc-ef1b5aeb3cd5 with attributes {'binding:profile': {}, 'device_owner': u'compute:nova', 'binding:host_id': 'tp-nova-04'}
2020-01-27 22:09:12.610 8033 WARNING nova.compute.manager [req-bb068ecf-73cf-484b-9a27-6cb7969d5df9 9a2c4c899358464093aae8a2d1a21fa6 d880e8f1ab2c443ca1b1e974a2045c55 - default default] [instance: 1d8d3b6d-34f4-4f49-9c19-72c0e84f498a] Received unexpected event network-vif-unplugged-421618b1-f5e7-4059-8afc-ef1b5aeb3cd5 for instance with vm_state active and task_state rebuilding.
2020-01-27 22:09:14.652 8033 INFO nova.virt.libvirt.driver [req-cf1812d7-04ac-4e9c-80e7-a7d59849a6d2 454223bb7b1a4230a9354ac3e6d348ce 2f568d0c51664761bd1f8b3855e07fdc - default default] [instance: 1d8d3b6d-34f4-4f49-9c19-72c0e84f498a] Creating image
2020-01-27 22:09:16.115 8033 INFO os_vif [req-cf1812d7-04ac-4e9c-80e7-a7d59849a6d2 454223bb7b1a4230a9354ac3e6d348ce 2f568d0c51664761bd1f8b3855e07fdc - default default] Successfully plugged vif VIFBridge(active=False,address=fa:16:3e:53:dc:ae,bridge_name='brq5bcb7802-b7',has_traffic_filtering=True,id=421618b1-f5e7-4059-8afc-ef1b5aeb3cd5,network=Network(5bcb7802-b789-40b2-95b1-0ae0db5f31b1),plugin='linux_bridge',port_profile=<?>,preserve_on_delete=False,vif_name='tap421618b1-f5')
2020-01-27 22:09:16.715 8033 INFO nova.compute.manager [req-0f2a2985-50a1-45db-ab71-0f2594a4ffa9 - - - - -] [instance: 1d8d3b6d-34f4-4f49-9c19-72c0e84f498a] VM Started (Lifecycle Event)
2020-01-27 22:09:16.803 8033 INFO nova.compute.manager [req-0f2a2985-50a1-45db-ab71-0f2594a4ffa9 - - - - -] [instance: 1d8d3b6d-34f4-4f49-9c19-72c0e84f498a] VM Paused (Lifecycle Event)
2020-01-27 22:09:16.949 8033 INFO nova.compute.manager [req-0f2a2985-50a1-45db-ab71-0f2594a4ffa9 - - - - -] [instance: 1d8d3b6d-34f4-4f49-9c19-72c0e84f498a] During the sync_power process the instance has moved from host tp-nova-02 to host tp-nova-04
2020-01-27 22:09:19.237 8033 INFO nova.compute.resource_tracker [req-9f908fc6-ae50-43e8-84f4-81c574f2cb1a - - - - -] [instance: 1d8d3b6d-34f4-4f49-9c19-72c0e84f498a] Updating resource usage from migration 4f1bc10c-879c-494c-9975-c21344b2885a
2020-01-27 22:09:19.582 8033 INFO nova.compute.manager [req-0f2a2985-50a1-45db-ab71-0f2594a4ffa9 - - - - -] [instance: 1d8d3b6d-34f4-4f49-9c19-72c0e84f498a] VM Resumed (Lifecycle Event)
2020-01-27 22:09:19.591 8033 INFO nova.virt.libvirt.driver [-] [instance: 1d8d3b6d-34f4-4f49-9c19-72c0e84f498a] Instance spawned successfully.
2020-01-27 22:09:19.741 8033 INFO nova.compute.manager [req-0f2a2985-50a1-45db-ab71-0f2594a4ffa9 - - - - -] [instance: 1d8d3b6d-34f4-4f49-9c19-72c0e84f498a] During the sync_power process the instance has moved from host tp-nova-02 to host tp-nova-04
2020-01-27 22:09:19.755 8033 WARNING nova.compute.resource_tracker [req-9f908fc6-ae50-43e8-84f4-81c574f2cb1a - - - - -] Instance 1d8d3b6d-34f4-4f49-9c19-72c0e84f498a has been moved to another host tp-nova-02(tp-nova-02). There are allocations remaining against the source host that might need to be removed: {u'resources': {u'MEMORY_MB': 4096, u'VCPU': 2, u'DISK_GB': 10}}.

** Affects: nova
     Importance: Undecided
         Status: New

** Description changed:

  Description
  ===========
  When vm host(nova-02) loss network connection, I try to evacuate instance from nova-02 to another host(nova-04), the status show no problem, but instance is broken.
+ 
+ instance migrate/live-migrate is no problem.
  
  Steps to reproduce
  ==================
  1. cut off running host networking, after 1 min. verify compute service
  [root@tp-osc-01 ~]# openstack compute service list
- +-----+----------------+------------+----------+---------+-------+----------------------------+
+ +-----+----------------+------------+----------+---------+-------+-------------
  |  ID | Binary         | Host       | Zone     | Status  | State | Updated At                 |
- +-----+----------------+------------+----------+---------+-------+----------------------------+
+ +-----+----------------+------------+----------+---------+-------+-------------
  | 293 | nova-compute   | tp-nova-01 | nova     | enabled | up    | 2020-01-27T14:05:50.000000 |
  | 327 | nova-compute   | tp-nova-02 | nova     | enabled | down  | 2020-01-27T14:04:32.000000 |
  | 329 | nova-compute   | tp-nova-04 | nova     | enabled | up    | 2020-01-27T14:05:51.000000 |
  | 331 | nova-compute   | tp-nova-05 | nova     | enabled | up    | 2020-01-27T14:05:53.000000 |
  | 333 | nova-compute   | tp-nova-03 | nova     | enabled | up    | 2020-01-27T14:05:51.000000 |
- +-----+----------------+------------+----------+---------+-------+----------------------------+
+ +-----+----------------+------------+----------+---------+-------+-------------
  2. evacuate vm instance
  [root@tp-osc-01 ~]# nova evacuate test1
  3. check vm status
  [root@tp-osc-01 ~]# openstack server list
- +--------------------------------------+-------+---------+---------------------------+----------------------+--------+
+ +--------------------------------------+-------+---------+---------------------
  | ID                                   | Name  | Status  | Networks                  | Image                | Flavor |
- +--------------------------------------+-------+---------+---------------------------+----------------------+--------+
+ +--------------------------------------+-------+---------+---------------------
  | 1d8d3b6d-34f4-4f49-9c19-72c0e84f498a | test1 | REBUILD | net_vlan1040=172.22.40.70 | CentOS-7-x86_64-1907 | m1     |
- +--------------------------------------+-------+---------+---------------------------+----------------------+--------+
+ +--------------------------------------+-------+---------+---------------------
  [root@tp-osc-01 ~]# openstack server list
- +--------------------------------------+-------+--------+---------------------------+----------------------+--------+
+ +--------------------------------------+-------+--------+----------------------
  | ID                                   | Name  | Status | Networks                  | Image                | Flavor |
- +--------------------------------------+-------+--------+---------------------------+----------------------+--------+
+ +--------------------------------------+-------+--------+----------------------
  | 1d8d3b6d-34f4-4f49-9c19-72c0e84f498a | test1 | ACTIVE | net_vlan1040=172.22.40.70 | CentOS-7-x86_64-1907 | m1     |
- +--------------------------------------+-------+--------+---------------------------+----------------------+--------+
+ +--------------------------------------+-------+--------+----------------------
  [root@tp-osc-01 ~]# openstack server show test1
- +-------------------------------------+------------------------------------------------+
+ +-------------------------------------+----------------------------------------
  | Field                               | Value                                          |
- +-------------------------------------+------------------------------------------------+
+ +-------------------------------------+----------------------------------------
  | OS-DCF:diskConfig                   | MANUAL                                         |
  | OS-EXT-AZ:availability_zone         | nova                                           |
  | OS-EXT-SRV-ATTR:host                | tp-nova-04                                     |
  | OS-EXT-SRV-ATTR:hypervisor_hostname | tp-nova-04                                     |
  | OS-EXT-SRV-ATTR:instance_name       | instance-0000007c                              |
  | OS-EXT-STS:power_state              | Running                                        |
  | OS-EXT-STS:task_state               | None                                           |
  | OS-EXT-STS:vm_state                 | active                                         |
  | OS-SRV-USG:launched_at              | 2020-01-27T14:09:19.000000                     |
  | OS-SRV-USG:terminated_at            | None                                           |
  | accessIPv4                          |                                                |
  
  Expected result
  ===============
  instance successfully evacuate from dead host to new one.
  
  Actual result
  =============
  instance console log show fail to mount /sysroot
  
  [  OK  ] Started File System Check on /dev/d...806-efd7-4eef-aaa2-2584909365ff.
-          Mounting /sysroot...
+          Mounting /sysroot...
  [    4.088225] SGI XFS with ACLs, security attributes, no debug enabled
  [    4.096798] XFS (vda1): Mounting V5 Filesystem
  [    4.245558] blk_update_request: I/O error, dev vda, sector 8395962
  [    4.252896] blk_update_request: I/O error, dev vda, sector 8396970
  [    4.259931] blk_update_request: I/O error, dev vda, sector 8397994
  [    4.266486] blk_update_request: I/O error, dev vda, sector 8399018
  [    4.272896] blk_update_request: I/O error, dev vda, sector 8400042
  [    4.279461] XFS (vda1): xfs_do_force_shutdown(0x1) called from line 1240 of file fs/xfs/xfs_buf.c.  Return address = 0xffffffffc054265c
  [    4.290789] XFS (vda1): I/O Error Detected. Shutting down filesystem
  [    4.296513] XFS (vda1): Please umount the filesystem and rectify the problem(s)
  [    4.304120] XFS (vda1): metadata I/O error: block 0x8014ba ("xlog_bwrite") error 5 numblks 8192
  [    4.312798] XFS (vda1): failed to locate log tail
  [    4.317410] XFS (vda1): log mount/recovery failed: error -5
  [    4.322768] XFS (vda1): log mount failed
  [    4.333181] blk_update_request: I/O error, dev vda, sector 0
  [FAILED] Failed to mount /sysroot.
  
  Environment
  ===========
  1. CentOS 7.7.1908, openstack-train
  # rpm -qa|grep nova
  python2-novaclient-15.1.0-1.el7.noarch
  openstack-nova-common-20.0.1-1.el7.noarch
  openstack-nova-compute-20.0.1-1.el7.noarch
  python2-nova-20.0.1-1.el7.noarch
  2. Which hypervisor did you use?
  Libvirt + KVM
  3. Which storage type did you use?
  # rpm -qa|grep ceph
  python-ceph-argparse-14.2.6-0.el7.x86_64
  centos-release-ceph-nautilus-1.2-2.el7.centos.noarch
  python-cephfs-14.2.6-0.el7.x86_64
  libcephfs2-14.2.6-0.el7.x86_64
  ceph-common-14.2.6-0.el7.x86_64
  
- nova-compute log
- ================
+ Logs & Configs
+ ==============
  2020-01-27 22:09:07.488 8033 INFO nova.compute.manager [req-cf1812d7-04ac-4e9c-80e7-a7d59849a6d2 454223bb7b1a4230a9354ac3e6d348ce 2f568d0c51664761bd1f8b3855e07fdc - default default] [instance: 1d8d3b6d-34f4-4f49-9c19-72c0e84f498a] Evacuating instance
  2020-01-27 22:09:07.586 8033 INFO nova.compute.claims [req-cf1812d7-04ac-4e9c-80e7-a7d59849a6d2 454223bb7b1a4230a9354ac3e6d348ce 2f568d0c51664761bd1f8b3855e07fdc - default default] [instance: 1d8d3b6d-34f4-4f49-9c19-72c0e84f498a] Claim successful on node tp-nova-04
  2020-01-27 22:09:07.814 8033 INFO nova.compute.resource_tracker [req-cf1812d7-04ac-4e9c-80e7-a7d59849a6d2 454223bb7b1a4230a9354ac3e6d348ce 2f568d0c51664761bd1f8b3855e07fdc - default default] [instance: 1d8d3b6d-34f4-4f49-9c19-72c0e84f498a] Updating resource usage from migration 4f1bc10c-879c-494c-9975-c21344b2885a
  2020-01-27 22:09:08.766 8033 INFO nova.compute.manager [req-cf1812d7-04ac-4e9c-80e7-a7d59849a6d2 454223bb7b1a4230a9354ac3e6d348ce 2f568d0c51664761bd1f8b3855e07fdc - default default] disk on shared storage, evacuating using existing disk
  2020-01-27 22:09:12.096 8033 INFO nova.network.neutronv2.api [req-cf1812d7-04ac-4e9c-80e7-a7d59849a6d2 454223bb7b1a4230a9354ac3e6d348ce 2f568d0c51664761bd1f8b3855e07fdc - default default] [instance: 1d8d3b6d-34f4-4f49-9c19-72c0e84f498a] Updating port 421618b1-f5e7-4059-8afc-ef1b5aeb3cd5 with attributes {'binding:profile': {}, 'device_owner': u'compute:nova', 'binding:host_id': 'tp-nova-04'}
  2020-01-27 22:09:12.610 8033 WARNING nova.compute.manager [req-bb068ecf-73cf-484b-9a27-6cb7969d5df9 9a2c4c899358464093aae8a2d1a21fa6 d880e8f1ab2c443ca1b1e974a2045c55 - default default] [instance: 1d8d3b6d-34f4-4f49-9c19-72c0e84f498a] Received unexpected event network-vif-unplugged-421618b1-f5e7-4059-8afc-ef1b5aeb3cd5 for instance with vm_state active and task_state rebuilding.
  2020-01-27 22:09:14.652 8033 INFO nova.virt.libvirt.driver [req-cf1812d7-04ac-4e9c-80e7-a7d59849a6d2 454223bb7b1a4230a9354ac3e6d348ce 2f568d0c51664761bd1f8b3855e07fdc - default default] [instance: 1d8d3b6d-34f4-4f49-9c19-72c0e84f498a] Creating image
  2020-01-27 22:09:16.115 8033 INFO os_vif [req-cf1812d7-04ac-4e9c-80e7-a7d59849a6d2 454223bb7b1a4230a9354ac3e6d348ce 2f568d0c51664761bd1f8b3855e07fdc - default default] Successfully plugged vif VIFBridge(active=False,address=fa:16:3e:53:dc:ae,bridge_name='brq5bcb7802-b7',has_traffic_filtering=True,id=421618b1-f5e7-4059-8afc-ef1b5aeb3cd5,network=Network(5bcb7802-b789-40b2-95b1-0ae0db5f31b1),plugin='linux_bridge',port_profile=<?>,preserve_on_delete=False,vif_name='tap421618b1-f5')
  2020-01-27 22:09:16.715 8033 INFO nova.compute.manager [req-0f2a2985-50a1-45db-ab71-0f2594a4ffa9 - - - - -] [instance: 1d8d3b6d-34f4-4f49-9c19-72c0e84f498a] VM Started (Lifecycle Event)
  2020-01-27 22:09:16.803 8033 INFO nova.compute.manager [req-0f2a2985-50a1-45db-ab71-0f2594a4ffa9 - - - - -] [instance: 1d8d3b6d-34f4-4f49-9c19-72c0e84f498a] VM Paused (Lifecycle Event)
  2020-01-27 22:09:16.949 8033 INFO nova.compute.manager [req-0f2a2985-50a1-45db-ab71-0f2594a4ffa9 - - - - -] [instance: 1d8d3b6d-34f4-4f49-9c19-72c0e84f498a] During the sync_power process the instance has moved from host tp-nova-02 to host tp-nova-04
  2020-01-27 22:09:19.237 8033 INFO nova.compute.resource_tracker [req-9f908fc6-ae50-43e8-84f4-81c574f2cb1a - - - - -] [instance: 1d8d3b6d-34f4-4f49-9c19-72c0e84f498a] Updating resource usage from migration 4f1bc10c-879c-494c-9975-c21344b2885a
  2020-01-27 22:09:19.582 8033 INFO nova.compute.manager [req-0f2a2985-50a1-45db-ab71-0f2594a4ffa9 - - - - -] [instance: 1d8d3b6d-34f4-4f49-9c19-72c0e84f498a] VM Resumed (Lifecycle Event)
  2020-01-27 22:09:19.591 8033 INFO nova.virt.libvirt.driver [-] [instance: 1d8d3b6d-34f4-4f49-9c19-72c0e84f498a] Instance spawned successfully.
  2020-01-27 22:09:19.741 8033 INFO nova.compute.manager [req-0f2a2985-50a1-45db-ab71-0f2594a4ffa9 - - - - -] [instance: 1d8d3b6d-34f4-4f49-9c19-72c0e84f498a] During the sync_power process the instance has moved from host tp-nova-02 to host tp-nova-04
  2020-01-27 22:09:19.755 8033 WARNING nova.compute.resource_tracker [req-9f908fc6-ae50-43e8-84f4-81c574f2cb1a - - - - -] Instance 1d8d3b6d-34f4-4f49-9c19-72c0e84f498a has been moved to another host tp-nova-02(tp-nova-02). There are allocations remaining against the source host that might need to be removed: {u'resources': {u'MEMORY_MB': 4096, u'VCPU': 2, u'DISK_GB': 10}}.

-- 
You received this bug notification because you are a member of Yahoo!
Engineering Team, which is subscribed to OpenStack Compute (nova).
https://bugs.launchpad.net/bugs/1860991

Title:
  fail to evacuate instance

Status in OpenStack Compute (nova):
  New

Bug description:
  Description
  ===========
  When vm host(nova-02) loss network connection, I try to evacuate instance from nova-02 to another host(nova-04), the status show no problem, but instance is broken.

  instance migrate/live-migrate is no problem.

  Steps to reproduce
  ==================
  1. cut off running host networking, after 1 min. verify compute service
  [root@tp-osc-01 ~]# openstack compute service list
  +-----+----------------+------------+----------+---------+-------+-------------
  |  ID | Binary         | Host       | Zone     | Status  | State | Updated At                 |
  +-----+----------------+------------+----------+---------+-------+-------------
  | 293 | nova-compute   | tp-nova-01 | nova     | enabled | up    | 2020-01-27T14:05:50.000000 |
  | 327 | nova-compute   | tp-nova-02 | nova     | enabled | down  | 2020-01-27T14:04:32.000000 |
  | 329 | nova-compute   | tp-nova-04 | nova     | enabled | up    | 2020-01-27T14:05:51.000000 |
  | 331 | nova-compute   | tp-nova-05 | nova     | enabled | up    | 2020-01-27T14:05:53.000000 |
  | 333 | nova-compute   | tp-nova-03 | nova     | enabled | up    | 2020-01-27T14:05:51.000000 |
  +-----+----------------+------------+----------+---------+-------+-------------
  2. evacuate vm instance
  [root@tp-osc-01 ~]# nova evacuate test1
  3. check vm status
  [root@tp-osc-01 ~]# openstack server list
  +--------------------------------------+-------+---------+---------------------
  | ID                                   | Name  | Status  | Networks                  | Image                | Flavor |
  +--------------------------------------+-------+---------+---------------------
  | 1d8d3b6d-34f4-4f49-9c19-72c0e84f498a | test1 | REBUILD | net_vlan1040=172.22.40.70 | CentOS-7-x86_64-1907 | m1     |
  +--------------------------------------+-------+---------+---------------------
  [root@tp-osc-01 ~]# openstack server list
  +--------------------------------------+-------+--------+----------------------
  | ID                                   | Name  | Status | Networks                  | Image                | Flavor |
  +--------------------------------------+-------+--------+----------------------
  | 1d8d3b6d-34f4-4f49-9c19-72c0e84f498a | test1 | ACTIVE | net_vlan1040=172.22.40.70 | CentOS-7-x86_64-1907 | m1     |
  +--------------------------------------+-------+--------+----------------------
  [root@tp-osc-01 ~]# openstack server show test1
  +-------------------------------------+----------------------------------------
  | Field                               | Value                                          |
  +-------------------------------------+----------------------------------------
  | OS-DCF:diskConfig                   | MANUAL                                         |
  | OS-EXT-AZ:availability_zone         | nova                                           |
  | OS-EXT-SRV-ATTR:host                | tp-nova-04                                     |
  | OS-EXT-SRV-ATTR:hypervisor_hostname | tp-nova-04                                     |
  | OS-EXT-SRV-ATTR:instance_name       | instance-0000007c                              |
  | OS-EXT-STS:power_state              | Running                                        |
  | OS-EXT-STS:task_state               | None                                           |
  | OS-EXT-STS:vm_state                 | active                                         |
  | OS-SRV-USG:launched_at              | 2020-01-27T14:09:19.000000                     |
  | OS-SRV-USG:terminated_at            | None                                           |
  | accessIPv4                          |                                                |

  Expected result
  ===============
  instance successfully evacuate from dead host to new one.

  Actual result
  =============
  instance console log show fail to mount /sysroot

  [  OK  ] Started File System Check on /dev/d...806-efd7-4eef-aaa2-2584909365ff.
           Mounting /sysroot...
  [    4.088225] SGI XFS with ACLs, security attributes, no debug enabled
  [    4.096798] XFS (vda1): Mounting V5 Filesystem
  [    4.245558] blk_update_request: I/O error, dev vda, sector 8395962
  [    4.252896] blk_update_request: I/O error, dev vda, sector 8396970
  [    4.259931] blk_update_request: I/O error, dev vda, sector 8397994
  [    4.266486] blk_update_request: I/O error, dev vda, sector 8399018
  [    4.272896] blk_update_request: I/O error, dev vda, sector 8400042
  [    4.279461] XFS (vda1): xfs_do_force_shutdown(0x1) called from line 1240 of file fs/xfs/xfs_buf.c.  Return address = 0xffffffffc054265c
  [    4.290789] XFS (vda1): I/O Error Detected. Shutting down filesystem
  [    4.296513] XFS (vda1): Please umount the filesystem and rectify the problem(s)
  [    4.304120] XFS (vda1): metadata I/O error: block 0x8014ba ("xlog_bwrite") error 5 numblks 8192
  [    4.312798] XFS (vda1): failed to locate log tail
  [    4.317410] XFS (vda1): log mount/recovery failed: error -5
  [    4.322768] XFS (vda1): log mount failed
  [    4.333181] blk_update_request: I/O error, dev vda, sector 0
  [FAILED] Failed to mount /sysroot.

  Environment
  ===========
  1. CentOS 7.7.1908, openstack-train
  # rpm -qa|grep nova
  python2-novaclient-15.1.0-1.el7.noarch
  openstack-nova-common-20.0.1-1.el7.noarch
  openstack-nova-compute-20.0.1-1.el7.noarch
  python2-nova-20.0.1-1.el7.noarch
  2. Which hypervisor did you use?
  Libvirt + KVM
  3. Which storage type did you use?
  # rpm -qa|grep ceph
  python-ceph-argparse-14.2.6-0.el7.x86_64
  centos-release-ceph-nautilus-1.2-2.el7.centos.noarch
  python-cephfs-14.2.6-0.el7.x86_64
  libcephfs2-14.2.6-0.el7.x86_64
  ceph-common-14.2.6-0.el7.x86_64

  Logs & Configs
  ==============
  2020-01-27 22:09:07.488 8033 INFO nova.compute.manager [req-cf1812d7-04ac-4e9c-80e7-a7d59849a6d2 454223bb7b1a4230a9354ac3e6d348ce 2f568d0c51664761bd1f8b3855e07fdc - default default] [instance: 1d8d3b6d-34f4-4f49-9c19-72c0e84f498a] Evacuating instance
  2020-01-27 22:09:07.586 8033 INFO nova.compute.claims [req-cf1812d7-04ac-4e9c-80e7-a7d59849a6d2 454223bb7b1a4230a9354ac3e6d348ce 2f568d0c51664761bd1f8b3855e07fdc - default default] [instance: 1d8d3b6d-34f4-4f49-9c19-72c0e84f498a] Claim successful on node tp-nova-04
  2020-01-27 22:09:07.814 8033 INFO nova.compute.resource_tracker [req-cf1812d7-04ac-4e9c-80e7-a7d59849a6d2 454223bb7b1a4230a9354ac3e6d348ce 2f568d0c51664761bd1f8b3855e07fdc - default default] [instance: 1d8d3b6d-34f4-4f49-9c19-72c0e84f498a] Updating resource usage from migration 4f1bc10c-879c-494c-9975-c21344b2885a
  2020-01-27 22:09:08.766 8033 INFO nova.compute.manager [req-cf1812d7-04ac-4e9c-80e7-a7d59849a6d2 454223bb7b1a4230a9354ac3e6d348ce 2f568d0c51664761bd1f8b3855e07fdc - default default] disk on shared storage, evacuating using existing disk
  2020-01-27 22:09:12.096 8033 INFO nova.network.neutronv2.api [req-cf1812d7-04ac-4e9c-80e7-a7d59849a6d2 454223bb7b1a4230a9354ac3e6d348ce 2f568d0c51664761bd1f8b3855e07fdc - default default] [instance: 1d8d3b6d-34f4-4f49-9c19-72c0e84f498a] Updating port 421618b1-f5e7-4059-8afc-ef1b5aeb3cd5 with attributes {'binding:profile': {}, 'device_owner': u'compute:nova', 'binding:host_id': 'tp-nova-04'}
  2020-01-27 22:09:12.610 8033 WARNING nova.compute.manager [req-bb068ecf-73cf-484b-9a27-6cb7969d5df9 9a2c4c899358464093aae8a2d1a21fa6 d880e8f1ab2c443ca1b1e974a2045c55 - default default] [instance: 1d8d3b6d-34f4-4f49-9c19-72c0e84f498a] Received unexpected event network-vif-unplugged-421618b1-f5e7-4059-8afc-ef1b5aeb3cd5 for instance with vm_state active and task_state rebuilding.
  2020-01-27 22:09:14.652 8033 INFO nova.virt.libvirt.driver [req-cf1812d7-04ac-4e9c-80e7-a7d59849a6d2 454223bb7b1a4230a9354ac3e6d348ce 2f568d0c51664761bd1f8b3855e07fdc - default default] [instance: 1d8d3b6d-34f4-4f49-9c19-72c0e84f498a] Creating image
  2020-01-27 22:09:16.115 8033 INFO os_vif [req-cf1812d7-04ac-4e9c-80e7-a7d59849a6d2 454223bb7b1a4230a9354ac3e6d348ce 2f568d0c51664761bd1f8b3855e07fdc - default default] Successfully plugged vif VIFBridge(active=False,address=fa:16:3e:53:dc:ae,bridge_name='brq5bcb7802-b7',has_traffic_filtering=True,id=421618b1-f5e7-4059-8afc-ef1b5aeb3cd5,network=Network(5bcb7802-b789-40b2-95b1-0ae0db5f31b1),plugin='linux_bridge',port_profile=<?>,preserve_on_delete=False,vif_name='tap421618b1-f5')
  2020-01-27 22:09:16.715 8033 INFO nova.compute.manager [req-0f2a2985-50a1-45db-ab71-0f2594a4ffa9 - - - - -] [instance: 1d8d3b6d-34f4-4f49-9c19-72c0e84f498a] VM Started (Lifecycle Event)
  2020-01-27 22:09:16.803 8033 INFO nova.compute.manager [req-0f2a2985-50a1-45db-ab71-0f2594a4ffa9 - - - - -] [instance: 1d8d3b6d-34f4-4f49-9c19-72c0e84f498a] VM Paused (Lifecycle Event)
  2020-01-27 22:09:16.949 8033 INFO nova.compute.manager [req-0f2a2985-50a1-45db-ab71-0f2594a4ffa9 - - - - -] [instance: 1d8d3b6d-34f4-4f49-9c19-72c0e84f498a] During the sync_power process the instance has moved from host tp-nova-02 to host tp-nova-04
  2020-01-27 22:09:19.237 8033 INFO nova.compute.resource_tracker [req-9f908fc6-ae50-43e8-84f4-81c574f2cb1a - - - - -] [instance: 1d8d3b6d-34f4-4f49-9c19-72c0e84f498a] Updating resource usage from migration 4f1bc10c-879c-494c-9975-c21344b2885a
  2020-01-27 22:09:19.582 8033 INFO nova.compute.manager [req-0f2a2985-50a1-45db-ab71-0f2594a4ffa9 - - - - -] [instance: 1d8d3b6d-34f4-4f49-9c19-72c0e84f498a] VM Resumed (Lifecycle Event)
  2020-01-27 22:09:19.591 8033 INFO nova.virt.libvirt.driver [-] [instance: 1d8d3b6d-34f4-4f49-9c19-72c0e84f498a] Instance spawned successfully.
  2020-01-27 22:09:19.741 8033 INFO nova.compute.manager [req-0f2a2985-50a1-45db-ab71-0f2594a4ffa9 - - - - -] [instance: 1d8d3b6d-34f4-4f49-9c19-72c0e84f498a] During the sync_power process the instance has moved from host tp-nova-02 to host tp-nova-04
  2020-01-27 22:09:19.755 8033 WARNING nova.compute.resource_tracker [req-9f908fc6-ae50-43e8-84f4-81c574f2cb1a - - - - -] Instance 1d8d3b6d-34f4-4f49-9c19-72c0e84f498a has been moved to another host tp-nova-02(tp-nova-02). There are allocations remaining against the source host that might need to be removed: {u'resources': {u'MEMORY_MB': 4096, u'VCPU': 2, u'DISK_GB': 10}}.

To manage notifications about this bug go to:
https://bugs.launchpad.net/nova/+bug/1860991/+subscriptions


Follow ups