← Back to team overview

yahoo-eng-team team mailing list archive

[Bug 2102034] [NEW] Some instance artefacts remain after evacuation

 

Public bug reported:

1) deploy several vms in hypervisor, e.g. 25
for i in {1..25}; do openstack --os-compute-api-version 2.74 server create --flavor m1.tiny --image cirros-0.6.3-x86_64-disk --network private1 --host comp-01 vm_$i; done

2) Fail hypervisor which hosts vms
echo c> /proc/sysrq-trigger

3) Wait until compute service is "down"
openstack compute service list --service nova-compute --long

4) Do "hard reboot" for failed hypervisor (via Horizon) and immediately evacuate all vms
for i in {1..25}; do openstack server evacuate vm_$i; done

5) Wait until all vms are evacuated
openstack server list --long

6) Wait until compute service is "up"
openstack compute service list --service nova-compute --long

7) Check evacuated vms artefacts in failed hypervisor, e.g.
ls /var/lib/docker/volumes/nova_compute/_data/instances

8) Try live-migrate all vms back to failed hypervisor
for i in {1..25}; do openstack --os-compute-api-version 2.30 server migrate --live-migration --block-migration --host comp-01 vm_$i ; done

Expected:
no artefacts of evacuated vms and successful live migrations.

Actual:
There are artefacts of evacuated vms and live migrations are successfull on the second try.


Troubleshooting:

1) During initialization compute service tries to destroy evacuated vms artefacts.
The service processes evacuations with all statuses, except failed or completed.
The service checks whether an instance storage is shared.
If it's shared, then instance disk is not destroyed.
The service create temp file in instance folder and checks whether new instance host "sees" it (via RPC request). If "yes" or no reply the service "thinks" instance storage is shared.
2) If evacuation started recently then RPC request always times out.
The service does not destroy instance artefacts and set "completed" status for evacuation.
This evacuation will never be processed again, instance artefacts are there forever.
3) New evacuations which start during evacuation cleanup are not processed too.
4) If some evacuation "is gonna" to fail the service will always get RPC timeout and does №2

** Affects: nova
     Importance: Undecided
     Assignee: Mitya Eremeev (mitos)
         Status: In Progress

** Changed in: nova
     Assignee: (unassigned) => Mitya Eremeev (mitos)

** Changed in: nova
       Status: New => In Progress

-- 
You received this bug notification because you are a member of Yahoo!
Engineering Team, which is subscribed to OpenStack Compute (nova).
https://bugs.launchpad.net/bugs/2102034

Title:
  Some instance artefacts remain after evacuation

Status in OpenStack Compute (nova):
  In Progress

Bug description:
  1) deploy several vms in hypervisor, e.g. 25
  for i in {1..25}; do openstack --os-compute-api-version 2.74 server create --flavor m1.tiny --image cirros-0.6.3-x86_64-disk --network private1 --host comp-01 vm_$i; done

  2) Fail hypervisor which hosts vms
  echo c> /proc/sysrq-trigger

  3) Wait until compute service is "down"
  openstack compute service list --service nova-compute --long

  4) Do "hard reboot" for failed hypervisor (via Horizon) and immediately evacuate all vms
  for i in {1..25}; do openstack server evacuate vm_$i; done

  5) Wait until all vms are evacuated
  openstack server list --long

  6) Wait until compute service is "up"
  openstack compute service list --service nova-compute --long

  7) Check evacuated vms artefacts in failed hypervisor, e.g.
  ls /var/lib/docker/volumes/nova_compute/_data/instances

  8) Try live-migrate all vms back to failed hypervisor
  for i in {1..25}; do openstack --os-compute-api-version 2.30 server migrate --live-migration --block-migration --host comp-01 vm_$i ; done

  Expected:
  no artefacts of evacuated vms and successful live migrations.

  Actual:
  There are artefacts of evacuated vms and live migrations are successfull on the second try.

  
  Troubleshooting:

  1) During initialization compute service tries to destroy evacuated vms artefacts.
  The service processes evacuations with all statuses, except failed or completed.
  The service checks whether an instance storage is shared.
  If it's shared, then instance disk is not destroyed.
  The service create temp file in instance folder and checks whether new instance host "sees" it (via RPC request). If "yes" or no reply the service "thinks" instance storage is shared.
  2) If evacuation started recently then RPC request always times out.
  The service does not destroy instance artefacts and set "completed" status for evacuation.
  This evacuation will never be processed again, instance artefacts are there forever.
  3) New evacuations which start during evacuation cleanup are not processed too.
  4) If some evacuation "is gonna" to fail the service will always get RPC timeout and does №2

To manage notifications about this bug go to:
https://bugs.launchpad.net/nova/+bug/2102034/+subscriptions