← Back to team overview

yahoo-eng-team team mailing list archive

[Bug 1260644] [NEW] ServerRescueTest may fail due to RESCUE taking too long

 

Public bug reported:

In the grenade test [0] for a bp I'm working on, ServerRescueTestXML
rescue_unrescue test failed because the VM did not get into RESCUE state
in time. It seems that the test is flacky.

>From the tempest log [1] I see the sequence VM ACTIVE, RESCUE issues,
WAIT, timeout, DELETE VM.

>From the nova cpu log [1], following request ID: req-6c20654c-c00c-4932
-87ad-8cfec9866399, I see that the RESCUE RCP is received immediately by
n-cpu, however then the requests starves for 3 minutes waiting for a
"compute_resources" lock.

The VM is than deleted by the test and when nova tries to process the
RESCUE it throws and exception as the VM is not there:

bc-b27a-83c39b7566c8] Traceback (most recent call last):
bc-b27a-83c39b7566c8]   File "/opt/stack/new/nova/nova/compute/manager.py", line 2664, in rescue_instance
bc-b27a-83c39b7566c8]     rescue_image_meta, admin_password)
bc-b27a-83c39b7566c8]   File "/opt/stack/new/nova/nova/virt/libvirt/driver.py", line 2109, in rescue
bc-b27a-83c39b7566c8]     write_to_disk=True)
bc-b27a-83c39b7566c8]   File "/opt/stack/new/nova/nova/virt/libvirt/driver.py", line 3236, in to_xml
bc-b27a-83c39b7566c8]     libvirt_utils.write_to_file(xml_path, xml)
bc-b27a-83c39b7566c8]   File "/opt/stack/new/nova/nova/virt/libvirt/utils.py", line 494, in write_to_file
bc-b27a-83c39b7566c8]     with open(path, 'w') as f:
bc-b27a-83c39b7566c8] IOError: [Errno 2] No such file or directory: u'/opt/stack/data/nova/instances/a5099beb-f4a2-47bc-b27a-83c39b7566c8/libvirt.xml'
bc-b27a-83c39b7566c8] 

There may be a problem in nova as well, as RESCUE is held for 3 minutes
waiting on a lock.

[0] https://review.openstack.org/#/c/60434/
[1] http://logs.openstack.org/34/60434/5/check/check-grenade-dsvm/1d2852d/logs/tempest.txt.gz
[2] http://logs.openstack.org/34/60434/5/check/check-grenade-dsvm/1d2852d/logs/new/screen-n-cpu.txt.gz?

** Affects: nova
     Importance: Undecided
         Status: New

** Affects: tempest
     Importance: Undecided
         Status: New

** Also affects: nova
   Importance: Undecided
       Status: New

-- 
You received this bug notification because you are a member of Yahoo!
Engineering Team, which is subscribed to OpenStack Compute (nova).
https://bugs.launchpad.net/bugs/1260644

Title:
  ServerRescueTest may fail due to RESCUE taking too long

Status in OpenStack Compute (Nova):
  New
Status in Tempest:
  New

Bug description:
  In the grenade test [0] for a bp I'm working on, ServerRescueTestXML
  rescue_unrescue test failed because the VM did not get into RESCUE
  state in time. It seems that the test is flacky.

  From the tempest log [1] I see the sequence VM ACTIVE, RESCUE issues,
  WAIT, timeout, DELETE VM.

  From the nova cpu log [1], following request ID: req-6c20654c-
  c00c-4932-87ad-8cfec9866399, I see that the RESCUE RCP is received
  immediately by n-cpu, however then the requests starves for 3 minutes
  waiting for a  "compute_resources" lock.

  The VM is than deleted by the test and when nova tries to process the
  RESCUE it throws and exception as the VM is not there:

  bc-b27a-83c39b7566c8] Traceback (most recent call last):
  bc-b27a-83c39b7566c8]   File "/opt/stack/new/nova/nova/compute/manager.py", line 2664, in rescue_instance
  bc-b27a-83c39b7566c8]     rescue_image_meta, admin_password)
  bc-b27a-83c39b7566c8]   File "/opt/stack/new/nova/nova/virt/libvirt/driver.py", line 2109, in rescue
  bc-b27a-83c39b7566c8]     write_to_disk=True)
  bc-b27a-83c39b7566c8]   File "/opt/stack/new/nova/nova/virt/libvirt/driver.py", line 3236, in to_xml
  bc-b27a-83c39b7566c8]     libvirt_utils.write_to_file(xml_path, xml)
  bc-b27a-83c39b7566c8]   File "/opt/stack/new/nova/nova/virt/libvirt/utils.py", line 494, in write_to_file
  bc-b27a-83c39b7566c8]     with open(path, 'w') as f:
  bc-b27a-83c39b7566c8] IOError: [Errno 2] No such file or directory: u'/opt/stack/data/nova/instances/a5099beb-f4a2-47bc-b27a-83c39b7566c8/libvirt.xml'
  bc-b27a-83c39b7566c8] 

  There may be a problem in nova as well, as RESCUE is held for 3
  minutes waiting on a lock.

  [0] https://review.openstack.org/#/c/60434/
  [1] http://logs.openstack.org/34/60434/5/check/check-grenade-dsvm/1d2852d/logs/tempest.txt.gz
  [2] http://logs.openstack.org/34/60434/5/check/check-grenade-dsvm/1d2852d/logs/new/screen-n-cpu.txt.gz?

To manage notifications about this bug go to:
https://bugs.launchpad.net/nova/+bug/1260644/+subscriptions


Follow ups

References