← Back to team overview

yahoo-eng-team team mailing list archive

[Bug 1462305] [NEW] multi-node test causes nova-compute to lockup

 

Public bug reported:

Its not very clear whats going on here, but here is the symptom.

One of the nova-compute nodes appears to lock up:
http://logs.openstack.org/67/175067/2/check/check-tempest-dsvm-multinode-full/7a95fb0/logs/screen-n-cpu.txt.gz#_2015-05-29_23_27_48_296
It was just completing the termination of an instance:
http://logs.openstack.org/67/175067/2/check/check-tempest-dsvm-multinode-full/7a95fb0/logs/screen-n-cpu.txt.gz#_2015-05-29_23_27_48_153

This is also seen in the scheduler reporting the node as down:
http://logs.openstack.org/67/175067/2/check/check-tempest-dsvm-multinode-full/7a95fb0/logs/screen-n-sch.txt.gz#_2015-05-29_23_31_02_711

On further inspection it seems like the other nova compute node had just started a migration:
http://logs.openstack.org/67/175067/2/check/check-tempest-dsvm-multinode-full/7a95fb0/logs/subnode-2/screen-n-cpu.txt.gz#_2015-05-29_23_27_48_079


We have had issues in the past where olso.locks can lead to deadlocks, its not totally clear if thats happening here. all the periodic tasks happen in the same greenlet, so you can stop them happening if you hold a lock in an RPC call thats being processed, etc. No idea if thats happening here though.

** Affects: nova
     Importance: Undecided
     Assignee: Joe Gordon (jogo)
         Status: Incomplete


** Tags: testing

** Changed in: nova
       Status: New => Incomplete

** Changed in: nova
     Assignee: (unassigned) => Joe Gordon (jogo)

** Tags added: testing

-- 
You received this bug notification because you are a member of Yahoo!
Engineering Team, which is subscribed to OpenStack Compute (nova).
https://bugs.launchpad.net/bugs/1462305

Title:
  multi-node test causes nova-compute to lockup

Status in OpenStack Compute (Nova):
  Incomplete

Bug description:
  Its not very clear whats going on here, but here is the symptom.

  One of the nova-compute nodes appears to lock up:
  http://logs.openstack.org/67/175067/2/check/check-tempest-dsvm-multinode-full/7a95fb0/logs/screen-n-cpu.txt.gz#_2015-05-29_23_27_48_296
  It was just completing the termination of an instance:
  http://logs.openstack.org/67/175067/2/check/check-tempest-dsvm-multinode-full/7a95fb0/logs/screen-n-cpu.txt.gz#_2015-05-29_23_27_48_153

  This is also seen in the scheduler reporting the node as down:
  http://logs.openstack.org/67/175067/2/check/check-tempest-dsvm-multinode-full/7a95fb0/logs/screen-n-sch.txt.gz#_2015-05-29_23_31_02_711

  On further inspection it seems like the other nova compute node had just started a migration:
  http://logs.openstack.org/67/175067/2/check/check-tempest-dsvm-multinode-full/7a95fb0/logs/subnode-2/screen-n-cpu.txt.gz#_2015-05-29_23_27_48_079

  
  We have had issues in the past where olso.locks can lead to deadlocks, its not totally clear if thats happening here. all the periodic tasks happen in the same greenlet, so you can stop them happening if you hold a lock in an RPC call thats being processed, etc. No idea if thats happening here though.

To manage notifications about this bug go to:
https://bugs.launchpad.net/nova/+bug/1462305/+subscriptions


Follow ups

References