yahoo-eng-team team mailing list archive
-
yahoo-eng-team team
-
Mailing list archive
-
Message #33461
[Bug 1462305] [NEW] multi-node test causes nova-compute to lockup
Public bug reported:
Its not very clear whats going on here, but here is the symptom.
One of the nova-compute nodes appears to lock up:
http://logs.openstack.org/67/175067/2/check/check-tempest-dsvm-multinode-full/7a95fb0/logs/screen-n-cpu.txt.gz#_2015-05-29_23_27_48_296
It was just completing the termination of an instance:
http://logs.openstack.org/67/175067/2/check/check-tempest-dsvm-multinode-full/7a95fb0/logs/screen-n-cpu.txt.gz#_2015-05-29_23_27_48_153
This is also seen in the scheduler reporting the node as down:
http://logs.openstack.org/67/175067/2/check/check-tempest-dsvm-multinode-full/7a95fb0/logs/screen-n-sch.txt.gz#_2015-05-29_23_31_02_711
On further inspection it seems like the other nova compute node had just started a migration:
http://logs.openstack.org/67/175067/2/check/check-tempest-dsvm-multinode-full/7a95fb0/logs/subnode-2/screen-n-cpu.txt.gz#_2015-05-29_23_27_48_079
We have had issues in the past where olso.locks can lead to deadlocks, its not totally clear if thats happening here. all the periodic tasks happen in the same greenlet, so you can stop them happening if you hold a lock in an RPC call thats being processed, etc. No idea if thats happening here though.
** Affects: nova
Importance: Undecided
Assignee: Joe Gordon (jogo)
Status: Incomplete
** Tags: testing
** Changed in: nova
Status: New => Incomplete
** Changed in: nova
Assignee: (unassigned) => Joe Gordon (jogo)
** Tags added: testing
--
You received this bug notification because you are a member of Yahoo!
Engineering Team, which is subscribed to OpenStack Compute (nova).
https://bugs.launchpad.net/bugs/1462305
Title:
multi-node test causes nova-compute to lockup
Status in OpenStack Compute (Nova):
Incomplete
Bug description:
Its not very clear whats going on here, but here is the symptom.
One of the nova-compute nodes appears to lock up:
http://logs.openstack.org/67/175067/2/check/check-tempest-dsvm-multinode-full/7a95fb0/logs/screen-n-cpu.txt.gz#_2015-05-29_23_27_48_296
It was just completing the termination of an instance:
http://logs.openstack.org/67/175067/2/check/check-tempest-dsvm-multinode-full/7a95fb0/logs/screen-n-cpu.txt.gz#_2015-05-29_23_27_48_153
This is also seen in the scheduler reporting the node as down:
http://logs.openstack.org/67/175067/2/check/check-tempest-dsvm-multinode-full/7a95fb0/logs/screen-n-sch.txt.gz#_2015-05-29_23_31_02_711
On further inspection it seems like the other nova compute node had just started a migration:
http://logs.openstack.org/67/175067/2/check/check-tempest-dsvm-multinode-full/7a95fb0/logs/subnode-2/screen-n-cpu.txt.gz#_2015-05-29_23_27_48_079
We have had issues in the past where olso.locks can lead to deadlocks, its not totally clear if thats happening here. all the periodic tasks happen in the same greenlet, so you can stop them happening if you hold a lock in an RPC call thats being processed, etc. No idea if thats happening here though.
To manage notifications about this bug go to:
https://bugs.launchpad.net/nova/+bug/1462305/+subscriptions
Follow ups
References