yahoo-eng-team team mailing list archive
-
yahoo-eng-team team
-
Mailing list archive
-
Message #32756
[Bug 1453350] [NEW] race between neutron port create and nova
Public bug reported:
I am doing load testing with tempest scenario tests and see what I think is a race condition between neutron dhcp standup and nova boot. I believe the scenario I am seeing to be a more general case of https://bugs.launchpad.net/neutron/+bug/1334447.
test environment: 5 compute nodes, 1 controller node running all api and
neutron services. ubuntu juno hand patched 1382064 and 1385257 and my
workaround in 1451492. standard neutron setup otherwise.
If I run tempest scenario test test_server_basic_ops 30 times in
parallel things consistently work fine. If I increase to 60 in parallel
I get lots of failures (see below). Upon investigation, it looks to me
that neutron standup of netns and its dnsmasq process is too slow and
loses the race with nova boot and the VM comes up without a (dhcp
provided) IP address (causing ssh to timeout and fail).
Traceback (most recent call last):
File "/home/aqua/tempest/tempest/test.py", line 125, in wrapper
return f(self, *func_args, **func_kwargs)
File "/home/aqua/tempest/tempest/scenario/test_server_basic_ops_38.py", line 105, in test_server_basicops
self.verify_ssh()
File "/home/aqua/tempest/tempest/scenario/test_server_basic_ops_38.py", line 95, in verify_ssh
private_key=self.keypair['private_key'])
File "/home/aqua/tempest/tempest/scenario/manager.py", line 310, in get_remote_client
linux_client.validate_authentication()
File "/home/aqua/tempest/tempest/common/utils/linux/remote_client.py", line 55, in validate_authentication
self.ssh_client.test_connection_auth()
File "/home/aqua/tempest/tempest/common/ssh.py", line 150, in test_connection_auth
connection = self._get_ssh_connection()
File "/home/aqua/tempest/tempest/common/ssh.py", line 87, in _get_ssh_connection
password=self.password)
tempest.exceptions.SSHTimeout: Connection to the 172.17.205.21 via SSH timed out.
User: cirros, Password: None
Ran 60 tests in 742.931s
FAILED (failures=47)
To reproduce test environment:
1) checkout tempest and remove all tempest scenario tests except test_server_basic_ops
2) run this command to make 59 copies of the test: for i in {1..59}; do cp -p test_server_basic_ops.py test_server_basic_ops_$i.py; sed --in-place -e "s/class TestServerBasicOps(manager.ScenarioTest):/class TestServerBasicOps$i(manager.ScenarioTest):/" -e "s/ super(TestServerBasicOps, self).setUp()/ super(TestServerBasicOps$i, self).setUp()/" -e "s/ @test.idempotent_id('7fff3fb3-91d8-4fd0-bd7d-0204f1f180ba')/ @test.idempotent_id(\'$(uuidgen)\')/" test_server_basic_ops_$i.py; done
3) run 30 tests and observe successful run: OS_TEST_TIMEOUT=1200 ./run_tempest.sh tempest.scenario -- --concurrency=30
4) run 60 tests and observe failures: OS_TEST_TIMEOUT=1200 ./run_tempest.sh tempest.scenario -- --concurrency=60
** Affects: neutron
Importance: Undecided
Status: New
--
You received this bug notification because you are a member of Yahoo!
Engineering Team, which is subscribed to neutron.
https://bugs.launchpad.net/bugs/1453350
Title:
race between neutron port create and nova
Status in OpenStack Neutron (virtual network service):
New
Bug description:
I am doing load testing with tempest scenario tests and see what I think is a race condition between neutron dhcp standup and nova boot. I believe the scenario I am seeing to be a more general case of https://bugs.launchpad.net/neutron/+bug/1334447.
test environment: 5 compute nodes, 1 controller node running all api
and neutron services. ubuntu juno hand patched 1382064 and 1385257
and my workaround in 1451492. standard neutron setup otherwise.
If I run tempest scenario test test_server_basic_ops 30 times in
parallel things consistently work fine. If I increase to 60 in
parallel I get lots of failures (see below). Upon investigation, it
looks to me that neutron standup of netns and its dnsmasq process is
too slow and loses the race with nova boot and the VM comes up without
a (dhcp provided) IP address (causing ssh to timeout and fail).
Traceback (most recent call last):
File "/home/aqua/tempest/tempest/test.py", line 125, in wrapper
return f(self, *func_args, **func_kwargs)
File "/home/aqua/tempest/tempest/scenario/test_server_basic_ops_38.py", line 105, in test_server_basicops
self.verify_ssh()
File "/home/aqua/tempest/tempest/scenario/test_server_basic_ops_38.py", line 95, in verify_ssh
private_key=self.keypair['private_key'])
File "/home/aqua/tempest/tempest/scenario/manager.py", line 310, in get_remote_client
linux_client.validate_authentication()
File "/home/aqua/tempest/tempest/common/utils/linux/remote_client.py", line 55, in validate_authentication
self.ssh_client.test_connection_auth()
File "/home/aqua/tempest/tempest/common/ssh.py", line 150, in test_connection_auth
connection = self._get_ssh_connection()
File "/home/aqua/tempest/tempest/common/ssh.py", line 87, in _get_ssh_connection
password=self.password)
tempest.exceptions.SSHTimeout: Connection to the 172.17.205.21 via SSH timed out.
User: cirros, Password: None
Ran 60 tests in 742.931s
FAILED (failures=47)
To reproduce test environment:
1) checkout tempest and remove all tempest scenario tests except test_server_basic_ops
2) run this command to make 59 copies of the test: for i in {1..59}; do cp -p test_server_basic_ops.py test_server_basic_ops_$i.py; sed --in-place -e "s/class TestServerBasicOps(manager.ScenarioTest):/class TestServerBasicOps$i(manager.ScenarioTest):/" -e "s/ super(TestServerBasicOps, self).setUp()/ super(TestServerBasicOps$i, self).setUp()/" -e "s/ @test.idempotent_id('7fff3fb3-91d8-4fd0-bd7d-0204f1f180ba')/ @test.idempotent_id(\'$(uuidgen)\')/" test_server_basic_ops_$i.py; done
3) run 30 tests and observe successful run: OS_TEST_TIMEOUT=1200 ./run_tempest.sh tempest.scenario -- --concurrency=30
4) run 60 tests and observe failures: OS_TEST_TIMEOUT=1200 ./run_tempest.sh tempest.scenario -- --concurrency=60
To manage notifications about this bug go to:
https://bugs.launchpad.net/neutron/+bug/1453350/+subscriptions
Follow ups
References