← Back to team overview

yahoo-eng-team team mailing list archive

[Bug 1453350] [NEW] race between neutron port create and nova

 

Public bug reported:


I am doing load testing with tempest scenario tests and see what I think is a race condition between neutron dhcp standup and nova boot.  I believe the scenario I am seeing to be a more general case of https://bugs.launchpad.net/neutron/+bug/1334447.

test environment: 5 compute nodes, 1 controller node running all api and
neutron services.  ubuntu juno hand patched 1382064 and 1385257 and my
workaround in 1451492.  standard neutron setup otherwise.

If I run tempest scenario test test_server_basic_ops 30 times in
parallel things consistently work fine.  If I increase to 60 in parallel
I get lots of failures (see below).  Upon investigation, it looks to me
that neutron standup of netns and its dnsmasq process is too slow and
loses the race with nova boot and the VM comes up without a (dhcp
provided) IP address (causing ssh to timeout and fail).


Traceback (most recent call last):
  File "/home/aqua/tempest/tempest/test.py", line 125, in wrapper
    return f(self, *func_args, **func_kwargs)
  File "/home/aqua/tempest/tempest/scenario/test_server_basic_ops_38.py", line 105, in test_server_basicops
    self.verify_ssh()
  File "/home/aqua/tempest/tempest/scenario/test_server_basic_ops_38.py", line 95, in verify_ssh
    private_key=self.keypair['private_key'])
  File "/home/aqua/tempest/tempest/scenario/manager.py", line 310, in get_remote_client
    linux_client.validate_authentication()
  File "/home/aqua/tempest/tempest/common/utils/linux/remote_client.py", line 55, in validate_authentication
    self.ssh_client.test_connection_auth()
  File "/home/aqua/tempest/tempest/common/ssh.py", line 150, in test_connection_auth
    connection = self._get_ssh_connection()
  File "/home/aqua/tempest/tempest/common/ssh.py", line 87, in _get_ssh_connection
    password=self.password)
tempest.exceptions.SSHTimeout: Connection to the 172.17.205.21 via SSH timed out.
User: cirros, Password: None


Ran 60 tests in 742.931s

FAILED (failures=47)


To reproduce test environment:
1) checkout tempest and remove all tempest scenario tests except test_server_basic_ops
2) run this command to make 59 copies of the test: for i in {1..59}; do cp -p test_server_basic_ops.py test_server_basic_ops_$i.py; sed --in-place -e "s/class TestServerBasicOps(manager.ScenarioTest):/class TestServerBasicOps$i(manager.ScenarioTest):/" -e "s/        super(TestServerBasicOps, self).setUp()/        super(TestServerBasicOps$i, self).setUp()/" -e "s/    @test.idempotent_id('7fff3fb3-91d8-4fd0-bd7d-0204f1f180ba')/    @test.idempotent_id(\'$(uuidgen)\')/" test_server_basic_ops_$i.py; done
3) run 30 tests and observe successful run: OS_TEST_TIMEOUT=1200 ./run_tempest.sh tempest.scenario -- --concurrency=30
4) run 60 tests and observe failures: OS_TEST_TIMEOUT=1200 ./run_tempest.sh tempest.scenario -- --concurrency=60

** Affects: neutron
     Importance: Undecided
         Status: New

-- 
You received this bug notification because you are a member of Yahoo!
Engineering Team, which is subscribed to neutron.
https://bugs.launchpad.net/bugs/1453350

Title:
  race between neutron port create and nova

Status in OpenStack Neutron (virtual network service):
  New

Bug description:
  
  I am doing load testing with tempest scenario tests and see what I think is a race condition between neutron dhcp standup and nova boot.  I believe the scenario I am seeing to be a more general case of https://bugs.launchpad.net/neutron/+bug/1334447.

  test environment: 5 compute nodes, 1 controller node running all api
  and neutron services.  ubuntu juno hand patched 1382064 and 1385257
  and my workaround in 1451492.  standard neutron setup otherwise.

  If I run tempest scenario test test_server_basic_ops 30 times in
  parallel things consistently work fine.  If I increase to 60 in
  parallel I get lots of failures (see below).  Upon investigation, it
  looks to me that neutron standup of netns and its dnsmasq process is
  too slow and loses the race with nova boot and the VM comes up without
  a (dhcp provided) IP address (causing ssh to timeout and fail).

  
  Traceback (most recent call last):
    File "/home/aqua/tempest/tempest/test.py", line 125, in wrapper
      return f(self, *func_args, **func_kwargs)
    File "/home/aqua/tempest/tempest/scenario/test_server_basic_ops_38.py", line 105, in test_server_basicops
      self.verify_ssh()
    File "/home/aqua/tempest/tempest/scenario/test_server_basic_ops_38.py", line 95, in verify_ssh
      private_key=self.keypair['private_key'])
    File "/home/aqua/tempest/tempest/scenario/manager.py", line 310, in get_remote_client
      linux_client.validate_authentication()
    File "/home/aqua/tempest/tempest/common/utils/linux/remote_client.py", line 55, in validate_authentication
      self.ssh_client.test_connection_auth()
    File "/home/aqua/tempest/tempest/common/ssh.py", line 150, in test_connection_auth
      connection = self._get_ssh_connection()
    File "/home/aqua/tempest/tempest/common/ssh.py", line 87, in _get_ssh_connection
      password=self.password)
  tempest.exceptions.SSHTimeout: Connection to the 172.17.205.21 via SSH timed out.
  User: cirros, Password: None

  
  Ran 60 tests in 742.931s

  FAILED (failures=47)

  
  To reproduce test environment:
  1) checkout tempest and remove all tempest scenario tests except test_server_basic_ops
  2) run this command to make 59 copies of the test: for i in {1..59}; do cp -p test_server_basic_ops.py test_server_basic_ops_$i.py; sed --in-place -e "s/class TestServerBasicOps(manager.ScenarioTest):/class TestServerBasicOps$i(manager.ScenarioTest):/" -e "s/        super(TestServerBasicOps, self).setUp()/        super(TestServerBasicOps$i, self).setUp()/" -e "s/    @test.idempotent_id('7fff3fb3-91d8-4fd0-bd7d-0204f1f180ba')/    @test.idempotent_id(\'$(uuidgen)\')/" test_server_basic_ops_$i.py; done
  3) run 30 tests and observe successful run: OS_TEST_TIMEOUT=1200 ./run_tempest.sh tempest.scenario -- --concurrency=30
  4) run 60 tests and observe failures: OS_TEST_TIMEOUT=1200 ./run_tempest.sh tempest.scenario -- --concurrency=60

To manage notifications about this bug go to:
https://bugs.launchpad.net/neutron/+bug/1453350/+subscriptions


Follow ups

References