← Back to team overview

yahoo-eng-team team mailing list archive

[Bug 1492249] [NEW] regression in network performance causing large-ops jobs failures

 

Public bug reported:

We're getting a pretty increased failure rate with large-ops jobs in
Nova, and it looks like it's around message timeouts in tearing down
networks. This very distinctly started showing up this week in the race
to merge code.

Example failure: http://logs.openstack.org/29/220229/3/check/gate-
tempest-dsvm-large-
ops/4b9bd8f/logs/screen-n-cpu-1.txt.gz#_2015-09-04_10_47_08_361


2015-09-04 10:47:08.226 ERROR nova.compute.manager [req-bd86a3c8-121b-43c9-ac02-c1d3dcdb2136 tempest-TestLargeOpsScenario-1437588360 tempest-TestLargeOpsScenario-2081713896] [instance: da1c0939-67ae-47bf-a403-1323ae2e1f7e] Failed to deallocate network for instance.
2015-09-04 10:47:08.329 DEBUG nova.compute.manager [req-08f02702-32ba-44f2-aef3-269a0eff3550 tempest-TestLargeOpsScenario-1437588360 tempest-TestLargeOpsScenario-2081713896] [instance: 2999f5a3-746b-4c7a-9dcd-0f82c031f540] terminating bdm BlockDeviceMapping(boot_index=0,connection_info=None,created_at=2015-09-04T10:42:57Z,delete_on_termination=True,deleted=False,deleted_at=None,destination_type='local',device_name='/dev/sda',device_type='disk',disk_bus=None,guest_format=None,id=281,image_id='4850537c-18cc-438b-b154-1aafb3c36ec8',instance=<?>,instance_uuid=2999f5a3-746b-4c7a-9dcd-0f82c031f540,no_device=False,snapshot_id=None,source_type='image',updated_at=2015-09-04T10:43:54Z,volume_id=None,volume_size=None) _cleanup_volumes /opt/stack/new/nova/nova/compute/manager.py:2299
2015-09-04 10:47:08.361 ERROR nova.compute.manager [req-bd86a3c8-121b-43c9-ac02-c1d3dcdb2136 tempest-TestLargeOpsScenario-1437588360 tempest-TestLargeOpsScenario-2081713896] [instance: da1c0939-67ae-47bf-a403-1323ae2e1f7e] Setting instance vm_state to ERROR
2015-09-04 10:47:08.361 8218 ERROR nova.compute.manager [instance: da1c0939-67ae-47bf-a403-1323ae2e1f7e] Traceback (most recent call last):
2015-09-04 10:47:08.361 8218 ERROR nova.compute.manager [instance: da1c0939-67ae-47bf-a403-1323ae2e1f7e]   File "/opt/stack/new/nova/nova/compute/manager.py", line 2396, in do_terminate_instance
2015-09-04 10:47:08.361 8218 ERROR nova.compute.manager [instance: da1c0939-67ae-47bf-a403-1323ae2e1f7e]     self._delete_instance(context, instance, bdms, quotas)
2015-09-04 10:47:08.361 8218 ERROR nova.compute.manager [instance: da1c0939-67ae-47bf-a403-1323ae2e1f7e]   File "/opt/stack/new/nova/nova/hooks.py", line 149, in inner
2015-09-04 10:47:08.361 8218 ERROR nova.compute.manager [instance: da1c0939-67ae-47bf-a403-1323ae2e1f7e]     rv = f(*args, **kwargs)
2015-09-04 10:47:08.361 8218 ERROR nova.compute.manager [instance: da1c0939-67ae-47bf-a403-1323ae2e1f7e]   File "/opt/stack/new/nova/nova/compute/manager.py", line 2375, in _delete_instance
2015-09-04 10:47:08.361 8218 ERROR nova.compute.manager [instance: da1c0939-67ae-47bf-a403-1323ae2e1f7e]     quotas.rollback()
2015-09-04 10:47:08.361 8218 ERROR nova.compute.manager [instance: da1c0939-67ae-47bf-a403-1323ae2e1f7e]   File "/usr/local/lib/python2.7/dist-packages/oslo_utils/excutils.py", line 195, in __exit__
2015-09-04 10:47:08.361 8218 ERROR nova.compute.manager [instance: da1c0939-67ae-47bf-a403-1323ae2e1f7e]     six.reraise(self.type_, self.value, self.tb)
2015-09-04 10:47:08.361 8218 ERROR nova.compute.manager [instance: da1c0939-67ae-47bf-a403-1323ae2e1f7e]   File "/opt/stack/new/nova/nova/compute/manager.py", line 2338, in _delete_instance
2015-09-04 10:47:08.361 8218 ERROR nova.compute.manager [instance: da1c0939-67ae-47bf-a403-1323ae2e1f7e]     self._shutdown_instance(context, instance, bdms)
2015-09-04 10:47:08.361 8218 ERROR nova.compute.manager [instance: da1c0939-67ae-47bf-a403-1323ae2e1f7e]   File "/opt/stack/new/nova/nova/compute/manager.py", line 2265, in _shutdown_instance
2015-09-04 10:47:08.361 8218 ERROR nova.compute.manager [instance: da1c0939-67ae-47bf-a403-1323ae2e1f7e]     self._try_deallocate_network(context, instance, requested_networks)
2015-09-04 10:47:08.361 8218 ERROR nova.compute.manager [instance: da1c0939-67ae-47bf-a403-1323ae2e1f7e]   File "/opt/stack/new/nova/nova/compute/manager.py", line 2194, in _try_deallocate_network
2015-09-04 10:47:08.361 8218 ERROR nova.compute.manager [instance: da1c0939-67ae-47bf-a403-1323ae2e1f7e]     self._set_instance_obj_error_state(context, instance)
2015-09-04 10:47:08.361 8218 ERROR nova.compute.manager [instance: da1c0939-67ae-47bf-a403-1323ae2e1f7e]   File "/usr/local/lib/python2.7/dist-packages/oslo_utils/excutils.py", line 195, in __exit__
2015-09-04 10:47:08.361 8218 ERROR nova.compute.manager [instance: da1c0939-67ae-47bf-a403-1323ae2e1f7e]     six.reraise(self.type_, self.value, self.tb)
2015-09-04 10:47:08.361 8218 ERROR nova.compute.manager [instance: da1c0939-67ae-47bf-a403-1323ae2e1f7e]   File "/opt/stack/new/nova/nova/compute/manager.py", line 2189, in _try_deallocate_network
2015-09-04 10:47:08.361 8218 ERROR nova.compute.manager [instance: da1c0939-67ae-47bf-a403-1323ae2e1f7e]     self._deallocate_network(context, instance, requested_networks)
2015-09-04 10:47:08.361 8218 ERROR nova.compute.manager [instance: da1c0939-67ae-47bf-a403-1323ae2e1f7e]   File "/opt/stack/new/nova/nova/compute/manager.py", line 1812, in _deallocate_network
2015-09-04 10:47:08.361 8218 ERROR nova.compute.manager [instance: da1c0939-67ae-47bf-a403-1323ae2e1f7e]     context, instance, requested_networks=requested_networks)
2015-09-04 10:47:08.361 8218 ERROR nova.compute.manager [instance: da1c0939-67ae-47bf-a403-1323ae2e1f7e]   File "/opt/stack/new/nova/nova/network/api.py", line 49, in wrapped
2015-09-04 10:47:08.361 8218 ERROR nova.compute.manager [instance: da1c0939-67ae-47bf-a403-1323ae2e1f7e]     return func(self, context, *args, **kwargs)
2015-09-04 10:47:08.361 8218 ERROR nova.compute.manager [instance: da1c0939-67ae-47bf-a403-1323ae2e1f7e]   File "/opt/stack/new/nova/nova/network/api.py", line 297, in deallocate_for_instance
2015-09-04 10:47:08.361 8218 ERROR nova.compute.manager [instance: da1c0939-67ae-47bf-a403-1323ae2e1f7e]     requested_networks=requested_networks)
2015-09-04 10:47:08.361 8218 ERROR nova.compute.manager [instance: da1c0939-67ae-47bf-a403-1323ae2e1f7e]   File "/opt/stack/new/nova/nova/network/rpcapi.py", line 175, in deallocate_for_instance
2015-09-04 10:47:08.361 8218 ERROR nova.compute.manager [instance: da1c0939-67ae-47bf-a403-1323ae2e1f7e]     return cctxt.call(ctxt, 'deallocate_for_instance', **kwargs)
2015-09-04 10:47:08.361 8218 ERROR nova.compute.manager [instance: da1c0939-67ae-47bf-a403-1323ae2e1f7e]   File "/usr/local/lib/python2.7/dist-packages/oslo_messaging/rpc/client.py", line 158, in call
2015-09-04 10:47:08.361 8218 ERROR nova.compute.manager [instance: da1c0939-67ae-47bf-a403-1323ae2e1f7e]     retry=self.retry)
2015-09-04 10:47:08.361 8218 ERROR nova.compute.manager [instance: da1c0939-67ae-47bf-a403-1323ae2e1f7e]   File "/usr/local/lib/python2.7/dist-packages/oslo_messaging/transport.py", line 90, in _send
2015-09-04 10:47:08.361 8218 ERROR nova.compute.manager [instance: da1c0939-67ae-47bf-a403-1323ae2e1f7e]     timeout=timeout, retry=retry)
2015-09-04 10:47:08.361 8218 ERROR nova.compute.manager [instance: da1c0939-67ae-47bf-a403-1323ae2e1f7e]   File "/usr/local/lib/python2.7/dist-packages/oslo_messaging/_drivers/amqpdriver.py", line 431, in send
2015-09-04 10:47:08.361 8218 ERROR nova.compute.manager [instance: da1c0939-67ae-47bf-a403-1323ae2e1f7e]     retry=retry)
2015-09-04 10:47:08.361 8218 ERROR nova.compute.manager [instance: da1c0939-67ae-47bf-a403-1323ae2e1f7e]   File "/usr/local/lib/python2.7/dist-packages/oslo_messaging/_drivers/amqpdriver.py", line 420, in _send
2015-09-04 10:47:08.361 8218 ERROR nova.compute.manager [instance: da1c0939-67ae-47bf-a403-1323ae2e1f7e]     result = self._waiter.wait(msg_id, timeout)
2015-09-04 10:47:08.361 8218 ERROR nova.compute.manager [instance: da1c0939-67ae-47bf-a403-1323ae2e1f7e]   File "/usr/local/lib/python2.7/dist-packages/oslo_messaging/_drivers/amqpdriver.py", line 318, in wait
2015-09-04 10:47:08.361 8218 ERROR nova.compute.manager [instance: da1c0939-67ae-47bf-a403-1323ae2e1f7e]     message = self.waiters.get(msg_id, timeout=timeout)
2015-09-04 10:47:08.361 8218 ERROR nova.compute.manager [instance: da1c0939-67ae-47bf-a403-1323ae2e1f7e]   File "/usr/local/lib/python2.7/dist-packages/oslo_messaging/_drivers/amqpdriver.py", line 223, in get
2015-09-04 10:47:08.361 8218 ERROR nova.compute.manager [instance: da1c0939-67ae-47bf-a403-1323ae2e1f7e]     'to message ID %s' % msg_id)
2015-09-04 10:47:08.361 8218 ERROR nova.compute.manager [instance: da1c0939-67ae-47bf-a403-1323ae2e1f7e] MessagingTimeout: Timed out waiting for a reply to message ID 4ce92e0a720b4f7fb1b26e1af1eff6ae
2015-09-04 10:47:08.361 8218 ERROR nova.compute.manager [instance: da1c0939-67ae-47bf-a403-1323ae2e1f7e]

** Affects: nova
     Importance: High
         Status: Confirmed


** Tags: gate

** Changed in: nova
   Importance: Undecided => High

** Changed in: nova
       Status: New => Confirmed

** Tags added: gate

-- 
You received this bug notification because you are a member of Yahoo!
Engineering Team, which is subscribed to OpenStack Compute (nova).
https://bugs.launchpad.net/bugs/1492249

Title:
  regression in network performance causing large-ops jobs failures

Status in OpenStack Compute (nova):
  Confirmed

Bug description:
  We're getting a pretty increased failure rate with large-ops jobs in
  Nova, and it looks like it's around message timeouts in tearing down
  networks. This very distinctly started showing up this week in the
  race to merge code.

  Example failure: http://logs.openstack.org/29/220229/3/check/gate-
  tempest-dsvm-large-
  ops/4b9bd8f/logs/screen-n-cpu-1.txt.gz#_2015-09-04_10_47_08_361

  
  2015-09-04 10:47:08.226 ERROR nova.compute.manager [req-bd86a3c8-121b-43c9-ac02-c1d3dcdb2136 tempest-TestLargeOpsScenario-1437588360 tempest-TestLargeOpsScenario-2081713896] [instance: da1c0939-67ae-47bf-a403-1323ae2e1f7e] Failed to deallocate network for instance.
  2015-09-04 10:47:08.329 DEBUG nova.compute.manager [req-08f02702-32ba-44f2-aef3-269a0eff3550 tempest-TestLargeOpsScenario-1437588360 tempest-TestLargeOpsScenario-2081713896] [instance: 2999f5a3-746b-4c7a-9dcd-0f82c031f540] terminating bdm BlockDeviceMapping(boot_index=0,connection_info=None,created_at=2015-09-04T10:42:57Z,delete_on_termination=True,deleted=False,deleted_at=None,destination_type='local',device_name='/dev/sda',device_type='disk',disk_bus=None,guest_format=None,id=281,image_id='4850537c-18cc-438b-b154-1aafb3c36ec8',instance=<?>,instance_uuid=2999f5a3-746b-4c7a-9dcd-0f82c031f540,no_device=False,snapshot_id=None,source_type='image',updated_at=2015-09-04T10:43:54Z,volume_id=None,volume_size=None) _cleanup_volumes /opt/stack/new/nova/nova/compute/manager.py:2299
  2015-09-04 10:47:08.361 ERROR nova.compute.manager [req-bd86a3c8-121b-43c9-ac02-c1d3dcdb2136 tempest-TestLargeOpsScenario-1437588360 tempest-TestLargeOpsScenario-2081713896] [instance: da1c0939-67ae-47bf-a403-1323ae2e1f7e] Setting instance vm_state to ERROR
  2015-09-04 10:47:08.361 8218 ERROR nova.compute.manager [instance: da1c0939-67ae-47bf-a403-1323ae2e1f7e] Traceback (most recent call last):
  2015-09-04 10:47:08.361 8218 ERROR nova.compute.manager [instance: da1c0939-67ae-47bf-a403-1323ae2e1f7e]   File "/opt/stack/new/nova/nova/compute/manager.py", line 2396, in do_terminate_instance
  2015-09-04 10:47:08.361 8218 ERROR nova.compute.manager [instance: da1c0939-67ae-47bf-a403-1323ae2e1f7e]     self._delete_instance(context, instance, bdms, quotas)
  2015-09-04 10:47:08.361 8218 ERROR nova.compute.manager [instance: da1c0939-67ae-47bf-a403-1323ae2e1f7e]   File "/opt/stack/new/nova/nova/hooks.py", line 149, in inner
  2015-09-04 10:47:08.361 8218 ERROR nova.compute.manager [instance: da1c0939-67ae-47bf-a403-1323ae2e1f7e]     rv = f(*args, **kwargs)
  2015-09-04 10:47:08.361 8218 ERROR nova.compute.manager [instance: da1c0939-67ae-47bf-a403-1323ae2e1f7e]   File "/opt/stack/new/nova/nova/compute/manager.py", line 2375, in _delete_instance
  2015-09-04 10:47:08.361 8218 ERROR nova.compute.manager [instance: da1c0939-67ae-47bf-a403-1323ae2e1f7e]     quotas.rollback()
  2015-09-04 10:47:08.361 8218 ERROR nova.compute.manager [instance: da1c0939-67ae-47bf-a403-1323ae2e1f7e]   File "/usr/local/lib/python2.7/dist-packages/oslo_utils/excutils.py", line 195, in __exit__
  2015-09-04 10:47:08.361 8218 ERROR nova.compute.manager [instance: da1c0939-67ae-47bf-a403-1323ae2e1f7e]     six.reraise(self.type_, self.value, self.tb)
  2015-09-04 10:47:08.361 8218 ERROR nova.compute.manager [instance: da1c0939-67ae-47bf-a403-1323ae2e1f7e]   File "/opt/stack/new/nova/nova/compute/manager.py", line 2338, in _delete_instance
  2015-09-04 10:47:08.361 8218 ERROR nova.compute.manager [instance: da1c0939-67ae-47bf-a403-1323ae2e1f7e]     self._shutdown_instance(context, instance, bdms)
  2015-09-04 10:47:08.361 8218 ERROR nova.compute.manager [instance: da1c0939-67ae-47bf-a403-1323ae2e1f7e]   File "/opt/stack/new/nova/nova/compute/manager.py", line 2265, in _shutdown_instance
  2015-09-04 10:47:08.361 8218 ERROR nova.compute.manager [instance: da1c0939-67ae-47bf-a403-1323ae2e1f7e]     self._try_deallocate_network(context, instance, requested_networks)
  2015-09-04 10:47:08.361 8218 ERROR nova.compute.manager [instance: da1c0939-67ae-47bf-a403-1323ae2e1f7e]   File "/opt/stack/new/nova/nova/compute/manager.py", line 2194, in _try_deallocate_network
  2015-09-04 10:47:08.361 8218 ERROR nova.compute.manager [instance: da1c0939-67ae-47bf-a403-1323ae2e1f7e]     self._set_instance_obj_error_state(context, instance)
  2015-09-04 10:47:08.361 8218 ERROR nova.compute.manager [instance: da1c0939-67ae-47bf-a403-1323ae2e1f7e]   File "/usr/local/lib/python2.7/dist-packages/oslo_utils/excutils.py", line 195, in __exit__
  2015-09-04 10:47:08.361 8218 ERROR nova.compute.manager [instance: da1c0939-67ae-47bf-a403-1323ae2e1f7e]     six.reraise(self.type_, self.value, self.tb)
  2015-09-04 10:47:08.361 8218 ERROR nova.compute.manager [instance: da1c0939-67ae-47bf-a403-1323ae2e1f7e]   File "/opt/stack/new/nova/nova/compute/manager.py", line 2189, in _try_deallocate_network
  2015-09-04 10:47:08.361 8218 ERROR nova.compute.manager [instance: da1c0939-67ae-47bf-a403-1323ae2e1f7e]     self._deallocate_network(context, instance, requested_networks)
  2015-09-04 10:47:08.361 8218 ERROR nova.compute.manager [instance: da1c0939-67ae-47bf-a403-1323ae2e1f7e]   File "/opt/stack/new/nova/nova/compute/manager.py", line 1812, in _deallocate_network
  2015-09-04 10:47:08.361 8218 ERROR nova.compute.manager [instance: da1c0939-67ae-47bf-a403-1323ae2e1f7e]     context, instance, requested_networks=requested_networks)
  2015-09-04 10:47:08.361 8218 ERROR nova.compute.manager [instance: da1c0939-67ae-47bf-a403-1323ae2e1f7e]   File "/opt/stack/new/nova/nova/network/api.py", line 49, in wrapped
  2015-09-04 10:47:08.361 8218 ERROR nova.compute.manager [instance: da1c0939-67ae-47bf-a403-1323ae2e1f7e]     return func(self, context, *args, **kwargs)
  2015-09-04 10:47:08.361 8218 ERROR nova.compute.manager [instance: da1c0939-67ae-47bf-a403-1323ae2e1f7e]   File "/opt/stack/new/nova/nova/network/api.py", line 297, in deallocate_for_instance
  2015-09-04 10:47:08.361 8218 ERROR nova.compute.manager [instance: da1c0939-67ae-47bf-a403-1323ae2e1f7e]     requested_networks=requested_networks)
  2015-09-04 10:47:08.361 8218 ERROR nova.compute.manager [instance: da1c0939-67ae-47bf-a403-1323ae2e1f7e]   File "/opt/stack/new/nova/nova/network/rpcapi.py", line 175, in deallocate_for_instance
  2015-09-04 10:47:08.361 8218 ERROR nova.compute.manager [instance: da1c0939-67ae-47bf-a403-1323ae2e1f7e]     return cctxt.call(ctxt, 'deallocate_for_instance', **kwargs)
  2015-09-04 10:47:08.361 8218 ERROR nova.compute.manager [instance: da1c0939-67ae-47bf-a403-1323ae2e1f7e]   File "/usr/local/lib/python2.7/dist-packages/oslo_messaging/rpc/client.py", line 158, in call
  2015-09-04 10:47:08.361 8218 ERROR nova.compute.manager [instance: da1c0939-67ae-47bf-a403-1323ae2e1f7e]     retry=self.retry)
  2015-09-04 10:47:08.361 8218 ERROR nova.compute.manager [instance: da1c0939-67ae-47bf-a403-1323ae2e1f7e]   File "/usr/local/lib/python2.7/dist-packages/oslo_messaging/transport.py", line 90, in _send
  2015-09-04 10:47:08.361 8218 ERROR nova.compute.manager [instance: da1c0939-67ae-47bf-a403-1323ae2e1f7e]     timeout=timeout, retry=retry)
  2015-09-04 10:47:08.361 8218 ERROR nova.compute.manager [instance: da1c0939-67ae-47bf-a403-1323ae2e1f7e]   File "/usr/local/lib/python2.7/dist-packages/oslo_messaging/_drivers/amqpdriver.py", line 431, in send
  2015-09-04 10:47:08.361 8218 ERROR nova.compute.manager [instance: da1c0939-67ae-47bf-a403-1323ae2e1f7e]     retry=retry)
  2015-09-04 10:47:08.361 8218 ERROR nova.compute.manager [instance: da1c0939-67ae-47bf-a403-1323ae2e1f7e]   File "/usr/local/lib/python2.7/dist-packages/oslo_messaging/_drivers/amqpdriver.py", line 420, in _send
  2015-09-04 10:47:08.361 8218 ERROR nova.compute.manager [instance: da1c0939-67ae-47bf-a403-1323ae2e1f7e]     result = self._waiter.wait(msg_id, timeout)
  2015-09-04 10:47:08.361 8218 ERROR nova.compute.manager [instance: da1c0939-67ae-47bf-a403-1323ae2e1f7e]   File "/usr/local/lib/python2.7/dist-packages/oslo_messaging/_drivers/amqpdriver.py", line 318, in wait
  2015-09-04 10:47:08.361 8218 ERROR nova.compute.manager [instance: da1c0939-67ae-47bf-a403-1323ae2e1f7e]     message = self.waiters.get(msg_id, timeout=timeout)
  2015-09-04 10:47:08.361 8218 ERROR nova.compute.manager [instance: da1c0939-67ae-47bf-a403-1323ae2e1f7e]   File "/usr/local/lib/python2.7/dist-packages/oslo_messaging/_drivers/amqpdriver.py", line 223, in get
  2015-09-04 10:47:08.361 8218 ERROR nova.compute.manager [instance: da1c0939-67ae-47bf-a403-1323ae2e1f7e]     'to message ID %s' % msg_id)
  2015-09-04 10:47:08.361 8218 ERROR nova.compute.manager [instance: da1c0939-67ae-47bf-a403-1323ae2e1f7e] MessagingTimeout: Timed out waiting for a reply to message ID 4ce92e0a720b4f7fb1b26e1af1eff6ae
  2015-09-04 10:47:08.361 8218 ERROR nova.compute.manager [instance: da1c0939-67ae-47bf-a403-1323ae2e1f7e]

To manage notifications about this bug go to:
https://bugs.launchpad.net/nova/+bug/1492249/+subscriptions