← Back to team overview

yahoo-eng-team team mailing list archive

[Bug 1491949] Re: gate-tempest-dsvm-large-ops fails to deallocate instance network due to rpc timeout

 

Reviewed:  https://review.openstack.org/220525
Committed: https://git.openstack.org/cgit/openstack-dev/devstack/commit/?id=975243189216561f66ca91520495e0c6e2f747e2
Submitter: Jenkins
Branch:    master

commit 975243189216561f66ca91520495e0c6e2f747e2
Author: Matt Riedemann <mriedem@xxxxxxxxxx>
Date:   Fri Sep 4 14:15:27 2015 +0000

    Revert "turn multi host true for nova network by default"
    
    This reverts commit 2e1a91c50b73ca7f46871d3a906ade93bbcac6a7
    
    It looks like this introduced race bug 1491949 in the
    gate-tempest-dsvm-large-ops job causing rpc timeouts when
    deallocating network information for an instance,
    specifically around the dnsmasq callback to release the
    fixed IP that the instance was using which triggers the
    disassociation between the fixed IP and the instance in the
    nova database.
    
    Change-Id: I163cdeea75e92485f241647c69aea0d7456c3258
    Closes-Bug: #1491949


** Changed in: devstack
       Status: In Progress => Fix Released

-- 
You received this bug notification because you are a member of Yahoo!
Engineering Team, which is subscribed to OpenStack Compute (nova).
https://bugs.launchpad.net/bugs/1491949

Title:
  gate-tempest-dsvm-large-ops fails to deallocate instance network due
  to rpc timeout

Status in devstack:
  Fix Released
Status in OpenStack Compute (nova):
  Invalid

Bug description:
  http://logs.openstack.org/96/219696/4/check/gate-tempest-dsvm-large-
  ops/158f061/logs/screen-n-cpu-1.txt.gz?level=TRACE

  2015-09-03 15:11:10.090 ERROR nova.compute.manager [req-ae96c425-a199-472f-a0db-e1b48147bb4c tempest-TestLargeOpsScenario-1690771764 tempest-TestLargeOpsScenario-1474206998] [instance: 195361d7-95c3-4740-825b-1acab707969e] Failed to deallocate network for instance.
  2015-09-03 15:11:11.051 ERROR nova.compute.manager [req-ae96c425-a199-472f-a0db-e1b48147bb4c tempest-TestLargeOpsScenario-1690771764 tempest-TestLargeOpsScenario-1474206998] [instance: 195361d7-95c3-4740-825b-1acab707969e] Setting instance vm_state to ERROR
  2015-09-03 15:11:11.051 22635 ERROR nova.compute.manager [instance: 195361d7-95c3-4740-825b-1acab707969e] Traceback (most recent call last):
  2015-09-03 15:11:11.051 22635 ERROR nova.compute.manager [instance: 195361d7-95c3-4740-825b-1acab707969e]   File "/opt/stack/new/nova/nova/compute/manager.py", line 2396, in do_terminate_instance
  2015-09-03 15:11:11.051 22635 ERROR nova.compute.manager [instance: 195361d7-95c3-4740-825b-1acab707969e]     self._delete_instance(context, instance, bdms, quotas)
  2015-09-03 15:11:11.051 22635 ERROR nova.compute.manager [instance: 195361d7-95c3-4740-825b-1acab707969e]   File "/opt/stack/new/nova/nova/hooks.py", line 149, in inner
  2015-09-03 15:11:11.051 22635 ERROR nova.compute.manager [instance: 195361d7-95c3-4740-825b-1acab707969e]     rv = f(*args, **kwargs)
  2015-09-03 15:11:11.051 22635 ERROR nova.compute.manager [instance: 195361d7-95c3-4740-825b-1acab707969e]   File "/opt/stack/new/nova/nova/compute/manager.py", line 2375, in _delete_instance
  2015-09-03 15:11:11.051 22635 ERROR nova.compute.manager [instance: 195361d7-95c3-4740-825b-1acab707969e]     quotas.rollback()
  2015-09-03 15:11:11.051 22635 ERROR nova.compute.manager [instance: 195361d7-95c3-4740-825b-1acab707969e]   File "/usr/local/lib/python2.7/dist-packages/oslo_utils/excutils.py", line 195, in __exit__
  2015-09-03 15:11:11.051 22635 ERROR nova.compute.manager [instance: 195361d7-95c3-4740-825b-1acab707969e]     six.reraise(self.type_, self.value, self.tb)
  2015-09-03 15:11:11.051 22635 ERROR nova.compute.manager [instance: 195361d7-95c3-4740-825b-1acab707969e]   File "/opt/stack/new/nova/nova/compute/manager.py", line 2338, in _delete_instance
  2015-09-03 15:11:11.051 22635 ERROR nova.compute.manager [instance: 195361d7-95c3-4740-825b-1acab707969e]     self._shutdown_instance(context, instance, bdms)
  2015-09-03 15:11:11.051 22635 ERROR nova.compute.manager [instance: 195361d7-95c3-4740-825b-1acab707969e]   File "/opt/stack/new/nova/nova/compute/manager.py", line 2265, in _shutdown_instance
  2015-09-03 15:11:11.051 22635 ERROR nova.compute.manager [instance: 195361d7-95c3-4740-825b-1acab707969e]     self._try_deallocate_network(context, instance, requested_networks)
  2015-09-03 15:11:11.051 22635 ERROR nova.compute.manager [instance: 195361d7-95c3-4740-825b-1acab707969e]   File "/opt/stack/new/nova/nova/compute/manager.py", line 2194, in _try_deallocate_network
  2015-09-03 15:11:11.051 22635 ERROR nova.compute.manager [instance: 195361d7-95c3-4740-825b-1acab707969e]     self._set_instance_obj_error_state(context, instance)
  2015-09-03 15:11:11.051 22635 ERROR nova.compute.manager [instance: 195361d7-95c3-4740-825b-1acab707969e]   File "/usr/local/lib/python2.7/dist-packages/oslo_utils/excutils.py", line 195, in __exit__
  2015-09-03 15:11:11.051 22635 ERROR nova.compute.manager [instance: 195361d7-95c3-4740-825b-1acab707969e]     six.reraise(self.type_, self.value, self.tb)
  2015-09-03 15:11:11.051 22635 ERROR nova.compute.manager [instance: 195361d7-95c3-4740-825b-1acab707969e]   File "/opt/stack/new/nova/nova/compute/manager.py", line 2189, in _try_deallocate_network
  2015-09-03 15:11:11.051 22635 ERROR nova.compute.manager [instance: 195361d7-95c3-4740-825b-1acab707969e]     self._deallocate_network(context, instance, requested_networks)
  2015-09-03 15:11:11.051 22635 ERROR nova.compute.manager [instance: 195361d7-95c3-4740-825b-1acab707969e]   File "/opt/stack/new/nova/nova/compute/manager.py", line 1812, in _deallocate_network
  2015-09-03 15:11:11.051 22635 ERROR nova.compute.manager [instance: 195361d7-95c3-4740-825b-1acab707969e]     context, instance, requested_networks=requested_networks)
  2015-09-03 15:11:11.051 22635 ERROR nova.compute.manager [instance: 195361d7-95c3-4740-825b-1acab707969e]   File "/opt/stack/new/nova/nova/network/api.py", line 49, in wrapped
  2015-09-03 15:11:11.051 22635 ERROR nova.compute.manager [instance: 195361d7-95c3-4740-825b-1acab707969e]     return func(self, context, *args, **kwargs)
  2015-09-03 15:11:11.051 22635 ERROR nova.compute.manager [instance: 195361d7-95c3-4740-825b-1acab707969e]   File "/opt/stack/new/nova/nova/network/api.py", line 297, in deallocate_for_instance
  2015-09-03 15:11:11.051 22635 ERROR nova.compute.manager [instance: 195361d7-95c3-4740-825b-1acab707969e]     requested_networks=requested_networks)
  2015-09-03 15:11:11.051 22635 ERROR nova.compute.manager [instance: 195361d7-95c3-4740-825b-1acab707969e]   File "/opt/stack/new/nova/nova/network/rpcapi.py", line 175, in deallocate_for_instance
  2015-09-03 15:11:11.051 22635 ERROR nova.compute.manager [instance: 195361d7-95c3-4740-825b-1acab707969e]     return cctxt.call(ctxt, 'deallocate_for_instance', **kwargs)
  2015-09-03 15:11:11.051 22635 ERROR nova.compute.manager [instance: 195361d7-95c3-4740-825b-1acab707969e]   File "/usr/local/lib/python2.7/dist-packages/oslo_messaging/rpc/client.py", line 158, in call
  2015-09-03 15:11:11.051 22635 ERROR nova.compute.manager [instance: 195361d7-95c3-4740-825b-1acab707969e]     retry=self.retry)
  2015-09-03 15:11:11.051 22635 ERROR nova.compute.manager [instance: 195361d7-95c3-4740-825b-1acab707969e]   File "/usr/local/lib/python2.7/dist-packages/oslo_messaging/transport.py", line 90, in _send
  2015-09-03 15:11:11.051 22635 ERROR nova.compute.manager [instance: 195361d7-95c3-4740-825b-1acab707969e]     timeout=timeout, retry=retry)
  2015-09-03 15:11:11.051 22635 ERROR nova.compute.manager [instance: 195361d7-95c3-4740-825b-1acab707969e]   File "/usr/local/lib/python2.7/dist-packages/oslo_messaging/_drivers/amqpdriver.py", line 431, in send
  2015-09-03 15:11:11.051 22635 ERROR nova.compute.manager [instance: 195361d7-95c3-4740-825b-1acab707969e]     retry=retry)
  2015-09-03 15:11:11.051 22635 ERROR nova.compute.manager [instance: 195361d7-95c3-4740-825b-1acab707969e]   File "/usr/local/lib/python2.7/dist-packages/oslo_messaging/_drivers/amqpdriver.py", line 420, in _send
  2015-09-03 15:11:11.051 22635 ERROR nova.compute.manager [instance: 195361d7-95c3-4740-825b-1acab707969e]     result = self._waiter.wait(msg_id, timeout)
  2015-09-03 15:11:11.051 22635 ERROR nova.compute.manager [instance: 195361d7-95c3-4740-825b-1acab707969e]   File "/usr/local/lib/python2.7/dist-packages/oslo_messaging/_drivers/amqpdriver.py", line 318, in wait
  2015-09-03 15:11:11.051 22635 ERROR nova.compute.manager [instance: 195361d7-95c3-4740-825b-1acab707969e]     message = self.waiters.get(msg_id, timeout=timeout)
  2015-09-03 15:11:11.051 22635 ERROR nova.compute.manager [instance: 195361d7-95c3-4740-825b-1acab707969e]   File "/usr/local/lib/python2.7/dist-packages/oslo_messaging/_drivers/amqpdriver.py", line 223, in get
  2015-09-03 15:11:11.051 22635 ERROR nova.compute.manager [instance: 195361d7-95c3-4740-825b-1acab707969e]     'to message ID %s' % msg_id)
  2015-09-03 15:11:11.051 22635 ERROR nova.compute.manager [instance: 195361d7-95c3-4740-825b-1acab707969e] MessagingTimeout: Timed out waiting for a reply to message ID 20d504c45f1c45e980c0194ac492d531
  2015-09-03 15:11:11.051 22635 ERROR nova.compute.manager [instance: 195361d7-95c3-4740-825b-1acab707969e] 


  I see this in the n-net logs which I think we only do if something
  else fails:

  http://logs.openstack.org/96/219696/4/check/gate-tempest-dsvm-large-
  ops/158f061/logs/screen-n-net.txt.gz#_2015-09-03_15_11_10_664

  2015-09-03 15:11:10.664 DEBUG nova.network.manager [req-ae96c425-a199
  -472f-a0db-e1b48147bb4c tempest-TestLargeOpsScenario-1690771764
  tempest-TestLargeOpsScenario-1474206998] [instance: 195361d7-95c3-4740
  -825b-1acab707969e] Explicitly disassociating fixed IP %s from
  instance. deallocate_fixed_ip
  /opt/stack/new/nova/nova/network/manager.py:1077

  https://github.com/openstack/nova/blob/master/nova/network/manager.py#L1068

                      # NOTE(yufang521247): This is probably a failed dhcp fixed
                      # ip. DHCPRELEASE packet sent to dnsmasq would not trigger
                      # dhcp-bridge to run. Thus it is better to disassociate
                      # such fixed ip here.
                      fixed_ip_ref = objects.FixedIP.get_by_address(
                          context, address)
                      if (instance_uuid == fixed_ip_ref.instance_uuid and
                              not fixed_ip_ref.leased):
                          LOG.debug('Explicitly disassociating fixed IP %s from '
                                    'instance.', instance_uuid=instance_uuid)
                          fixed_ip_ref.disassociate()

To manage notifications about this bug go to:
https://bugs.launchpad.net/devstack/+bug/1491949/+subscriptions


References