← Back to team overview

yahoo-eng-team team mailing list archive

[Bug 1349590] [NEW] nova-network appears to lock up while associating floating ip's

 

Public bug reported:

Seeing oslo.messaging timeouts while assocating floating IPs in the API
logs:

http://logs.openstack.org/08/110008/1/gate/gate-tempest-dsvm-postgres-
full/04890fa/logs/screen-n-api.txt.gz?level=TRACE#_2014-07-28_16_18_36_915

2014-07-28 16:18:36.915 ERROR nova.api.openstack.compute.contrib.floating_ips [req-d0ee3c07-0ae3-49d2-bad7-c4874096cc52 FloatingIPsTestJSON-352614651 FloatingIPsTestJSON-1773137245] Error. Unable to associate floating ip
2014-07-28 16:18:36.915 18485 TRACE nova.api.openstack.compute.contrib.floating_ips Traceback (most recent call last):
2014-07-28 16:18:36.915 18485 TRACE nova.api.openstack.compute.contrib.floating_ips   File "/opt/stack/new/nova/nova/api/openstack/compute/contrib/floating_ips.py", line 258, in _add_floating_ip
2014-07-28 16:18:36.915 18485 TRACE nova.api.openstack.compute.contrib.floating_ips     fixed_address=fixed_address)
2014-07-28 16:18:36.915 18485 TRACE nova.api.openstack.compute.contrib.floating_ips   File "/opt/stack/new/nova/nova/network/api.py", line 44, in wrapped
2014-07-28 16:18:36.915 18485 TRACE nova.api.openstack.compute.contrib.floating_ips     return func(self, context, *args, **kwargs)
2014-07-28 16:18:36.915 18485 TRACE nova.api.openstack.compute.contrib.floating_ips   File "/opt/stack/new/nova/nova/network/base_api.py", line 60, in wrapper
2014-07-28 16:18:36.915 18485 TRACE nova.api.openstack.compute.contrib.floating_ips     res = f(self, context, *args, **kwargs)
2014-07-28 16:18:36.915 18485 TRACE nova.api.openstack.compute.contrib.floating_ips   File "/opt/stack/new/nova/nova/network/api.py", line 190, in associate_floating_ip
2014-07-28 16:18:36.915 18485 TRACE nova.api.openstack.compute.contrib.floating_ips     context, floating_address, fixed_address, affect_auto_assigned)
2014-07-28 16:18:36.915 18485 TRACE nova.api.openstack.compute.contrib.floating_ips   File "/opt/stack/new/nova/nova/utils.py", line 949, in wrapper
2014-07-28 16:18:36.915 18485 TRACE nova.api.openstack.compute.contrib.floating_ips     return func(*args, **kwargs)
2014-07-28 16:18:36.915 18485 TRACE nova.api.openstack.compute.contrib.floating_ips   File "/usr/local/lib/python2.7/dist-packages/oslo/messaging/rpc/server.py", line 138, in inner
2014-07-28 16:18:36.915 18485 TRACE nova.api.openstack.compute.contrib.floating_ips     return func(*args, **kwargs)
2014-07-28 16:18:36.915 18485 TRACE nova.api.openstack.compute.contrib.floating_ips   File "/opt/stack/new/nova/nova/network/floating_ips.py", line 347, in associate_floating_ip
2014-07-28 16:18:36.915 18485 TRACE nova.api.openstack.compute.contrib.floating_ips     fixed_ip.instance_uuid)
2014-07-28 16:18:36.915 18485 TRACE nova.api.openstack.compute.contrib.floating_ips   File "/opt/stack/new/nova/nova/network/rpcapi.py", line 324, in _associate_floating_ip
2014-07-28 16:18:36.915 18485 TRACE nova.api.openstack.compute.contrib.floating_ips     interface=interface, instance_uuid=instance_uuid)
2014-07-28 16:18:36.915 18485 TRACE nova.api.openstack.compute.contrib.floating_ips   File "/usr/local/lib/python2.7/dist-packages/oslo/messaging/rpc/client.py", line 152, in call
2014-07-28 16:18:36.915 18485 TRACE nova.api.openstack.compute.contrib.floating_ips     retry=self.retry)
2014-07-28 16:18:36.915 18485 TRACE nova.api.openstack.compute.contrib.floating_ips   File "/usr/local/lib/python2.7/dist-packages/oslo/messaging/transport.py", line 90, in _send
2014-07-28 16:18:36.915 18485 TRACE nova.api.openstack.compute.contrib.floating_ips     timeout=timeout, retry=retry)
2014-07-28 16:18:36.915 18485 TRACE nova.api.openstack.compute.contrib.floating_ips   File "/usr/local/lib/python2.7/dist-packages/oslo/messaging/_drivers/amqpdriver.py", line 404, in send
2014-07-28 16:18:36.915 18485 TRACE nova.api.openstack.compute.contrib.floating_ips     retry=retry)
2014-07-28 16:18:36.915 18485 TRACE nova.api.openstack.compute.contrib.floating_ips   File "/usr/local/lib/python2.7/dist-packages/oslo/messaging/_drivers/amqpdriver.py", line 393, in _send
2014-07-28 16:18:36.915 18485 TRACE nova.api.openstack.compute.contrib.floating_ips     result = self._waiter.wait(msg_id, timeout)
2014-07-28 16:18:36.915 18485 TRACE nova.api.openstack.compute.contrib.floating_ips   File "/usr/local/lib/python2.7/dist-packages/oslo/messaging/_drivers/amqpdriver.py", line 281, in wait
2014-07-28 16:18:36.915 18485 TRACE nova.api.openstack.compute.contrib.floating_ips     reply, ending = self._poll_connection(msg_id, timeout)
2014-07-28 16:18:36.915 18485 TRACE nova.api.openstack.compute.contrib.floating_ips   File "/usr/local/lib/python2.7/dist-packages/oslo/messaging/_drivers/amqpdriver.py", line 231, in _poll_connection
2014-07-28 16:18:36.915 18485 TRACE nova.api.openstack.compute.contrib.floating_ips     % msg_id)
2014-07-28 16:18:36.915 18485 TRACE nova.api.openstack.compute.contrib.floating_ips MessagingTimeout: Timed out waiting for a reply to message ID b463ca53f2bb42f6992b5e262b494f78

And when you look in the nova-network logs, it appears there is a
deadlock with iptables while playing with floating IPs:

http://logs.openstack.org/08/110008/1/gate/gate-tempest-dsvm-postgres-
full/04890fa/logs/screen-n-net.txt.gz

The last message in the n-net logs is trying to grab a lock:

2014-07-28 16:17:37.011 DEBUG nova.openstack.common.lockutils [req-
d0ee3c07-0ae3-49d2-bad7-c4874096cc52 FloatingIPsTestJSON-352614651
FloatingIPsTestJSON-1773137245] Attempting to grab external lock
"iptables" external_lock
/opt/stack/new/nova/nova/openstack/common/lockutils.py:230

>From logstash:

message:"HTTP exception thrown: Error. Unable to associate floating ip"
AND tags:"screen-n-api.txt"

There are 94 hits in 2 weeks, looks like it started on 7/19:

http://logstash.openstack.org/#eyJzZWFyY2giOiJtZXNzYWdlOlwiSFRUUCBleGNlcHRpb24gdGhyb3duOiBFcnJvci4gVW5hYmxlIHRvIGFzc29jaWF0ZSBmbG9hdGluZyBpcFwiIEFORCB0YWdzOlwic2NyZWVuLW4tYXBpLnR4dFwiIiwiZmllbGRzIjpbXSwib2Zmc2V0IjowLCJ0aW1lZnJhbWUiOiJjdXN0b20iLCJncmFwaG1vZGUiOiJjb3VudCIsInRpbWUiOnsiZnJvbSI6IjIwMTQtMDctMTRUMjE6NDM6MTIrMDA6MDAiLCJ0byI6IjIwMTQtMDctMjhUMjE6NDM6MTIrMDA6MDAiLCJ1c2VyX2ludGVydmFsIjoiMCJ9LCJzdGFtcCI6MTQwNjU4MzkyMjc5Nn0=

This is also showing up in the api logs:

2014-07-28 16:20:48.331 WARNING nova.compute.api [req-5c540168-bde5
-45fe-926f-f24a8f242b72 ServersAdminTestJSON-253527515
ServersAdminTestJSON-735286156] [instance: 748ad6e1-7741-4ff4-be1b-
66af4466e019] instance's host devstack-trusty-hpcloud-b5-1122950 is
down, deleting from database

I'm not sure if it's related though, but looking at logstash:

message:"instance's host" AND message:"is down, deleting from database"
AND tags:"screen-n-api.txt"

There are 548 hits in 2 weeks, which also appears to have started around
7/19:

http://logstash.openstack.org/#eyJzZWFyY2giOiJtZXNzYWdlOlwiaW5zdGFuY2UncyBob3N0XCIgQU5EIG1lc3NhZ2U6XCJpcyBkb3duLCBkZWxldGluZyBmcm9tIGRhdGFiYXNlXCIgQU5EIHRhZ3M6XCJzY3JlZW4tbi1hcGkudHh0XCIiLCJmaWVsZHMiOltdLCJvZmZzZXQiOjAsInRpbWVmcmFtZSI6ImN1c3RvbSIsImdyYXBobW9kZSI6ImNvdW50IiwidGltZSI6eyJmcm9tIjoiMjAxNC0wNy0xNFQyMTozNTozNiswMDowMCIsInRvIjoiMjAxNC0wNy0yOFQyMTozNTozNiswMDowMCIsInVzZXJfaW50ZXJ2YWwiOiIwIn0sInN0YW1wIjoxNDA2NTg0MDIwNjE5fQ==

Those are only on nova-network jobs though so most likely related.

** Affects: nova
     Importance: High
         Status: Confirmed


** Tags: gate-failure network nova-network testing

** Changed in: nova
   Importance: Undecided => High

** Changed in: nova
       Status: New => Confirmed

-- 
You received this bug notification because you are a member of Yahoo!
Engineering Team, which is subscribed to OpenStack Compute (nova).
https://bugs.launchpad.net/bugs/1349590

Title:
  nova-network appears to lock up while associating floating ip's

Status in OpenStack Compute (Nova):
  Confirmed

Bug description:
  Seeing oslo.messaging timeouts while assocating floating IPs in the
  API logs:

  http://logs.openstack.org/08/110008/1/gate/gate-tempest-dsvm-postgres-
  full/04890fa/logs/screen-n-api.txt.gz?level=TRACE#_2014-07-28_16_18_36_915

  2014-07-28 16:18:36.915 ERROR nova.api.openstack.compute.contrib.floating_ips [req-d0ee3c07-0ae3-49d2-bad7-c4874096cc52 FloatingIPsTestJSON-352614651 FloatingIPsTestJSON-1773137245] Error. Unable to associate floating ip
  2014-07-28 16:18:36.915 18485 TRACE nova.api.openstack.compute.contrib.floating_ips Traceback (most recent call last):
  2014-07-28 16:18:36.915 18485 TRACE nova.api.openstack.compute.contrib.floating_ips   File "/opt/stack/new/nova/nova/api/openstack/compute/contrib/floating_ips.py", line 258, in _add_floating_ip
  2014-07-28 16:18:36.915 18485 TRACE nova.api.openstack.compute.contrib.floating_ips     fixed_address=fixed_address)
  2014-07-28 16:18:36.915 18485 TRACE nova.api.openstack.compute.contrib.floating_ips   File "/opt/stack/new/nova/nova/network/api.py", line 44, in wrapped
  2014-07-28 16:18:36.915 18485 TRACE nova.api.openstack.compute.contrib.floating_ips     return func(self, context, *args, **kwargs)
  2014-07-28 16:18:36.915 18485 TRACE nova.api.openstack.compute.contrib.floating_ips   File "/opt/stack/new/nova/nova/network/base_api.py", line 60, in wrapper
  2014-07-28 16:18:36.915 18485 TRACE nova.api.openstack.compute.contrib.floating_ips     res = f(self, context, *args, **kwargs)
  2014-07-28 16:18:36.915 18485 TRACE nova.api.openstack.compute.contrib.floating_ips   File "/opt/stack/new/nova/nova/network/api.py", line 190, in associate_floating_ip
  2014-07-28 16:18:36.915 18485 TRACE nova.api.openstack.compute.contrib.floating_ips     context, floating_address, fixed_address, affect_auto_assigned)
  2014-07-28 16:18:36.915 18485 TRACE nova.api.openstack.compute.contrib.floating_ips   File "/opt/stack/new/nova/nova/utils.py", line 949, in wrapper
  2014-07-28 16:18:36.915 18485 TRACE nova.api.openstack.compute.contrib.floating_ips     return func(*args, **kwargs)
  2014-07-28 16:18:36.915 18485 TRACE nova.api.openstack.compute.contrib.floating_ips   File "/usr/local/lib/python2.7/dist-packages/oslo/messaging/rpc/server.py", line 138, in inner
  2014-07-28 16:18:36.915 18485 TRACE nova.api.openstack.compute.contrib.floating_ips     return func(*args, **kwargs)
  2014-07-28 16:18:36.915 18485 TRACE nova.api.openstack.compute.contrib.floating_ips   File "/opt/stack/new/nova/nova/network/floating_ips.py", line 347, in associate_floating_ip
  2014-07-28 16:18:36.915 18485 TRACE nova.api.openstack.compute.contrib.floating_ips     fixed_ip.instance_uuid)
  2014-07-28 16:18:36.915 18485 TRACE nova.api.openstack.compute.contrib.floating_ips   File "/opt/stack/new/nova/nova/network/rpcapi.py", line 324, in _associate_floating_ip
  2014-07-28 16:18:36.915 18485 TRACE nova.api.openstack.compute.contrib.floating_ips     interface=interface, instance_uuid=instance_uuid)
  2014-07-28 16:18:36.915 18485 TRACE nova.api.openstack.compute.contrib.floating_ips   File "/usr/local/lib/python2.7/dist-packages/oslo/messaging/rpc/client.py", line 152, in call
  2014-07-28 16:18:36.915 18485 TRACE nova.api.openstack.compute.contrib.floating_ips     retry=self.retry)
  2014-07-28 16:18:36.915 18485 TRACE nova.api.openstack.compute.contrib.floating_ips   File "/usr/local/lib/python2.7/dist-packages/oslo/messaging/transport.py", line 90, in _send
  2014-07-28 16:18:36.915 18485 TRACE nova.api.openstack.compute.contrib.floating_ips     timeout=timeout, retry=retry)
  2014-07-28 16:18:36.915 18485 TRACE nova.api.openstack.compute.contrib.floating_ips   File "/usr/local/lib/python2.7/dist-packages/oslo/messaging/_drivers/amqpdriver.py", line 404, in send
  2014-07-28 16:18:36.915 18485 TRACE nova.api.openstack.compute.contrib.floating_ips     retry=retry)
  2014-07-28 16:18:36.915 18485 TRACE nova.api.openstack.compute.contrib.floating_ips   File "/usr/local/lib/python2.7/dist-packages/oslo/messaging/_drivers/amqpdriver.py", line 393, in _send
  2014-07-28 16:18:36.915 18485 TRACE nova.api.openstack.compute.contrib.floating_ips     result = self._waiter.wait(msg_id, timeout)
  2014-07-28 16:18:36.915 18485 TRACE nova.api.openstack.compute.contrib.floating_ips   File "/usr/local/lib/python2.7/dist-packages/oslo/messaging/_drivers/amqpdriver.py", line 281, in wait
  2014-07-28 16:18:36.915 18485 TRACE nova.api.openstack.compute.contrib.floating_ips     reply, ending = self._poll_connection(msg_id, timeout)
  2014-07-28 16:18:36.915 18485 TRACE nova.api.openstack.compute.contrib.floating_ips   File "/usr/local/lib/python2.7/dist-packages/oslo/messaging/_drivers/amqpdriver.py", line 231, in _poll_connection
  2014-07-28 16:18:36.915 18485 TRACE nova.api.openstack.compute.contrib.floating_ips     % msg_id)
  2014-07-28 16:18:36.915 18485 TRACE nova.api.openstack.compute.contrib.floating_ips MessagingTimeout: Timed out waiting for a reply to message ID b463ca53f2bb42f6992b5e262b494f78

  And when you look in the nova-network logs, it appears there is a
  deadlock with iptables while playing with floating IPs:

  http://logs.openstack.org/08/110008/1/gate/gate-tempest-dsvm-postgres-
  full/04890fa/logs/screen-n-net.txt.gz

  The last message in the n-net logs is trying to grab a lock:

  2014-07-28 16:17:37.011 DEBUG nova.openstack.common.lockutils [req-
  d0ee3c07-0ae3-49d2-bad7-c4874096cc52 FloatingIPsTestJSON-352614651
  FloatingIPsTestJSON-1773137245] Attempting to grab external lock
  "iptables" external_lock
  /opt/stack/new/nova/nova/openstack/common/lockutils.py:230

  From logstash:

  message:"HTTP exception thrown: Error. Unable to associate floating
  ip" AND tags:"screen-n-api.txt"

  There are 94 hits in 2 weeks, looks like it started on 7/19:

  http://logstash.openstack.org/#eyJzZWFyY2giOiJtZXNzYWdlOlwiSFRUUCBleGNlcHRpb24gdGhyb3duOiBFcnJvci4gVW5hYmxlIHRvIGFzc29jaWF0ZSBmbG9hdGluZyBpcFwiIEFORCB0YWdzOlwic2NyZWVuLW4tYXBpLnR4dFwiIiwiZmllbGRzIjpbXSwib2Zmc2V0IjowLCJ0aW1lZnJhbWUiOiJjdXN0b20iLCJncmFwaG1vZGUiOiJjb3VudCIsInRpbWUiOnsiZnJvbSI6IjIwMTQtMDctMTRUMjE6NDM6MTIrMDA6MDAiLCJ0byI6IjIwMTQtMDctMjhUMjE6NDM6MTIrMDA6MDAiLCJ1c2VyX2ludGVydmFsIjoiMCJ9LCJzdGFtcCI6MTQwNjU4MzkyMjc5Nn0=

  This is also showing up in the api logs:

  2014-07-28 16:20:48.331 WARNING nova.compute.api [req-5c540168-bde5
  -45fe-926f-f24a8f242b72 ServersAdminTestJSON-253527515
  ServersAdminTestJSON-735286156] [instance: 748ad6e1-7741-4ff4-be1b-
  66af4466e019] instance's host devstack-trusty-hpcloud-b5-1122950 is
  down, deleting from database

  I'm not sure if it's related though, but looking at logstash:

  message:"instance's host" AND message:"is down, deleting from
  database" AND tags:"screen-n-api.txt"

  There are 548 hits in 2 weeks, which also appears to have started
  around 7/19:

  http://logstash.openstack.org/#eyJzZWFyY2giOiJtZXNzYWdlOlwiaW5zdGFuY2UncyBob3N0XCIgQU5EIG1lc3NhZ2U6XCJpcyBkb3duLCBkZWxldGluZyBmcm9tIGRhdGFiYXNlXCIgQU5EIHRhZ3M6XCJzY3JlZW4tbi1hcGkudHh0XCIiLCJmaWVsZHMiOltdLCJvZmZzZXQiOjAsInRpbWVmcmFtZSI6ImN1c3RvbSIsImdyYXBobW9kZSI6ImNvdW50IiwidGltZSI6eyJmcm9tIjoiMjAxNC0wNy0xNFQyMTozNTozNiswMDowMCIsInRvIjoiMjAxNC0wNy0yOFQyMTozNTozNiswMDowMCIsInVzZXJfaW50ZXJ2YWwiOiIwIn0sInN0YW1wIjoxNDA2NTg0MDIwNjE5fQ==

  Those are only on nova-network jobs though so most likely related.

To manage notifications about this bug go to:
https://bugs.launchpad.net/nova/+bug/1349590/+subscriptions


Follow ups

References