← Back to team overview

yahoo-eng-team team mailing list archive

[Bug 1323475] [NEW] Losting network info_cache sometimes

 

Public bug reported:

We are using stable/havana.

For some inexplicable reason, some instances lost network information.
The result looks like:


$ nova list
| a8f8a437-d203-4265-aca2-7bd35539c5d1 | test                                              | ACTIVE | -                | Running     |          
                      
$ neutron port-list --device-id a8f8a437-d203-4265-aca2-7bd35539c5d1
+--------------------------------------+------+-------------------+------------------------------------------------------------------------------------+
| id                                   | name | mac_address       | fixed_ips                                                                          |
+--------------------------------------+------+-------------------+------------------------------------------------------------------------------------+
| 6b042778-76bb-45ca-86a8-abfdb1ba1a62 |      | fa:16:3e:67:9a:88 | {"subnet_id": "90b338d3-7711-48fd-a0f6-11a27388cb42", "ip_address": "10.162.82.2"} |
| 9800fd03-5e07-4a54-8568-28d501073c5f |      | fa:16:3e:d0:86:4a | {"subnet_id": "9a1fc59d-aec1-4e3a-bd88-99ea558e8b29", "ip_address": "192.168.0.5"} |
+--------------------------------------+------+-------------------+------------------------------------------------------------------------------------+

neutron said there are two ports binding with the instance, but nova
said the instance has no port.

We dug logs, and found somethings went wrong after running
heal_instance_info_cache. One line log said the instance info_cache is
[], but the previous log said the instance info_cache is filled. From
that time, the info_cache lost, and can't self-healing.

The simple logs pasted below, and full log here:
http://paste.openstack.org/show/81605/


....
2014-05-26 03:47:13.258 14884 DEBUG nova.network.api [-] Updating cache with info: [VIF({'ovs_interfaceid': u'5953e098-e131-48eb-b53c-5eb095f3bfee', 'network': Network({'bridge': 'br-int', 'subne
ts': [Subnet({'ips': [FixedIP({'meta': {}, 'version': 4, 'type': 'fixed', 'floating_ips': [], 'address': u'10.162.81.4'})], 'version': 4, 'meta': {'dhcp_server': u'10.162.81.3'}, 'dns': [], 'rout
es': [], 'cidr': u'10.162.81.0/28', 'gateway': IP({'meta': {}, 'version': None, 'type': 'gateway', 'address': None})})], 'meta': {'injected': False, 'tenant_id': u'c10373fb5d234e31af4d5d56527994f
c'}, 'id': u'b0bb08c1-dc05-4e17-a021-f3b850a823ba', 'label': u'idc_c10373fb5d234e31af4d5d56527994fc'}), 'devname': u'tap5953e098-e1', 'qbh_params': None, 'meta': {}, 'address': u'fa:16:3e:40:34:4
c', 'type': u'ovs', 'id': u'5953e098-e131-48eb-b53c-5eb095f3bfee', 'qbg_params': None})] update_instance_cache_with_nw_info /usr/lib/python2.7/dist-packages/nova/network/api.py:71
2014-05-26 03:47:13.263 14884 DEBUG nova.compute.manager [-] [instance: 49a806a9-986e-4ce3-ae9f-d3c4317255a3] Updated the info_cache for instance _heal_instance_info_cache /usr/lib/python2.7/dist
-packages/nova/compute/manager.py:5146
.....
2014-05-26 03:52:14.255 14884 DEBUG nova.network.api [-] Updating cache with info: [] update_instance_cache_with_nw_info /usr/lib/python2.7/dist-packages/nova/network/api.py:71
.....


I try hard but can't no re-product the bug manual, The key problem here is why the info_cache not showing up. But on the other hand, we'd better give nova the ability to self-healing in this case.

** Affects: nova
     Importance: Undecided
         Status: New

-- 
You received this bug notification because you are a member of Yahoo!
Engineering Team, which is subscribed to OpenStack Compute (nova).
https://bugs.launchpad.net/bugs/1323475

Title:
  Losting network info_cache sometimes

Status in OpenStack Compute (Nova):
  New

Bug description:
  We are using stable/havana.

  For some inexplicable reason, some instances lost network information.
  The result looks like:

  
  $ nova list
  | a8f8a437-d203-4265-aca2-7bd35539c5d1 | test                                              | ACTIVE | -                | Running     |          
                        
  $ neutron port-list --device-id a8f8a437-d203-4265-aca2-7bd35539c5d1
  +--------------------------------------+------+-------------------+------------------------------------------------------------------------------------+
  | id                                   | name | mac_address       | fixed_ips                                                                          |
  +--------------------------------------+------+-------------------+------------------------------------------------------------------------------------+
  | 6b042778-76bb-45ca-86a8-abfdb1ba1a62 |      | fa:16:3e:67:9a:88 | {"subnet_id": "90b338d3-7711-48fd-a0f6-11a27388cb42", "ip_address": "10.162.82.2"} |
  | 9800fd03-5e07-4a54-8568-28d501073c5f |      | fa:16:3e:d0:86:4a | {"subnet_id": "9a1fc59d-aec1-4e3a-bd88-99ea558e8b29", "ip_address": "192.168.0.5"} |
  +--------------------------------------+------+-------------------+------------------------------------------------------------------------------------+

  neutron said there are two ports binding with the instance, but nova
  said the instance has no port.

  We dug logs, and found somethings went wrong after running
  heal_instance_info_cache. One line log said the instance info_cache is
  [], but the previous log said the instance info_cache is filled. From
  that time, the info_cache lost, and can't self-healing.

  The simple logs pasted below, and full log here:
  http://paste.openstack.org/show/81605/

  
  ....
  2014-05-26 03:47:13.258 14884 DEBUG nova.network.api [-] Updating cache with info: [VIF({'ovs_interfaceid': u'5953e098-e131-48eb-b53c-5eb095f3bfee', 'network': Network({'bridge': 'br-int', 'subne
  ts': [Subnet({'ips': [FixedIP({'meta': {}, 'version': 4, 'type': 'fixed', 'floating_ips': [], 'address': u'10.162.81.4'})], 'version': 4, 'meta': {'dhcp_server': u'10.162.81.3'}, 'dns': [], 'rout
  es': [], 'cidr': u'10.162.81.0/28', 'gateway': IP({'meta': {}, 'version': None, 'type': 'gateway', 'address': None})})], 'meta': {'injected': False, 'tenant_id': u'c10373fb5d234e31af4d5d56527994f
  c'}, 'id': u'b0bb08c1-dc05-4e17-a021-f3b850a823ba', 'label': u'idc_c10373fb5d234e31af4d5d56527994fc'}), 'devname': u'tap5953e098-e1', 'qbh_params': None, 'meta': {}, 'address': u'fa:16:3e:40:34:4
  c', 'type': u'ovs', 'id': u'5953e098-e131-48eb-b53c-5eb095f3bfee', 'qbg_params': None})] update_instance_cache_with_nw_info /usr/lib/python2.7/dist-packages/nova/network/api.py:71
  2014-05-26 03:47:13.263 14884 DEBUG nova.compute.manager [-] [instance: 49a806a9-986e-4ce3-ae9f-d3c4317255a3] Updated the info_cache for instance _heal_instance_info_cache /usr/lib/python2.7/dist
  -packages/nova/compute/manager.py:5146
  .....
  2014-05-26 03:52:14.255 14884 DEBUG nova.network.api [-] Updating cache with info: [] update_instance_cache_with_nw_info /usr/lib/python2.7/dist-packages/nova/network/api.py:71
  .....

  
  I try hard but can't no re-product the bug manual, The key problem here is why the info_cache not showing up. But on the other hand, we'd better give nova the ability to self-healing in this case.

To manage notifications about this bug go to:
https://bugs.launchpad.net/nova/+bug/1323475/+subscriptions


Follow ups

References