yahoo-eng-team team mailing list archive

Thread
Date

[Bug 1300005] [NEW] agent down time should be larger

To: yahoo-eng-team@xxxxxxxxxxxxxxxxxxx
From: Liping Mao <limao@xxxxxxxxx>
Date: Mon, 31 Mar 2014 04:56:55 -0000
Reply-to: Bug 1300005 <1300005@xxxxxxxxxxxxxxxxxx>
Sender: bounces@xxxxxxxxxxxxx

Public bug reported:

When we use the default config in neutron.conf:
# report_interval = 4
# agent_down_time = 5

 When I boot VMs, I find sometimes the port status of one VM is DOWN. Other VMs is working well.
 
I got the following log in /var/log/neutron/openvswitch.log.
2014-03-28 09:50:45.201 5972 WARNING neutron.plugins.ml2.drivers.mech_agent [-] Attempting to bind with dead agent: {'binary': u'neutron-openvswitch-agent', 'description': None, 'admin_state_up': True, 'heartbeat_timestamp': datetime.datetime(2014, 3, 28, 9, 50, 40), 'alive': False, 'topic': u'N/A', 'host': u'ci91szcmp001.webex.com', 'agent_type': u'Open vSwitch agent', 'created_at': datetime.datetime(2014, 2, 27, 8, 39, 48), 'started_at': datetime.datetime(2014, 3, 12, 4, 59, 29), 'id': u'b9dfa98f-ed19-47a3-b66c-3db895d7a227', 'configurations': {u'tunnel_types': [], u'tunneling_ip': u'', u'bridge_mappings': {u'physnet1': u'br-eth1'}, u'l2_population': False, u'devices': 9}}


I think this is because the heartbeat time is (2014, 3, 28, 9, 50, 40) and now is 2014-03-28 09:50:45.201 , So it is more than 5 sec. Then the agent will be marked ‘alive’ = False.  Then the status of the port will be wrong.

The following things  make this issue happen:
1.   the time of controller nodes and compute nodes is not exactly the  same.
2.   heartbeat_timestamp is round to secend. It is not so accurate.
3.   python period task is not excuted exactly 4 sec.

So I suggest we could modify the default value of agent_down_time from 5
sec to 8 sec.

** Affects: neutron
     Importance: Undecided
         Status: New

-- 
You received this bug notification because you are a member of Yahoo!
Engineering Team, which is subscribed to neutron.
https://bugs.launchpad.net/bugs/1300005

Title:
  agent down time should be larger

Status in OpenStack Neutron (virtual network service):
  New

Bug description:
  When we use the default config in neutron.conf:
  # report_interval = 4
  # agent_down_time = 5

   When I boot VMs, I find sometimes the port status of one VM is DOWN. Other VMs is working well.
   
  I got the following log in /var/log/neutron/openvswitch.log.
  2014-03-28 09:50:45.201 5972 WARNING neutron.plugins.ml2.drivers.mech_agent [-] Attempting to bind with dead agent: {'binary': u'neutron-openvswitch-agent', 'description': None, 'admin_state_up': True, 'heartbeat_timestamp': datetime.datetime(2014, 3, 28, 9, 50, 40), 'alive': False, 'topic': u'N/A', 'host': u'ci91szcmp001.webex.com', 'agent_type': u'Open vSwitch agent', 'created_at': datetime.datetime(2014, 2, 27, 8, 39, 48), 'started_at': datetime.datetime(2014, 3, 12, 4, 59, 29), 'id': u'b9dfa98f-ed19-47a3-b66c-3db895d7a227', 'configurations': {u'tunnel_types': [], u'tunneling_ip': u'', u'bridge_mappings': {u'physnet1': u'br-eth1'}, u'l2_population': False, u'devices': 9}}

  
  I think this is because the heartbeat time is (2014, 3, 28, 9, 50, 40) and now is 2014-03-28 09:50:45.201 , So it is more than 5 sec. Then the agent will be marked ‘alive’ = False.  Then the status of the port will be wrong.

  The following things  make this issue happen:
  1.   the time of controller nodes and compute nodes is not exactly the  same.
  2.   heartbeat_timestamp is round to secend. It is not so accurate.
  3.   python period task is not excuted exactly 4 sec.

  So I suggest we could modify the default value of agent_down_time from
  5 sec to 8 sec.

To manage notifications about this bug go to:
https://bugs.launchpad.net/neutron/+bug/1300005/+subscriptions

Follow ups

[Bug 1300005] Re: agent down time should be larger
From: Oleg Bondarev, 2014-03-31
[Bug 1300005] [NEW] agent down time should be larger
From: Liping Mao, 2014-03-31

References

[Bug 1300005] [NEW] agent down time should be larger
From: Liping Mao, 2014-03-31