yahoo-eng-team team mailing list archive
-
yahoo-eng-team team
-
Mailing list archive
-
Message #12461
[Bug 1300005] [NEW] agent down time should be larger
Public bug reported:
When we use the default config in neutron.conf:
# report_interval = 4
# agent_down_time = 5
When I boot VMs, I find sometimes the port status of one VM is DOWN. Other VMs is working well.
I got the following log in /var/log/neutron/openvswitch.log.
2014-03-28 09:50:45.201 5972 WARNING neutron.plugins.ml2.drivers.mech_agent [-] Attempting to bind with dead agent: {'binary': u'neutron-openvswitch-agent', 'description': None, 'admin_state_up': True, 'heartbeat_timestamp': datetime.datetime(2014, 3, 28, 9, 50, 40), 'alive': False, 'topic': u'N/A', 'host': u'ci91szcmp001.webex.com', 'agent_type': u'Open vSwitch agent', 'created_at': datetime.datetime(2014, 2, 27, 8, 39, 48), 'started_at': datetime.datetime(2014, 3, 12, 4, 59, 29), 'id': u'b9dfa98f-ed19-47a3-b66c-3db895d7a227', 'configurations': {u'tunnel_types': [], u'tunneling_ip': u'', u'bridge_mappings': {u'physnet1': u'br-eth1'}, u'l2_population': False, u'devices': 9}}
I think this is because the heartbeat time is (2014, 3, 28, 9, 50, 40) and now is 2014-03-28 09:50:45.201 , So it is more than 5 sec. Then the agent will be marked ‘alive’ = False. Then the status of the port will be wrong.
The following things make this issue happen:
1. the time of controller nodes and compute nodes is not exactly the same.
2. heartbeat_timestamp is round to secend. It is not so accurate.
3. python period task is not excuted exactly 4 sec.
So I suggest we could modify the default value of agent_down_time from 5
sec to 8 sec.
** Affects: neutron
Importance: Undecided
Status: New
--
You received this bug notification because you are a member of Yahoo!
Engineering Team, which is subscribed to neutron.
https://bugs.launchpad.net/bugs/1300005
Title:
agent down time should be larger
Status in OpenStack Neutron (virtual network service):
New
Bug description:
When we use the default config in neutron.conf:
# report_interval = 4
# agent_down_time = 5
When I boot VMs, I find sometimes the port status of one VM is DOWN. Other VMs is working well.
I got the following log in /var/log/neutron/openvswitch.log.
2014-03-28 09:50:45.201 5972 WARNING neutron.plugins.ml2.drivers.mech_agent [-] Attempting to bind with dead agent: {'binary': u'neutron-openvswitch-agent', 'description': None, 'admin_state_up': True, 'heartbeat_timestamp': datetime.datetime(2014, 3, 28, 9, 50, 40), 'alive': False, 'topic': u'N/A', 'host': u'ci91szcmp001.webex.com', 'agent_type': u'Open vSwitch agent', 'created_at': datetime.datetime(2014, 2, 27, 8, 39, 48), 'started_at': datetime.datetime(2014, 3, 12, 4, 59, 29), 'id': u'b9dfa98f-ed19-47a3-b66c-3db895d7a227', 'configurations': {u'tunnel_types': [], u'tunneling_ip': u'', u'bridge_mappings': {u'physnet1': u'br-eth1'}, u'l2_population': False, u'devices': 9}}
I think this is because the heartbeat time is (2014, 3, 28, 9, 50, 40) and now is 2014-03-28 09:50:45.201 , So it is more than 5 sec. Then the agent will be marked ‘alive’ = False. Then the status of the port will be wrong.
The following things make this issue happen:
1. the time of controller nodes and compute nodes is not exactly the same.
2. heartbeat_timestamp is round to secend. It is not so accurate.
3. python period task is not excuted exactly 4 sec.
So I suggest we could modify the default value of agent_down_time from
5 sec to 8 sec.
To manage notifications about this bug go to:
https://bugs.launchpad.net/neutron/+bug/1300005/+subscriptions
Follow ups
References