← Back to team overview

yahoo-eng-team team mailing list archive

[Bug 1583503] Re: L3 HA broken after network node crash

 

It seems that this is mainly a bug in how keepalived handles an empty or
otherwise broken pid file. According to the upstream change log, the
issue has been fixed in 1.2.20.

@Ubuntu: Can you look into updating the package or backporting the fix?

** Also affects: keepalived (Ubuntu)
   Importance: Undecided
       Status: New

** Summary changed:

- L3 HA broken after network node crash
+ keepalived fails to start when PID file is empty

-- 
You received this bug notification because you are a member of Yahoo!
Engineering Team, which is subscribed to neutron.
https://bugs.launchpad.net/bugs/1583503

Title:
  keepalived fails to start when PID file is empty

Status in neutron:
  New
Status in keepalived package in Ubuntu:
  New

Bug description:
  After a crash of a network node, we were left with empty PID files for
  some keepalived processes:

   root@network-node14:~# ls -l /var/lib/neutron/ha_confs/0ab5f647-1e04-4345-ae9b-ee66c6f08882.pid
  -rw-r--r-- 1 root root 0 May 19 08:41 /var/lib/neutron/ha_confs/0ab5f647-1e04-4345-ae9b-ee66c6f08882.pid

  This causes the L3 agent to log the following errors repeating every
  minute:

  2016-05-19 08:46:44.525 13554 ERROR neutron.agent.linux.utils [-] Unable to convert value in /var/lib/neutron/ha_confs/0ab5f647-1e04-4345-ae9b-ee66c6f08882.pid
  2016-05-19 08:46:44.526 13554 ERROR neutron.agent.linux.external_process [-] keepalived for router with uuid 0ab5f647-1e04-4345-ae9b-ee66c6f08882 not found. The process should not have died
  2016-05-19 08:46:44.526 13554 WARNING neutron.agent.linux.external_process [-] Respawning keepalived for uuid 0ab5f647-1e04-4345-ae9b-ee66c6f08882
  2016-05-19 08:46:44.526 13554 ERROR neutron.agent.linux.utils [-] Unable to convert value in /var/lib/neutron/ha_confs/0ab5f647-1e04-4345-ae9b-ee66c6f08882.pid
  2016-05-19 08:46:44.526 13554 ERROR neutron.agent.linux.utils [-] Unable to convert value in /var/lib/neutron/ha_confs/0ab5f647-1e04-4345-ae9b-ee66c6f08882.pid-vrrp

  and the keepalived process fails to start. As a result, the routers
  hosted by this agent are non-functional.

To manage notifications about this bug go to:
https://bugs.launchpad.net/neutron/+bug/1583503/+subscriptions


References