← Back to team overview

yahoo-eng-team team mailing list archive

[Bug 1583503] [NEW] L3 HA broken after network node crash

 

Public bug reported:

After a crash of a network node, we were left with empty PID files for
some keepalived processes:

 root@network-node14:~# ls -l /var/lib/neutron/ha_confs/0ab5f647-1e04-4345-ae9b-ee66c6f08882.pid
-rw-r--r-- 1 root root 0 May 19 08:41 /var/lib/neutron/ha_confs/0ab5f647-1e04-4345-ae9b-ee66c6f08882.pid

This causes the L3 agent to log the following errors repeating every
minute:

2016-05-19 08:46:44.525 13554 ERROR neutron.agent.linux.utils [-] Unable to convert value in /var/lib/neutron/ha_confs/0ab5f647-1e04-4345-ae9b-ee66c6f08882.pid
2016-05-19 08:46:44.526 13554 ERROR neutron.agent.linux.external_process [-] keepalived for router with uuid 0ab5f647-1e04-4345-ae9b-ee66c6f08882 not found. The process should not have died
2016-05-19 08:46:44.526 13554 WARNING neutron.agent.linux.external_process [-] Respawning keepalived for uuid 0ab5f647-1e04-4345-ae9b-ee66c6f08882
2016-05-19 08:46:44.526 13554 ERROR neutron.agent.linux.utils [-] Unable to convert value in /var/lib/neutron/ha_confs/0ab5f647-1e04-4345-ae9b-ee66c6f08882.pid
2016-05-19 08:46:44.526 13554 ERROR neutron.agent.linux.utils [-] Unable to convert value in /var/lib/neutron/ha_confs/0ab5f647-1e04-4345-ae9b-ee66c6f08882.pid-vrrp

and the keepalived process fails to start. As a result, the routers
hosted by this agent are non-functional.

** Affects: neutron
     Importance: Undecided
         Status: New

** Affects: keepalived (Ubuntu)
     Importance: Undecided
         Status: New

-- 
You received this bug notification because you are a member of Yahoo!
Engineering Team, which is subscribed to neutron.
https://bugs.launchpad.net/bugs/1583503

Title:
  L3 HA broken after network node crash

Status in neutron:
  New
Status in keepalived package in Ubuntu:
  New

Bug description:
  After a crash of a network node, we were left with empty PID files for
  some keepalived processes:

   root@network-node14:~# ls -l /var/lib/neutron/ha_confs/0ab5f647-1e04-4345-ae9b-ee66c6f08882.pid
  -rw-r--r-- 1 root root 0 May 19 08:41 /var/lib/neutron/ha_confs/0ab5f647-1e04-4345-ae9b-ee66c6f08882.pid

  This causes the L3 agent to log the following errors repeating every
  minute:

  2016-05-19 08:46:44.525 13554 ERROR neutron.agent.linux.utils [-] Unable to convert value in /var/lib/neutron/ha_confs/0ab5f647-1e04-4345-ae9b-ee66c6f08882.pid
  2016-05-19 08:46:44.526 13554 ERROR neutron.agent.linux.external_process [-] keepalived for router with uuid 0ab5f647-1e04-4345-ae9b-ee66c6f08882 not found. The process should not have died
  2016-05-19 08:46:44.526 13554 WARNING neutron.agent.linux.external_process [-] Respawning keepalived for uuid 0ab5f647-1e04-4345-ae9b-ee66c6f08882
  2016-05-19 08:46:44.526 13554 ERROR neutron.agent.linux.utils [-] Unable to convert value in /var/lib/neutron/ha_confs/0ab5f647-1e04-4345-ae9b-ee66c6f08882.pid
  2016-05-19 08:46:44.526 13554 ERROR neutron.agent.linux.utils [-] Unable to convert value in /var/lib/neutron/ha_confs/0ab5f647-1e04-4345-ae9b-ee66c6f08882.pid-vrrp

  and the keepalived process fails to start. As a result, the routers
  hosted by this agent are non-functional.

To manage notifications about this bug go to:
https://bugs.launchpad.net/neutron/+bug/1583503/+subscriptions


Follow ups