← Back to team overview

yahoo-eng-team team mailing list archive

[Bug 1511311] [NEW] L3 agent failed to respawn keepalived process

 

Public bug reported:

I enabled the l3 ha in neutron configuration, and I usually see the
following log in l3_agent.log:

2015-10-14 22:30:16.397 21460 ERROR neutron.agent.linux.external_process [-] default-service for router with uuid 59de181e-8f02-470d-80f6-cb9f0d46f78b not found. The process should not have died
2015-10-14 22:30:16.397 21460 ERROR neutron.agent.linux.external_process [-] respawning keepalived for uuid 59de181e-8f02-470d-80f6-cb9f0d46f78b
2015-10-14 22:30:16.397 21460 DEBUG neutron.agent.linux.utils [-] Unable to access /var/lib/neutron/ha_confs/59de181e-8f02-470d-80f6-cb9f0d46f78b.pid get_value_from_file /usr/lib/python2.7/dist-packages/neutron/agent/linux/utils.py:222
2015-10-14 22:30:16.398 21460 DEBUG neutron.agent.linux.utils [-] Running command: ['sudo', '/usr/bin/neutron-rootwrap', '/etc/neutron/rootwrap.conf', 'ip', 'netns', 'exec', 'qrouter-59de181e-8f02-470d-80f6-cb9f0d46f78b', 'keepalived', '-P', '-f', '/var/lib/neutron/ha_confs/59de181e-8f02-470d-80f6-cb9f0d46f78b/keepalived.conf', '-p', '/var/lib/neutron/ha_confs/59de181e-8f02-470d-80f6-cb9f0d46f78b.pid',  '-r', '/var/lib/neutron/ha_confs/59de181e-8f02-470d-80f6-cb9f0d46f78b.pid-vrrp'] create_process /usr/lib/python2.7/dist-packages/neutron/agent/linux/utils.py:84  

And I noticed that the counts of vrrp pid files were usually bigger than
the "pid" files:

root@neutron2:~# ls /var/lib/neutron/ha_confs/ | grep pid | grep -v vrrp | wc -l
664
root@neutron2:~# ls /var/lib/neutron/ha_confs/ | grep vrrp | wc -l
677

And seems that if "pid.vrrp" file existed,  we can't successfully respawn  the keepalived process using this kind of command:
keepalived -P -f /var/lib/neutron/ha_confs/cb01b1de-fa6c-461e-ba39-4d506dfdfccb/keepalived.conf -p /var/lib/neutron/ha_confs/cb01b1de-fa6c-461e-ba39-4d506dfdfccb.pid -r /var/lib/neutron/ha_confs/cb01b1de-fa6c-461e-ba39-4d506dfdfccb.pid-vrrp

So  I think in neutron,  after we checked that the pid is not active,
can we check the existence of "pid" file and "vrrp pid" file and remove
them before respawn the keepalived process to make sure the process can
be started successfully ?

https://github.com/openstack/neutron/blob/master/neutron/agent/linux/external_process.py#L91-L92

** Affects: neutron
     Importance: Undecided
         Status: New

-- 
You received this bug notification because you are a member of Yahoo!
Engineering Team, which is subscribed to neutron.
https://bugs.launchpad.net/bugs/1511311

Title:
  L3 agent failed to respawn keepalived process

Status in neutron:
  New

Bug description:
  I enabled the l3 ha in neutron configuration, and I usually see the
  following log in l3_agent.log:

  2015-10-14 22:30:16.397 21460 ERROR neutron.agent.linux.external_process [-] default-service for router with uuid 59de181e-8f02-470d-80f6-cb9f0d46f78b not found. The process should not have died
  2015-10-14 22:30:16.397 21460 ERROR neutron.agent.linux.external_process [-] respawning keepalived for uuid 59de181e-8f02-470d-80f6-cb9f0d46f78b
  2015-10-14 22:30:16.397 21460 DEBUG neutron.agent.linux.utils [-] Unable to access /var/lib/neutron/ha_confs/59de181e-8f02-470d-80f6-cb9f0d46f78b.pid get_value_from_file /usr/lib/python2.7/dist-packages/neutron/agent/linux/utils.py:222
  2015-10-14 22:30:16.398 21460 DEBUG neutron.agent.linux.utils [-] Running command: ['sudo', '/usr/bin/neutron-rootwrap', '/etc/neutron/rootwrap.conf', 'ip', 'netns', 'exec', 'qrouter-59de181e-8f02-470d-80f6-cb9f0d46f78b', 'keepalived', '-P', '-f', '/var/lib/neutron/ha_confs/59de181e-8f02-470d-80f6-cb9f0d46f78b/keepalived.conf', '-p', '/var/lib/neutron/ha_confs/59de181e-8f02-470d-80f6-cb9f0d46f78b.pid',  '-r', '/var/lib/neutron/ha_confs/59de181e-8f02-470d-80f6-cb9f0d46f78b.pid-vrrp'] create_process /usr/lib/python2.7/dist-packages/neutron/agent/linux/utils.py:84  

  And I noticed that the counts of vrrp pid files were usually bigger
  than the "pid" files:

  root@neutron2:~# ls /var/lib/neutron/ha_confs/ | grep pid | grep -v vrrp | wc -l
  664
  root@neutron2:~# ls /var/lib/neutron/ha_confs/ | grep vrrp | wc -l
  677

  And seems that if "pid.vrrp" file existed,  we can't successfully respawn  the keepalived process using this kind of command:
  keepalived -P -f /var/lib/neutron/ha_confs/cb01b1de-fa6c-461e-ba39-4d506dfdfccb/keepalived.conf -p /var/lib/neutron/ha_confs/cb01b1de-fa6c-461e-ba39-4d506dfdfccb.pid -r /var/lib/neutron/ha_confs/cb01b1de-fa6c-461e-ba39-4d506dfdfccb.pid-vrrp

  So  I think in neutron,  after we checked that the pid is not active,
  can we check the existence of "pid" file and "vrrp pid" file and
  remove them before respawn the keepalived process to make sure the
  process can be started successfully ?

  https://github.com/openstack/neutron/blob/master/neutron/agent/linux/external_process.py#L91-L92

To manage notifications about this bug go to:
https://bugs.launchpad.net/neutron/+bug/1511311/+subscriptions


Follow ups