← Back to team overview

yahoo-eng-team team mailing list archive

[Bug 1846198] Re: packet loss during active L3 HA agent restart

 

Fix has been released for Victoria

** Changed in: openstack-ansible
       Status: New => Fix Released

-- 
You received this bug notification because you are a member of Yahoo!
Engineering Team, which is subscribed to neutron.
https://bugs.launchpad.net/bugs/1846198

Title:
   packet loss during active L3 HA agent restart

Status in neutron:
  Invalid
Status in openstack-ansible:
  Fix Released

Bug description:
  Deployment:

  Openstack-ansible 19.0.3(Stein) with two dedicated network nodes(is_metal=True) + linuxbridge + vxlan.
  Ubuntu 16.04.6 4.15.0-62-generic

  neutron l3-agent-list-hosting-router R1
  +--------------------------------------+---------------+----------------+-------+----------+
  | id                                   | host          | admin_state_up | alive | ha_state |
  +--------------------------------------+---------------+----------------+-------+----------+
  | 1b3b1b5d-08e7-48a1-ab8d-256d94099fb6 | test-network2 | True           | :-)   | standby  |
  | fa402ada-7716-4ad4-a004-7f8114fb1edf | test-network1 | True           | :-)   | active   |
  +--------------------------------------+---------------+----------------+-------+----------+

  How to reproduce: Restart the active l3 agent. (systemctl restart
  neutron-l3-agent.service)

  test-network1 server side events:

  systemctl restart neutron-l3-agent: @02:58:56.135635630
  ip monitor terminated (kill -9)     @02:58:56.208922038
  vip ips removed                     @02:58:56.268074480
  keepalived terminated               @02:58:57.318596743
  l3-agent terminated                 @02:59:07.504366398
  keepalived-state-change terminated  @03:01:07.735281710

  test-network1 journal:
    @02:58:56 test-network1 systemd[1]: Stopping neutron-l3-agent service...
    @02:58:56 test-network1 Keepalived_vrrp[24400]: VRRP_Instance(VR_217) sent 0 priority
    @02:58:56 test-network1 Keepalived_vrrp[24400]: VRRP_Instance(VR_217) removing protocol Virtual Routes
    @02:58:56 test-network1 Keepalived_vrrp[24400]: VRRP_Instance(VR_217) removing protocol VIPs.
    @02:58:56 test-network1 Keepalived_vrrp[24400]: VRRP_Instance(VR_217) removing protocol E-VIPs.
    @02:58:56 test-network1 Keepalived[24394]: Stopping
    @02:58:56 test-network1 neutron-keepalived-state-change[24278]: 2019-10-01 02:58:56.193 24278 DEBUG neutron.agent.linux.utils [-] enax_custom_log: pid: 24283, signal: 9 kill_process /openstack/venvs/neutron-19.0.4.dev1/lib/python2.7/site-packages/neutron/agent/linux/utils.py:243
    @02:58:56 test-network1 audit[24089]: USER_END pid=24089 uid=0 auid=4294967295 ses=4294967295 msg='op=PAM:session_close acct="root" exe="/usr/bin/sudo" hostname=? addr=? terminal=? res=success'
    @02:58:56 test-network1 sudo[24089]: pam_unix(sudo:session): session closed for user root
    @02:58:56 test-network1 audit[24089]: CRED_DISP pid=24089 uid=0 auid=4294967295 ses=4294967295 msg='op=PAM:setcred acct="root" exe="/usr/bin/sudo" hostname=? addr=? terminal=? res=success'
    @02:58:57 test-network1 Keepalived_vrrp[24400]: Stopped
    @02:58:57 test-network1 Keepalived[24394]: Stopped Keepalived v1.3.9 (10/21,2017)

  TCPDUMP qrouter-24010932-a0a4-4454-9539-27c1535c5ed8 ha-57528491-1b:
    @02:58:53.130735 IP 169.254.195.168 > 224.0.0.18: VRRPv2, Advertisement, vrid 217, prio 50, authtype simple, intvl 2s, length 20
    @02:58:55.131926 IP 169.254.195.168 > 224.0.0.18: VRRPv2, Advertisement, vrid 217, prio 50, authtype simple, intvl 2s, length 20
    @02:58:56.188558 IP 169.254.195.168 > 224.0.0.18: VRRPv2, Advertisement, vrid 217, prio 0, authtype simple, intvl 2s, length 20
    @02:58:56.215889 IP 169.254.195.168 > 224.0.0.22: igmp v3 report, 1 group record(s)
    @02:58:56.539804 IP 169.254.195.168 > 224.0.0.22: igmp v3 report, 1 group record(s)
    @02:58:56.995386 IP 169.254.194.242 > 224.0.0.18: VRRPv2, Advertisement, vrid 217, prio 50, authtype simple, intvl 2s, length 20
    @02:58:58.998565 ARP, Request who-has 169.254.0.217 (ff:ff:ff:ff:ff:ff) tell 169.254.0.217, length 28
    @02:58:59.000138 ARP, Request who-has 169.254.0.217 (ff:ff:ff:ff:ff:ff) tell 169.254.0.217, length 28
    @02:58:59.001063 ARP, Request who-has 169.254.0.217 (ff:ff:ff:ff:ff:ff) tell 169.254.0.217, length 28
    @02:58:59.002173 ARP, Request who-has 169.254.0.217 (ff:ff:ff:ff:ff:ff) tell 169.254.0.217, length 28
    @02:58:59.003018 ARP, Request who-has 169.254.0.217 (ff:ff:ff:ff:ff:ff) tell 169.254.0.217, length 28
    @02:58:59.003860 IP 169.254.194.242 > 224.0.0.18: VRRPv2, Advertisement, vrid 217, prio 50, authtype simple, intvl 2s, length 20
    @02:59:01.004772 IP 169.254.194.242 > 224.0.0.18: VRRPv2, Advertisement, vrid 217, prio 50, authtype simple, intvl 2s, length 20

  
  After l3-agent restart

  neutron l3-agent-list-hosting-router R1
  +--------------------------------------+---------------+----------------+-------+----------+
  | id                                   | host          | admin_state_up | alive | ha_state |
  +--------------------------------------+---------------+----------------+-------+----------+
  | 1b3b1b5d-08e7-48a1-ab8d-256d94099fb6 | test-network2 | True           | :-)   | active   |
  | fa402ada-7716-4ad4-a004-7f8114fb1edf | test-network1 | True           | :-)   | standby  |
  +--------------------------------------+---------------+----------------+-------+----------+

  Logs and configs in the attachment.

To manage notifications about this bug go to:
https://bugs.launchpad.net/neutron/+bug/1846198/+subscriptions


References