← Back to team overview

yahoo-eng-team team mailing list archive

[Bug 2004004] [NEW] keepalived virtual_routes wrong order

 

Public bug reported:

Neutron version: 13.0.6  (I will try to test this with version Zed as well)
ML2: OVS

1. create a provider network (eg. named: public)
2. create a subnet pool (eg. 100.100.100.32/27)
3. create a 1st subnet /29 from that subnetpool on network "public": 100.100.100.32/29 --gateway 100.100.100.33
4. create a 2nd subnet /29 from that subnetpool on network "public": 100.100.100.40/29 --gateway 100.100.100.33

NOTE: the "physical" gateway of the whole subnetpool is 100.100.100.33
-> so the GW of the 2nd subnet is in the range of the 1st subnet!

neutron_l3_agent will create a keepalived.conf like:

global_defs {
    notification_email_from neutron@openstack.local
    router_id neutron
}

vrrp_script ha_health_check_186 {
    script "/var/lib/neutron/ha_confs/f9ed7361-29b2-48e1-a96b-1a2919062021/ha_check_script_186.sh"
    interval 5
    fall 2
    rise 2
}

vrrp_instance VR_186 {
    state BACKUP
    interface ha-f3350150-28
    virtual_router_id 186
    priority 50
    garp_master_delay 60
    nopreempt
    advert_int 2
    authentication {
        auth_type PASS
        auth_pass somepass
    }
    track_interface {
        ha-f3350150-28
    }
    virtual_ipaddress {
        169.254.0.186/24 dev ha-f3350150-28
    }
    virtual_ipaddress_excluded {
        192.168.199.1/24 dev qr-24c07a36-4f
        100.100.100.34/32 dev qg-7b9963a7-72
        100.100.100.42/32 dev qg-7b9963a7-72
        100.100.100.43/29 dev qg-7b9963a7-72
        fe80::xxxx:xxxx:xxxx:xxxx/64 dev qg-7b9963a7-72 scope link
        fe80::xxxx:xxxx:xxxx:xxxx/64 dev qr-24c07a36-4f scope link
    }
    virtual_routes {
        0.0.0.0/0 via 100.100.100.33 dev qg-7b9963a7-72
        100.100.100.32/29 dev qg-7b9963a7-72 scope link
    }
    track_script {
        ha_health_check_186
    }
}


So keepalived will try to create the default route BEFORE the route "100.100.100.32/29 dev qg-7b9963a7-72 scope link" has been created. This will throw an error like:

Jan 26 17:49:56 xxxxxxxx Keepalived_vrrp[1532]: (VR_186) Receive advertisement timeout
Jan 26 17:49:56 xxxxxxxx Keepalived_vrrp[1532]: (VR_186) Entering MASTER STATE
Jan 26 17:49:56 xxxxxxxx Keepalived_vrrp[1532]: (VR_186) setting VIPs.
Jan 26 17:49:56 xxxxxxxx Keepalived_vrrp[1532]: (VR_186) setting E-VIPs.
Jan 26 17:49:56 xxxxxxxx Keepalived_vrrp[1532]: (VR_186) setting Virtual Routes
Jan 26 17:49:56 xxxxxxxx Keepalived_vrrp[1532]: Netlink: error: Network is unreachable(101), type=RTM_NEWROUTE(24), seq=1674751752, pid=0
Jan 26 17:49:56 xxxxxxxx Keepalived_vrrp[1532]: (VR_186) Sending/queueing gratuitous ARPs on ha-f3350150-28 for 169.254.0.186
...

And the default route will be missing:

169.254.0.0/24 dev ha-a1f79365-fc proto kernel scope link src 169.254.0.186
169.254.192.0/18 dev ha-a1f79365-fc proto kernel scope link src 169.254.192.14
192.168.199.0/24 dev qr-24c07a36-4f proto kernel scope link src 192.168.199.1
100.100.100.32/29 dev qg-7b9963a7-72 scope link
100.100.100.40/29 dev qg-7b9963a7-72 proto kernel scope link src 100.100.100.43


Changing the order of "virtual_routes" will fix this issue:

    }
    virtual_routes {
        100.100.100.32/29 dev qg-7b9963a7-72 scope link
        0.0.0.0/0 via 100.100.100.33 dev qg-7b9963a7-72
    }


The error message is now gone:

Jan 27 10:09:31 xxxxxxxxx Keepalived_vrrp[1532]: (VR_186) Receive advertisement timeout
Jan 27 10:09:31 xxxxxxxxx Keepalived_vrrp[1532]: (VR_186) Entering MASTER STATE
Jan 27 10:09:31 xxxxxxxxx Keepalived_vrrp[1532]: (VR_186) setting VIPs.
Jan 27 10:09:31 xxxxxxxxx Keepalived_vrrp[1532]: (VR_186) setting E-VIPs.
Jan 27 10:09:31 xxxxxxxxx Keepalived_vrrp[1532]: (VR_186) setting Virtual Routes
Jan 27 10:09:31 xxxxxxxxx Keepalived_vrrp[1532]: (VR_186) Sending/queueing gratuitous ARPs on ha-f3350150-28 for 169.254.0.186
Jan 27 10:09:31 xxxxxxxxx Keepalived_vrrp[1532]: Sending gratuitous ARP on ha-f3350150-28 for 169.254.0.186

And the routing table looks fine:

default via 100.100.100.33 dev qg-7b9963a7-72
169.254.0.0/24 dev ha-a1f79365-fc proto kernel scope link src 169.254.0.186
169.254.192.0/18 dev ha-a1f79365-fc proto kernel scope link src 169.254.192.14
192.168.199.0/24 dev qr-24c07a36-4f proto kernel scope link src 192.168.199.1
100.100.100.32/29 dev qg-7b9963a7-72 scope link
100.100.100.40/29 dev qg-7b9963a7-72 proto kernel scope link src 100.100.100.43

To me, it looks like the order in neutron/agent/linux/keepalived.py has
to be changed?

** Affects: neutron
     Importance: Undecided
         Status: New

-- 
You received this bug notification because you are a member of Yahoo!
Engineering Team, which is subscribed to neutron.
https://bugs.launchpad.net/bugs/2004004

Title:
  keepalived virtual_routes wrong order

Status in neutron:
  New

Bug description:
  Neutron version: 13.0.6  (I will try to test this with version Zed as well)
  ML2: OVS

  1. create a provider network (eg. named: public)
  2. create a subnet pool (eg. 100.100.100.32/27)
  3. create a 1st subnet /29 from that subnetpool on network "public": 100.100.100.32/29 --gateway 100.100.100.33
  4. create a 2nd subnet /29 from that subnetpool on network "public": 100.100.100.40/29 --gateway 100.100.100.33

  NOTE: the "physical" gateway of the whole subnetpool is 100.100.100.33
  -> so the GW of the 2nd subnet is in the range of the 1st subnet!

  neutron_l3_agent will create a keepalived.conf like:

  global_defs {
      notification_email_from neutron@openstack.local
      router_id neutron
  }

  vrrp_script ha_health_check_186 {
      script "/var/lib/neutron/ha_confs/f9ed7361-29b2-48e1-a96b-1a2919062021/ha_check_script_186.sh"
      interval 5
      fall 2
      rise 2
  }

  vrrp_instance VR_186 {
      state BACKUP
      interface ha-f3350150-28
      virtual_router_id 186
      priority 50
      garp_master_delay 60
      nopreempt
      advert_int 2
      authentication {
          auth_type PASS
          auth_pass somepass
      }
      track_interface {
          ha-f3350150-28
      }
      virtual_ipaddress {
          169.254.0.186/24 dev ha-f3350150-28
      }
      virtual_ipaddress_excluded {
          192.168.199.1/24 dev qr-24c07a36-4f
          100.100.100.34/32 dev qg-7b9963a7-72
          100.100.100.42/32 dev qg-7b9963a7-72
          100.100.100.43/29 dev qg-7b9963a7-72
          fe80::xxxx:xxxx:xxxx:xxxx/64 dev qg-7b9963a7-72 scope link
          fe80::xxxx:xxxx:xxxx:xxxx/64 dev qr-24c07a36-4f scope link
      }
      virtual_routes {
          0.0.0.0/0 via 100.100.100.33 dev qg-7b9963a7-72
          100.100.100.32/29 dev qg-7b9963a7-72 scope link
      }
      track_script {
          ha_health_check_186
      }
  }

  
  So keepalived will try to create the default route BEFORE the route "100.100.100.32/29 dev qg-7b9963a7-72 scope link" has been created. This will throw an error like:

  Jan 26 17:49:56 xxxxxxxx Keepalived_vrrp[1532]: (VR_186) Receive advertisement timeout
  Jan 26 17:49:56 xxxxxxxx Keepalived_vrrp[1532]: (VR_186) Entering MASTER STATE
  Jan 26 17:49:56 xxxxxxxx Keepalived_vrrp[1532]: (VR_186) setting VIPs.
  Jan 26 17:49:56 xxxxxxxx Keepalived_vrrp[1532]: (VR_186) setting E-VIPs.
  Jan 26 17:49:56 xxxxxxxx Keepalived_vrrp[1532]: (VR_186) setting Virtual Routes
  Jan 26 17:49:56 xxxxxxxx Keepalived_vrrp[1532]: Netlink: error: Network is unreachable(101), type=RTM_NEWROUTE(24), seq=1674751752, pid=0
  Jan 26 17:49:56 xxxxxxxx Keepalived_vrrp[1532]: (VR_186) Sending/queueing gratuitous ARPs on ha-f3350150-28 for 169.254.0.186
  ...

  And the default route will be missing:

  169.254.0.0/24 dev ha-a1f79365-fc proto kernel scope link src 169.254.0.186
  169.254.192.0/18 dev ha-a1f79365-fc proto kernel scope link src 169.254.192.14
  192.168.199.0/24 dev qr-24c07a36-4f proto kernel scope link src 192.168.199.1
  100.100.100.32/29 dev qg-7b9963a7-72 scope link
  100.100.100.40/29 dev qg-7b9963a7-72 proto kernel scope link src 100.100.100.43

  
  Changing the order of "virtual_routes" will fix this issue:

      }
      virtual_routes {
          100.100.100.32/29 dev qg-7b9963a7-72 scope link
          0.0.0.0/0 via 100.100.100.33 dev qg-7b9963a7-72
      }

  
  The error message is now gone:

  Jan 27 10:09:31 xxxxxxxxx Keepalived_vrrp[1532]: (VR_186) Receive advertisement timeout
  Jan 27 10:09:31 xxxxxxxxx Keepalived_vrrp[1532]: (VR_186) Entering MASTER STATE
  Jan 27 10:09:31 xxxxxxxxx Keepalived_vrrp[1532]: (VR_186) setting VIPs.
  Jan 27 10:09:31 xxxxxxxxx Keepalived_vrrp[1532]: (VR_186) setting E-VIPs.
  Jan 27 10:09:31 xxxxxxxxx Keepalived_vrrp[1532]: (VR_186) setting Virtual Routes
  Jan 27 10:09:31 xxxxxxxxx Keepalived_vrrp[1532]: (VR_186) Sending/queueing gratuitous ARPs on ha-f3350150-28 for 169.254.0.186
  Jan 27 10:09:31 xxxxxxxxx Keepalived_vrrp[1532]: Sending gratuitous ARP on ha-f3350150-28 for 169.254.0.186

  And the routing table looks fine:

  default via 100.100.100.33 dev qg-7b9963a7-72
  169.254.0.0/24 dev ha-a1f79365-fc proto kernel scope link src 169.254.0.186
  169.254.192.0/18 dev ha-a1f79365-fc proto kernel scope link src 169.254.192.14
  192.168.199.0/24 dev qr-24c07a36-4f proto kernel scope link src 192.168.199.1
  100.100.100.32/29 dev qg-7b9963a7-72 scope link
  100.100.100.40/29 dev qg-7b9963a7-72 proto kernel scope link src 100.100.100.43

  To me, it looks like the order in neutron/agent/linux/keepalived.py
  has to be changed?

To manage notifications about this bug go to:
https://bugs.launchpad.net/neutron/+bug/2004004/+subscriptions



Follow ups