← Back to team overview

yahoo-eng-team team mailing list archive

[Bug 1796703] [NEW] HA router interfaces in standby state

 

Public bug reported:

Hello,
I faced up with issue when IP floating stopped to work for particular project in OpenStack because of wrong state of HA interfaces.
I have OpenStack-Ansible setup with 3 Neutron containers. IP floating, creating router interfaces in other OpenStack projects works fine in the same time. 
Debug showed that, all HA interfaces dedicated to router inside OpenStack project have status "standby". Neutron cli command output:
neutron l3-agent-list-hosting-router c71008d3-5685-4e11-b650-7f7f49408643
+--------------------------------------+------------------------------------------+----------------+-------+----------+
| id                                   | host                                     | admin_state_up | alive | ha_state |
+--------------------------------------+------------------------------------------+----------------+-------+----------+
| 44738018-e88c-4358-829a-167502be4f3b | infra2-neutron-agents-container-6ed576b6 | True           | :-)   | standby  |
| c1d95367-9193-42e0-aa1f-9919d4ad79a3 | infra1-neutron-agents-container-946dca38 | True           | :-)   | standby  |
| c7023dd7-ce87-4e85-a82f-f725caf649f0 | infra3-neutron-agents-container-fb07941c | True           | :-)   | standby  |
+--------------------------------------+------------------------------------------+----------------+-------+----------+
root@infra1-utility-container-59becf74:/# openstack port list --router c71008d3-5685-4e11-b650-7f7f49408643
+--------------------------------------+-------------------------------------------------+-------------------+------------------------------------------------------------------------------+--------+
| ID                                   | Name                                            | MAC Address       | Fixed IP Addresses                                                           | Status |
+--------------------------------------+-------------------------------------------------+-------------------+------------------------------------------------------------------------------+--------+
| 10227cbb-d5d0-4563-aa34-f501caedc501 |                                                 | fa:16:3e:7d:f8:da | ip_address='x.x.x.8', subnet_id='4ba80f19-4ef8-431f-8f4f-4fe844f4b673'  | ACTIVE |
| 49e60bd7-3661-4c8f-bbbf-ba6942dc960a | HA port tenant 6d439dafcdca4e06bd30935f83a24bb0 | fa:16:3e:32:59:5f | ip_address='169.254.192.6', subnet_id='8ae9245b-eb38-4069-98c8-ccf3e41f3516' | DOWN   |
| 81b8f9ab-3026-44f4-aa1c-59b0a2754ae5 | HA port tenant 6d439dafcdca4e06bd30935f83a24bb0 | fa:16:3e:e9:ba:50 | ip_address='169.254.192.8', subnet_id='8ae9245b-eb38-4069-98c8-ccf3e41f3516' | DOWN   |
| bbe4833f-f67a-40b1-b72d-12a1a825cbd1 | HA port tenant 6d439dafcdca4e06bd30935f83a24bb0 | fa:16:3e:71:66:85 | ip_address='169.254.192.5', subnet_id='8ae9245b-eb38-4069-98c8-ccf3e41f3516' | DOWN   |
| cab8cacb-1abf-4d94-8133-83fbaa67d048 |                                                 | fa:16:3e:f2:f4:db | ip_address='172.16.0.1', subnet_id='dc422b33-1e45-43c4-8510-8b0f14baf181'    | ACTIVE |
+--------------------------------------+-------------------------------------------------+-------------------+------------------------------------------------------------------------------+--------+


Router namespace inside containers doesn't have assigned float IPs, router IP addresses, internal network GW (172.16.0.1):
root@infra1-neutron-agents-container-946dca38:/# ip netns exec qrouter-c71008d3-5685-4e11-b650-7f7f49408643 ip a
1: lo: <LOOPBACK,UP,LOWER_UP> mtu 65536 qdisc noqueue state UNKNOWN group default qlen 1
    link/loopback 00:00:00:00:00:00 brd 00:00:00:00:00:00
    inet 127.0.0.1/8 scope host lo
       valid_lft forever preferred_lft forever
    inet6 ::1/128 scope host 
       valid_lft forever preferred_lft forever
2: ha-bbe4833f-f6@if137: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1450 qdisc noqueue state UP group default qlen 1000
    link/ether fa:16:3e:71:66:85 brd ff:ff:ff:ff:ff:ff link-netnsid 0
    inet 169.254.192.5/18 brd 169.254.255.255 scope global ha-bbe4833f-f6
       valid_lft forever preferred_lft forever
    inet6 fe80::f816:3eff:fe71:6685/64 scope link 
       valid_lft forever preferred_lft forever
3: qr-cab8cacb-1a@if138: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1450 qdisc noqueue state UP group default qlen 1000
    link/ether fa:16:3e:f2:f4:db brd ff:ff:ff:ff:ff:ff link-netnsid 0
4: qg-10227cbb-d5@if139: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc noqueue state UP group default qlen 1000
    link/ether fa:16:3e:7d:f8:da brd ff:ff:ff:ff:ff:ff link-netnsid 0

Keepalived proccess isn't launched for router id "c71008d3-5685-4e11-b650-7f7f49408643":
root@infra1-neutron-agents-container-946dca38:/# ps auxw | grep c71008d3-5685-4e11-b650-7f7f49408643
neutron    90394  0.0  0.1 166660 72136 ?        S    Oct05   0:00 /openstack/venvs/neutron-16.0.1/bin/python /openstack/venvs/neutron-16.0.1/bin/neutron-keepalived-state-change --router_id=c71008d3-5685-4e11-b650-7f7f49408643 --namespace=qrouter-c71008d3-5685-4e11-b650-7f7f49408643 --conf_dir=/var/lib/neutron/ha_confs/c71008d3-5685-4e11-b650-7f7f49408643 --monitor_interface=ha-bbe4833f-f6 --monitor_cidr=169.254.0.1/24 --pid_file=/var/lib/neutron/external/pids/c71008d3-5685-4e11-b650-7f7f49408643.monitor.pid --state_path=/var/lib/neutron --user=999 --group=999
root      103561  0.0  0.0  11284   928 ?        S+   14:14   0:00 grep c71008d3-5685-4e11-b650-7f7f49408643

Neutron launches Keepalived from configuration folder
"/var/lib/neutron/ha_confs/c71008d3-5685-4e11-b650-7f7f49408643/".
Folder contains 2 files: "keepalived.conf", "state". Neutron usually
update "keepalived.conf" file on router create command but it doens't do
this for router with id "c71008d3-5685-4e11-b650-7f7f49408643". It only
updates "state" file.

I can't provide step-by-step reproduction steps because the trigger of this problem is unclear for me. According to my research, this error can be fixed by recreating router but I don't really want to do this because it will not solve the source of problem. Neutron log output is attached. 
I suppose that problem can be in wrong Neutron database records, but I wasn't able to found what script generates "keepalived.conf". Please, let me know script/task do this and I will be able to continue debugging.
Thanks for paying attention.

Software description:
OpenStack was deployed via OpenStack-Ansible playbook, Pike 16.0.1, commit ebe2bc8734845b44c17819c04f2322a2ca7152db.
OpenStack services running inside LXC containers. Neutron server, API, agent, sceduler are placed in one container.
Linux OS - Ubuntu 16.04.4 LTS, kernel - 4.4.0-134-generic
neutron-keepalived-state-change - 11.0.2.dev2
neutron-l3-agent 11.0.2.dev2
neutron-server 11.0.2.dev2
neutron CLI - 6.5.0

** Affects: neutron
     Importance: Undecided
         Status: New

** Patch added: "file contains logs from neutron-l3-agent.log, neutron.log , neutron-server.log files"
   https://bugs.launchpad.net/bugs/1796703/+attachment/5198714/+files/neutron-full.log

-- 
You received this bug notification because you are a member of Yahoo!
Engineering Team, which is subscribed to neutron.
https://bugs.launchpad.net/bugs/1796703

Title:
  HA router interfaces in standby state

Status in neutron:
  New

Bug description:
  Hello,
  I faced up with issue when IP floating stopped to work for particular project in OpenStack because of wrong state of HA interfaces.
  I have OpenStack-Ansible setup with 3 Neutron containers. IP floating, creating router interfaces in other OpenStack projects works fine in the same time. 
  Debug showed that, all HA interfaces dedicated to router inside OpenStack project have status "standby". Neutron cli command output:
  neutron l3-agent-list-hosting-router c71008d3-5685-4e11-b650-7f7f49408643
  +--------------------------------------+------------------------------------------+----------------+-------+----------+
  | id                                   | host                                     | admin_state_up | alive | ha_state |
  +--------------------------------------+------------------------------------------+----------------+-------+----------+
  | 44738018-e88c-4358-829a-167502be4f3b | infra2-neutron-agents-container-6ed576b6 | True           | :-)   | standby  |
  | c1d95367-9193-42e0-aa1f-9919d4ad79a3 | infra1-neutron-agents-container-946dca38 | True           | :-)   | standby  |
  | c7023dd7-ce87-4e85-a82f-f725caf649f0 | infra3-neutron-agents-container-fb07941c | True           | :-)   | standby  |
  +--------------------------------------+------------------------------------------+----------------+-------+----------+
  root@infra1-utility-container-59becf74:/# openstack port list --router c71008d3-5685-4e11-b650-7f7f49408643
  +--------------------------------------+-------------------------------------------------+-------------------+------------------------------------------------------------------------------+--------+
  | ID                                   | Name                                            | MAC Address       | Fixed IP Addresses                                                           | Status |
  +--------------------------------------+-------------------------------------------------+-------------------+------------------------------------------------------------------------------+--------+
  | 10227cbb-d5d0-4563-aa34-f501caedc501 |                                                 | fa:16:3e:7d:f8:da | ip_address='x.x.x.8', subnet_id='4ba80f19-4ef8-431f-8f4f-4fe844f4b673'  | ACTIVE |
  | 49e60bd7-3661-4c8f-bbbf-ba6942dc960a | HA port tenant 6d439dafcdca4e06bd30935f83a24bb0 | fa:16:3e:32:59:5f | ip_address='169.254.192.6', subnet_id='8ae9245b-eb38-4069-98c8-ccf3e41f3516' | DOWN   |
  | 81b8f9ab-3026-44f4-aa1c-59b0a2754ae5 | HA port tenant 6d439dafcdca4e06bd30935f83a24bb0 | fa:16:3e:e9:ba:50 | ip_address='169.254.192.8', subnet_id='8ae9245b-eb38-4069-98c8-ccf3e41f3516' | DOWN   |
  | bbe4833f-f67a-40b1-b72d-12a1a825cbd1 | HA port tenant 6d439dafcdca4e06bd30935f83a24bb0 | fa:16:3e:71:66:85 | ip_address='169.254.192.5', subnet_id='8ae9245b-eb38-4069-98c8-ccf3e41f3516' | DOWN   |
  | cab8cacb-1abf-4d94-8133-83fbaa67d048 |                                                 | fa:16:3e:f2:f4:db | ip_address='172.16.0.1', subnet_id='dc422b33-1e45-43c4-8510-8b0f14baf181'    | ACTIVE |
  +--------------------------------------+-------------------------------------------------+-------------------+------------------------------------------------------------------------------+--------+

  
  Router namespace inside containers doesn't have assigned float IPs, router IP addresses, internal network GW (172.16.0.1):
  root@infra1-neutron-agents-container-946dca38:/# ip netns exec qrouter-c71008d3-5685-4e11-b650-7f7f49408643 ip a
  1: lo: <LOOPBACK,UP,LOWER_UP> mtu 65536 qdisc noqueue state UNKNOWN group default qlen 1
      link/loopback 00:00:00:00:00:00 brd 00:00:00:00:00:00
      inet 127.0.0.1/8 scope host lo
         valid_lft forever preferred_lft forever
      inet6 ::1/128 scope host 
         valid_lft forever preferred_lft forever
  2: ha-bbe4833f-f6@if137: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1450 qdisc noqueue state UP group default qlen 1000
      link/ether fa:16:3e:71:66:85 brd ff:ff:ff:ff:ff:ff link-netnsid 0
      inet 169.254.192.5/18 brd 169.254.255.255 scope global ha-bbe4833f-f6
         valid_lft forever preferred_lft forever
      inet6 fe80::f816:3eff:fe71:6685/64 scope link 
         valid_lft forever preferred_lft forever
  3: qr-cab8cacb-1a@if138: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1450 qdisc noqueue state UP group default qlen 1000
      link/ether fa:16:3e:f2:f4:db brd ff:ff:ff:ff:ff:ff link-netnsid 0
  4: qg-10227cbb-d5@if139: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc noqueue state UP group default qlen 1000
      link/ether fa:16:3e:7d:f8:da brd ff:ff:ff:ff:ff:ff link-netnsid 0

  Keepalived proccess isn't launched for router id "c71008d3-5685-4e11-b650-7f7f49408643":
  root@infra1-neutron-agents-container-946dca38:/# ps auxw | grep c71008d3-5685-4e11-b650-7f7f49408643
  neutron    90394  0.0  0.1 166660 72136 ?        S    Oct05   0:00 /openstack/venvs/neutron-16.0.1/bin/python /openstack/venvs/neutron-16.0.1/bin/neutron-keepalived-state-change --router_id=c71008d3-5685-4e11-b650-7f7f49408643 --namespace=qrouter-c71008d3-5685-4e11-b650-7f7f49408643 --conf_dir=/var/lib/neutron/ha_confs/c71008d3-5685-4e11-b650-7f7f49408643 --monitor_interface=ha-bbe4833f-f6 --monitor_cidr=169.254.0.1/24 --pid_file=/var/lib/neutron/external/pids/c71008d3-5685-4e11-b650-7f7f49408643.monitor.pid --state_path=/var/lib/neutron --user=999 --group=999
  root      103561  0.0  0.0  11284   928 ?        S+   14:14   0:00 grep c71008d3-5685-4e11-b650-7f7f49408643

  Neutron launches Keepalived from configuration folder
  "/var/lib/neutron/ha_confs/c71008d3-5685-4e11-b650-7f7f49408643/".
  Folder contains 2 files: "keepalived.conf", "state". Neutron usually
  update "keepalived.conf" file on router create command but it doens't
  do this for router with id "c71008d3-5685-4e11-b650-7f7f49408643". It
  only updates "state" file.

  I can't provide step-by-step reproduction steps because the trigger of this problem is unclear for me. According to my research, this error can be fixed by recreating router but I don't really want to do this because it will not solve the source of problem. Neutron log output is attached. 
  I suppose that problem can be in wrong Neutron database records, but I wasn't able to found what script generates "keepalived.conf". Please, let me know script/task do this and I will be able to continue debugging.
  Thanks for paying attention.

  Software description:
  OpenStack was deployed via OpenStack-Ansible playbook, Pike 16.0.1, commit ebe2bc8734845b44c17819c04f2322a2ca7152db.
  OpenStack services running inside LXC containers. Neutron server, API, agent, sceduler are placed in one container.
  Linux OS - Ubuntu 16.04.4 LTS, kernel - 4.4.0-134-generic
  neutron-keepalived-state-change - 11.0.2.dev2
  neutron-l3-agent 11.0.2.dev2
  neutron-server 11.0.2.dev2
  neutron CLI - 6.5.0

To manage notifications about this bug go to:
https://bugs.launchpad.net/neutron/+bug/1796703/+subscriptions