← Back to team overview

yahoo-eng-team team mailing list archive

[Bug 2012104] [NEW] Neutron picking incorrect ovn records

 

Public bug reported:

For one of our compute machines I'm seeing two network agents that
appear unhealthy:

```
$ os network agent list | fgrep "register deleted"
| compute1                              | OVN Controller agent | ("Chassis" register deleted) |                   | XXX   | UP    | ovn-controller             |
| c085d57a-3a2b-4f97-8250-23d3f914b078 | OVN Metadata agent   | ("Chassis" register deleted) |                   | XXX   | UP    | neutron-ovn-metadata-agent |
```

The ("Chassis" register deleted) message appears to come from the fix
for this: https://bugs.launchpad.net/neutron/+bug/1951149

Searching for that external id I can find this private chassis and it's
chassis indeed seems empty:

```
$ sudo ovn-sbctl find chassis-private | grep -A 5 e621e0fb-83d3-4a18-82b3-c842996548ed'
_uuid               : e621e0fb-83d3-4a18-82b3-c842996548ed
chassis             : []
external_ids        : {"neutron:liveness_check_at"="2022-06-17T08:43:33.393639+00:00", "neutron:metadata_liveness_check_at"="2022-06-17T02:27:21.309718+00:00", "neutron:ovn-metadata-id"="c085d57a-3a2b-4f97-8250-23d3f914b078", "ne
utron:ovn-metadata-sb-cfg"="150397"}
name                : compute1
nb_cfg              : 150397
nb_cfg_timestamp    : 1657729945956
```

But there's also:

```
$ sudo ovn-sbctl find chassis hostname=compute1.stack
_uuid               : 164cb56b-1a3c-4401-bc52-6fa5e58d8f2a
encaps              : [c442312a-9dfa-4ffe-9db7-afe5f9055962]
external_ids        : {datapath-type=system, iface-types="bareudp,erspan,geneve,gre,gtpu,internal,ip6erspan,ip6gre,lisp,patch,stt,system,tap,vxlan", is-interconn="false", "neutron:ovn-metadata-sb-cfg"="250161", ovn-bridge-mappings="", ovn-chassis-mac-mappings="", ovn-cms-options="", ovn-enable-lflow-cache="true", ovn-limit-lflow-cache="", ovn-memlimit-lflow-cache-kb="", ovn-monitor-all="false", ovn-trim-limit-lflow-cache="", ovn-trim-timeout-ms="", ovn-trim-wmark-perc-lflow-cache="", port-up-notif="true"}
hostname            : compute1.stack
name                : compute1.stack
nb_cfg              : 0
other_config        : {datapath-type=system, iface-types="bareudp,erspan,geneve,gre,gtpu,internal,ip6erspan,ip6gre,lisp,patch,stt,system,tap,vxlan", is-interconn="false", ovn-bridge-mappings="", ovn-chassis-mac-mappings="", ovn-cms-options="", ovn-enable-lflow-cache="true", ovn-limit-lflow-cache="", ovn-memlimit-lflow-cache-kb="", ovn-monitor-all="false", ovn-trim-limit-lflow-cache="", ovn-trim-timeout-ms="", ovn-trim-wmark-perc-lflow-cache="", port-up-notif="true"}
transport_zones     : []
vtep_logical_switches: []

$ sudo ovn-sbctl find chassis-private chassis=164cb56b-1a3c-4401-bc52-6fa5e58d8f2a
_uuid               : cbec617d-19dc-481c-ba99-b4132244773c
chassis             : 164cb56b-1a3c-4401-bc52-6fa5e58d8f2a
external_ids        : {"neutron:ovn-metadata-id"="3328a0c7-081b-58a9-9e91-baf5c8c259cd", "neutron:ovn-metadata-sb-cfg"="312321"}
name                : compute1.stack
nb_cfg              : 312321
nb_cfg_timestamp    : 1679042105359
```


Which seems to be a correct entry -- should neutron not pick up this entry rather than the one with "chassis             : []"?

Software versions:

ii  neutron-server                  2:20.2.0-0ubuntu1~cloud0
all          Neutron is a virtual network service for Openstack - server

ii  ovn-central                    22.03.0-0ubuntu1~cloud0
amd64        OVN central components

Distributor ID: Ubuntu
Description:    Ubuntu 20.04.4 LTS
Release:        20.04
Codename:       focal


Please let me know if I can provide more diagnostics.

** Affects: neutron
     Importance: Undecided
         Status: New

** Description changed:

  For one of our compute machines I'm seeing two network agents that
  appear unhealthy:
  
+ ```
  $ os network agent list | fgrep "register deleted"
  | compute1                              | OVN Controller agent | ("Chassis" register deleted) |                   | XXX   | UP    | ovn-controller             |
  | c085d57a-3a2b-4f97-8250-23d3f914b078 | OVN Metadata agent   | ("Chassis" register deleted) |                   | XXX   | UP    | neutron-ovn-metadata-agent |
+ ```
  
  The ("Chassis" register deleted) message appears to come from the fix
  for this: https://bugs.launchpad.net/neutron/+bug/1951149
  
  Searching for that external id I can find this private chassis and it's
  chassis indeed seems empty:
  
+ ```
  $ sudo ovn-sbctl find chassis-private | grep -A 5 e621e0fb-83d3-4a18-82b3-c842996548ed'
  _uuid               : e621e0fb-83d3-4a18-82b3-c842996548ed
  chassis             : []
  external_ids        : {"neutron:liveness_check_at"="2022-06-17T08:43:33.393639+00:00", "neutron:metadata_liveness_check_at"="2022-06-17T02:27:21.309718+00:00", "neutron:ovn-metadata-id"="c085d57a-3a2b-4f97-8250-23d3f914b078", "ne
  utron:ovn-metadata-sb-cfg"="150397"}
  name                : compute1
  nb_cfg              : 150397
  nb_cfg_timestamp    : 1657729945956
+ ```
  
  But there's also:
  
+ ```
  $ sudo ovn-sbctl find chassis hostname=compute1.stack
  _uuid               : 164cb56b-1a3c-4401-bc52-6fa5e58d8f2a
  encaps              : [c442312a-9dfa-4ffe-9db7-afe5f9055962]
  external_ids        : {datapath-type=system, iface-types="bareudp,erspan,geneve,gre,gtpu,internal,ip6erspan,ip6gre,lisp,patch,stt,system,tap,vxlan", is-interconn="false", "neutron:ovn-metadata-sb-cfg"="250161", ovn-bridge-mappings="", ovn-chassis-mac-mappings="", ovn-cms-options="", ovn-enable-lflow-cache="true", ovn-limit-lflow-cache="", ovn-memlimit-lflow-cache-kb="", ovn-monitor-all="false", ovn-trim-limit-lflow-cache="", ovn-trim-timeout-ms="", ovn-trim-wmark-perc-lflow-cache="", port-up-notif="true"}
  hostname            : compute1.stack
  name                : compute1.stack
  nb_cfg              : 0
  other_config        : {datapath-type=system, iface-types="bareudp,erspan,geneve,gre,gtpu,internal,ip6erspan,ip6gre,lisp,patch,stt,system,tap,vxlan", is-interconn="false", ovn-bridge-mappings="", ovn-chassis-mac-mappings="", ovn-cms-options="", ovn-enable-lflow-cache="true", ovn-limit-lflow-cache="", ovn-memlimit-lflow-cache-kb="", ovn-monitor-all="false", ovn-trim-limit-lflow-cache="", ovn-trim-timeout-ms="", ovn-trim-wmark-perc-lflow-cache="", port-up-notif="true"}
  transport_zones     : []
  vtep_logical_switches: []
  
  $ sudo ovn-sbctl find chassis-private chassis=164cb56b-1a3c-4401-bc52-6fa5e58d8f2a
  _uuid               : cbec617d-19dc-481c-ba99-b4132244773c
  chassis             : 164cb56b-1a3c-4401-bc52-6fa5e58d8f2a
  external_ids        : {"neutron:ovn-metadata-id"="3328a0c7-081b-58a9-9e91-baf5c8c259cd", "neutron:ovn-metadata-sb-cfg"="312321"}
  name                : compute1.stack
  nb_cfg              : 312321
  nb_cfg_timestamp    : 1679042105359
+ ```
  
  
  Which seems to be a correct entry -- should neutron not pick up this entry rather than the one with "chassis             : []"?
  
  Software versions:
  
  ii  neutron-server                  2:20.2.0-0ubuntu1~cloud0
  all          Neutron is a virtual network service for Openstack - server
  
  ii  ovn-central                    22.03.0-0ubuntu1~cloud0
  amd64        OVN central components
  
  Distributor ID: Ubuntu
  Description:    Ubuntu 20.04.4 LTS
  Release:        20.04
  Codename:       focal
  
+ 
  Please let me know if I can provide more diagnostics.

-- 
You received this bug notification because you are a member of Yahoo!
Engineering Team, which is subscribed to neutron.
https://bugs.launchpad.net/bugs/2012104

Title:
  Neutron picking incorrect ovn records

Status in neutron:
  New

Bug description:
  For one of our compute machines I'm seeing two network agents that
  appear unhealthy:

  ```
  $ os network agent list | fgrep "register deleted"
  | compute1                              | OVN Controller agent | ("Chassis" register deleted) |                   | XXX   | UP    | ovn-controller             |
  | c085d57a-3a2b-4f97-8250-23d3f914b078 | OVN Metadata agent   | ("Chassis" register deleted) |                   | XXX   | UP    | neutron-ovn-metadata-agent |
  ```

  The ("Chassis" register deleted) message appears to come from the fix
  for this: https://bugs.launchpad.net/neutron/+bug/1951149

  Searching for that external id I can find this private chassis and
  it's chassis indeed seems empty:

  ```
  $ sudo ovn-sbctl find chassis-private | grep -A 5 e621e0fb-83d3-4a18-82b3-c842996548ed'
  _uuid               : e621e0fb-83d3-4a18-82b3-c842996548ed
  chassis             : []
  external_ids        : {"neutron:liveness_check_at"="2022-06-17T08:43:33.393639+00:00", "neutron:metadata_liveness_check_at"="2022-06-17T02:27:21.309718+00:00", "neutron:ovn-metadata-id"="c085d57a-3a2b-4f97-8250-23d3f914b078", "ne
  utron:ovn-metadata-sb-cfg"="150397"}
  name                : compute1
  nb_cfg              : 150397
  nb_cfg_timestamp    : 1657729945956
  ```

  But there's also:

  ```
  $ sudo ovn-sbctl find chassis hostname=compute1.stack
  _uuid               : 164cb56b-1a3c-4401-bc52-6fa5e58d8f2a
  encaps              : [c442312a-9dfa-4ffe-9db7-afe5f9055962]
  external_ids        : {datapath-type=system, iface-types="bareudp,erspan,geneve,gre,gtpu,internal,ip6erspan,ip6gre,lisp,patch,stt,system,tap,vxlan", is-interconn="false", "neutron:ovn-metadata-sb-cfg"="250161", ovn-bridge-mappings="", ovn-chassis-mac-mappings="", ovn-cms-options="", ovn-enable-lflow-cache="true", ovn-limit-lflow-cache="", ovn-memlimit-lflow-cache-kb="", ovn-monitor-all="false", ovn-trim-limit-lflow-cache="", ovn-trim-timeout-ms="", ovn-trim-wmark-perc-lflow-cache="", port-up-notif="true"}
  hostname            : compute1.stack
  name                : compute1.stack
  nb_cfg              : 0
  other_config        : {datapath-type=system, iface-types="bareudp,erspan,geneve,gre,gtpu,internal,ip6erspan,ip6gre,lisp,patch,stt,system,tap,vxlan", is-interconn="false", ovn-bridge-mappings="", ovn-chassis-mac-mappings="", ovn-cms-options="", ovn-enable-lflow-cache="true", ovn-limit-lflow-cache="", ovn-memlimit-lflow-cache-kb="", ovn-monitor-all="false", ovn-trim-limit-lflow-cache="", ovn-trim-timeout-ms="", ovn-trim-wmark-perc-lflow-cache="", port-up-notif="true"}
  transport_zones     : []
  vtep_logical_switches: []

  $ sudo ovn-sbctl find chassis-private chassis=164cb56b-1a3c-4401-bc52-6fa5e58d8f2a
  _uuid               : cbec617d-19dc-481c-ba99-b4132244773c
  chassis             : 164cb56b-1a3c-4401-bc52-6fa5e58d8f2a
  external_ids        : {"neutron:ovn-metadata-id"="3328a0c7-081b-58a9-9e91-baf5c8c259cd", "neutron:ovn-metadata-sb-cfg"="312321"}
  name                : compute1.stack
  nb_cfg              : 312321
  nb_cfg_timestamp    : 1679042105359
  ```

  
  Which seems to be a correct entry -- should neutron not pick up this entry rather than the one with "chassis             : []"?

  Software versions:

  ii  neutron-server                  2:20.2.0-0ubuntu1~cloud0
  all          Neutron is a virtual network service for Openstack -
  server

  ii  ovn-central                    22.03.0-0ubuntu1~cloud0
  amd64        OVN central components

  Distributor ID: Ubuntu
  Description:    Ubuntu 20.04.4 LTS
  Release:        20.04
  Codename:       focal

  
  Please let me know if I can provide more diagnostics.

To manage notifications about this bug go to:
https://bugs.launchpad.net/neutron/+bug/2012104/+subscriptions



Follow ups