← Back to team overview

yahoo-eng-team team mailing list archive

[Bug 1926838] [NEW] [OVN] infinite loop in ovsdb_monitor

 

Public bug reported:

I am running the ovn sandbox, a second chassis, and neutron. I
synchronize neutron database with the databases of the sandbox, run
neutron-server, and possibly run a few ovs-vsctl commands on chassis to
set up ovs ports.

I notice that some commands on the chassis can trigger some sort of
infinite loop in neutron. For example

    ovs-vsctl set open . external-ids:ovn-cms-options=enable-chassis-as-gw
    ovs-vsctl set open . external-ids:ovn-cms-options=xx
    ovs-vsctl set open . external-ids:ovn-cms-options=enable-chassis-as-gw

on the second chassis, will trigger transactions "in a loop" on the
neutron-server:

    ...
    Successfully bumped revision number for resource f32ac6cc (type: ports) to 571
    Router 079cde19-0b92-48f8-bef2-5e35b939a7a1 is bound to host sandbox
    Running txn n=1 command(idx=0): CheckRevisionNumberCommand
    Running txn n=1 command(idx=1): UpdateLRouterPortCommand
    Running txn n=1 command(idx=2): SetLRouterPortInLSwitchPortCommand
    Successfully bumped revision number for resource f32ac6cc (type: router_ports) to 572
    Running txn n=1 command(idx=0): CheckRevisionNumberCommand
    Running txn n=1 command(idx=1): SetLSwitchPortCommand
    Running txn n=1 command(idx=2): PgDelPortCommand
    Successfully bumped revision number for resource f32ac6cc (type: ports) to 572
    Router 079cde19-0b92-48f8-bef2-5e35b939a7a1 is bound to host sandbox
    Running txn n=1 command(idx=0): CheckRevisionNumberCommand
    Running txn n=1 command(idx=1): UpdateLRouterPortCommand
    Running txn n=1 command(idx=2): SetLRouterPortInLSwitchPortCommand
    Successfully bumped revision number for resource f32ac6cc (type: router_ports) to 573
    Running txn n=1 command(idx=0): CheckRevisionNumberCommand
    Running txn n=1 command(idx=1): SetLSwitchPortCommand
    Running txn n=1 command(idx=2): PgDelPortCommand
    ...


This is not limited to the change of external-ids:ovn-cmd-options, other ovs-vsctl commands can trigger the same issue.

neutron-server CPU consumption jumps to 100% and the revision_number of
ports keep increasing. Restarting neutron-server fixes the issue
temporarily.

I am not sure how to provide a simple reproducer because I did not found
any instructions to run neutron standalone and two OVN chassis. I will
investigate what is happening locally.

Version: main branch from OVN (d41a337fe3b608a8f90de8722d148344011f0bd8)
and of Neutron  (94d36862c207b1e4d984d28874ca2f3bd09c855f)

It's not a blocker as long as it happens only on my laptop.

** Affects: neutron
     Importance: Undecided
         Status: New


** Tags: ovn

** Attachment added: "logs of one loop"
   https://bugs.launchpad.net/bugs/1926838/+attachment/5494052/+files/logs1

-- 
You received this bug notification because you are a member of Yahoo!
Engineering Team, which is subscribed to neutron.
https://bugs.launchpad.net/bugs/1926838

Title:
  [OVN] infinite loop in ovsdb_monitor

Status in neutron:
  New

Bug description:
  I am running the ovn sandbox, a second chassis, and neutron. I
  synchronize neutron database with the databases of the sandbox, run
  neutron-server, and possibly run a few ovs-vsctl commands on chassis
  to set up ovs ports.

  I notice that some commands on the chassis can trigger some sort of
  infinite loop in neutron. For example

      ovs-vsctl set open . external-ids:ovn-cms-options=enable-chassis-as-gw
      ovs-vsctl set open . external-ids:ovn-cms-options=xx
      ovs-vsctl set open . external-ids:ovn-cms-options=enable-chassis-as-gw

  on the second chassis, will trigger transactions "in a loop" on the
  neutron-server:

      ...
      Successfully bumped revision number for resource f32ac6cc (type: ports) to 571
      Router 079cde19-0b92-48f8-bef2-5e35b939a7a1 is bound to host sandbox
      Running txn n=1 command(idx=0): CheckRevisionNumberCommand
      Running txn n=1 command(idx=1): UpdateLRouterPortCommand
      Running txn n=1 command(idx=2): SetLRouterPortInLSwitchPortCommand
      Successfully bumped revision number for resource f32ac6cc (type: router_ports) to 572
      Running txn n=1 command(idx=0): CheckRevisionNumberCommand
      Running txn n=1 command(idx=1): SetLSwitchPortCommand
      Running txn n=1 command(idx=2): PgDelPortCommand
      Successfully bumped revision number for resource f32ac6cc (type: ports) to 572
      Router 079cde19-0b92-48f8-bef2-5e35b939a7a1 is bound to host sandbox
      Running txn n=1 command(idx=0): CheckRevisionNumberCommand
      Running txn n=1 command(idx=1): UpdateLRouterPortCommand
      Running txn n=1 command(idx=2): SetLRouterPortInLSwitchPortCommand
      Successfully bumped revision number for resource f32ac6cc (type: router_ports) to 573
      Running txn n=1 command(idx=0): CheckRevisionNumberCommand
      Running txn n=1 command(idx=1): SetLSwitchPortCommand
      Running txn n=1 command(idx=2): PgDelPortCommand
      ...

  
  This is not limited to the change of external-ids:ovn-cmd-options, other ovs-vsctl commands can trigger the same issue.

  neutron-server CPU consumption jumps to 100% and the revision_number
  of ports keep increasing. Restarting neutron-server fixes the issue
  temporarily.

  I am not sure how to provide a simple reproducer because I did not
  found any instructions to run neutron standalone and two OVN chassis.
  I will investigate what is happening locally.

  Version: main branch from OVN
  (d41a337fe3b608a8f90de8722d148344011f0bd8) and of Neutron
  (94d36862c207b1e4d984d28874ca2f3bd09c855f)

  It's not a blocker as long as it happens only on my laptop.

To manage notifications about this bug go to:
https://bugs.launchpad.net/neutron/+bug/1926838/+subscriptions