← Back to team overview

yahoo-eng-team team mailing list archive

[Bug 1714802] [NEW] DVR and HA migration tests failing intermittently for gate-tempest-dsvm-neutron-dvr-multinode-scenario-ubuntu-xenial-nv job

 

Public bug reported:

For the migration test failures Jakub has already created this etherpad
https://etherpad.openstack.org/p/neutron-dvr-multinode-scenario-gate-
failures

My analysis is this - 
DVR and HA migration tempest scenario tests are failing(or passing) intermittently. In the existing tests, immediately after the port update API is returned we are trying ssh connectivity, without checking the dependent resources (like below) created or updated properly.
1) new interfaces are created
2) existing interfaces updated
3) interfaces bound to agents
4) interfaces status updated
5) agents creates namespaces etc

For example, during DVR to HA migration, as soon as the router update
api is returned, ssh test might try to use old data plane created with
DVR router, as agents might have not synced(removed namespaces, ovs
flows and ip routes) with server.  If the ssh reply packets arrived back
before the old data plane is removed, then ssh can be succesful. If this
data path is reconstructed(because of the migration) before the packet
arrived, then ssh can fail. Though ssh can retry, it may use existing
conection track and try to follow the same old data path(just my
assumption)

When I updated tests to check for the dependent resources before trying
for ssh, tests are passing reliably. So we can have these checks before
we try for ssh connectivity.

** Affects: neutron
     Importance: Undecided
     Assignee: venkata anil (anil-venkata)
         Status: New


** Tags: l3-dvr-backlog l3-ha tempest

** Changed in: neutron
     Assignee: (unassigned) => venkata anil (anil-venkata)

** Tags added: l3-dvr-backlog l3-ha tempest

-- 
You received this bug notification because you are a member of Yahoo!
Engineering Team, which is subscribed to neutron.
https://bugs.launchpad.net/bugs/1714802

Title:
  DVR and HA migration tests failing intermittently for gate-tempest-
  dsvm-neutron-dvr-multinode-scenario-ubuntu-xenial-nv job

Status in neutron:
  New

Bug description:
  For the migration test failures Jakub has already created this
  etherpad https://etherpad.openstack.org/p/neutron-dvr-multinode-
  scenario-gate-failures

  My analysis is this - 
  DVR and HA migration tempest scenario tests are failing(or passing) intermittently. In the existing tests, immediately after the port update API is returned we are trying ssh connectivity, without checking the dependent resources (like below) created or updated properly.
  1) new interfaces are created
  2) existing interfaces updated
  3) interfaces bound to agents
  4) interfaces status updated
  5) agents creates namespaces etc

  For example, during DVR to HA migration, as soon as the router update
  api is returned, ssh test might try to use old data plane created with
  DVR router, as agents might have not synced(removed namespaces, ovs
  flows and ip routes) with server.  If the ssh reply packets arrived
  back before the old data plane is removed, then ssh can be succesful.
  If this data path is reconstructed(because of the migration) before
  the packet arrived, then ssh can fail. Though ssh can retry, it may
  use existing conection track and try to follow the same old data
  path(just my assumption)

  When I updated tests to check for the dependent resources before
  trying for ssh, tests are passing reliably. So we can have these
  checks before we try for ssh connectivity.

To manage notifications about this bug go to:
https://bugs.launchpad.net/neutron/+bug/1714802/+subscriptions


Follow ups