← Back to team overview

yahoo-eng-team team mailing list archive

[Bug 1714802] Re: DVR and HA migration tests failing intermittently for gate-tempest-dsvm-neutron-dvr-multinode-scenario-ubuntu-xenial-nv job

 

Reviewed:  https://review.openstack.org/500384
Committed: https://git.openstack.org/cgit/openstack/neutron/commit/?id=f5718972257cf229c8a9db0a5fc4349acbaade12
Submitter: Jenkins
Branch:    master

commit f5718972257cf229c8a9db0a5fc4349acbaade12
Author: venkata anil <anilvenkata@xxxxxxxxxx>
Date:   Tue Sep 19 07:41:19 2017 +0000

    tempest: check router interface exists before ssh
    
    As explained in the bug, tempest DVR and HA migration scenario
    tests are failing intermittently, as we are not checking if the
    new router interfaces are ready after migration and might try
    to use the old dataplane if the pre-migration router resources
    (like interfaces, namespaces, etc) still exist and are not yet
    destroyed.
    
    We need to check that the pre-migration router interfaces are
    deleted and the new interfaces are created and active (as we
    can't check namespace existence on other nodes, we rely on port
    status set by L2 agent after wiring the port) before
    attempting ssh connectivity.
    
    Closes-Bug: 1714802
    Change-Id: I2a933d4cdd6de4e5ff31c8e3f97477819ba27afa


** Changed in: neutron
       Status: In Progress => Fix Released

-- 
You received this bug notification because you are a member of Yahoo!
Engineering Team, which is subscribed to neutron.
https://bugs.launchpad.net/bugs/1714802

Title:
  DVR and HA migration tests failing intermittently for gate-tempest-
  dsvm-neutron-dvr-multinode-scenario-ubuntu-xenial-nv job

Status in neutron:
  Fix Released

Bug description:
  For the migration test failures Jakub has already created this
  etherpad https://etherpad.openstack.org/p/neutron-dvr-multinode-
  scenario-gate-failures

  My analysis is this - 
  DVR and HA migration tempest scenario tests are failing(or passing) intermittently. In the existing tests, immediately after the port update API is returned we are trying ssh connectivity, without checking the dependent resources (like below) created or updated properly.
  1) new interfaces are created
  2) existing interfaces updated
  3) interfaces bound to agents
  4) interfaces status updated
  5) agents creates namespaces etc

  For example, during DVR to HA migration, as soon as the router update
  api is returned, ssh test might try to use old data plane created with
  DVR router, as agents might have not synced(removed namespaces, ovs
  flows and ip routes) with server.  If the ssh reply packets arrived
  back before the old data plane is removed, then ssh can be succesful.
  If this data path is reconstructed(because of the migration) before
  the packet arrived, then ssh can fail. Though ssh can retry, it may
  use existing conection track and try to follow the same old data
  path(just my assumption)

  When I updated tests to check for the dependent resources before
  trying for ssh, tests are passing reliably. So we can have these
  checks before we try for ssh connectivity.

To manage notifications about this bug go to:
https://bugs.launchpad.net/neutron/+bug/1714802/+subscriptions


References