yahoo-eng-team team mailing list archive
-
yahoo-eng-team team
-
Mailing list archive
-
Message #68433
[Bug 1714802] Re: DVR and HA migration tests failing intermittently for gate-tempest-dsvm-neutron-dvr-multinode-scenario-ubuntu-xenial-nv job
Reviewed: https://review.openstack.org/500384
Committed: https://git.openstack.org/cgit/openstack/neutron/commit/?id=f5718972257cf229c8a9db0a5fc4349acbaade12
Submitter: Jenkins
Branch: master
commit f5718972257cf229c8a9db0a5fc4349acbaade12
Author: venkata anil <anilvenkata@xxxxxxxxxx>
Date: Tue Sep 19 07:41:19 2017 +0000
tempest: check router interface exists before ssh
As explained in the bug, tempest DVR and HA migration scenario
tests are failing intermittently, as we are not checking if the
new router interfaces are ready after migration and might try
to use the old dataplane if the pre-migration router resources
(like interfaces, namespaces, etc) still exist and are not yet
destroyed.
We need to check that the pre-migration router interfaces are
deleted and the new interfaces are created and active (as we
can't check namespace existence on other nodes, we rely on port
status set by L2 agent after wiring the port) before
attempting ssh connectivity.
Closes-Bug: 1714802
Change-Id: I2a933d4cdd6de4e5ff31c8e3f97477819ba27afa
** Changed in: neutron
Status: In Progress => Fix Released
--
You received this bug notification because you are a member of Yahoo!
Engineering Team, which is subscribed to neutron.
https://bugs.launchpad.net/bugs/1714802
Title:
DVR and HA migration tests failing intermittently for gate-tempest-
dsvm-neutron-dvr-multinode-scenario-ubuntu-xenial-nv job
Status in neutron:
Fix Released
Bug description:
For the migration test failures Jakub has already created this
etherpad https://etherpad.openstack.org/p/neutron-dvr-multinode-
scenario-gate-failures
My analysis is this -
DVR and HA migration tempest scenario tests are failing(or passing) intermittently. In the existing tests, immediately after the port update API is returned we are trying ssh connectivity, without checking the dependent resources (like below) created or updated properly.
1) new interfaces are created
2) existing interfaces updated
3) interfaces bound to agents
4) interfaces status updated
5) agents creates namespaces etc
For example, during DVR to HA migration, as soon as the router update
api is returned, ssh test might try to use old data plane created with
DVR router, as agents might have not synced(removed namespaces, ovs
flows and ip routes) with server. If the ssh reply packets arrived
back before the old data plane is removed, then ssh can be succesful.
If this data path is reconstructed(because of the migration) before
the packet arrived, then ssh can fail. Though ssh can retry, it may
use existing conection track and try to follow the same old data
path(just my assumption)
When I updated tests to check for the dependent resources before
trying for ssh, tests are passing reliably. So we can have these
checks before we try for ssh connectivity.
To manage notifications about this bug go to:
https://bugs.launchpad.net/neutron/+bug/1714802/+subscriptions
References