yahoo-eng-team team mailing list archive
-
yahoo-eng-team team
-
Mailing list archive
-
Message #67179
[Bug 1714802] [NEW] DVR and HA migration tests failing intermittently for gate-tempest-dsvm-neutron-dvr-multinode-scenario-ubuntu-xenial-nv job
Public bug reported:
For the migration test failures Jakub has already created this etherpad
https://etherpad.openstack.org/p/neutron-dvr-multinode-scenario-gate-
failures
My analysis is this -
DVR and HA migration tempest scenario tests are failing(or passing) intermittently. In the existing tests, immediately after the port update API is returned we are trying ssh connectivity, without checking the dependent resources (like below) created or updated properly.
1) new interfaces are created
2) existing interfaces updated
3) interfaces bound to agents
4) interfaces status updated
5) agents creates namespaces etc
For example, during DVR to HA migration, as soon as the router update
api is returned, ssh test might try to use old data plane created with
DVR router, as agents might have not synced(removed namespaces, ovs
flows and ip routes) with server. If the ssh reply packets arrived back
before the old data plane is removed, then ssh can be succesful. If this
data path is reconstructed(because of the migration) before the packet
arrived, then ssh can fail. Though ssh can retry, it may use existing
conection track and try to follow the same old data path(just my
assumption)
When I updated tests to check for the dependent resources before trying
for ssh, tests are passing reliably. So we can have these checks before
we try for ssh connectivity.
** Affects: neutron
Importance: Undecided
Assignee: venkata anil (anil-venkata)
Status: New
** Tags: l3-dvr-backlog l3-ha tempest
** Changed in: neutron
Assignee: (unassigned) => venkata anil (anil-venkata)
** Tags added: l3-dvr-backlog l3-ha tempest
--
You received this bug notification because you are a member of Yahoo!
Engineering Team, which is subscribed to neutron.
https://bugs.launchpad.net/bugs/1714802
Title:
DVR and HA migration tests failing intermittently for gate-tempest-
dsvm-neutron-dvr-multinode-scenario-ubuntu-xenial-nv job
Status in neutron:
New
Bug description:
For the migration test failures Jakub has already created this
etherpad https://etherpad.openstack.org/p/neutron-dvr-multinode-
scenario-gate-failures
My analysis is this -
DVR and HA migration tempest scenario tests are failing(or passing) intermittently. In the existing tests, immediately after the port update API is returned we are trying ssh connectivity, without checking the dependent resources (like below) created or updated properly.
1) new interfaces are created
2) existing interfaces updated
3) interfaces bound to agents
4) interfaces status updated
5) agents creates namespaces etc
For example, during DVR to HA migration, as soon as the router update
api is returned, ssh test might try to use old data plane created with
DVR router, as agents might have not synced(removed namespaces, ovs
flows and ip routes) with server. If the ssh reply packets arrived
back before the old data plane is removed, then ssh can be succesful.
If this data path is reconstructed(because of the migration) before
the packet arrived, then ssh can fail. Though ssh can retry, it may
use existing conection track and try to follow the same old data
path(just my assumption)
When I updated tests to check for the dependent resources before
trying for ssh, tests are passing reliably. So we can have these
checks before we try for ssh connectivity.
To manage notifications about this bug go to:
https://bugs.launchpad.net/neutron/+bug/1714802/+subscriptions
Follow ups