← Back to team overview

yahoo-eng-team team mailing list archive

[Bug 1817956] [NEW] Metadata not reachable when dvr_snat L3 agent is used on compute node

 

Public bug reported:

In case when L3 agents are deployed on compute nodes in dvr_snat agent
mode (that is e.g. in CI jobs) and dvr ha is used  it may happen that
metadata will not be reachable from instances.

For example, as it is in neutron-tempest-dvr-ha-multinode-full job, we
have:

- controller (all in one) with L3 agent in dvr mode,
- compute-1 with L3 agent in dvr_snat mode,
- compute-2 with L3 agent in dvr_snat mode.

Now, if VM will be scheduled e.g. on host compute-2 and it will be
connected to dvr+ha router which is scheduled to be Active on compute-1
and standby on compute-2 node, than on compute-2 metadata haproxy will
not be spawned and VM will not be able to reach metadata IP.

I found it when I tried to migrate existing legacy neutron-tempest-dvr-ha-multinode-full job to zuulv3. I found that legacy job is in fact "nonHA" job because "l3_ha" option is set there to False and because of that routers are created as nonHA dvr routers.
When I switched it to be dvr+ha in https://review.openstack.org/#/c/633979/ I spotted this error described above.

Example of failed tests http://logs.openstack.org/79/633979/16/check
/neutron-tempest-dvr-ha-multinode-full/710fb3d/job-output.txt.gz - all
VMs which SSH wasn't possible, can't reach metadata IP.

** Affects: neutron
     Importance: Medium
     Assignee: Slawek Kaplonski (slaweq)
         Status: Confirmed


** Tags: gate-failure l3-dvr-backlog

-- 
You received this bug notification because you are a member of Yahoo!
Engineering Team, which is subscribed to neutron.
https://bugs.launchpad.net/bugs/1817956

Title:
  Metadata not reachable when dvr_snat L3 agent is used on compute node

Status in neutron:
  Confirmed

Bug description:
  In case when L3 agents are deployed on compute nodes in dvr_snat agent
  mode (that is e.g. in CI jobs) and dvr ha is used  it may happen that
  metadata will not be reachable from instances.

  For example, as it is in neutron-tempest-dvr-ha-multinode-full job, we
  have:

  - controller (all in one) with L3 agent in dvr mode,
  - compute-1 with L3 agent in dvr_snat mode,
  - compute-2 with L3 agent in dvr_snat mode.

  Now, if VM will be scheduled e.g. on host compute-2 and it will be
  connected to dvr+ha router which is scheduled to be Active on
  compute-1 and standby on compute-2 node, than on compute-2 metadata
  haproxy will not be spawned and VM will not be able to reach metadata
  IP.

  I found it when I tried to migrate existing legacy neutron-tempest-dvr-ha-multinode-full job to zuulv3. I found that legacy job is in fact "nonHA" job because "l3_ha" option is set there to False and because of that routers are created as nonHA dvr routers.
  When I switched it to be dvr+ha in https://review.openstack.org/#/c/633979/ I spotted this error described above.

  Example of failed tests http://logs.openstack.org/79/633979/16/check
  /neutron-tempest-dvr-ha-multinode-full/710fb3d/job-output.txt.gz - all
  VMs which SSH wasn't possible, can't reach metadata IP.

To manage notifications about this bug go to:
https://bugs.launchpad.net/neutron/+bug/1817956/+subscriptions


Follow ups