← Back to team overview

yahoo-eng-team team mailing list archive

[Bug 1791268] [NEW] [DVR] ARP chaos in mixed cloud scenarios

 

Public bug reported:

Supposing the tenant network type is vlan. And we have a neutron network whose vlan id is 1000 (CIDR: 192.168.111.0/24, gateway IP: 192.168.111.1).
We aslo have a physical switch (SWITCH-1), which connect the compute NODE-1, NODE-2.
For these compute nodes, we set the l3 agent_mode to `dvr_no_external`.
And we have network NODE-3, the l3 agent_mode is dvr_snat.

We have one bare metal mechine NODE-4. And set the bare metal mechine swith port with vlan id 1000.
Assuming we have vm-1 in NODE-1 and vm-2 in NODE-2, then the qrouter-namespace will be created in these hosts.
And for the snat traffic, the qrouter-namespace will also be created in network NODE-3.

Then the VMs and the bare metal mechine can connect each other

Then we get something strange for internal gateway ARP, when the bare metal mechine try to arp the internal gateway IP (192.168.111.1) mac.
We get 3 arp response from compute NODE-1, compute NODE-2 and network NODE-3. Because they all have the qrouter-namespace with the same qr-device and same IP(192.168.111.1) and mac.

But the arp responses are not totally same for the physical (SWITCH-1).
NODE-1 will response the src mac with it's own dvr_host mac, and the data segment is the right 192.168.111.1 mac.
NODE-2 and NODE-3 have the same behavior.

This may cause the physical switch to flood the arp request again and
again. Since they do not know which physical port (maybe, fdb entry) to
located the 192.168.111.1 mac.

So, this bug is try to find a solution about DVR and bare metal
(ironic), can they work together now?

** Affects: neutron
     Importance: Undecided
         Status: New

** Description changed:

  Supposing the tenant network type is vlan. And we have a neutron network whose vlan id is 1000 (CIDR: 192.168.111.0/24, gateway IP: 192.168.111.1).
  We aslo have a physical switch (SWITCH-1), which connect the compute NODE-1, NODE-2.
  For these compute nodes, we set the l3 agent_mode to `dvr_no_external`.
  And we have network NODE-3, the l3 agent_mode is dvr_snat.
  
  We have one bare metal mechine NODE-4. And set the bare metal mechine swith port with vlan id 1000.
  Assuming we have vm-1 in NODE-1 and vm-2 in NODE-2, then the qrouter-namespace will be created in these hosts.
  And for the snat traffic, the qrouter-namespace will also be created in network NODE-3.
  
- Then the vm and the bare metal mechine can connect each other, acctually
- some of these works were done by the service ironic.
+ Then the VMs and the bare metal mechine can connect each other
  
  Then we get something strange for internal gateway ARP, when the bare metal mechine try to arp the internal gateway IP (192.168.111.1) mac.
- We get 3 arp response from compute NODE-1, compute NODE-2 and network NODE-3. Because they all have the qrouter-namespace with the same device qr-device and same IP(192.168.111.1) and mac.
+ We get 3 arp response from compute NODE-1, compute NODE-2 and network NODE-3. Because they all have the qrouter-namespace with the same qr-device and same IP(192.168.111.1) and mac.
  
  But the arp responses are not totally same for the physical (SWITCH-1).
  NODE-1 will response the src mac with it's own dvr_host mac, and the data segment is the right 192.168.111.1 mac.
- NODE-2 and NODE-4 have the same behavior.
+ NODE-2 and NODE-3 have the same behavior.
  
  This may cause the physical switch to flood the arp request again and
  again. Since they do not know which physical port (maybe, fdb entry) to
  located the 192.168.111.1 mac.
  
  So, this bug is try to find a solution about DVR and bare metal
  (ironic), can they work together now?

-- 
You received this bug notification because you are a member of Yahoo!
Engineering Team, which is subscribed to neutron.
https://bugs.launchpad.net/bugs/1791268

Title:
  [DVR] ARP chaos in mixed cloud scenarios

Status in neutron:
  New

Bug description:
  Supposing the tenant network type is vlan. And we have a neutron network whose vlan id is 1000 (CIDR: 192.168.111.0/24, gateway IP: 192.168.111.1).
  We aslo have a physical switch (SWITCH-1), which connect the compute NODE-1, NODE-2.
  For these compute nodes, we set the l3 agent_mode to `dvr_no_external`.
  And we have network NODE-3, the l3 agent_mode is dvr_snat.

  We have one bare metal mechine NODE-4. And set the bare metal mechine swith port with vlan id 1000.
  Assuming we have vm-1 in NODE-1 and vm-2 in NODE-2, then the qrouter-namespace will be created in these hosts.
  And for the snat traffic, the qrouter-namespace will also be created in network NODE-3.

  Then the VMs and the bare metal mechine can connect each other

  Then we get something strange for internal gateway ARP, when the bare metal mechine try to arp the internal gateway IP (192.168.111.1) mac.
  We get 3 arp response from compute NODE-1, compute NODE-2 and network NODE-3. Because they all have the qrouter-namespace with the same qr-device and same IP(192.168.111.1) and mac.

  But the arp responses are not totally same for the physical (SWITCH-1).
  NODE-1 will response the src mac with it's own dvr_host mac, and the data segment is the right 192.168.111.1 mac.
  NODE-2 and NODE-3 have the same behavior.

  This may cause the physical switch to flood the arp request again and
  again. Since they do not know which physical port (maybe, fdb entry)
  to located the 192.168.111.1 mac.

  So, this bug is try to find a solution about DVR and bare metal
  (ironic), can they work together now?

To manage notifications about this bug go to:
https://bugs.launchpad.net/neutron/+bug/1791268/+subscriptions