yahoo-eng-team team mailing list archive

Thread
Date
[Bug 1550400] Re: Macvtap driver/agent migrates instances on an invalid physical network

To: yahoo-eng-team@xxxxxxxxxxxxxxxxxxx
From: Rodolfo Alonso <1550400@xxxxxxxxxxxxxxxxxx>
Date: Wed, 19 Oct 2022 11:19:59 -0000
Reply-to: Bug 1550400 <1550400@xxxxxxxxxxxxxxxxxx>
Sender: noreply@xxxxxxxxxxxxx
Bug closed due to lack of activity, please feel free to reopen if
needed.

** Changed in: neutron
       Status: In Progress => Won't Fix

-- 
You received this bug notification because you are a member of Yahoo!
Engineering Team, which is subscribed to neutron.
https://bugs.launchpad.net/bugs/1550400

Title:
  Macvtap driver/agent migrates instances on an invalid physical network

Status in neutron:
  Won't Fix

Bug description:
  Scenario1 - Migration on wrong physical network - High Prio
  ===========================================================
  Host1 has physical_interface_mappings: pyhsnet1:eth0, physnet2=eth2
  Host2 has physical_interface_mappings: physnet1:eth1, physnet2=eth0

  Now Live migration from an instance hosted on host1 (connected to physnet1) to host2 succeeds. Libvirt just migrates the whole server with its domain.xml and the macvtap is plugged on the targets side eth0.
  Now the instance does not have access to its network anymore, but access to another physical network. The behavior is documented, however this needs to be fixed!

  Scenario2 - Migration fails - Low Prio
  ======================================
  Host1 has physical_interface_mappings: pyhsnet1:eth0
  Host2 has physical_interface_mappings: physnet1:eth1

  Let's assume a vlan setup. Let's assume a migration from host1 to
  host2. Host to does NOT have a interface eth0. Migration will fail in
  instance will remain active on the source, as nova plug on host2
  failed to create a vlan device on eth0.

  If you have a flat network - definition of he libvirt xml will fail on
  host2.

  Two approaches are thinkable
  * Solve the problem (Scenario 1+2)
  * just prevent such an invalid migration (let scenario 1 fail like scenario 2 fails today)

  Solve the problem
  =================

  #1 Solve it in Nova pre live migration
  --------------------------------------

  This would allow migration although physical_interface mappings are
  different.

  a) On pre live migration nova should change the binding:host to the
  migration target. This will trigger the portbinding and the mech
  driver which will update the vif_details with the right macvtap source
  device information. Libvirt can then adapt the migration-xml to
  reflect the changes.

  Currently the update of the binding is done in post migration, after migration succeeded. Can we already do it in pre_live_migration and on failure undo it in rollback ?
  - There's no issue for the reference implementations - See the prototype: https://review.openstack.org/297100
  - But there might be some mechanisms for external SDN Controllers that might shut down ports on the source host as soon as the host_id is being updated. On the other hand, if controller rely on this mechanism, they will set the port up a little too late today, as the update host_id is sent after live migration succeeded.

  b) The alternative would be to allow a port to be bound to multiple hosts simultaneously. So in pre_live migration, nova would add a binding for the target host and in post_live_migration it would remove the original binding.
  This would require
  - simultaneous port binding. This will be achieved by https://bugs.launchpad.net/neutron/+bug/1367391
  - allow such a binding for compute ports as well
  - Update APIs to reflect multiple port_bindings
    - Create / Update / Show Port
    - host_id is not reflect for DVR ports today [1]

  #2 Moved to Prevent section
  ---------------------------

  #3 Device renaming in the macvtap agent
  ---------------------------------------
  This would allow migration although physical_interface mappings are different.

  Instead of
       physical_interface_mapping = physnet1:eth0
  use a
       physical_interface_mac_mapping = physnet1:00:11:22:33:44:55:66      #where 00:11:22:33:44:55:66 is the mac address of the interface to use

  On agent startup, the agent could rename the associated device to
  "physnet1" (or to some other generic value) that is consistent cross
  all hosts!

  We would need to document that this interface should not be used by
  any other application (that relies on the interface name)

  #4 Use generic vlan device names
  --------------------------------

  This solves the problem only for vlan networks! For flat networks it
  still would exist

  Today, the agent generates the vlan device names like this: for eth1
  eth1.<vlan-id>. We could get rid of this pattern and use network-
  uuid.vlan instead. Where nework-uuid are the first 10 chars of the id.

  But this would not solve the issue for flat networks. Therefore the
  device renaming like proposed in #3 would be required.

  Prevent invalid migration
  =========================

  #1 Let Port binding fail
  ------------------------

  The idea is to detect an invalid migration in the mechanism driver and
  let port binding fail.

  This approach has two problems
  a) Portbinding happens AFTER Migration happened. In post live migration nova requests to update the binding:host-id to the target. But when doing so, the instance is already running on the target host. The binding will fail, but the migration happened, though.
  --> But at least the the instance would be in error state and user is aware of that! In addition, we might drop all traffic related to this instance.

  b) Detecting a migration in the mech driver is difficult. The idea is to use the PortContext.original port. This works if we add the original port to the context somewhere in the ml2 plugin (see proposed patch). The problem is that this is not an indicator for a live migration. The original port will also be added if scheduling on another node failed before and now the current node is picked. There was no live migration but the PortContext.original port is set to another host. Maybe this can be solved but it's worthwhile mentioning here.
  --> In the worst case, use the profile information added with https://review.openstack.org/#/c/275073/

  see patch https://review.openstack.org/293404

  #2 Let agent detect invalid migration (not working)
  ---------------------------------------------------
  An invalid migration could be detected in the agent to avoid the agent setting the device status to up.

  But this is too late, as the agent detects the device after migration
  already started. There is no way to stop it again.

  see patch https://review.openstack.org/293403

  #3 Solve it in nova post live migration
  ------------------------------------
  The idea is, that nova starts the migration and then listens on plug_vif event that is emitted by neutron after the agent reported the device as up. Nova also waits for the portbinding to occur. If one of both runs into a timeout or fails, either the migration should be rolled back (if still possible) or the instance should be set into error state and the network locked down (which is the default for ovs - not sure about other right now).
  There are some patchsets out that try to achieve something similar, but for the ovs-hybrid plug only. For others it's much more complicated, as the agent will only report the device up after it occured on the target (after migration already started) https://review.openstack.org/246898

  #4 Prohibit agent start with invalid mapping
  --------------------------------------------
  Do not allow different mappings at all.

  How to trigger the validtion?
  * Have an RPC call from the Agent to the Neutron plugin at agent start.
  --> Less resource consumption, but extra rpc call

  * Use the regular agent status reports.
  -->  Checking on every status report consumes a lot of resources (db query and potential mech_driver calls)

  What to compare?
  * Have a master interface mappings config option configured on the plugin/mech_driver. All agent mappings must match that mapping
  --> If the server changes the master mapping, there's no way to notify the agents (or it must get implemented)
  --> Config option duplication

  * Query the master mapping from the server and compare at agent side,
  ord ignore local mapping at all if one has been configured.

  * Compare against existing mappings in database. The first agent that sends his mapping via status reports defines the valid mapping.
  --> We need explicit table locking (locking rows is not sufficient) to avoid races, especially for the cases where the first agents get added.

  Where to do the validation/gather data for validation?
  * In the mech driver
  --> Most natural way, but requires a new mech_driver interface
  --> Also a new plugin interface is required

  * In the rpc callbacks class
  --> As the validation depends on the mech_driver, we would have mech_driver specific code there. But we would get around new interfaces

  * In the agent

  Proposal:
  * Agent has a new config option "safe_migration_mode" = True|False (Default False, to stay backward compatible)
  * If it is set, the servers master mapping is queried by the agent on agent start via RPC.
  * If it does not map the local mapping, the agents terminates
  * The RPC call will be made to the plugin, which then triggers all mechanism drivers. Those have a generic method like 'get_plugin_configuration()' or similar
  * If this method is not present, the code will just continue (to not break other drivers)
  * The plugins returns a dict mech_driver:configuration to the agent. If the mech_driver did not provide any configuration (as not required or method not implemented), it will not be part of the return dict.
  * If the master mapping on the server got changed, bug agents haven't been restarted, the local mapping will not be validated against the new master mapping again (which would required agent restart)

  References
  ==========

  [1] curl -g -i -X GET http://192.168.122.30:9696/v2.0/ports/b780af01-a3c6-4279-a355-fa3289bc1ec3.json
  {"port": {"status": "ACTIVE", "binding:host_id": "", "description": "", "allowed_address_pairs": [], "extra_dhcp_opts": [], "updated_at": "2016-04-08T08:59:08", "device_owner": "network:router_interface_distributed", "port_security_enabled": false, "binding:profile": {}, "fixed_ips": [{"subnet_id": "7c57f270-46a9-4cde-a2a7-82e66da1d084", "ip_address": "10.0.0.1"}], "id": "b780af01-a3c6-4279-a355-fa3289bc1ec3", "security_groups": [], "device_id": "48120dd2-2523-4b11-9f67-8ea944d11012", "name": "", "admin_state_up": true, "network_id": "2be9f80a-e3ac-42e6-9249-5ccca241ad85", "dns_name": null, "binding:vif_details": {}, "binding:vnic_type": "normal", "binding:vif_type": "distributed", "tenant_id": "1c9d0fc21afc40a2959d3d3d4acca528", "mac_address": "fa:16:3e:a7:d8:80", "created_at": "2016-04-07T15:05:57"}}

  [2] https://review.openstack.org/342872

To manage notifications about this bug go to:
https://bugs.launchpad.net/neutron/+bug/1550400/+subscriptions
References

[Bug 1550400] [NEW] Macvtap driver/agent migrates instances on an invalid pyhsical network
From: Andreas Scheuring, 2016-02-26