← Back to team overview

cloud-init team mailing list archive

Re: Azure Networking Support in cloud-init

 

On Fri, Feb 9, 2018 at 6:23 PM, Sushant Sharma (AZURE) <
Sushant.Sharma@xxxxxxxxxxxxx> wrote:

> Hi Ryan, Please see my comments below
>
>
>
> *<< During early boot at cloud-init local time, which runs before
> system-wide networking is enabled, the Azure Datasource will*
>
> *DHCP on the primary interface, consume metadata, and within that, read
> the PreprovisionedVM value and if so, goes down a path of polling*
>
> *until it's time to come up.   >>*
>
>
>
> The new provisioning data for the VM will not be returned unless the VM is
> moved into the customer’s new network.
>
> So first, we have to get the new IP-address, and only then subsequent poll
> will return the new provisioning data and cloud-init can move on.
>

The code in the Azure Datasource handles getting a new ip address during
the polling loop[1].  If the network goes down, or any other exception
besides 404, the polling loop will attempt additional DHCP lease
discovery.

Cloud-init invokes the Azure datasource only two times during boot.  First,
before system networking is up(cloud-init local)
and once after (cloud-init net).

I assumed that the pre-provisioned flag in metadata (or ovf-xml) is set
prior to boot implying that at cloud-init local time,
then we'd poll imds before networking comes up.  In that case I think it's
clear that we don't need an additional DHCP
bounce as the polling loop along with system networking will ensure we
acquire a new DHCP lease as needed.

If the flag is not set until after the system brings networking online,
we'd have this scenario:

a. launch a pre-provision VM
b. cloud-init local runs, creates a Azure datasource
c. cloud-init calls Azure ._get_data() which reads ovf for metadata, no
preprovision flag set
d. cloud-init local applies fallback network config (dhcp on eth0)
e. cloud-init local exits
f.  networking layer comes up and dhcp's on eth0
g. cloud-init net runs
h.  cloud-init net restores Azure datasource object, calls _get_data() on it
i.  while reading metadata, detects flag for pre-provision, writes marker
file, and enters _poll_imds
j. _poll_imds() will poll the specified URL up to 5 times, waiting up to 60
seconds each poll loop.  When each loop
    is run, cloud-init attempts to acquire a DHCP lease.  The exit from
_reprovision is successful when the URL
returns metadata, which is then parsed by read_azure_ovf()

At this point, cloud-init now has the new provisioning metadata.  Did I
miss a scenario?

1. https://git.launchpad.net/cloud-init/tree/cloudinit/sourc
es/DataSourceAzure.py#n445




>
>
> Instead of interpreting that  *[call to host has failed]* indicates *[VM
> has moved to the new network]*, we are relying on netlink to
> deterministically tell us that this switch has happened.
>

I see now that you're suggesting to replace the existing code above with
what's in the branch here:

https://code.launchpad.net/~tamilmani1989/cloud-init/+git/cloud-init/+merge/336392

I've now reviewed this branch, and would like to understand what currently
does not work with the existing code
that's already merged, and how we can test/verify whether the landed code
is fully functional or not.



> *<< The bigger point here is that you're interested in *telling* the
> instance that it should reconfigure it's networking.  You're suggestion is
> to utilize the loss of carrier on the nic as an indication that it should
> and while that may be what needs to be done specifically on Azure, it's not
> a general case mechanism, and only works if using DHCP >>*
>
> You are correct in saying that this mechanism of host indicating to VM
> that networking needs to be reconfigured is specific to Azure’s
> pre-provisioning scenario, and may not hold in general and for other public
> clouds.
>

The pre-provisioning scenario is Azure specific, but in general there are
needs to reconfigure the instance networking outside of that scenario and
across other clouds and distros; to that end, I'd like network
reconfiguration to be signaled more generically.


>
> Thanks,
>
> Sushant
>
>
>
> *From:* Ryan Harper [mailto:ryan.harper@xxxxxxxxxxxxx]
> *Sent:* Friday, February 9, 2018 11:43 AM
> *To:* Tamilmani Manoharan <tamanoha@xxxxxxxxxxxxx>
> *Cc:* Sushant Sharma (AZURE) <Sushant.Sharma@xxxxxxxxxxxxx>; Douglas
> Jordan <Douglas.Jordan@xxxxxxxxxxxxx>; cloud-init@xxxxxxxxxxxxxxxxxxx;
> Nisheeth Srivastava <Nisheeth.Srivastava@xxxxxxxxxxxxx>
>
> *Subject:* Re: [Cloud-init] Azure Networking Support in cloud-init
>
>
>
>
>
>
>
> On Thu, Feb 8, 2018 at 8:14 PM, Tamilmani Manoharan <
> tamanoha@xxxxxxxxxxxxx> wrote:
>
> Ryan,
>
> The VM has to be configured with new data that should be read after
> network switch and these VM configuration happens even before
> system-networkd starts. We need some mechanism to listen on network switch
> events and issue dhcp. So we can’t rely on system-networkd on resending
> dhcp.
>
>
>
> The process, as I understand based on what's been
> committed,  c03bdd3d8ed762cada813c5e95a40b14d2047b57
>
> During early boot at cloud-init local time, which runs before system-wide
> networking is enabled, the Azure Datasource will
>
> DHCP on the primary interface, consume metadata, and within that, read the
> PreprovisionedVM value and if so, goes down a path of polling
>
> until it's time to come up.   When system execution continues cloud-init
> local exits and has written a network configuration to DHCP on the primary
>
> interface.  Then the networking layer is activated and system blocks until
> it has come online (which will ensure a DHCP request has been completed).
>
> After the system network is online, cloud-init net mode runs it will
> refetch metadata due to the on-disk marker that the instance was a
> reprovision VM.
>
>
>
> If I've somehow misunderstood things, please let me know, but as I
> understand things now; this scenario does not need cloud-init itself to
> watch
>
> netlink layer to reissue dhcp.
>
>
>
>
>
> *From:* Ryan Harper [mailto:ryan.harper@xxxxxxxxxxxxx]
> *Sent:* Thursday, February 8, 2018 5:51 PM
> *To:* Sushant Sharma (AZURE) <Sushant.Sharma@xxxxxxxxxxxxx>
> *Cc:* cloud-init@xxxxxxxxxxxxxxxxxxx; Tamilmani Manoharan <
> tamanoha@xxxxxxxxxxxxx>; Nisheeth Srivastava <
> Nisheeth.Srivastava@xxxxxxxxxxxxx>; Douglas Jordan <
> Douglas.Jordan@xxxxxxxxxxxxx>
>
>
> *Subject:* Re: [Cloud-init] Azure Networking Support in cloud-init
>
>
>
>
>
>
>
> On Thu, Feb 8, 2018 at 7:07 PM, Sushant Sharma (AZURE) <
> Sushant.Sharma@xxxxxxxxxxxxx> wrote:
>
> Hi Ryan,
>
> Thank you for your email. We just updated the PR with some more changes.
>
> To answer your specific questions
>
>
>
>    1. The specific scenario addressed here is to start the VM and have it
>    running, and once customer asks for a new VM in Azure, move the VM into
>    customer’s network and apply all customer specific configurations in the
>    VM. This is why the earlier PR by Douglas blocks cloud-init in Azure unless
>    we learn that VM is ready to be moved into customer’s Network. This
>    cloud-init block happens even before systemd-networkd starts.
>
>
>
> In this scenario which is before "networking" there's no need to "bounce"
> networking, rather once the VM is "released", it will issue a DHCP at
> network configuration time.  This works on Xenial (ifupdown based) as well
> as Artful and newer (systemd-networkd).
>
>
>
>
>    1.
>    2. The goal of this PR is to learn whenever the switch to customer’s
>    network has happened, and issue a dchp-request upon this event to learn new
>    IP address. The IP address may or may not change. Can you help us
>    understand why removing and adding nic may be needed in this scenario?
>
> Right now, on the linux side, I'm not aware of any distro which contains a
> networking configuration daemon which watches for carrier/no-carrier
> changes and subsequently issues a DHCP release and DHCP renew.  The
> exceptions to this are systemd-networkd behavior; though it does depend on
> how long
>
> the link goes away for w.r.t whether it would issue a new DHCP lease.  The
> bigger point here is that you're interested in *telling* the instance that
> it should reconfigure it's networking.  You're suggestion is to utilize the
> loss of carrier on the nic as an indication that it should and while that
> may be what needs to
>
> be done specifically on Azure, it's not a general case mechanism, and only
> works if using DHCP.
>
>
>
> As a stepping stone toward a real cloud to instance communication
> mechanism, hotplug can work more genericially on Linux as an indication
> that the instance network configuration needs to change (and possibly
> restarted) which could result in a DHCP release and renew.
>
>
>
>
>    1.
>
>
>
> Regarding the newer scenarios, I think it is great that you have shared
> the document where we can add the description and discuss solutions. Let’s
> do that separately in parallel to this PR.
>
> At the moment, we will appreciate if you and others can take a look at
> this PR and provide feedback so that it can be accepted.
>
>
>
> For the previous proposed which included spawning a python process to
> watch the netlink socket; I'm not comfortable with such an approach, for
> many of the reasons Robert already indicated in his
>
> review of the initial PR. I'll follow-up directly in the PR with more
> specific concerns.
>
>
>
> Ryan
>
>
>
>
>
> Thanks,
>
> Sushant
>
>
>
> *From:* Ryan Harper [mailto:ryan.harper@xxxxxxxxxxxxx]
> *Sent:* Friday, January 26, 2018 8:16 AM
> *To:* Sushant Sharma (AZURE) <Sushant.Sharma@xxxxxxxxxxxxx>
> *Cc:* cloud-init@xxxxxxxxxxxxxxxxxxx; Tamilmani Manoharan <
> tamanoha@xxxxxxxxxxxxx>; Nisheeth Srivastava <
> Nisheeth.Srivastava@xxxxxxxxxxxxx>
> *Subject:* Re: [Cloud-init] Azure Networking Support in cloud-init
>
>
>
>
>
>
>
> On Tue, Jan 9, 2018 at 5:31 PM, Sushant Sharma (AZURE) <
> Sushant.Sharma@xxxxxxxxxxxxx> wrote:
>
> Hi cloud-init members,
>
>
>
> We would like to discuss with you our proposal to add a network module in
> cloud-init to support various networking scenarios in Azure.
>
> To begin with, we would like to support move of a virtual machine (VM)
> from one network to another in Azure.
>
> As such, it will listen for media disconnect/connect (via netlink) and
> issues a re-DHCP when required (this design is based on how azure moves VM
> from one network to another).
>
>
>
> Thanks for starting the discussion here.  For this use-case, are you
> migrating
>
> the entire VM or are we changing an existing nic from one subnet to
> another?
>
> Since the link goes down (stopping traffic), is it possible to remove the
> nic
>
> and re-add it instead?
>
>
>
> Operating system behavior around link state change varies depending on the
>
> network service managing things.  In Ubuntu where ifupdown and
> isc-dhcp-client
>
> are utilized, as you know, netlink changes are not handled.  Under Ubuntu
> Artful
>
> and Bionic which utilize systemd-networkd, link state changes are watched;
> if
>
> the device loses carrier then when it is restored networkd will reacquire
> a lease in that
>
> case.
>
>
>
> Over time, we plan to support more advance networking scenarios in Azure.
> Please let us know your thoughts before we work on adding the module.
>
>
>
> I'm very much interested in enumerating additional scenarios.  Some
> user-stories
>
> which I think need to be address:
>
>
>
> 1. add additional network device and configure
>
> 2. remove network device (and update configuration)
>
> 3. add additional ip addresses to one or more network devices
>
> 4. remove ip address from one or more network devices
>
> 5. modify the configuration of an existing network device (changes outside
>
>    of 3 and 4)
>
>
>
> Cases 1 and 2 are generally covered by a udev hook handler.  3 and 4 can
> be
>
> partially addressed by updating cloud-init to read network config metadata
> and
>
> renderer a complete network configuration and may be combined with 1 and 2.
>
>
>
> What's not easily covered by a udev hook is the case where users modify
> existing
>
> network configuration without adding or removing devices.  To handle this
> sort
>
> of scenario a cloud will need to provide some notification mechanism to
> which
>
> cloud-init can react.  This may be something simple like a websocket
> cloud-init
>
> can select() on, or some other hypervisor event injection.  This area is
> not
>
> well defined and will certainly vary from provider to provider which will
>
> require some time to form a general solution.
>
>
>
> I'd like to continue the discussion in a shared document:
>
>
>
> https://hackmd.io/MzCsBYBMEMHYCMC0AOS4BmjwAYCM3F54A2JGZAY3gC
> YBTaSYa2IA?both
> <https://na01.safelinks.protection.outlook.com/?url=https%3A%2F%2Fhackmd.io%2FMzCsBYBMEMHYCMC0AOS4BmjwAYCM3F54A2JGZAY3gCYBTaSYa2IA%3Fboth&data=02%7C01%7CSushant.Sharma%40microsoft.com%7C17acfcd577e8446cde6d08d564d80c0f%7Cee3303d7fb734b0c8589bcd847f1c277%7C1%7C1%7C636525801436863680&sdata=AVN8VPHOOLds01uguh5GNklo4pBz%2FpXABX44cLe8OLI%3D&reserved=0>
>
>
>
>
>
>
>
> Thanks,
>
> Sushant
>
>
>
>
> --
> Mailing list: https://launchpad.net/~cloud-init
> <https://na01.safelinks.protection.outlook.com/?url=https:%2F%2Flaunchpad.net%2F~cloud-init&data=02%7C01%7CSushant.Sharma%40microsoft.com%7C17acfcd577e8446cde6d08d564d80c0f%7Cee3303d7fb734b0c8589bcd847f1c277%7C1%7C1%7C636525801436863680&sdata=0Ieqm%2FzkdIdB5Gz5mu3C0xN5m7nSVUDyVhBaSmDZVN4%3D&reserved=0>
> Post to     : cloud-init@xxxxxxxxxxxxxxxxxxx
> Unsubscribe : https://launchpad.net/~cloud-init
> <https://na01.safelinks.protection.outlook.com/?url=https:%2F%2Flaunchpad.net%2F~cloud-init&data=02%7C01%7CSushant.Sharma%40microsoft.com%7C17acfcd577e8446cde6d08d564d80c0f%7Cee3303d7fb734b0c8589bcd847f1c277%7C1%7C1%7C636525801436863680&sdata=0Ieqm%2FzkdIdB5Gz5mu3C0xN5m7nSVUDyVhBaSmDZVN4%3D&reserved=0>
> More help   : https://help.launchpad.net/ListHelp
> <https://na01.safelinks.protection.outlook.com/?url=https%3A%2F%2Fhelp.launchpad.net%2FListHelp&data=02%7C01%7CSushant.Sharma%40microsoft.com%7C17acfcd577e8446cde6d08d564d80c0f%7Cee3303d7fb734b0c8589bcd847f1c277%7C1%7C1%7C636525801436863680&sdata=DyQQKKEVimW0VwODAnV731mjOZ50VX7oEH5piU12qb8%3D&reserved=0>
>
>
>
>
>
>
>

References