← Back to team overview

yahoo-eng-team team mailing list archive

[Bug 1844191] [NEW] azure advanced networking sometimes triggers duplicate mac detection

 

Public bug reported:

Hi, we're still being affected by this on Azure with 19.2-24-ge7881d5c-
0ubuntu1~18.04.1 - using PACKER to build from image: BuildSource :
Marketplace/Canonical/UbuntuServer/18.04-DAILY-LTS

Here is the packer config:
````
    "provisioners": [
        {
          "type": "shell",
          "inline": [
            "while [ ! -f /var/lib/cloud/instance/boot-finished ]; do echo 'Waiting for cloud-init...'; sleep 1; done"
          ]
        },
        {
            "type": "ansible",
            "playbook_file": "{{user `ansible_playbook`}}",
            "user": "packer",
            "extra_arguments": [ "--extra-vars", "codeVersion={{user `code_version`}} managed_image_name={{user `managed_image_name`}}" ]
        },
        {
            "type": "shell",
            "execute_command": "chmod +x {{ .Path }}; {{ .Vars }} sudo -E sh '{{ .Path }}'",
            "inline_shebang": "/bin/sh -x",
            "inline": [ "/usr/sbin/waagent -force -deprovision+user && export HISTSIZE=0 && sync" ]
    }]
````

Here is the playbook:
````
---
- hosts: all
  remote_user: ubuntu
  become: yes
  become_method: sudo
  become_user: root

  environment:
    DEBIAN_FRONTEND: noninteractive
````

Note: we are applying `enableAcceleratedNetworking: true` to the NIC,
anecdotally we think this is related.

Usually our playbook has more in it (obviously) but Azure kept pointing
fingers at us that our image was causing the problem, so I ran this test
simply deploying a blank deprovisioned image via our same process.

And here's what happens on the serial console log:

````
[   20.337603] sh[910]: + [ -e /var/lib/cloud/instance/obj.pkl ]
[   20.343177] sh[910]: + echo cleaning persistent cloud-init object
[   20.349027] [  OK  ] Started Network Time Synchronization.
[  OK  ] Reached target System Time Synchronized.
sh[910]: cleaning persistent cloud-init object
[   20.361066] sh[910]: + rm /var/lib/cloud/instance/obj.pkl
[   20.412333] sh[910]: + exit 0
[   34.282291] cloud-init[938]: Cloud-init v. 19.2-24-ge7881d5c-0ubuntu1~18.04.1 running 'init-local' at Mon, 16 Sep 2019 18:02:23 +0000. Up 32.02 seconds.
[   34.288809] cloud-init[938]: 2019-09-16 18:02:25,262 - util.py[WARNING]: failed stage init-local
[   34.423057] cloud-init[938]: failed run of stage init-local
[   34.437716] cloud-init[938]: ------------------------------------------------------------
[   34.441088] cloud-init[938]: Traceback (most recent call last):
[   34.443719] cloud-init[938]:   File "/usr/lib/python3/dist-packages/cloudinit/cmd/main.py", line 653, in status_wrapper
[   34.448072] cloud-init[938]:     ret = functor(name, args)
[   34.450532] cloud-init[938]:   File "/usr/lib/python3/dist-packages/cloudinit/cmd/main.py", line 362, in main_init
[   34.454849] cloud-init[938]:     init.apply_network_config(bring_up=bool(mode != sources.DSMODE_LOCAL))
[   34.458725] cloud-init[938]:   File "/usr/lib/python3/dist-packages/cloudinit/stages.py", line 697, in apply_network_config
[   34.463421] cloud-init[938]:     net.wait_for_physdevs(netcfg)
[   34.466051] cloud-init[938]:   File "/usr/lib/python3/dist-packages/cloudinit/net/__init__.py", line 344, in wait_for_physdevs
[   34.470673] cloud-init[938]:     present_macs = get_interfaces_by_mac().keys()
[   34.473964] cloud-init[938]:   File "/usr/lib/python3/dist-packages/cloudinit/net/__init__.py", line 633, in get_interfaces_by_mac
[   34.479325] cloud-init[938]:     (name, ret[mac], mac))
[   34.481838] cloud-init[938]: RuntimeError: duplicate mac found! both 'eth0' and 'enP1s1' have mac '00:0d:3a:7c:f7:3f'
[   34.486614] cloud-init[938]: ------------------------------------------------------------
[FAILED] Failed to start Initial cloud-init job (pre-networking).
See 'systemctl status cloud-init-local.service' for details.
[  OK  ] Reached target Network (Pre).
         Starting Network Service...
[  OK  ] Started Network Service.
         Starting Wait for Network to be Configured...
         Starting Network Name Resolution...
[  OK  ] Started Wait for Network to be Configured.
         Starting Initial cloud-init job (metadata service crawler)...
[  OK  ] Started Network Name Resolution.
[  OK  ] Reached target Host and Network Name Lookups.
[  OK  ] Reached target Network.
````

When this happens, the machine never boots, and we get an
OSProvisioningTimedOut error after about 30 minutes, and the machine
never reaches healthy state.

** Affects: cloud-init
     Importance: High
         Status: Triaged

** Changed in: cloud-init
   Importance: Undecided => High

** Changed in: cloud-init
       Status: New => Triaged

-- 
You received this bug notification because you are a member of Yahoo!
Engineering Team, which is subscribed to cloud-init.
https://bugs.launchpad.net/bugs/1844191

Title:
  azure advanced networking sometimes triggers duplicate mac detection

Status in cloud-init:
  Triaged

Bug description:
  Hi, we're still being affected by this on Azure with 19.2-24
  -ge7881d5c-0ubuntu1~18.04.1 - using PACKER to build from image:
  BuildSource : Marketplace/Canonical/UbuntuServer/18.04-DAILY-LTS

  Here is the packer config:
  ````
      "provisioners": [
          {
            "type": "shell",
            "inline": [
              "while [ ! -f /var/lib/cloud/instance/boot-finished ]; do echo 'Waiting for cloud-init...'; sleep 1; done"
            ]
          },
          {
              "type": "ansible",
              "playbook_file": "{{user `ansible_playbook`}}",
              "user": "packer",
              "extra_arguments": [ "--extra-vars", "codeVersion={{user `code_version`}} managed_image_name={{user `managed_image_name`}}" ]
          },
          {
              "type": "shell",
              "execute_command": "chmod +x {{ .Path }}; {{ .Vars }} sudo -E sh '{{ .Path }}'",
              "inline_shebang": "/bin/sh -x",
              "inline": [ "/usr/sbin/waagent -force -deprovision+user && export HISTSIZE=0 && sync" ]
      }]
  ````

  Here is the playbook:
  ````
  ---
  - hosts: all
    remote_user: ubuntu
    become: yes
    become_method: sudo
    become_user: root

    environment:
      DEBIAN_FRONTEND: noninteractive
  ````

  Note: we are applying `enableAcceleratedNetworking: true` to the NIC,
  anecdotally we think this is related.

  Usually our playbook has more in it (obviously) but Azure kept
  pointing fingers at us that our image was causing the problem, so I
  ran this test simply deploying a blank deprovisioned image via our
  same process.

  And here's what happens on the serial console log:

  ````
  [   20.337603] sh[910]: + [ -e /var/lib/cloud/instance/obj.pkl ]
  [   20.343177] sh[910]: + echo cleaning persistent cloud-init object
  [   20.349027] [  OK  ] Started Network Time Synchronization.
  [  OK  ] Reached target System Time Synchronized.
  sh[910]: cleaning persistent cloud-init object
  [   20.361066] sh[910]: + rm /var/lib/cloud/instance/obj.pkl
  [   20.412333] sh[910]: + exit 0
  [   34.282291] cloud-init[938]: Cloud-init v. 19.2-24-ge7881d5c-0ubuntu1~18.04.1 running 'init-local' at Mon, 16 Sep 2019 18:02:23 +0000. Up 32.02 seconds.
  [   34.288809] cloud-init[938]: 2019-09-16 18:02:25,262 - util.py[WARNING]: failed stage init-local
  [   34.423057] cloud-init[938]: failed run of stage init-local
  [   34.437716] cloud-init[938]: ------------------------------------------------------------
  [   34.441088] cloud-init[938]: Traceback (most recent call last):
  [   34.443719] cloud-init[938]:   File "/usr/lib/python3/dist-packages/cloudinit/cmd/main.py", line 653, in status_wrapper
  [   34.448072] cloud-init[938]:     ret = functor(name, args)
  [   34.450532] cloud-init[938]:   File "/usr/lib/python3/dist-packages/cloudinit/cmd/main.py", line 362, in main_init
  [   34.454849] cloud-init[938]:     init.apply_network_config(bring_up=bool(mode != sources.DSMODE_LOCAL))
  [   34.458725] cloud-init[938]:   File "/usr/lib/python3/dist-packages/cloudinit/stages.py", line 697, in apply_network_config
  [   34.463421] cloud-init[938]:     net.wait_for_physdevs(netcfg)
  [   34.466051] cloud-init[938]:   File "/usr/lib/python3/dist-packages/cloudinit/net/__init__.py", line 344, in wait_for_physdevs
  [   34.470673] cloud-init[938]:     present_macs = get_interfaces_by_mac().keys()
  [   34.473964] cloud-init[938]:   File "/usr/lib/python3/dist-packages/cloudinit/net/__init__.py", line 633, in get_interfaces_by_mac
  [   34.479325] cloud-init[938]:     (name, ret[mac], mac))
  [   34.481838] cloud-init[938]: RuntimeError: duplicate mac found! both 'eth0' and 'enP1s1' have mac '00:0d:3a:7c:f7:3f'
  [   34.486614] cloud-init[938]: ------------------------------------------------------------
  [FAILED] Failed to start Initial cloud-init job (pre-networking).
  See 'systemctl status cloud-init-local.service' for details.
  [  OK  ] Reached target Network (Pre).
           Starting Network Service...
  [  OK  ] Started Network Service.
           Starting Wait for Network to be Configured...
           Starting Network Name Resolution...
  [  OK  ] Started Wait for Network to be Configured.
           Starting Initial cloud-init job (metadata service crawler)...
  [  OK  ] Started Network Name Resolution.
  [  OK  ] Reached target Host and Network Name Lookups.
  [  OK  ] Reached target Network.
  ````

  When this happens, the machine never boots, and we get an
  OSProvisioningTimedOut error after about 30 minutes, and the machine
  never reaches healthy state.

To manage notifications about this bug go to:
https://bugs.launchpad.net/cloud-init/+bug/1844191/+subscriptions


Follow ups