← Back to team overview

yahoo-eng-team team mailing list archive

[Bug 1910552] Re: machines fail to boot if MAAS doesn't respond to cloud-init

 

Deployed a machine using an alpha build of MAAS 3.4.0 with cloud-init
23.1.1-0ubuntu0~20.04.1

>From initial deployment

ubuntu@petrel:~$ cloud-init analyze boot
-- Most Recent Boot Record --
    Kernel Started at: 2023-04-21 10:21:24.408999
    Kernel ended boot at: 2023-04-21 10:21:28.556996
    Kernel time to boot (seconds): 4.14799690246582
    Cloud-init activated by systemd at: 2023-04-21 10:21:33.759920
    Time between Kernel end boot and Cloud-init activation (seconds): 5.202924013137817
    Cloud-init start: 2023-04-21 10:21:36.521000
successful

Then rebooted with MAAS still available

ubuntu@petrel:~$ cloud-init analyze boot
-- Most Recent Boot Record --
    Kernel Started at: 2023-04-21 10:29:39.458092
    Kernel ended boot at: 2023-04-21 10:29:43.598001
    Kernel time to boot (seconds): 4.139909029006958
    Cloud-init activated by systemd at: 2023-04-21 10:29:48.546367
    Time between Kernel end boot and Cloud-init activation (seconds): 4.948365926742554
    Cloud-init start: 2023-04-21 10:29:51.154000
successful


Then turned off MAAS, and rebooted a second time

ubuntu@petrel:~$ cloud-init analyze boot
-- Most Recent Boot Record --
    Kernel Started at: 2023-04-21 10:34:03.435137
    Kernel ended boot at: 2023-04-21 10:34:07.571495
    Kernel time to boot (seconds): 4.136358022689819
    Cloud-init activated by systemd at: 2023-04-21 10:34:12.638482
    Time between Kernel end boot and Cloud-init activation (seconds): 5.066987037658691
    Cloud-init start: 2023-04-21 10:34:15.266000
successful

and confirmed in the cloud-init logs that we saw the expected

2023-04-21 10:34:27,759 - handlers.py[WARNING]: Multiple consecutive
failures in WebHookHandler. Cancelling all queued events.


** Changed in: maas
     Assignee: (unassigned) => Adam Collard (adam-collard)

** Changed in: maas
       Status: Triaged => Won't Fix

** Changed in: maas
       Status: Won't Fix => Fix Released

** Changed in: maas
       Status: Fix Released => Won't Fix

** Changed in: maas
       Status: Won't Fix => Invalid

-- 
You received this bug notification because you are a member of Yahoo!
Engineering Team, which is subscribed to cloud-init.
https://bugs.launchpad.net/bugs/1910552

Title:
  machines fail to boot if MAAS doesn't respond to cloud-init

Status in cloud-init:
  Fix Released
Status in MAAS:
  Invalid

Bug description:
  We have a recurring issue on a MAAS 2.3.7 (xenial), where once in a while we need to restart rackd and regiond to make maas respond to machines rebooting.
  This itself would be a different bug though.
  What I'd like to report here is that a machine should be able to finish its boot sequence even if it can't talk to the MAAS API.

  Observed behaviour:

  [  OK  ] Started Raise network interfaces.
  [  OK  ] Reached target Network.
           Starting Initial cloud-init job (metadata service crawler)...
  (stuck here indefinitely)

  (restart rackd and regiond)

  the machine reboots successfully.

To manage notifications about this bug go to:
https://bugs.launchpad.net/cloud-init/+bug/1910552/+subscriptions