← Back to team overview

yahoo-eng-team team mailing list archive

[Bug 1303925] Re: commissioning fails silently if a node can't reach the region controller

 

cloud-init is executing code that maas told it to execute.
so maas needs to tell it to execute code that has some "last ditch catch".

to be clear, cloud-init got data from maas (via kernel cmdline) that
told it to tell get some code from the metadata server to execute.  It
then executed it.  That code failed.  *that* is the code that needs to
be more resilient.  cloud-init is, by design, very much doing exactly
what maas tells it to do.

** No longer affects: cloud-init

-- 
You received this bug notification because you are a member of Yahoo!
Engineering Team, which is subscribed to cloud-init.
https://bugs.launchpad.net/bugs/1303925

Title:
  commissioning fails silently if a node can't reach the region
  controller

Status in MAAS:
  Triaged

Bug description:
  We recently had a node which completely refused to commission in MAAS.
  After (literally) several man days of debugging, we figured out that
  it was because the node couldn't talk to the region controller over
  HTTP.

  Obviously, that's ultimately our mistake/problem, but MAAS could have
  been a lot better at helping us to help ourselves; currently, there's
  absolutely no indication from the boot process that the HTTP
  connection to the region controller is the problem.

  Attached is the serial console output (from the point of boot) for the
  node that was failing to commission.  91.189.94.35 is the MAAS region
  controller and 91.189.88.20 is the MAAS cluster controller.

To manage notifications about this bug go to:
https://bugs.launchpad.net/maas/+bug/1303925/+subscriptions