← Back to team overview

yahoo-eng-team team mailing list archive

[Bug 1555238] Re: cloud-init FAIL status message doesn't differentiate between a critical failure vs

 

Generally speaking, I think this bug is off-the-cuff response to an issue
that is now fixed (bug 1554152).  Spending engineering resources developing a
complex solution is not justified by a single bug.

In that regard
a.) that bug is fixed, and wont happen again.
b.) cloud-init is not in the position of being able to determine "NONFATAL"
 some security conscious users may consider failure good entropy in
 system boot to be fatal.  Some of those users use MAAS to deploy their
 systems.
c.) What would maas be expected to do in a case where it received a
FAILCONTINUE ?  Would it then look at that log and make a decision?
Clearly something went wrong that something may have adverse affects.

cloud-init generally knows less about what is supposed to happen than the
user of it does, so is less in a position to make such desicions.
I find it much better for our system as a whole to *not* have transient
failures, even unimportant ones.

If there are more cases of cloud-init correctly reporting failure that are
problematic to maas we can consider engineering a way that MAAS can tell
cloud-init what types of failures it would consider not important.

Consider the following cases:
1.) /dev/random did not get seeded with data from entropy.canonical.com
  Some people might consider this fatal, some people might find it
  desirable.
2.) cloud-init failed to add a configured user
3.) cloud-init failed to add one a configured user to a specific group
4.) user provided code (runcmd / user-data-script) exited non-zero
   (Note, this is how 'curtin install' is provided to cloud-init)

Each of these is fatal for some users and non-fatal for others.
Generally looking at the things above, maas might consider 1, 2 and 3
to be FAILCONTINUE , as it does not need the users at all.

However, a user that launched a system expected for their admins to be
able to get into it is very much bothered by '2'.

case '4' is pretty straight forward, but lots of times my scripts fail
and things deal with that.

Implementing a solution to this really means the user of cloud-init (in
this case maas) needs to know what *they* consider fatal or non-fatal and
either tell cloud-init to report those things as fatal, or interpret the
reports as fatal or non-fatal itself.


** Changed in: cloud-init
       Status: New => Won't Fix

** Changed in: cloud-init
   Importance: Undecided => Low

-- 
You received this bug notification because you are a member of Yahoo!
Engineering Team, which is subscribed to cloud-init.
https://bugs.launchpad.net/bugs/1555238

Title:
  cloud-init FAIL status message doesn't differentiate between a
  critical failure vs

Status in cloud-init:
  Won't Fix

Bug description:
  Cloud-init status messages that are send to MAAS provide a
  SUCCESS/FAIL results for the different modules that cloud-init runs.
  As such, if a module failed, MAAS captures that FAIL message and acts
  upon on it; for example, it marks a machine Failed Deployment.

  That being said, when using a MAAS data source/endpoint, there are
  some cloud-init modules for which a failure is not critical; meaning
  that cloud-init won't stop working or cause a deployment failure if
  the module has failed. However, this doesn't reflect in the messaging.
  Even if it is not a critical module, cloud-init will continue to send
  a FAIL message to MAAS, which causes MAAS to mark a machine Failed
  Deployment.

  As such, cloud-init shouldn't be tell MAAS that a module run has
  FAILED if it is not critical to a MAAS deployment (that will cause a
  machine to FAIL). In turn, cloud-init should be sending:

  A different 'result' i.e. SUCCESS/FAIL/WARNING (or FAILCONTINUE)

  As an example, the info sent to MAAS is:

              "event_type": "finish",
              "origin": "curtin",
              "description": "Finished XYZ",
              "name": "cmd-install",
              "result": "FAIL",

To manage notifications about this bug go to:
https://bugs.launchpad.net/cloud-init/+bug/1555238/+subscriptions


References