← Back to team overview

yahoo-eng-team team mailing list archive

[Bug 1810859] [NEW] ds-identify runs too early

 

Public bug reported:

ds-identify is executed from a systemd generator [1]. Based on my
understanding of the intention of both this creates a non resolvable
timing conflict.

Generators run very early in the boot process.

The cloud-init generator runs ds-identify which in turn runs "blkid" to
find filesystems with specific labels, "cidata" for the nocloud data
source. However, it is possible to construct an environment where the
filesystem with the "cidata" label is on an attached device and the
generator runs prior to the attached device being known to the kernel
and thus the return of blkid cannot reflect the proper status, meaning
the "cidata" label cannot be found and thus the "nocloud" data source is
not properly identified. This implies that the cloud-init.target unit
will be disabled.

Observed in a test environment with qemu and the data source on a
separate virtual device.

According to [1] we shouldn't add any sync points such as "udevadm
settle", thus I am not certain how this could be resolved. Also given
that we cannot control the timing of the execution of the generator it
appears that this is going to be difficult to get under control.

Would it make sense to give ds-identify the option to simply exit and
leave things alone?

In the present setup the generator target runs ds-identify which in turn
will disable cloud-init.target if no data source can be identified.
However, the Python code usually runs late enough that things that are
no available in early boot are found and data sources are identified
properly. If users that know they run in a specific environment could
set a "ds=no-check" flag on the kernel command line then the timing
issue could be prevented.

I realize for the nocloud case a user can set "ds=nocloud" on the kernel
command line to work around the timing issue described herein. Also a
"ds=no-check" would circumvent the basic intention of the generator to
allow cloud-init to be installed anywhere and simply detect quickly an
environment where cloud-init Python code should not be executed and thus
safe boot time.

My point is that, IMHO, timing issues in general cannot be avoided by
ds-identify due to the nature of when systemd executes the generators.
Thus giving users the general ability to disable ds-identify maybe
useful.

I am happy if I can be proven incorrect and the timing issue can be
resolved.


[1] https://www.freedesktop.org/wiki/Software/systemd/Generators/

** Affects: cloud-init
     Importance: Undecided
         Status: New

-- 
You received this bug notification because you are a member of Yahoo!
Engineering Team, which is subscribed to cloud-init.
https://bugs.launchpad.net/bugs/1810859

Title:
  ds-identify runs too early

Status in cloud-init:
  New

Bug description:
  ds-identify is executed from a systemd generator [1]. Based on my
  understanding of the intention of both this creates a non resolvable
  timing conflict.

  Generators run very early in the boot process.

  The cloud-init generator runs ds-identify which in turn runs "blkid"
  to find filesystems with specific labels, "cidata" for the nocloud
  data source. However, it is possible to construct an environment where
  the filesystem with the "cidata" label is on an attached device and
  the generator runs prior to the attached device being known to the
  kernel and thus the return of blkid cannot reflect the proper status,
  meaning the "cidata" label cannot be found and thus the "nocloud" data
  source is not properly identified. This implies that the cloud-
  init.target unit will be disabled.

  Observed in a test environment with qemu and the data source on a
  separate virtual device.

  According to [1] we shouldn't add any sync points such as "udevadm
  settle", thus I am not certain how this could be resolved. Also given
  that we cannot control the timing of the execution of the generator it
  appears that this is going to be difficult to get under control.

  Would it make sense to give ds-identify the option to simply exit and
  leave things alone?

  In the present setup the generator target runs ds-identify which in
  turn will disable cloud-init.target if no data source can be
  identified. However, the Python code usually runs late enough that
  things that are no available in early boot are found and data sources
  are identified properly. If users that know they run in a specific
  environment could set a "ds=no-check" flag on the kernel command line
  then the timing issue could be prevented.

  I realize for the nocloud case a user can set "ds=nocloud" on the
  kernel command line to work around the timing issue described herein.
  Also a "ds=no-check" would circumvent the basic intention of the
  generator to allow cloud-init to be installed anywhere and simply
  detect quickly an environment where cloud-init Python code should not
  be executed and thus safe boot time.

  My point is that, IMHO, timing issues in general cannot be avoided by
  ds-identify due to the nature of when systemd executes the generators.
  Thus giving users the general ability to disable ds-identify maybe
  useful.

  I am happy if I can be proven incorrect and the timing issue can be
  resolved.

  
  [1] https://www.freedesktop.org/wiki/Software/systemd/Generators/

To manage notifications about this bug go to:
https://bugs.launchpad.net/cloud-init/+bug/1810859/+subscriptions


Follow ups