yahoo-eng-team team mailing list archive
-
yahoo-eng-team team
-
Mailing list archive
-
Message #71418
[Bug 1750780] Re: Race with local file systems can make open-vm-tools fail to start
Installed another Xenial and Bionic in vmware to take a deper look.
- Xenial (with backported open-vm-tools): affected
- Bionic (with the interim fix reverted): no hit in several retries, explanation below
Systemd fixed it (via our assumed implicit dependency).
In Bionic the PrivateTmp gives it a dependency on systemd-tmpfile-setup.service (seen in systemd analyze, there might be more but not on crit path).
This is configured by default to include /var/tmp in /usr/lib/tmpfiles.d/tmp.conf.
In regard to your thoughts about later on changing cloud-init ordering
that won't help you, as the dependency is there (implicit or explicit
doesn't matter).
For the xenial case where I reliably hit the issue instead of stracing I cut things short.
A service with the following exposes exactly the same error:
[Unit]
Description=foo
DefaultDependencies=no
[Service]
PrivateTmp=yes
ExecStart=/bin/true
[Install]
WantedBy=multi-user.target
So back on Xenial it is privateTmp + too early that breaks it.
Xenial vs Bionic critical-chain according to "systemd-analyze critical-
chain open.vm-tools.service"
Xenial with fix:
open-vm-tools.service @3.482s
└─local-fs.target @3.460s
└─local-fs-pre.target @3.460s
└─systemd-remount-fs.service @3.442s +9ms
└─system.slice @220ms
└─-.slice @204m
Xenial without fix:
└─run-vmblock\x2dfuse.mount @6.076s +390ms
└─sys-fs-fuse-connections.mount @5.510s +375ms
└─systemd-modules-load.service @1.996s +75ms
└─system.slice @1.984s
└─-.slice @1.966s
Bionic
open-vm-tools.service @3.566s
└─systemd-tmpfiles-setup.service @3.421s +100ms
└─systemd-journal-flush.service @3.054s +342ms
└─systemd-journald.service @825ms +2.219s
└─syslog.socket @808ms
└─system.slice @621ms
└─-.slice @613ms
To Summarize, we can:
- revert the fix for Bionic (or later) - just make it a sync when convenient down the road, it doesn't hurt for now as it is (almost) the same as the implicit dependency)
- add a xenials systemd bug task (probably too complex to fix as -upstream)
- until said systemd bug is fixed a backport of open-vm-tools needs this fix
** Also affects: systemd (Ubuntu)
Importance: Undecided
Status: New
** Also affects: open-vm-tools (Ubuntu Xenial)
Importance: Undecided
Status: New
** Also affects: systemd (Ubuntu Xenial)
Importance: Undecided
Status: New
** Changed in: open-vm-tools (Ubuntu Xenial)
Status: New => Triaged
** Changed in: systemd (Ubuntu)
Status: New => Fix Released
--
You received this bug notification because you are a member of Yahoo!
Engineering Team, which is subscribed to cloud-init.
https://bugs.launchpad.net/bugs/1750780
Title:
Race with local file systems can make open-vm-tools fail to start
Status in cloud-init:
Invalid
Status in open-vm-tools package in Ubuntu:
Fix Released
Status in systemd package in Ubuntu:
Fix Released
Status in open-vm-tools source package in Xenial:
Triaged
Status in systemd source package in Xenial:
New
Status in open-vm-tools package in Debian:
Incomplete
Bug description:
Since the change in [1] open-vm-tools-service starts very (very) early.
Not so much due to the
Before=cloud-init-local.service
But much more by
DefaultDependencies=no
That can trigger an issue that looks like
root@ubuntuguest:~# systemctl status -l open-vm-tools.service
● open-vm-tools.service - Service for virtual machines hosted on VMware
Loaded: loaded (/lib/systemd/system/open-vm-tools.service; enabled; vendor preset: enabled)
Active: failed (Result: resources)
As it is right now open-vm-tools can race with the other early start and then fail.
In detail one can find a message like:
open-vm-tools.service: Failed to run 'start' task: Read-only file system"
This is due to privtaeTmp=yes which is also set needing a writable
/var/tmp [2]
To ensure this works PrivateTmp would have to be removed (not good) or some after dependencies added that make this work reliably.
I added
After=local-fs.target
which made it work for me in 3/3 tests.
I' like to have an ack by the cloud-init Team that this does not totally kill the originally intended Before=cloud-init-local.service
I think it does not as local-fs can complete before cloud-init-local, then open-vm-tools can initialize and finally cloud-init-local can pick up the data.
To summarize:
# cloud-init-local #
DefaultDependencies=no
Wants=network-pre.target
After=systemd-remount-fs.service
Before=NetworkManager.service
Before=network-pre.target
Before=shutdown.target
Before=sysinit.target
Conflicts=shutdown.target
RequiresMountsFor=/var/lib/cloud
# open-vm-tools #
DefaultDependencies=no
Before=cloud-init-local.service
Proposed is to add to the latter:
After=local-fs.target
[1]: https://bugs.debian.org/cgi-bin/bugreport.cgi?bug=859677
[2]: https://github.com/systemd/systemd/issues/5610
To manage notifications about this bug go to:
https://bugs.launchpad.net/cloud-init/+bug/1750780/+subscriptions
References