yahoo-eng-team team mailing list archive
-
yahoo-eng-team team
-
Mailing list archive
-
Message #76083
[Bug 1806701] [NEW] cloud-init may hang OS boot process due to grep for the entire ISO file when it is attached
Public bug reported:
We have found in our test for SLES15 with cloud-init installed, if we
attach a ISO file with the VM before VM is boot, it often takes more
than 10 minutes to start the SLES OS. Sometimes it failed to start the
SLES OS at all.
We've root caused it is due to the "is_cdrom_ovf()" func of "tools/ds-
identify".
In this function, there is the following logic to detect if an ISO contains certain string:
> local idstr="http://schemas.dmtf.org/ovf/environment/1"
> grep --quiet --ignore-case "$idstr" "${PATH_ROOT}$dev"
ref: https://git.launchpad.net/cloud-init/tree/tools/ds-identify
It is trying to grep the who ISO file for a certain string, which causes intense IO pressure for the system.
What is worse is that sometimes the ISO file is large (e.g. >5GB for installer DVD) and it is mounted over NFS. The "grep" process often consume 99% CPU and seems hang. Then the systemd starts more and more "grep" process which smoke the CPU and consumes all the IO bandwidth for the ISO file. Then the system may hang for a long time and sometimes failed to start.
To fix this issue, I suggest that we should not grep for the entire ISO
file. Rather then we should just check if the file/dir exists with
os.path.exists().
-------------------------debug log snip------------------------
pek2-gosv-16-dhcp180:~ # ps -ef
UID PID PPID C STIME TTY TIME CMD
root 1 0 0 13:32 ? 00:00:04 /usr/lib/systemd/systemd --switched-root --system --deserialize 24
…
root 474 1 0 13:34 ? 00:00:00 /bin/sh /usr/lib/cloud-init/ds-identify
root 482 474 2 13:34 ? 00:00:15 grep --quiet --ignore-case http://schemas.dmtf.org/ovf/environment/1 /dev/sr1
root 1020 1 0 13:35 ? 00:00:00 /bin/sh /usr/lib/cloud-init/ds-identify
root 1039 1020 1 13:35 ? 00:00:07 grep --quiet --ignore-case http://schemas.dmtf.org/ovf/environment/1 /dev/sr1
polkitd 1049 1 0 13:37 ? 00:00:00 /usr/lib/polkit-1/polkitd --no-debug
root 1051 1 0 13:37 ? 00:00:00 /usr/sbin/wickedd --systemd --foreground
root 1052 1 0 13:37 ? 00:00:00 /usr/lib/systemd/systemd-logind
root 1054 1 0 13:37 ? 00:00:00 /usr/sbin/wickedd-nanny --systemd --foreground
root 1073 1 0 13:37 ? 00:00:00 /usr/bin/vmtoolsd
root 1097 1 0 13:37 ? 00:00:00 /bin/sh /usr/lib/cloud-init/ds-identify
root 1110 1097 1 13:37 ? 00:00:04 grep --quiet --ignore-case http://schemas.dmtf.org/ovf/environment/1 /dev/sr1
root 1304 1 0 13:38 ? 00:00:00 /bin/sh /usr/lib/cloud-init/ds-identify
root 1312 1304 1 13:38 ? 00:00:03 grep --quiet --ignore-case http://schemas.dmtf.org/ovf/environment/1 /dev/sr1
root 1537 1 0 13:40 ? 00:00:00 /usr/bin/plymouth --wait
root 1613 1 0 13:40 ? 00:00:00 /bin/sh /usr/lib/cloud-init/ds-identify
root 1645 1613 0 13:40 ? 00:00:02 grep --quiet --ignore-case http://schemas.dmtf.org/ovf/environment/1 /dev/sr1
…
Grep use nearly 100% cpu, system very slow.
top - 13:46:37 up 26 min, 2 users, load average: 14.14, 15.03, 10.57
Tasks: 225 total, 6 running, 219 sleeping, 0 stopped, 0 zombie
%Cpu(s): 40.1 us, 49.3 sy, 0.0 ni, 0.0 id, 1.4 wa, 0.0 hi, 9.1 si, 0.0 st
KiB Mem : 1000916 total, 64600 free, 355880 used, 580436 buff/cache
KiB Swap: 1288168 total, 1285600 free, 2568 used. 492688 avail Mem
PID USER PR NI VIRT RES SHR S %CPU %MEM TIME+ COMMAND
4427 root 20 0 40100 3940 3084 R 99.90 0.394 0:27.41 top
1016 root 20 0 197796 4852 3400 R 99.90 0.485 1:26.44 vmtoolsd
1723 root 20 0 7256 1860 1556 D 99.90 0.186 0:28.44 grep
484 root 20 0 7256 1684 1396 D 99.90 0.168 1:51.22 grep
1278 root 20 0 7256 1856 1556 D 99.90 0.185 0:38.44 grep
1398 root 20 0 7256 1860 1556 R 99.90 0.186 0:28.53 grep
1061 root 20 0 7256 1856 1556 D 99.90 0.185 0:56.62 grep
-------------------------debug log snip------------------------
** Affects: cloud-init
Importance: Undecided
Status: New
--
You received this bug notification because you are a member of Yahoo!
Engineering Team, which is subscribed to cloud-init.
https://bugs.launchpad.net/bugs/1806701
Title:
cloud-init may hang OS boot process due to grep for the entire ISO
file when it is attached
Status in cloud-init:
New
Bug description:
We have found in our test for SLES15 with cloud-init installed, if we
attach a ISO file with the VM before VM is boot, it often takes more
than 10 minutes to start the SLES OS. Sometimes it failed to start the
SLES OS at all.
We've root caused it is due to the "is_cdrom_ovf()" func of "tools
/ds-identify".
In this function, there is the following logic to detect if an ISO contains certain string:
> local idstr="http://schemas.dmtf.org/ovf/environment/1"
> grep --quiet --ignore-case "$idstr" "${PATH_ROOT}$dev"
ref: https://git.launchpad.net/cloud-init/tree/tools/ds-identify
It is trying to grep the who ISO file for a certain string, which causes intense IO pressure for the system.
What is worse is that sometimes the ISO file is large (e.g. >5GB for installer DVD) and it is mounted over NFS. The "grep" process often consume 99% CPU and seems hang. Then the systemd starts more and more "grep" process which smoke the CPU and consumes all the IO bandwidth for the ISO file. Then the system may hang for a long time and sometimes failed to start.
To fix this issue, I suggest that we should not grep for the entire
ISO file. Rather then we should just check if the file/dir exists with
os.path.exists().
-------------------------debug log snip------------------------
pek2-gosv-16-dhcp180:~ # ps -ef
UID PID PPID C STIME TTY TIME CMD
root 1 0 0 13:32 ? 00:00:04 /usr/lib/systemd/systemd --switched-root --system --deserialize 24
…
root 474 1 0 13:34 ? 00:00:00 /bin/sh /usr/lib/cloud-init/ds-identify
root 482 474 2 13:34 ? 00:00:15 grep --quiet --ignore-case http://schemas.dmtf.org/ovf/environment/1 /dev/sr1
root 1020 1 0 13:35 ? 00:00:00 /bin/sh /usr/lib/cloud-init/ds-identify
root 1039 1020 1 13:35 ? 00:00:07 grep --quiet --ignore-case http://schemas.dmtf.org/ovf/environment/1 /dev/sr1
polkitd 1049 1 0 13:37 ? 00:00:00 /usr/lib/polkit-1/polkitd --no-debug
root 1051 1 0 13:37 ? 00:00:00 /usr/sbin/wickedd --systemd --foreground
root 1052 1 0 13:37 ? 00:00:00 /usr/lib/systemd/systemd-logind
root 1054 1 0 13:37 ? 00:00:00 /usr/sbin/wickedd-nanny --systemd --foreground
root 1073 1 0 13:37 ? 00:00:00 /usr/bin/vmtoolsd
root 1097 1 0 13:37 ? 00:00:00 /bin/sh /usr/lib/cloud-init/ds-identify
root 1110 1097 1 13:37 ? 00:00:04 grep --quiet --ignore-case http://schemas.dmtf.org/ovf/environment/1 /dev/sr1
root 1304 1 0 13:38 ? 00:00:00 /bin/sh /usr/lib/cloud-init/ds-identify
root 1312 1304 1 13:38 ? 00:00:03 grep --quiet --ignore-case http://schemas.dmtf.org/ovf/environment/1 /dev/sr1
root 1537 1 0 13:40 ? 00:00:00 /usr/bin/plymouth --wait
root 1613 1 0 13:40 ? 00:00:00 /bin/sh /usr/lib/cloud-init/ds-identify
root 1645 1613 0 13:40 ? 00:00:02 grep --quiet --ignore-case http://schemas.dmtf.org/ovf/environment/1 /dev/sr1
…
Grep use nearly 100% cpu, system very slow.
top - 13:46:37 up 26 min, 2 users, load average: 14.14, 15.03, 10.57
Tasks: 225 total, 6 running, 219 sleeping, 0 stopped, 0 zombie
%Cpu(s): 40.1 us, 49.3 sy, 0.0 ni, 0.0 id, 1.4 wa, 0.0 hi, 9.1 si, 0.0 st
KiB Mem : 1000916 total, 64600 free, 355880 used, 580436 buff/cache
KiB Swap: 1288168 total, 1285600 free, 2568 used. 492688 avail Mem
PID USER PR NI VIRT RES SHR S %CPU %MEM TIME+ COMMAND
4427 root 20 0 40100 3940 3084 R 99.90 0.394 0:27.41 top
1016 root 20 0 197796 4852 3400 R 99.90 0.485 1:26.44 vmtoolsd
1723 root 20 0 7256 1860 1556 D 99.90 0.186 0:28.44 grep
484 root 20 0 7256 1684 1396 D 99.90 0.168 1:51.22 grep
1278 root 20 0 7256 1856 1556 D 99.90 0.185 0:38.44 grep
1398 root 20 0 7256 1860 1556 R 99.90 0.186 0:28.53 grep
1061 root 20 0 7256 1856 1556 D 99.90 0.185 0:56.62 grep
-------------------------debug log snip------------------------
To manage notifications about this bug go to:
https://bugs.launchpad.net/cloud-init/+bug/1806701/+subscriptions
Follow ups