← Back to team overview

yahoo-eng-team team mailing list archive

[Bug 1913354] Re: NFS mounts in /etc/fstab and cloud-init may cause boot hang

 

Tracked in Github Issues as https://github.com/canonical/cloud-
init/issues/3834

** Bug watch added: github.com/canonical/cloud-init/issues #3834
   https://github.com/canonical/cloud-init/issues/3834

** Changed in: cloud-init
       Status: Triaged => Expired

-- 
You received this bug notification because you are a member of Yahoo!
Engineering Team, which is subscribed to cloud-init.
https://bugs.launchpad.net/bugs/1913354

Title:
  NFS mounts in /etc/fstab and cloud-init may cause boot hang

Status in cloud-init:
  Expired

Bug description:
  Azure, RHEL 7.8, 7.9 and OEL 7.8, 7.9.

  On OEL 7.8 cloud-init is cloud-init-18.5-6.el7.x86_64

  On both OEL and RHel 7.* (certainly 7.8 and 7.9), if we have a NFS
  mount in /etc/fstab (unknown if this applies to NFSv4), then boot may
  not complete. The end result is a hang, and the system is inaccessible
  from SSH or serial console login.

  All points to a deadlock between the starting of the rpc.statd and
  rpc.statd-notify services and the cloud-init.service.

  This happens because rpc.statd and rpc.statd-notify have the following
  dependencies declared:

  # rcp-statd.service
  [Unit]
  Description=NFS status monitor for NFSv2/3 locking.
  DefaultDependencies=no
  Conflicts=umount.target
  Requires=nss-lookup.target rpcbind.socket
  Wants=network-online.target                                   # <---
  After=network-online.target nss-lookup.target rpcbind.socket  # <---

  PartOf=nfs-utils.service

  Wants=nfs-config.service
  After=nfs-config.service

  [Service]
  Environment=RPC_STATD_NO_NOTIFY=1
  EnvironmentFile=-/run/sysconfig/nfs-utils
  Type=forking
  PIDFile=/var/run/rpc.statd.pid
  ExecStart=/usr/sbin/rpc.statd $STATDARGS

  # rpc-statd-notify.service:

  [Unit]
  Description=Notify NFS peers of a restart
  DefaultDependencies=no
  Wants=network-online.target                                   # <---
  After=local-fs.target network-online.target nss-lookup.target # <---

  # Do not start up in HA environments
  ConditionPathExists=!/var/lib/nfs/statd/sm.ha

  # if we run an nfs server, it needs to be running before we
  # tell clients that it has restarted.
  After=nfs-server.service

  PartOf=nfs-utils.service

  Wants=nfs-config.service
  After=nfs-config.service

  [Service]
  EnvironmentFile=-/run/sysconfig/nfs-utils
  Type=forking
  ExecStart=-/usr/sbin/sm-notify $SMNOTIFYARGS

  while cloud-init.service is:

  [Unit]
  Description=Initial cloud-init job (metadata service crawler)
  Wants=cloud-init-local.service
  Wants=sshd-keygen.service
  Wants=sshd.service
  After=cloud-init-local.service
  After=NetworkManager.service network.service
  Before=network-online.target                     # <---
  Before=sshd-keygen.service
  Before=sshd.service
  Before=systemd-user-sessions.service
  ConditionPathExists=!/etc/cloud/cloud-init.disabled
  ConditionKernelCommandLine=!cloud-init=disabled

  [Service]
  Type=oneshot
  ExecStart=/usr/bin/cloud-init init
  RemainAfterExit=yes
  TimeoutSec=0

  # Output needs to appear in instance console output
  StandardOutput=journal+console

  [Install]
  WantedBy=cloud-init.target

  So cloud-init is to be started before network-online.target, while
  rpc-statd* are to be started after network-online.target.

  CX has demonstrated this to my satisfaction.

  I see a few possible paths here:

  1. CX has to change the (rpc-statd|rpc-statd-notify).service so that
  they now state:

  Before=network-online.target
  #Wants=network-online.target
  #After network-online.target

  2. CX has to change cloud-init.service so that it now states:

  Wants=network-online.target
  After=network-online.target
  #Before=network-online.target

  3. CX removes the NFS mount from /etc/fstab, and adds it as a systemd
  .mount unit

  
  CX opted for change #1 above, and now sees no boot issues.

  There is a Red Hat bug about that:
  https://bugzilla.redhat.com/show_bug.cgi?id=1858930, but it was closed
  WONTFIX because... support for RHEL7 ended :-(. . I also tried to
  search on bugzilla and Launchpad for related bugs on RHEL(7|8), but
  did not find any.

To manage notifications about this bug go to:
https://bugs.launchpad.net/cloud-init/+bug/1913354/+subscriptions



References