← Back to team overview

yahoo-eng-team team mailing list archive

[Bug 1805446] [NEW] per-instance is not run after reboot during boot

 

Public bug reported:

Bumped into a problem with missing user after node reboot during per-
instance module execution. What happens is that the file lock is
acquired, but module does not finish before rebooting[1]. After reboot
we can see that cloud-init thinks module ran successfully although it
was interrupted before succeeding to add user[2]. And later we can see
other modules failing due to missing user[3].

I suppose there is attempts to run same module multiple times at the
same time so we need to have the lock file written before module
execution? Obviously would be easiest if sem.lock() could just be moved
to after execution. If this is not possible, I guess the locking could
be improved in a way that has_run() would distinguish module
running/finished from module failed mid-run on first boot.


[1]
2018-11-23 11:11:18,195 - stages.py[DEBUG]: Running module users-groups (<module 'cloudinit.config.cc_users_groups' from '/usr/lib/python2.7/site-packages/cloudinit/config/cc_users_groups.pyc'>) with frequency once-per-instance
2018-11-23 11:11:18,195 - handlers.py[DEBUG]: start: init-network/config-users-groups: running config-users-groups with frequency once-per-instance
2018-11-23 11:11:18,196 - util.py[DEBUG]: Running command sync with allowed return codes [0] (shell=False, capture=True)
^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@2018-11-23 11:11:51,582 - util.py[DEBUG]: Cloud-init v. 17.1 running 'init-local' at Fri, 23 Nov 2018 11:11:51 +0000. Up 13.86 seconds.
2018-11-23 11:11:51,608 - main.py[DEBUG]: No kernel command line url found.

[2]
2018-11-23 11:12:02,624 - stages.py[DEBUG]: Running module users-groups (<module 'cloudinit.config.cc_users_groups' from '/usr/lib/python2.7/site-packages/cloudinit/config/cc_users_groups.pyc'>) with frequency once-per-instance
2018-11-23 11:12:02,625 - handlers.py[DEBUG]: start: init-network/config-users-groups: running config-users-groups with frequency once-per-instance
2018-11-23 11:12:02,625 - helpers.py[DEBUG]: config-users-groups already ran (freq=once-per-instance)
2018-11-23 11:12:02,625 - handlers.py[DEBUG]: finish: init-network/config-users-groups: SUCCESS: config-users-groups previously ran

[3]
2018-11-23 11:12:31,806 - util.py[DEBUG]: Running module ssh-authkey-fingerprints (<module 'cloudinit.config.cc_ssh_authkey_fingerprints' from '/usr/lib/python2.7/site-packages/cloudinit/config/cc_ssh_authkey_fingerprints.pyc'>) failed
Traceback (most recent call last):
  File "/usr/lib/python2.7/site-packages/cloudinit/stages.py", line 793, in _run_modules
    freq=freq)
  File "/usr/lib/python2.7/site-packages/cloudinit/cloud.py", line 54, in run
    return self._runners.run(name, functor, args, freq, clear_on_fail)
  File "/usr/lib/python2.7/site-packages/cloudinit/helpers.py", line 198, in run
    results = functor(*args)
  File "/usr/lib/python2.7/site-packages/cloudinit/config/cc_ssh_authkey_fingerprints.py", line 108, in handle
    (key_fn, key_entries) = ssh_util.extract_authorized_keys(user_name)
  File "/usr/lib/python2.7/site-packages/cloudinit/ssh_util.py", line 212, in extract_authorized_keys
    (ssh_dir, pw_ent) = users_ssh_info(username)
  File "/usr/lib/python2.7/site-packages/cloudinit/ssh_util.py", line 205, in users_ssh_info
    pw_ent = pwd.getpwnam(username)
KeyError: 'getpwnam(): name not found: testuser'

** Affects: cloud-init
     Importance: Undecided
         Status: New

-- 
You received this bug notification because you are a member of Yahoo!
Engineering Team, which is subscribed to cloud-init.
https://bugs.launchpad.net/bugs/1805446

Title:
  per-instance is not run after reboot during boot

Status in cloud-init:
  New

Bug description:
  Bumped into a problem with missing user after node reboot during per-
  instance module execution. What happens is that the file lock is
  acquired, but module does not finish before rebooting[1]. After reboot
  we can see that cloud-init thinks module ran successfully although it
  was interrupted before succeeding to add user[2]. And later we can see
  other modules failing due to missing user[3].

  I suppose there is attempts to run same module multiple times at the
  same time so we need to have the lock file written before module
  execution? Obviously would be easiest if sem.lock() could just be
  moved to after execution. If this is not possible, I guess the locking
  could be improved in a way that has_run() would distinguish module
  running/finished from module failed mid-run on first boot.

  
  [1]
  2018-11-23 11:11:18,195 - stages.py[DEBUG]: Running module users-groups (<module 'cloudinit.config.cc_users_groups' from '/usr/lib/python2.7/site-packages/cloudinit/config/cc_users_groups.pyc'>) with frequency once-per-instance
  2018-11-23 11:11:18,195 - handlers.py[DEBUG]: start: init-network/config-users-groups: running config-users-groups with frequency once-per-instance
  2018-11-23 11:11:18,196 - util.py[DEBUG]: Running command sync with allowed return codes [0] (shell=False, capture=True)
  ^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@2018-11-23 11:11:51,582 - util.py[DEBUG]: Cloud-init v. 17.1 running 'init-local' at Fri, 23 Nov 2018 11:11:51 +0000. Up 13.86 seconds.
  2018-11-23 11:11:51,608 - main.py[DEBUG]: No kernel command line url found.

  [2]
  2018-11-23 11:12:02,624 - stages.py[DEBUG]: Running module users-groups (<module 'cloudinit.config.cc_users_groups' from '/usr/lib/python2.7/site-packages/cloudinit/config/cc_users_groups.pyc'>) with frequency once-per-instance
  2018-11-23 11:12:02,625 - handlers.py[DEBUG]: start: init-network/config-users-groups: running config-users-groups with frequency once-per-instance
  2018-11-23 11:12:02,625 - helpers.py[DEBUG]: config-users-groups already ran (freq=once-per-instance)
  2018-11-23 11:12:02,625 - handlers.py[DEBUG]: finish: init-network/config-users-groups: SUCCESS: config-users-groups previously ran

  [3]
  2018-11-23 11:12:31,806 - util.py[DEBUG]: Running module ssh-authkey-fingerprints (<module 'cloudinit.config.cc_ssh_authkey_fingerprints' from '/usr/lib/python2.7/site-packages/cloudinit/config/cc_ssh_authkey_fingerprints.pyc'>) failed
  Traceback (most recent call last):
    File "/usr/lib/python2.7/site-packages/cloudinit/stages.py", line 793, in _run_modules
      freq=freq)
    File "/usr/lib/python2.7/site-packages/cloudinit/cloud.py", line 54, in run
      return self._runners.run(name, functor, args, freq, clear_on_fail)
    File "/usr/lib/python2.7/site-packages/cloudinit/helpers.py", line 198, in run
      results = functor(*args)
    File "/usr/lib/python2.7/site-packages/cloudinit/config/cc_ssh_authkey_fingerprints.py", line 108, in handle
      (key_fn, key_entries) = ssh_util.extract_authorized_keys(user_name)
    File "/usr/lib/python2.7/site-packages/cloudinit/ssh_util.py", line 212, in extract_authorized_keys
      (ssh_dir, pw_ent) = users_ssh_info(username)
    File "/usr/lib/python2.7/site-packages/cloudinit/ssh_util.py", line 205, in users_ssh_info
      pw_ent = pwd.getpwnam(username)
  KeyError: 'getpwnam(): name not found: testuser'

To manage notifications about this bug go to:
https://bugs.launchpad.net/cloud-init/+bug/1805446/+subscriptions


Follow ups