← Back to team overview

group.of.nepali.translators team mailing list archive

[Bug 1864669] Re: [linux-azure] overlayfs regression - internal getxattr operations without sepolicy checking

 

** Also affects: linux-azure-4.15 (Ubuntu)
   Importance: Undecided
       Status: New

** Also affects: linux-azure (Ubuntu Xenial)
   Importance: Undecided
       Status: New

** Also affects: linux-azure-4.15 (Ubuntu Xenial)
   Importance: Undecided
       Status: New

** Also affects: linux-azure (Ubuntu Bionic)
   Importance: Undecided
       Status: New

** Also affects: linux-azure-4.15 (Ubuntu Bionic)
   Importance: Undecided
       Status: New

** Also affects: linux-azure (Ubuntu Focal)
   Importance: Undecided
       Status: Fix Released

** Also affects: linux-azure-4.15 (Ubuntu Focal)
   Importance: Undecided
       Status: New

** Also affects: linux-azure (Ubuntu Eoan)
   Importance: Undecided
       Status: New

** Also affects: linux-azure-4.15 (Ubuntu Eoan)
   Importance: Undecided
       Status: New

** Changed in: linux-azure-4.15 (Ubuntu Xenial)
       Status: New => Invalid

** Changed in: linux-azure-4.15 (Ubuntu Bionic)
       Status: New => Fix Committed

** Changed in: linux-azure-4.15 (Ubuntu Eoan)
       Status: New => Invalid

** Changed in: linux-azure-4.15 (Ubuntu Focal)
       Status: New => Invalid

** Changed in: linux-azure (Ubuntu Eoan)
       Status: New => Fix Committed

** Changed in: linux-azure (Ubuntu Bionic)
       Status: New => Fix Committed

** Changed in: linux-azure (Ubuntu Xenial)
       Status: New => Fix Committed

-- 
You received this bug notification because you are a member of नेपाली
भाषा समायोजकहरुको समूह, which is subscribed to Xenial.
Matching subscriptions: Ubuntu 16.04 Bugs
https://bugs.launchpad.net/bugs/1864669

Title:
  [linux-azure] overlayfs regression - internal getxattr operations
  without sepolicy checking

Status in linux-azure package in Ubuntu:
  Fix Released
Status in linux-azure-4.15 package in Ubuntu:
  Invalid
Status in linux-azure source package in Xenial:
  Fix Committed
Status in linux-azure-4.15 source package in Xenial:
  Invalid
Status in linux-azure source package in Bionic:
  Fix Committed
Status in linux-azure-4.15 source package in Bionic:
  Fix Committed
Status in linux-azure source package in Eoan:
  Fix Committed
Status in linux-azure-4.15 source package in Eoan:
  Invalid
Status in linux-azure source package in Focal:
  Fix Released
Status in linux-azure-4.15 source package in Focal:
  Invalid

Bug description:
  Bug description and repro:

  Run the following commands on host instances:

  Prepare the overlayfs directories:
  $ cd /tmp
  $ mkdir -p base/dir1/dir2 upper olwork merged
  $ touch base/dir1/dir2/file
  $ chown -R 100000:100000 base upper olwork merged

  Verify that the directory is owned by user 100000:
  $ ls -al merged/ 
  total 8
  drwxr-xr-x  2 100000 100000 4096 Nov  1 07:08 .
  drwxrwxrwt 16 root   root   4096 Nov  1 07:08 ..

  We use lxc-usernsexec to start a new shell as user 100000.
  $ lxc-usernsexec -m b:0:100000:1 -- /bin/bash
  $$ ls -al merged/
  total 8
  drwxr-xr-x  2 root   root    4096 Nov  1 07:08 .
  drwxrwxrwt 16 nobody nogroup 4096 Nov  1 07:08 ..

  Notice that the ownership of . and .. has changed because the new shell is running as the remapped user.
  Now, mount the overlayfs as an unprivileged user in the new shell. This is the key to trigger the bug.
  $$ mount -t overlay -o lowerdir=base,upperdir=upper,workdir=olwork none merged
  $$ ls -al merged/dir1/dir2/file 
  -rw-r--r-- 1 root root 0 Nov  1 07:09 merged/dir1/dir2/file

  We can see the file in the base layer from the mount directory. Now trigger the bug:
  $$ rm -rf merged/dir1/dir2/
  $$ mkdir merged/dir1/dir2
  $$ ls -al merged/dir1/dir2
  total 12
  drwxr-xr-x 2 root root 4096 Nov  1 07:10 .
  drwxr-xr-x 1 root root 4096 Nov  1 07:10 ..

  File does not show up in the newly created dir2 as expected. But it will reappear after we remount the filesystem (or any other means that might evict the cached dentry, such as attempt to delete the parent directory):
  $$ umount merged
  $$ mount -t overlay -o lowerdir=base,upperdir=upper,workdir=olwork none merged
  $$ ls -al merged/dir1/dir2
  total 12
  drwxr-xr-x 1 root root 4096 Nov  1 07:10 .
  drwxr-xr-x 1 root root 4096 Nov  1 07:10 ..
  -rw-r--r-- 1 root root    0 Nov  1 07:09 file
  $$ exit
  $

  This is a recent kernel regression. I tried the above step on an old
  kernel (4.4.0-1072-aws) but cannot reproduce.


  I looked up linux source code and figured out where the "regression" is coming from. The issue lies in how overlayfs checks the "opaque" flag from the underlying upper-level filesystem. It checks the "trusted.overlay.opaque" extended attribute to decide whether to hide the directory content from the lower level. The logic are different in 4.4 and 4.15 kernel.
  In 4.4: https://elixir.bootlin.com/linux/v4.4/source/fs/overlayfs/super.c#L255
  static bool ovl_is_opaquedir(struct dentry *dentry)
  {
  	int res;
  	char val;
  	struct inode *inode = dentry->d_inode;

  	if (!S_ISDIR(inode->i_mode) || !inode->i_op->getxattr)
  		return false;

  	res = inode->i_op->getxattr(dentry, OVL_XATTR_OPAQUE, &val, 1);
  	if (res == 1 && val == 'y')
  		return true;

  	return false;
  }

  In 4.15: https://elixir.bootlin.com/linux/v4.15/source/fs/overlayfs/util.c#L349
  static bool ovl_is_opaquedir(struct dentry *dentry)
  {
  	return ovl_check_dir_xattr(dentry, OVL_XATTR_OPAQUE);
  }

  bool ovl_check_dir_xattr(struct dentry *dentry, const char *name)
  {
  	int res;
  	char val;

  	if (!d_is_dir(dentry))
  		return false;

  	res = vfs_getxattr(dentry, name, &val, 1);
  	if (res == 1 && val == 'y')
  		return true;

  	return false;
  }

  The 4.4 version simply uses the internal i_node callback inode->i_op->getxattr from the host filesystem, which doesn't perform any permission check. While the 4.15 version calls the VFS interface vfs_getxattr that performs bunch of permission checks before the calling the internal insecure callback __vfs_getxattr:
  See https://elixir.bootlin.com/linux/v4.15/source/fs/xattr.c#L317
  ssize_t
  vfs_getxattr(struct dentry *dentry, const char *name, void *value, size_t size)
  {
  	struct inode *inode = dentry->d_inode;
  	int error;

  	error = xattr_permission(inode, name, MAY_READ);
  	if (error)
  		return error;

  	error = security_inode_getxattr(dentry, name);
  	if (error)
  		return error;

  	if (!strncmp(name, XATTR_SECURITY_PREFIX,
  				XATTR_SECURITY_PREFIX_LEN)) {
  		const char *suffix = name + XATTR_SECURITY_PREFIX_LEN;
  		int ret = xattr_getsecurity(inode, suffix, value, size);
  		/*
  		 * Only overwrite the return value if a security module
  		 * is actually active.
  		 */
  		if (ret == -EOPNOTSUPP)
  			goto nolsm;
  		return ret;
  	}
  nolsm:
  	return __vfs_getxattr(dentry, inode, name, value, size);
  }

  In 4.15, ovl_is_opaquedir is called by the following caller:
  ovl_is_opaquedir <-
  ovl_lookup_single() <-
  ovl_lookup_layer <-
  ovl_lookup,
  ovl_lookup is the entry point for directory listing in overlayfs. Importantly, it assumes the filesystem mounter's credential to perform all internal lookup operations:
  struct dentry *ovl_lookup(struct inode *dir, struct dentry *dentry,
  			  unsigned int flags)
  {
     old_cred = ovl_override_creds(dentry->d_sb);
     // perform lookups
     // ....
     revert_creds(old_cred);   
  }

  The "credential switching" logic also does not exist in the 4.4 kernel: https://elixir.bootlin.com/linux/v4.4/source/fs/overlayfs/super.c#L397
  That means, on 4.15, overlayfs uses the file system mounter's credential to fetch the "trusted.overlay.opaque" xattr from the underlying filesystem. This can fail the permission check if the overlayfs is mounted by a remapped user, who doesn't have CAP_SYS_ADMIN capability
  See https://elixir.bootlin.com/linux/v4.15/source/fs/xattr.c#L115:
  static int xattr_permission(struct inode *inode, const char *name, int mask)
  {
   ....
    	/*
  	 * The trusted.* namespace can only be accessed by privileged users.
  	 */
  	if (!strncmp(name, XATTR_TRUSTED_PREFIX, XATTR_TRUSTED_PREFIX_LEN)) {
  		if (!capable(CAP_SYS_ADMIN))
  			return (mask & MAY_WRITE) ? -EPERM : -ENODATA;
  		return 0;
  	}
  ....
  }

  When this call fails, overlayfs assumes the upper directory is not
  "opaque" and combines the content from the lower directory in the
  result.

  
  There's a proposed patch to fix this issue: https://lkml.org/lkml/2019/7/30/787
  The patch calls the insecure __vfs_getxattr to fetch the opaque flag so that it can bypass the permission check even if the other lookup operation is done under the mounter's credential.
  However, the patch hasn't been merged to the upstream linux kernel as of today (see https://elixir.bootlin.com/linux/v5.4-rc5/source/fs/overlayfs/util.c#L551).

To manage notifications about this bug go to:
https://bugs.launchpad.net/ubuntu/+source/linux-azure/+bug/1864669/+subscriptions