kernel-packages team mailing list archive

Thread
Date
[Bug 1371591] Re: file not initialized to 0s under some conditions on VMWare

To: kernel-packages@xxxxxxxxxxxxxxxxxxx
From: Chris J Arges <1371591@xxxxxxxxxxxxxxxxxx>
Date: Fri, 19 Sep 2014 21:13:03 -0000
Reply-to: Bug 1371591 <1371591@xxxxxxxxxxxxxxxxxx>
Sender: bounces@xxxxxxxxxxxxx
The bisect resulted in the following:

dc019b21fb92d620a3b52ccecc135ac968a7c7ec is the first bad commit
commit dc019b21fb92d620a3b52ccecc135ac968a7c7ec
Author: Mike Snitzer <snitzer@xxxxxxxxxx>
Date:   Fri May 10 14:37:16 2013 +0100

    dm table: fix write same support
    
    If device_not_write_same_capable() returns true then the iterate_devices
    loop in dm_table_supports_write_same() should return false.
    
    Reported-by: Bharata B Rao <bharata.rao@xxxxxxxxx>
    Signed-off-by: Mike Snitzer <snitzer@xxxxxxxxxx>
    Cc: stable@xxxxxxxxxxxxxxx # v3.8+
    Signed-off-by: Alasdair G Kergon <agk@xxxxxxxxxx>

:040000 040000 d8b62d18789b5c9e5b52c076abcf4c8c066b5d59
71a5511a8ea76f43bd167524a9186c1d78407bce M      drivers

--

However, I don't think the issue is with this patch. The function 'device_not_write_same_capable()' correctly returns:
   return q && !q->limits.max_write_same_sectors;
If max_write_same_sectors is 0 (write_same not supported), then true is returned and thus 'not_write_same_capable'.

Likewise the function 'dm_table_supports_write_same' iterates through
dm tables and checks

 if (!ti->type->iterate_devices ||
                    ti->type->iterate_devices(ti, device_not_write_same_capable, NULL))
                        return false;

So if iterate_devices is NULL, this if returns false, otherwise if
iterate_devices exist, then device_not_write_same_capable is called, if
it returns 'true' then the function returns 'false' (A bit confusing,
but essentially the parent function is 'supports_write_same' and uses a
'not_write_same_capable' function to check this fact. )

That logic was introduced in: d54eaa5a0fde0a202e4e91f200f818edcef15bee
(v3.8-rc1), which means that previous to that we might not see the same
behavior which could account for 2.6.38 not failing this test case.

Relevant thread: http://www.spinics.net/lists/dm-devel/msg19583.html

--

Looking at the affected VM:

Now this makes sense why LVM is only affected, and explains the helpful
kernel message output. If we check the dm's for our LVM vg's we see the
following:

ubuntu@ubuntu:~$ lsblk
NAME                         MAJ:MIN RM   SIZE RO TYPE MOUNTPOINT
sda                            8:0    0    20G  0 disk 
├─sda1                         8:1    0   243M  0 part /boot
├─sda2                         8:2    0     1K  0 part 
└─sda5                         8:5    0  19.8G  0 part 
  ├─ubuntu--vg-root (dm-0)   252:0    0  18.8G  0 lvm  /
  └─ubuntu--vg-swap_1 (dm-1) 252:1    0  1020M  0 lvm  [SWAP]
sr0                           11:0    1   572M  0 rom  

ubuntu@ubuntu:~$ cat /sys/dev/block/252\:1/queue/write_same_max_bytes 
33553920
ubuntu@ubuntu:~$ cat /sys/dev/block/252\:0/queue/write_same_max_bytes 
33553920

So write_same support is enabled, but then that causes the failure. So
at this point, I wonder if the underlying virtual SCSI is at fault.

-- 
You received this bug notification because you are a member of Kernel
Packages, which is subscribed to linux in Ubuntu.
https://bugs.launchpad.net/bugs/1371591

Title:
  file not initialized to 0s under some conditions on VMWare

Status in “linux” package in Ubuntu:
  In Progress
Status in “linux” source package in Trusty:
  New

Bug description:
  Under some conditions, after fallocate() the file is observed not to
  be completely initilized to 0s: some 4KB pages have left-over data
  from previous files that occupied those pages. Note that in addition
  to causing functional problems for applications expecting files to be
  initialized to 0s, this is a security issue because it allows data to
  "leak" from one file to another, bypassing file access controls.

  The problem has been seen running under the following VMWare-based virtual environments:
  Fusion 6.0.2
  ESXi 5.1.0

  And under the following versions of Ubuntu:
  Ubuntu 12.04, 3.11.0-26-generic
  Ubuntu 14.04.1, 3.13.0-32-generic
  Ubuntu 14.04.1, 3.13.0-35-generic

  But did not reproduce under the following version:
  Ubuntu 10.04, 2.6.32-38-server

  The problem reproduced under LVM, but did not reproduce without LVM.

  I reproduced the problem as follows under VMWare Fusion:
  set up custom VM with default disk size (20 GB) and memory size (1 GB)
  attach Ubuntu 14.04.1 ISO to CDROM, set it as boot device, boot up
  select all defaults during installation _including_ LVM
  install gcc
  unpack the attached repro.tgz
  run repro.sh

  what it does:
  * fills the disk with a file containing bytes of 0xcc then deletes it
  * repeatedly runs the repro program which creates two files and accesses them in a certain pattern
  * checks the file f0 with hexdump; it should contain all 0s, but if pages 0x1000-0x7000 contain 0xcc you have reproduced the problem

  If the problem does not appear to reproduce, please try waiting a bit
  and checking the f0 files with hexdump again. This behavior was
  observed by a customer reproducing the problem under ESXi. I since
  added an sync after the running the repro binary which I think will
  fix that.

  If you still can't reproduce the problem please let me know if there's
  anything I can do to help. For example can we trace the disk accesses
  at the SCSI level to verify whether the appropriate SCSI commands are
  being sent? This may help determine whether the problem is in Linux or
  in VMWare.

To manage notifications about this bug go to:
https://bugs.launchpad.net/ubuntu/+source/linux/+bug/1371591/+subscriptions
References

[Bug 1371591] [NEW] FS Corruption with Ubuntu and VMWare
From: Leann Ogasawara, 2014-09-19