group.of.nepali.translators team mailing list archive
-
group.of.nepali.translators team
-
Mailing list archive
-
Message #32365
[Bug 1567557] Re: Performance degradation of "zfs clone"
** Changed in: zfs
Status: New => Fix Released
--
You received this bug notification because you are a member of नेपाली
भाषा समायोजकहरुको समूह, which is subscribed to Xenial.
Matching subscriptions: Ubuntu 16.04 Bugs
https://bugs.launchpad.net/bugs/1567557
Title:
Performance degradation of "zfs clone"
Status in Native ZFS for Linux:
Fix Released
Status in zfs-linux package in Ubuntu:
Fix Released
Status in zfs-linux source package in Xenial:
Fix Released
Status in zfs-linux source package in Zesty:
Fix Released
Status in zfs-linux source package in Artful:
Fix Released
Bug description:
[SRU Justification]
Creating tens of hundreds of clones can be prohibitively slow. The
underlying mechanism to gather clone information is using a 16K buffer
which limits performance. Also, the initial assumption is to pass in
zero sized buffer to the underlying ioctl() to get an idea of the
size of the buffer required to fetch information back to userspace.
If we bump the initial buffer to a larger size then we reduce the need
for two ioctl calls which improves performance.
[Fix]
Bump initial buffer size from 16K to 256K
[Regression Potential]
This is minimal as this is just a tweak in the initial buffer size and larger sizes are handled correctly by ZFS since they are normally used on the second ioctl() call once we have established the size of the buffer required from the first ioctl() call. Larger initial buffers just remove the need for the initial size estimation for most cases where the number of clones is less than ~5000. There is a risk that a larger buffer size could lead to a ENOMEM issue when allocating the buffer, but the size of buffer used is still trivial for modern large 64 bit servers running ZFS.
[Test case]
Create 4000 clones. With the fix this takes 35-40% less time than without the fix. See the example test.sh script as an example of how to create this many clones.
------------------------------------------
I've been running some scale tests for LXD and what I've noticed is
that "zfs clone" gets slower and slower as the zfs filesystem is
getting busier.
It feels like "zfs clone" requires some kind of pool-wide lock or
something and so needs for all operations to complete before it can
clone a new filesystem.
A basic LXD scale test with btrfs vs zfs shows what I mean, see below
for the reports.
The test is run on a completely dedicated physical server with the
pool on a dedicated SSD, the exact same machine and SSD was used for
the btrfs test.
The zfs filesystem is configured with those settings:
- relatime=on
- sync=disabled
- xattr=sa
So it shouldn't be related to pending sync() calls...
The workload in this case is ultimately 1024 containers running busybox as their init system and udhcpc grabbing an IP.
The problem gets significantly worse if spawning busier containers, say a full Ubuntu system.
=== zfs ===
root@edfu:~# /home/ubuntu/lxd-benchmark spawn --count=1024 --image=images:alpine/edge/amd64 --privileged=true
Test environment:
Server backend: lxd
Server version: 2.0.0.rc8
Kernel: Linux
Kernel architecture: x86_64
Kernel version: 4.4.0-16-generic
Storage backend: zfs
Storage version: 5
Container backend: lxc
Container version: 2.0.0.rc15
Test variables:
Container count: 1024
Container mode: privileged
Image: images:alpine/edge/amd64
Batches: 128
Batch size: 8
Remainder: 0
[Apr 3 06:42:51.170] Importing image into local store: 64192037277800298d8c19473c055868e0288b039349b1c6579971fe99fdbac7
[Apr 3 06:42:52.657] Starting the test
[Apr 3 06:42:53.994] Started 8 containers in 1.336s
[Apr 3 06:42:55.521] Started 16 containers in 2.864s
[Apr 3 06:42:58.632] Started 32 containers in 5.975s
[Apr 3 06:43:05.399] Started 64 containers in 12.742s
[Apr 3 06:43:20.343] Started 128 containers in 27.686s
[Apr 3 06:43:57.269] Started 256 containers in 64.612s
[Apr 3 06:46:09.112] Started 512 containers in 196.455s
[Apr 3 06:58:19.309] Started 1024 containers in 926.652s
[Apr 3 06:58:19.309] Test completed in 926.652s
=== btrfs ===
Test environment:
Server backend: lxd
Server version: 2.0.0.rc8
Kernel: Linux
Kernel architecture: x86_64
Kernel version: 4.4.0-16-generic
Storage backend: btrfs
Storage version: 4.4
Container backend: lxc
Container version: 2.0.0.rc15
Test variables:
Container count: 1024
Container mode: privileged
Image: images:alpine/edge/amd64
Batches: 128
Batch size: 8
Remainder: 0
[Apr 3 07:42:12.053] Importing image into local store: 64192037277800298d8c19473c055868e0288b039349b1c6579971fe99fdbac7
[Apr 3 07:42:13.351] Starting the test
[Apr 3 07:42:14.793] Started 8 containers in 1.442s
[Apr 3 07:42:16.495] Started 16 containers in 3.144s
[Apr 3 07:42:19.881] Started 32 containers in 6.530s
[Apr 3 07:42:26.798] Started 64 containers in 13.447s
[Apr 3 07:42:42.048] Started 128 containers in 28.697s
[Apr 3 07:43:13.210] Started 256 containers in 59.859s
[Apr 3 07:44:26.238] Started 512 containers in 132.887s
[Apr 3 07:47:30.708] Started 1024 containers in 317.357s
[Apr 3 07:47:30.708] Test completed in 317.357s
To manage notifications about this bug go to:
https://bugs.launchpad.net/zfs/+bug/1567557/+subscriptions