yahoo-eng-team team mailing list archive
-
yahoo-eng-team team
-
Mailing list archive
-
Message #21755
[Bug 1218372] Re: [libvirt] resize fails when using NFS shared storage
** No longer affects: nova/grizzly
--
You received this bug notification because you are a member of Yahoo!
Engineering Team, which is subscribed to OpenStack Compute (nova).
https://bugs.launchpad.net/bugs/1218372
Title:
[libvirt] resize fails when using NFS shared storage
Status in OpenStack Compute (Nova):
Triaged
Bug description:
With two hosts installed using devstack with a multi-node
configuration and the directory /opt/stack/data/nova/instances/ shared
using NFS.
When performing a resize I get the following error (Complete traceback
in http://paste.openstack.org/show/45368/):
"qemu-img: Could not open
'/opt/stack/data/nova/instances/7dbeb7f2-39e2-4f1d-8228-0b7a84d27745/disk':
Permission denied\n"
This problem was introduced with patch
https://review.openstack.org/28424 which modified the behaviour of
migrate/resize when using shared storage. Before that, the disk was
moved to the new host using ssh even if using shared storage (which
could cause some data loss when an error happened) but now, if we're
using shared storage it won't send the disk to the other host but only
assume that it will be accessible from there. In the end both are
using the same storage, why should this be a problem?
After doing some research on how NFS handles its shares on the client
side, I realized that NFS client keeps a file cache with the file name
and the inodes which, if no process asks for it before, will be
refreshed on intervals of from 3 to 60 seconds (See nfs options
ac[dir|reg][min|max] in nfs' manpage). So, if a process tries to
access a file which has been renamed on the remote server it will be
accessing the old version because the name is still pointing to the
old inode (cache won't be updated when accessing a file but only when
asking for the file attributes, e.g. ls -lh)
In the resize case, the origin compute node renamed the instance
directory to "$INSTANCE_DIR/<instance_uuid>_resize" (owned by root
after qemu stops) and created the new instance disk from it under the
new directory "$INSTANCE_DIR/<instance_uuid>".
From the destination host, even thought we were trying to access the
new disk file from "$INSTANCE_DIR/<instance_uuid>/disk" we were still
holding the old inode for that path which pointed to
"$INSTANCE_DIR/<instance_uuid>_resize/disk" (owned by root,
inaccessible, the wrong image, etc, etc).
If the NFS share is mounted with the option "noac" which (from
manpage) "forces application writes to become synchronous so that
local changes to a file become visible on the server immediately".
This prevents the files to be out of sync, but it comes with the
drawback of issuing a network call for every file operation which may
cause performance issues.
To manage notifications about this bug go to:
https://bugs.launchpad.net/nova/+bug/1218372/+subscriptions