openstack team mailing list archive
-
openstack team
-
Mailing list archive
-
Message #14287
Re: Libvirt LXC with volume-attach broken ?
Quoting Eric W. Biederman (ebiederm@xxxxxxxxxxxx):
> "Daniel P. Berrange" <berrange@xxxxxxxxxx> writes:
>
> > On Thu, Jul 05, 2012 at 06:49:06PM -0700, Eric W. Biederman wrote:
> >> Serge Hallyn <serge.hallyn@xxxxxxxxxxxxx> writes:
> >>
> >> > Quoting Daniel P. Berrange (berrange@xxxxxxxxxx):
> >> >> On Thu, Jul 05, 2012 at 03:00:26PM +0100, Daniel P. Berrange wrote:
> >> >> > Now, when using 'nova volume-attach':
> >> >> >
> >> >> > # nova volume-attach 05eb16df-03b8-451b-85c1-b838a8757736 a5ad1d37-aed0-4bf6-8c6e-c28543cd38ac /dev/sdf
> >> >> >
> >> >> > nova will import an iSCSI LUN from the nova volume service, on the compute
> >> >> > node. The kernel will assign it the next free SCSI drive letter, in my
> >> >> > case '/dev/sdc'.
> >> >> >
> >> >> > The libvirt nova driver will then do a mknod, using the volume name
> >> >> > passed to 'nova volume-attach'.
> >> >> > eg it will do
> >> >> >
> >> >> > mknod /var/lib/nova/instances/instance-0000000e/rootfs/dev/sdf
> >> >>
> >> >> Opps, I'm slightly wrong here. What it actually does is
> >> >>
> >> >> mount --bind /dev/sdc /var/lib/nova/instances/instance-0000000e/rootfs/dev/sdf
> >> >>
> >> >> so you get a 'sdf' device, but with the major/minor number of the 'sdc'
> >> >> device. I can't say I particularly like this approach. Ultimately I
> >> >> think we need the kernel support to make this work correctly. In any
> >> >
> >> > Yes, that's what the 'devices namespace' is meant to address. I'm hoping
> >> > we can some serious design discussion on that in the next few months.
> >>
> >> This is not the device namespace problem.
> >>
> >> This is the setns problem for mount namespaces, and the unprivilged
> >> mount problem.
> >>
> >> There may be a notification issue so use space can perform actions
> >> in a container when a device shows up.
> >>
> >> But it should be very possible on the host to call.
> >> setns(containers_mount_namespace);
> >> mknod("/dev/foo");
> >> chown("/dev/foo", CONTAINER_ROOT_UID, CONTAINER_ROOT_GID);
> >>
> >> And then from inside the container especially when I get the rest of
> >> the user namespace merged it should be very possible to manipulate
> >> the block device because you have permission, and to mount the
> >> partitions of the block device, because you are root in your container.
> >>
> >> But until the user namespace is merged you really are root so you can
> >> mount whatever.
> >>
> >> Daniel does that sound like the support you are looking for?
> >
> > Yes, the setns(mnt) approach you describe above is exactly what I'd
> > like to be able todo, to solve the first half of the problem.
> >
> > The part of the problem is that I have a /dev/sdf, or even a
> > /dev/volgroup00/logvol3 in the host (with whatever major:minor
> > number that implies), and I want to be able to make it always
> > appear as /dev/sda in the container (with the correspondingly
> > different major:minor number). I'm guessing this is what Serge
> > was refering to as the 'device' namespace problem
Right.
> Getting the device to always appear with the name /dev/sda is easy.
It's easy to log in and make it look that way. It's not easy to
make all distros see it that way across boot.
> Where does the need to have a specific device come from? I would have
> thought by now that hotplug had been around long enough that in general
> user space would not care.
Yes the *primary* need for the devices namespace is to prevent udev
storm in the host and send uevents to the right place, and macvtap
and loop devices.
> The only case that I know of where keeping the same device number seems
> reasonable is in the case of live migration an application, in order to
> avoid issues with stat changing for the same file over the transition,
> and I think a synthesized hotplug event could probably handle that case.
>
> Is there another case besides buggy applications that have hard
> coded device numbers that need specific device numbers?
Other cases where specific device maj-min numbers are important
are things like makedev. There is lots of software, and especially
automatic update software, which insists that things have specific
'correct' maj-minor numbers.
FWIW my (presumably naive) view is that for each non-init devicens
we'd have a list of
type-major:minor::type2-major:minor2
(:: meaning maps-to). Then if a uevent comes through not aimed at
any type2-major2:minor2 valid in the namespace, that ns doesn't get
the uevent.
-serge
References