← Back to team overview

openstack team mailing list archive

Re: [libvirt] [RFC PATCH] lxc: don't return error on GetInfo when cgroups not yet set up

 

Quoting Daniel P. Berrange (berrange@xxxxxxxxxx):
> On Thu, Sep 29, 2011 at 10:12:17PM -0500, Serge E. Hallyn wrote:
> > Quoting Daniel P. Berrange (berrange@xxxxxxxxxx):
> > > On Wed, Sep 28, 2011 at 02:14:52PM -0500, Serge E. Hallyn wrote:
> > > > Nova (openstack) calls libvirt to create a container, then
> > > > periodically checks using GetInfo to see whether the container
> > > > is up.  If it does this too quickly, then libvirt returns an
> > > > error, which in libvirt.py causes an exception to be raised,
> > > > the same type as if the container was bad.
> > > lxcDomainGetInfo(), holds a mutex on 'dom' for the duration of
> > > its execution. It checks for virDomainObjIsActive() before
> > > trying to use the cgroups.
> > 
> > Yes, it does, but
> > 
> > > lxcDomainStart(), holds the mutex on 'dom' for the duration of
> > > its execution, and does not return until the container is running
> > > and cgroups are present.
> > 
> > No.  It calls the lxc_controller with --background.  The controller
> > main task in turn exits before the cgroups have been set up.  There
> > is the race.
> 
> The lxcDomainStart() method isn't actually waiting on the child
> pid directly, so the --background flag ought not to matter. We
> have a pipe that we pass into the controller, which we wait on
> for a notification after running the process. The controller
> does not notify the 'handshake' FD until after cgroups have
> been setup, unless I'm mis-interpreting our code

That's the call to lxcContainerWaitForContinue(), right?  If so, that's
done by lxcContainerChild(), which is called by the lxc_controller.
AFAICS there is nothing in the lxc_driver which will wait on that
before dropping the driver->lock mutex.

-serge


References