← Back to team overview

launchpad-dev team mailing list archive

Re: Proof-of-concept 'forking' lp-serve

 

On Mon, 16 Aug 2010 19:40:43 -0500, John Arbash Meinel <john@xxxxxxxxxxxxxxxxx> wrote:
> >> for numbers:
> >>
> >>  900ms 'time (echo hello | bzr serve --inet)'
> >> 2400ms 'time (echo hello | bzr lp-serve --inet)'
> >>  ~80ms 'time (overly complex stuff to spawn and do the same work)'
> >>
> >> The last is a bit of an approximation, because my test front end hack
> >> uses select.poll() over file descriptors, and doesn't seem to be getting
> >> POLLHUP on any of the fifos.
> > 
> > I don't entirely understand the consequence of this :-)
> 
> So the big takeaway is 2.4s => 0.08s. 

I understood that bit! :-)

> There is some concern that we won't easily detect when the smart
> server is finished going this route.  As I at least wasn't seeing it
> disconnect from the fifo (echo foo > script, sends a POLLHUP event to
> select when the input is finished, I wasn't seeing that when lp-serve
> on the other side of a fifo finished.)

Ah, I see.

> >> 1) Plain socket.socket() service versus a 'twisted' implementation. I'm
> >>    currently just creating a simple TCP/IP socket, and listening on it
> >>    for a request to spawn a child (or quit), then forking, creating
> >>    fifos on disk, reporting the path to the requester. The forked child
> >>    then hangs on those handles, and remaps stdin/out/err and running the
> >>    lp-serve code.
> >>
> >>    a) Eventually I would hook up the existing
> >>       lib/lp/codehosting/sshserver code to make the request to the
> >>       daemon, and hook up a fake ProcessTransport(?) instead of the
> >>       self._transport = self.reactor.spawnProcess()
> > 
> > I'm not sure I entirely understand what you're planning here -- it seems
> > from your branch that you could just launch the connect_to_lpservice
> > process and wire up its real ProcessTransport process.
> > 
> > This still leaves us paying Python startup overhead, but that's not so
> > bad if you don't import very much.  Then later we can move to doing what
> > connect_to_lpservice does inside the ssh server, if we find that it
> > would be worth it.
> > 
> > I don't understand why connect_to_lpservice is in bzrplugins -- it isn't
> > a plugin!
> 
> Because it was a quick hack to sit next to lpserve.py I wasn't intending
> for it to stay live. I was intending to pull that code into session.py

Ah ok.  Then you'll be wanting to rewrite it to be twisty, I guess.
Shouldn't be too hard.

> Also, as mentioned above, I'm not seeing the service stop inside of
> connect_to_lpservice, which means that the 'connect' script doesn't stop
> itself, which means the bzr+ssh may hang as well (I haven't tried
> anything at that level).

OK.  Nothing helpful leaps to mind for me here, I'm afraid.

> >>    b) Current service is single threaded, though I thought it would
> >>       probably be enough because it only blocks long enough to fork().
> > 
> > The idea of multiple threads forking somehow scares me rather...
> > 
> >>    c) If most of the launchpad internals are based around twisted
> >>       services, it probably would be best for consistency to also have a
> >>       twisted service.  I personally know very very little about
> >>       twisted.
> > 
> > I think it would be required to rewrite connect_to_lpservice using
> > twisted apis should that move into the ssh server process.  I don't
> > think it would make sense to rewrite the forking process using
> > twisted -- I don't think forking once the reactor is running (and not
> > promptly calling exec()) is really supported with Twisted.
> > 
> >> 2) Robert mentioned in the past 'why not just have sshserver fork
> >>    itself'. This would be possible, though I'm already seeing some
> >>    issues wrt isolation. (logging and bzrlib.ui.ui_factory have some
> >>    state wrt sys.std* that I have to manually poke at.)
> > 
> > See above.
> > 
> >> 3) Prefork. I think the existing code would lend itself pretty easily to
> >>    preforking. The forking code would probably be reworked a \
> >>    little, and timing would need to be sorted out. (At the moment the
> >>    spawned child doesn't return the path to the caller until the files
> >>    have been created, but w/ prefork we would need a way to inform the
> >>    daemon that the files exist now, though the daemon itself could
> >>    create them and inform the child what path to use.)
> > 
> > Doesn't seem worth it for this first cut?
> 
> On my host with no load, the time to fork seems to be in the tens of
> milliseconds (~30ms). It doesn't seem necessary yet to prefork. Though
> it is something to keep in mind.

OK.

> If we *really* wanted to keep exec() level isolation, then prefork would
> be necessary. My biggest concern is that it doesn't really reduce the
> amount of startup *work* being done, so if the service is busy, you'll
> still be slow.

Right.

> >> 4) Launchpad code review, testing, landing, rollout, etc. I really don't
> >>    have much knowledge beyond "submit a merge proposal". I've read
> >>    dev.launchpad.net/Getting /Help and /Build. And certainly have things
> >>    building locally. /Help basically says to ask here :)
> > 
> > Well, um.  I can see some trivial style problems in the code, but mostly
> > it looks fine.  Tests are certainly needed, as is some actually hooking
> > into the ssh server.
> > 
> >>    I guess I'm basically asking for some mentoring, and not really sure
> >>    who to talk to about it. (stuff like ec2-land and running the lp test
> >>    suite still scare me a bit)
> > 
> > Hope this helped a bit.
> > 
> > Cheers,
> > mwh
> > 
> 
> It was nice to get some feedback from you.

I'm glad to hear it.

Cheers,
mwh



Follow ups

References