launchpad-dev team mailing list archive

Thread
Date

Re: Proof-of-concept 'forking' lp-serve

To: John Arbash Meinel <john@xxxxxxxxxxxxxxxxx>, Launchpad Community Development Team <launchpad-dev@xxxxxxxxxxxxxxxxxxx>
From: Michael Hudson <michael.hudson@xxxxxxxxxxxxx>
Date: Tue, 17 Aug 2010 11:44:52 +1200
In-reply-to: <4C699912.4050909@arbash-meinel.com>
User-agent: Notmuch/0.3.1 (http://notmuchmail.org) Emacs/23.1.50.1 (x86_64-pc-linux-gnu)

On Mon, 16 Aug 2010 15:01:22 -0500, John Arbash Meinel <john@xxxxxxxxxxxxxxxxx> wrote:
> -----BEGIN PGP SIGNED MESSAGE-----
> Hash: SHA1
> 
> It has been on the radar for a while that the time to connect to the
> codehosting service is a bit long. Specifically, the time it takes to do
> a full ssh handshake, and then get a live bzr smart server to talk to
> can take as long as 9 seconds, though seems to average about 3-4s.
> 
> https://lpstats.canonical.com/graphs/CodehostingPerformance/ [1]

Indeed; thanks for working on this!

> The idea is that essentially, we are paying the python startup overhead
> every time we get a connection, while we could instead preload all of
> that into a process, and just fork() it when a request comes in. (And
> arguably we could eventually prefork, and just use this one we already
> have over here.)
> 
> My basic proof of concept seems to be working, but I'm a bit stalled as
> to how to get from PoC to having it landed and running on
> Codebrowse. As

FWIW, in Launchpad jargon, codebrowse == codebounce == the loggerhead
side.  We generally refer to both the ssh service and the more general
area as 'codehosting'.

> for numbers:
> 
>  900ms 'time (echo hello | bzr serve --inet)'
> 2400ms 'time (echo hello | bzr lp-serve --inet)'
>  ~80ms 'time (overly complex stuff to spawn and do the same work)'
> 
> The last is a bit of an approximation, because my test front end hack
> uses select.poll() over file descriptors, and doesn't seem to be getting
> POLLHUP on any of the fifos.

I don't entirely understand the consequence of this :-)

> Anyway, it certainly looks like the concept has the potential to a
> couple of seconds from the connection time.
> 
> The code is available here:
>   lp:~jameinel/launchpad/lp-service
> 
> There isn't any unit testing yet, but part of that was because I wasn't
> 100% sure of the specific design. So I'd like to solicit some feedback.
> I'll try to bring up my specific doubts, but any feedback on the work is
> welcome.
> 
> 1) Plain socket.socket() service versus a 'twisted' implementation. I'm
>    currently just creating a simple TCP/IP socket, and listening on it
>    for a request to spawn a child (or quit), then forking, creating
>    fifos on disk, reporting the path to the requester. The forked child
>    then hangs on those handles, and remaps stdin/out/err and running the
>    lp-serve code.
> 
>    a) Eventually I would hook up the existing
>       lib/lp/codehosting/sshserver code to make the request to the
>       daemon, and hook up a fake ProcessTransport(?) instead of the
>       self._transport = self.reactor.spawnProcess()

I'm not sure I entirely understand what you're planning here -- it seems
from your branch that you could just launch the connect_to_lpservice
process and wire up its real ProcessTransport process.

This still leaves us paying Python startup overhead, but that's not so
bad if you don't import very much.  Then later we can move to doing what
connect_to_lpservice does inside the ssh server, if we find that it
would be worth it.

I don't understand why connect_to_lpservice is in bzrplugins -- it isn't
a plugin!

>    b) Current service is single threaded, though I thought it would
>       probably be enough because it only blocks long enough to fork().

The idea of multiple threads forking somehow scares me rather...

>    c) If most of the launchpad internals are based around twisted
>       services, it probably would be best for consistency to also have a
>       twisted service.  I personally know very very little about
>       twisted.

I think it would be required to rewrite connect_to_lpservice using
twisted apis should that move into the ssh server process.  I don't
think it would make sense to rewrite the forking process using
twisted -- I don't think forking once the reactor is running (and not
promptly calling exec()) is really supported with Twisted.

> 2) Robert mentioned in the past 'why not just have sshserver fork
>    itself'. This would be possible, though I'm already seeing some
>    issues wrt isolation. (logging and bzrlib.ui.ui_factory have some
>    state wrt sys.std* that I have to manually poke at.)

See above.

> 3) Prefork. I think the existing code would lend itself pretty easily to
>    preforking. The forking code would probably be reworked a \
>    little, and timing would need to be sorted out. (At the moment the
>    spawned child doesn't return the path to the caller until the files
>    have been created, but w/ prefork we would need a way to inform the
>    daemon that the files exist now, though the daemon itself could
>    create them and inform the child what path to use.)

Doesn't seem worth it for this first cut?

> 4) Launchpad code review, testing, landing, rollout, etc. I really don't
>    have much knowledge beyond "submit a merge proposal". I've read
>    dev.launchpad.net/Getting /Help and /Build. And certainly have things
>    building locally. /Help basically says to ask here :)

Well, um.  I can see some trivial style problems in the code, but mostly
it looks fine.  Tests are certainly needed, as is some actually hooking
into the ssh server.

>    I guess I'm basically asking for some mentoring, and not really sure
>    who to talk to about it. (stuff like ec2-land and running the lp test
>    suite still scare me a bit)

Hope this helped a bit.

Cheers,
mwh

Follow ups

Re: Proof-of-concept 'forking' lp-serve
From: John Arbash Meinel, 2010-08-17

References

Proof-of-concept 'forking' lp-serve
From: John Arbash Meinel, 2010-08-16