launchpad-dev team mailing list archive

Thread
Date
Proof-of-concept 'forking' lp-serve

To: Launchpad Community Development Team <launchpad-dev@xxxxxxxxxxxxxxxxxxx>
From: John Arbash Meinel <john@xxxxxxxxxxxxxxxxx>
Date: Mon, 16 Aug 2010 15:01:22 -0500
User-agent: Thunderbird 2.0.0.24 (Windows/20100228)
-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1

It has been on the radar for a while that the time to connect to the
codehosting service is a bit long. Specifically, the time it takes to do
a full ssh handshake, and then get a live bzr smart server to talk to
can take as long as 9 seconds, though seems to average about 3-4s.

https://lpstats.canonical.com/graphs/CodehostingPerformance/ [1]

The idea is that essentially, we are paying the python startup overhead
every time we get a connection, while we could instead preload all of
that into a process, and just fork() it when a request comes in. (And
arguably we could eventually prefork, and just use this one we already
have over here.)

My basic proof of concept seems to be working, but I'm a bit stalled as
to how to get from PoC to having it landed and running on Codebrowse. As
for numbers:

 900ms 'time (echo hello | bzr serve --inet)'
2400ms 'time (echo hello | bzr lp-serve --inet)'
 ~80ms 'time (overly complex stuff to spawn and do the same work)'

The last is a bit of an approximation, because my test front end hack
uses select.poll() over file descriptors, and doesn't seem to be getting
POLLHUP on any of the fifos.

Anyway, it certainly looks like the concept has the potential to a
couple of seconds from the connection time.

The code is available here:
  lp:~jameinel/launchpad/lp-service

There isn't any unit testing yet, but part of that was because I wasn't
100% sure of the specific design. So I'd like to solicit some feedback.
I'll try to bring up my specific doubts, but any feedback on the work is
welcome.

1) Plain socket.socket() service versus a 'twisted' implementation. I'm
   currently just creating a simple TCP/IP socket, and listening on it
   for a request to spawn a child (or quit), then forking, creating
   fifos on disk, reporting the path to the requester. The forked child
   then hangs on those handles, and remaps stdin/out/err and running the
   lp-serve code.

   a) Eventually I would hook up the existing
      lib/lp/codehosting/sshserver code to make the request to the
      daemon, and hook up a fake ProcessTransport(?) instead of the
      self._transport = self.reactor.spawnProcess()

   b) Current service is single threaded, though I thought it would
      probably be enough because it only blocks long enough to fork().

   c) If most of the launchpad internals are based around twisted
      services, it probably would be best for consistency to also have a
      twisted service.  I personally know very very little about
      twisted.

2) Robert mentioned in the past 'why not just have sshserver fork
   itself'. This would be possible, though I'm already seeing some
   issues wrt isolation. (logging and bzrlib.ui.ui_factory have some
   state wrt sys.std* that I have to manually poke at.)

3) Prefork. I think the existing code would lend itself pretty easily to
   preforking. The forking code would probably be reworked a \
   little, and timing would need to be sorted out. (At the moment the
   spawned child doesn't return the path to the caller until the files
   have been created, but w/ prefork we would need a way to inform the
   daemon that the files exist now, though the daemon itself could
   create them and inform the child what path to use.)

4) Launchpad code review, testing, landing, rollout, etc. I really don't
   have much knowledge beyond "submit a merge proposal". I've read
   dev.launchpad.net/Getting /Help and /Build. And certainly have things
   building locally. /Help basically says to ask here :)

   I guess I'm basically asking for some mentoring, and not really sure
   who to talk to about it. (stuff like ec2-land and running the lp test
   suite still scare me a bit)

Thanks for whatever help you can provide,
John
=:->


[1]: That graph is a local network connection. The primary overhead
seems to be the time for 'bzr lp-serve' to start up a new bzr instance,
and load all the related libraries.

-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.4.9 (Cygwin)
Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org/

iEYEARECAAYFAkxpmRIACgkQJdeBCYSNAAO6YgCfUXud72PaxFiCr8uqEPGHJLGH
fMwAn0vDHrLe6knYVjkcKQSiuu+mC+nJ
=0a4y
-----END PGP SIGNATURE-----
Follow ups

Re: Proof-of-concept 'forking' lp-serve
From: Michael Hudson, 2010-08-16