launchpad-dev team mailing list archive

Thread
Date
Re: stories for external test helpers (librarian, memcache, etc)

To: Robert Collins <robertc@xxxxxxxxxxxxxxxxx>, Launchpad Community Development Team <launchpad-dev@xxxxxxxxxxxxxxxxxxx>
From: Michael Hudson <michael.hudson@xxxxxxxxxxxxx>
Date: Mon, 04 Oct 2010 11:40:27 +1300
In-reply-to: <AANLkTik_fQtZ5FmM9Rm0L=8tr_F+9F=VZmo2XvaNoJP5@mail.gmail.com>
User-agent: Notmuch/0.3.1 (http://notmuchmail.org) Emacs/23.1.50.1 (x86_64-pc-linux-gnu)
On Mon, 27 Sep 2010 10:41:51 +1200, Robert Collins <robertc@xxxxxxxxxxxxxxxxx> wrote:
> I've hit my tolerance level for stale librarian processes and am
> looking to address this in the test environment. I want to make sure I
> preserve existing use cases - please tell me if I have missed any.

I freely admit that for whatever reason I'm not thinking very clearly
today, so I may be being dense and not understanding things.  But I hope
I'm making some kind of sense at least.  I also started to reply before
reading the whole mail, sorry about that...

> 1) Run an external process for the life of *a* test

I think that there are two subcases here, one for a server-type process
where you start the process, get it to do some things and then shut it
down (the librarian would be a prototypical example, I guess) and the
case where you want to invoke a script to do something (e.g. the script
that performs a code import).

I also think there are two cases when it comes to intent: one where the
external process is part of the test fixture and one where it's the
system under test.

For all that, I don't think the implementation needs to take this into
account very much -- I think the problems each case faces are about the
same.

> * create a working config for it
> * start it

It's a trivial point, but one that AIUI requires some engineering time:
we need to start it, *using the previously created config*.  I guess we
can do this using the LPCONFIG variable -- maybe we establish a
convention that if LPCONFIG starts with a / it is interpreted as an
absolute path to a directory containing config files.  We also need to
make sure that lazr.config can serialize itself out to a config file
that it can read in again.

These may both be already implemented for all I know.  But I don't think
so, and I'm pretty sure that they're not routinely utilized.

Using a hidden parameter like an environment variable is somewhat gross
I guess, but I can't think of a better way.  We *could* be super regular
about having each of our scripts take a --config option and similarly
use a super regular API in all our tests to invoke each subprocess with
the appropriate value for this option, but that doesn't seem likely to
come to pass somehow.

> * run test
> * kill it
> * clean up any overhead left in the work environment [that we care about]
> 
> 2) For many tests
> * create working config
> * start it
> -run tests-
>  * check its still ok and do per-test isolation
>  * run a test
> * kill it
> * clean up

In some sense, this doesn't seem interestingly different to the above
problem, you just start and stop the process in different places.

Again, there's the issue of propagating 'which config to use' around.

> 3) For 'make run'
> * Use a small commandline tool that
>  * creates working config
>  * starts it
>  * waits for SIGINT
>  * stops it
>  * cleans up
> 
> 4) for concurrent testing
> * As for many tests, but the creation of a working config needs to be
> safe in the presence of concurrent activity.
> * The created config needs to be sharable with *other* external
> processes (e.g. the buildmaster may want to talk to the librarian)

Putting this bullet here doesn't really make sense to me, although I
agree it's necessary.  We already have many tests that invoke scripts
that need to talk to the librarian, for example.

> 5) For low-overhead iteration
> * Find an existing external process
> * Must 'know' its config a-priori
> -run tests-
>  * check the process is running, do per-test isolation
>  * run a test

Hm.  This feels a bit tricky, but I guess you don't need this at the
same time as concurrent testing...

> 6) Start a particular server in production
>  * I think we should probably -not- have this as a use case: server
> management, rotation, graceful setup and tear down are much more
> complex than in a testing environment. Instead we may need some
> supporting logic around this, in the server bring up/tear down code,
> but at least for now that should be considered a separate problem.

Sounds sensible to me.

> If the above set is complete, then I am proposing to combine things in
> the following way.
> Firstly, because its a good building block, the make run use case.
> Note that the current code is duplicative/overlapping with the test
> helper code - I'm proposing to consolidate this. This shouldn't look
> too different to our runlaunchpad Service today, except that we'll
> have more entry points (to do cleanup etc).
>  - the small helper will do the following for a given service:
>    start up the instance
>    optionally print a bash snippet with variables (like ssh-agent
> does), including the helpers pid
>      - this is useful for running up isolated copies
>  - main process runs

I think I agree that the code we use in the 'make run' case is nicer and
has a nicer API than the code we use in the tests, but I'm not sure that
the make run use case is very much like the concurrent testing case --
partly because of state that's embedded in things like the apache config
and partly because the ways you interact with things in the 'make run'
case doesn't have a convenient parent process/child process relationship
to know which running instance we should be talking to.

> This lets us capture useful things from starting it off without
> needing a well known location a-priori.

To state my above point again: in this case, sure you can start it off
without needing a well-known location, but how do you find it again?

> We can even 'run' postgresql in this fashion, and have it return the
> DB name to use.

Oh, I see, maybe you're thinking of this more at the level of how the
'make run' code builds up the various interacting processes it needs to
start?

> Now, the cheap test iteration case can be addressed:
>  - devs run eval `bin/run -i test --daemonise`
>    - this outputs all the variables for all the servers started.
>  - test code looks for a *per-service* information about pid files etc.
>    e.g. LP_LIBRARIAN_PIDFILE and LP_LIBRARIAN_CONFIGFILE rather than
> LP_PERSISTENT_TEST_SERVICES
>  - to kill, eval `bin/test-servers --stop`
>    (Which will kill the daemonised wrapper, and unset the environment
> variables).
>  - If LP_PERSISTENT_TEST_SERVICES is set and a service isn't running,
> I propose to error, because I think it usefully indicates a bug in
> that external process, and this is easier than detecting both 'not
> started yet' and 'started but crashed' - especially given the test
> runners tendancy to fork sub runners.
> 
> 
> Concurrent testing then is easy: as long as all the fixtures are
> meeting this contract, if the *default* behaviour is to  bring up a
> unique instance, everything will come up fine.

Well, "easy": I think we'll find we have a lot of absolute paths in our
testrunner config currently, and these will need to be shaken out before
we can reliably run tests in parallel on the same machine.  I think that
if we can solve the external process problem, the other problems will be
shallow even if they might be numerous, but maybe you are focusing a bit
too much on the process case here.

To try to be super clear, here's an example/use case: the acceptance
tests for the scanner currently create branches in a known location
(/var/tmp/bazaar.launchpad.net/mirrors I think), invoke the scanner
script and check that the database has been updated.  In the new world,
I guess they would create some branches in a temporary directory and
then use some api to set the supermirror.hosted_branches_by_id (iirc)
config key to point at this temporary directory and invoke the scanner
script and check the database as before.

This mythical API would have to work in a way that crosses process
boundaries, possibly by creating yet another temporary directory,
creating a file in it called launchpad.conf that looks like this:

extends: $previous-value-of-LPCONFIG/launchpad.conf

[supermirror]
hosted_branches_by_id: file:///path/to/tmp/dir

and update the LPCONFIG environment variable to point at this directory
(as well as updating the in-process in-memory config).  It would
presuambly be organized as a fixture to make undoing all this easy.

Does this make sense?

> Note that in this model there is never a need to do more than 'kill
> helper-pid' to shut something down: that helper pid will encapsulate
> all the cleanup logic, kill-9ing, dropdb of temporary dbs etc, and the
> helper code should be simple and robust. This will help give us a
> simple, robust interface.

Right, that bit sounds nice.  We should of course work on fixing it, but
I don't think we can rely on our processes shutting down nicely when
asked.  Having a wrapper that does that sounds good.

> In the python code, I think something like the following will do:
> 
> class ExternalService(Fixture):
>     """An external service used by Launchpad.
> 
>     :ivar service_config: A config fragment with the variables for this
>         service.
>     :ivar service_label: the label for the service. e.g. LIBRARIAN.
> Used in generating
>         environment variable names.
>     """
> 
>     def setUp(self, use_existing=False, unique_instance=True):
>         """Setup an external service.
> 
>         :param use_existing: If True, look for an use an existing instance.
>         :param unique_instance: If False use the LP_CONFIG service definition.
>             Otherwise, create a new service definition for this
> instance, which can
>             be found on self.service_config
>         """
> 
>     def reset(self):
>         """Ensure the service is running and ready for another client.
> 
>         Any state accumulated since setUp is discarded or ignored
> (which is up to the service implementation).
>         """
> 
>     def cleanUp(self):
>         """Shutdown the service and remove any state created as part
> of setUp."""
> 
> 
> The wrapper helper becomes something like the following (but with
> private stdout/stderr to avoid race conditions and console spew).
> 
> def wrapper_process(service):
>     pid = os.fork()
>     if pid:
>         wait_for_ok(..)
>         print "LP_%s_CONTROL_PID %d" % (service.service_label, pid)
>         exit()
>     service.setUp()
>     try:
>         print "LP_%S_CONFIGFILE %s" % (service.service_label,
> stash_config(service.service_config))
>         signal.pause()
>     finally:
>         service.cleanUp()
> 
> 
> Note that reset() with persistent test services is obviously a little
> more limited.
> 
> What do you think?

I think it by and large makes sense.  The only part that I think might
have to change is that you talk about environment variables like
LP_LIBRARIAN_CONFIGFILE, but in fact in Launchpad all services are
configured from one config file.  So I think we'll end up managing a
stack of config files, and pushing and popping from the stack will be a
matter of updating where LPCONFIG points.

Apologies for the slightly rambling reply, I hope it's useful.

Cheers,
mwh
Follow ups

Re: stories for external test helpers (librarian, memcache, etc)
From: Robert Collins, 2010-10-03
References

stories for external test helpers (librarian, memcache, etc)
From: Robert Collins, 2010-09-26