launchpad-dev team mailing list archive

Thread
Date
Re: stories for external test helpers (librarian, memcache, etc)

To: Robert Collins <robert.collins@xxxxxxxxxxxxx>
From: Michael Hudson <michael.hudson@xxxxxxxxxxxxx>
Date: Mon, 04 Oct 2010 17:47:42 +1300
Cc: Launchpad Community Development Team <launchpad-dev@xxxxxxxxxxxxxxxxxxx>
In-reply-to: <AANLkTimQxRbya74ENdjcFVHYcmUKNwCWhm-3LaVrV_EQ@mail.gmail.com>
User-agent: Notmuch/0.3.1 (http://notmuchmail.org) Emacs/23.1.50.1 (x86_64-pc-linux-gnu)
On Mon, 4 Oct 2010 12:22:57 +1300, Robert Collins <robert.collins@xxxxxxxxxxxxx> wrote:
> On Mon, Oct 4, 2010 at 11:40 AM, Michael Hudson
> <michael.hudson@xxxxxxxxxxxxx> wrote:
> > On Mon, 27 Sep 2010 10:41:51 +1200, Robert Collins <robertc@xxxxxxxxxxxxxxxxx> wrote:
> >> I've hit my tolerance level for stale librarian processes and am
> >> looking to address this in the test environment. I want to make sure I
> >> preserve existing use cases - please tell me if I have missed any.
> >
> > I freely admit that for whatever reason I'm not thinking very clearly
> > today, so I may be being dense and not understanding things. Â But I hope
> > I'm making some kind of sense at least. Â I also started to reply before
> > reading the whole mail, sorry about that...
> 
> Thanks for digging in in detail.

NP.

> >> 1) Run an external process for the life of *a* test
> >
> > I think that there are two subcases here, one for a server-type process
> > where you start the process, get it to do some things and then shut it
> > down (the librarian would be a prototypical example, I guess) and the
> > case where you want to invoke a script to do something (e.g. the script
> > that performs a code import).
> >
> > I also think there are two cases when it comes to intent: one where the
> > external process is part of the test fixture and one where it's the
> > system under test.
> >
> > For all that, I don't think the implementation needs to take this into
> > account very much -- I think the problems each case faces are about the
> > same.
> >
> >> * create a working config for it
> >> * start it
> >
> > It's a trivial point, but one that AIUI requires some engineering time:
> > we need to start it, *using the previously created config*. Â I guess we
> > can do this using the LPCONFIG variable -- maybe we establish a
> > convention that if LPCONFIG starts with a / it is interpreted as an
> > absolute path to a directory containing config files. Â We also need to
> > make sure that lazr.config can serialize itself out to a config file
> > that it can read in again.
> 
> We don't need lazr.config serialization, as long as we can hand-write
> (per helper) the fragment as a string template.

True.

> what we need is the ability to combine multiple
> fragments. e.g. inheriting [for a config on disk] / push() (for an in
> memory fragment).

Luckily this is easy.

> We can in the simplest fashion create a config called by the name of
> the working dir we create for the helper, if absolute paths don't Just
> Work - that will occasionally leave grot around on disk in the tree,
> but easily cleaned, and bugs can be filed to improve.

I guess, although I suspect making absolute paths Just Work will be
little effort.

> > These may both be already implemented for all I know. Â But I don't think
> > so, and I'm pretty sure that they're not routinely utilized.
> >
> > Using a hidden parameter like an environment variable is somewhat gross
> > I guess, but I can't think of a better way. Â We *could* be super regular
> > about having each of our scripts take a --config option and similarly
> > use a super regular API in all our tests to invoke each subprocess with
> > the appropriate value for this option, but that doesn't seem likely to
> > come to pass somehow.
> 
> We need the main appserver and *everything* to take this option, and
> to apply it it to any child processes it takes. So I'm pretty sure an
> option isn't anywhere as easy as an environment variable.

Yeah.

> >> * run test
> >> * kill it
> >> * clean up any overhead left in the work environment [that we care about]
> >>
> >> 2) For many tests
> >> * create working config
> >> * start it
> >> -run tests-
> >> Â * check its still ok and do per-test isolation
> >> Â * run a test
> >> * kill it
> >> * clean up
> >
> > In some sense, this doesn't seem interestingly different to the above
> > problem, you just start and stop the process in different places.
> 
> The main difference is needing a per-test step, which in layers is
> 'setUpTest' and 'tearDownTest', I'm not convinced that having two
> separate calls is needed, I think one call (reset) is sufficient.

Right.

> > Again, there's the issue of propagating 'which config to use' around.
> >
> >> 3) For 'make run'
> >> * Use a small commandline tool that
> >> Â * creates working config
> >> Â * starts it
> >> Â * waits for SIGINT
> >> Â * stops it
> >> Â * cleans up
> >>
> >> 4) for concurrent testing
> >> * As for many tests, but the creation of a working config needs to be
> >> safe in the presence of concurrent activity.
> >> * The created config needs to be sharable with *other* external
> >> processes (e.g. the buildmaster may want to talk to the librarian)
> >
> > Putting this bullet here doesn't really make sense to me, although I
> > agree it's necessary. Â We already have many tests that invoke scripts
> > that need to talk to the librarian, for example.
> 
> We don't have any helpers that depend on a tree of dynamically created
> config fragments at the moment, and we have to make sure that we can
> glue them together.

I hope we only need a stack, not a tree...

> For instance, without this point we might do: - start unique helpers
> by running an external process which returns a config - helpers don't
> get given a special config, we just apply the config they spit out
> in-process.
> 
> So yes, its necessary, and we could easily implement in a way that
> fails this if if we don't require it.

Ah yes.  Well I think I get the point you're making anyway :)

> >> 5) For low-overhead iteration
> >> * Find an existing external process
> >> * Must 'know' its config a-priori
> >> -run tests-
> >> Â * check the process is running, do per-test isolation
> >> Â * run a test
> >
> > Hm. Â This feels a bit tricky, but I guess you don't need this at the
> > same time as concurrent testing...
> 
> Ideally starting things like appservers and the librarian wouldn't
> take 6-7 seconds. removing zcml will probably fix that (and then we
> can remove the low-overhead iteration story completely).

Um, I'm not sure what you mean by removing the ZCML?  It was my
impression (and hey, I might be wrong) that it was the work the ZCML
implied not the ZCML itself that takes up most of the time of setting up
the component architecture.  Although I agree -- it would be nice if
services started really fast.

> >> If the above set is complete, then I am proposing to combine things in
> >> the following way.
> >> Firstly, because its a good building block, the make run use case.
> >> Note that the current code is duplicative/overlapping with the test
> >> helper code - I'm proposing to consolidate this. This shouldn't look
> >> too different to our runlaunchpad Service today, except that we'll
> >> have more entry points (to do cleanup etc).
> >> Â - the small helper will do the following for a given service:
> >> Â  Â start up the instance
> >> Â  Â optionally print a bash snippet with variables (like ssh-agent
> >> does), including the helpers pid
> >> Â  Â  Â - this is useful for running up isolated copies
> >> Â - main process runs
> >
> > I think I agree that the code we use in the 'make run' case is nicer and
> > has a nicer API than the code we use in the tests, but I'm not sure that
> > the make run use case is very much like the concurrent testing case --
> > partly because of state that's embedded in things like the apache config
> > and partly because the ways you interact with things in the 'make run'
> > case doesn't have a convenient parent process/child process relationship
> > to know which running instance we should be talking to.
> 
> I don't quite follow you here. I'm saying that there is a DAG (nothing
> has mutual dependencies - this is true today, and if it stops being
> true we could separate 'allocate and execute' to get a DAG); given
> that we have a parent->child structure we can create (but it won't be
> process based, rather we'd bring up helpers, and inject their returned
> config into the next as per the dag).
> 
> >> This lets us capture useful things from starting it off without
> >> needing a well known location a-priori.
> >
> > To state my above point again: in this case, sure you can start it off
> > without needing a well-known location, but how do you find it again?
> 
> It returns it as shell variables - 'bash snippets'.

I'm slightly unconvinced by this interface -- but only slightly, and
need to think a bit more.  It'll work, I guess.

> >> We can even 'run' postgresql in this fashion, and have it return the
> >> DB name to use.
> >
> > Oh, I see, maybe you're thinking of this more at the level of how the
> > 'make run' code builds up the various interacting processes it needs to
> > start?
> 
> Yes.

This means that some of my waffle above is a bit misdirected :-)

> >> Now, the cheap test iteration case can be addressed:
> >> Â - devs run eval `bin/run -i test --daemonise`
> >> Â  Â - this outputs all the variables for all the servers started.
> >> Â - test code looks for a *per-service* information about pid files etc.
> >> Â  Â e.g. LP_LIBRARIAN_PIDFILE and LP_LIBRARIAN_CONFIGFILE rather than
> >> LP_PERSISTENT_TEST_SERVICES
> >> Â - to kill, eval `bin/test-servers --stop`
> >> Â  Â (Which will kill the daemonised wrapper, and unset the environment
> >> variables).
> >> Â - If LP_PERSISTENT_TEST_SERVICES is set and a service isn't running,
> >> I propose to error, because I think it usefully indicates a bug in
> >> that external process, and this is easier than detecting both 'not
> >> started yet' and 'started but crashed' - especially given the test
> >> runners tendancy to fork sub runners.
> >>
> >>
> >> Concurrent testing then is easy: as long as all the fixtures are
> >> meeting this contract, if the *default* behaviour is to Â bring up a
> >> unique instance, everything will come up fine.
> >
> > Well, "easy": I think we'll find we have a lot of absolute paths in our
> > testrunner config currently, and these will need to be shaken out before
> > we can reliably run tests in parallel on the same machine. Â I think that
> > if we can solve the external process problem, the other problems will be
> > shallow even if they might be numerous, but maybe you are focusing a bit
> > too much on the process case here.
> >
> > To try to be super clear, here's an example/use case: the acceptance
> > tests for the scanner currently create branches in a known location
> > (/var/tmp/bazaar.launchpad.net/mirrors I think), invoke the scanner
> > script and check that the database has been updated. Â In the new world,
> > I guess they would create some branches in a temporary directory and
> > then use some api to set the supermirror.hosted_branches_by_id (iirc)
> > config key to point at this temporary directory and invoke the scanner
> > script and check the database as before.
> 
> yes. Specifically I would probably layer it like this:
> branches = BranchHostingFixture()
> scanner = ScannerFixture(branches)
> branches.setUp()
> scanner.setUp()
> # make branches in branches.mirror_dir
> scanner.trigger_scan()

I think that makes sense.  If BranchHostingFixture() is the thing that
sets LPCONFIG I'm not sure it needs to be passed to ScannerFixture -- if
they're communicating via a hidden channel (an env var) it's perhaps
best to not make it look like they're communicating over a visible one?
Minor detail though.

> > This mythical API would have to work in a way that crosses process
> > boundaries, possibly by creating yet another temporary directory,
> > creating a file in it called launchpad.conf that looks like this:
> >
> > extends: $previous-value-of-LPCONFIG/launchpad.conf
> >
> > [supermirror]
> > hosted_branches_by_id: file:///path/to/tmp/dir
> 
> created by branches.setUp(); found by scanner.setUp()

Right.

> > and update the LPCONFIG environment variable to point at this directory
> > (as well as updating the in-process in-memory config). Â It would
> > presuambly be organized as a fixture to make undoing all this easy.
> >
> > Does this make sense?
> 
> yes.

Good.

> >> Note that in this model there is never a need to do more than 'kill
> >> helper-pid' to shut something down: that helper pid will encapsulate
> >> all the cleanup logic, kill-9ing, dropdb of temporary dbs etc, and the
> >> helper code should be simple and robust. This will help give us a
> >> simple, robust interface.
> >
> > Right, that bit sounds nice. Â We should of course work on fixing it, but
> > I don't think we can rely on our processes shutting down nicely when
> > asked. Â Having a wrapper that does that sounds good.
> >

[...]

> >> What do you think?
> >
> > I think it by and large makes sense. Â The only part that I think might
> > have to change is that you talk about environment variables like
> > LP_LIBRARIAN_CONFIGFILE, but in fact in Launchpad all services are
> > configured from one config file. Â So I think we'll end up managing a
> > stack of config files, and pushing and popping from the stack will be a
> > matter of updating where LPCONFIG points.
> >
> > Apologies for the slightly rambling reply, I hope it's useful.
> 
> It is indeed useful.

Glad to hear it.

Cheers,
mwh
Follow ups

Re: stories for external test helpers (librarian, memcache, etc)
From: Jonathan Lange, 2010-10-04
References

stories for external test helpers (librarian, memcache, etc)
From: Robert Collins, 2010-09-26
Re: stories for external test helpers (librarian, memcache, etc)
From: Michael Hudson, 2010-10-03
Re: stories for external test helpers (librarian, memcache, etc)
From: Robert Collins, 2010-10-03