← Back to team overview

launchpad-dev team mailing list archive

Re: stories for external test helpers (librarian, memcache, etc)

 

On Mon, Oct 4, 2010 at 11:40 AM, Michael Hudson
<michael.hudson@xxxxxxxxxxxxx> wrote:
> On Mon, 27 Sep 2010 10:41:51 +1200, Robert Collins <robertc@xxxxxxxxxxxxxxxxx> wrote:
>> I've hit my tolerance level for stale librarian processes and am
>> looking to address this in the test environment. I want to make sure I
>> preserve existing use cases - please tell me if I have missed any.
>
> I freely admit that for whatever reason I'm not thinking very clearly
> today, so I may be being dense and not understanding things.  But I hope
> I'm making some kind of sense at least.  I also started to reply before
> reading the whole mail, sorry about that...

Thanks for digging in in detail.

>> 1) Run an external process for the life of *a* test
>
> I think that there are two subcases here, one for a server-type process
> where you start the process, get it to do some things and then shut it
> down (the librarian would be a prototypical example, I guess) and the
> case where you want to invoke a script to do something (e.g. the script
> that performs a code import).
>
> I also think there are two cases when it comes to intent: one where the
> external process is part of the test fixture and one where it's the
> system under test.
>
> For all that, I don't think the implementation needs to take this into
> account very much -- I think the problems each case faces are about the
> same.
>
>> * create a working config for it
>> * start it
>
> It's a trivial point, but one that AIUI requires some engineering time:
> we need to start it, *using the previously created config*.  I guess we
> can do this using the LPCONFIG variable -- maybe we establish a
> convention that if LPCONFIG starts with a / it is interpreted as an
> absolute path to a directory containing config files.  We also need to
> make sure that lazr.config can serialize itself out to a config file
> that it can read in again.

We don't need lazr.config serialization, as long as we can hand-write
(per helper) the fragment as a string template. what we need is the
ability to combine multiple fragments. e.g. inheriting [for a config
on disk] / push() (for an in memory fragment). We can in the simplest
fashion create a config called by the name of the working dir we
create for the helper, if absolute paths don't Just Work - that will
occasionally leave grot around on disk in the tree, but easily
cleaned, and bugs can be filed to improve.

> These may both be already implemented for all I know.  But I don't think
> so, and I'm pretty sure that they're not routinely utilized.
>
> Using a hidden parameter like an environment variable is somewhat gross
> I guess, but I can't think of a better way.  We *could* be super regular
> about having each of our scripts take a --config option and similarly
> use a super regular API in all our tests to invoke each subprocess with
> the appropriate value for this option, but that doesn't seem likely to
> come to pass somehow.

We need the main appserver and *everything* to take this option, and
to apply it it to any child processes it takes. So I'm pretty sure an
option isn't anywhere as easy as an environment variable.

>> * run test
>> * kill it
>> * clean up any overhead left in the work environment [that we care about]
>>
>> 2) For many tests
>> * create working config
>> * start it
>> -run tests-
>>  * check its still ok and do per-test isolation
>>  * run a test
>> * kill it
>> * clean up
>
> In some sense, this doesn't seem interestingly different to the above
> problem, you just start and stop the process in different places.

The main difference is needing a per-test step, which in layers is
'setUpTest' and 'tearDownTest', I'm not convinced that having two
separate calls is needed, I think one call (reset) is sufficient.

> Again, there's the issue of propagating 'which config to use' around.
>
>> 3) For 'make run'
>> * Use a small commandline tool that
>>  * creates working config
>>  * starts it
>>  * waits for SIGINT
>>  * stops it
>>  * cleans up
>>
>> 4) for concurrent testing
>> * As for many tests, but the creation of a working config needs to be
>> safe in the presence of concurrent activity.
>> * The created config needs to be sharable with *other* external
>> processes (e.g. the buildmaster may want to talk to the librarian)
>
> Putting this bullet here doesn't really make sense to me, although I
> agree it's necessary.  We already have many tests that invoke scripts
> that need to talk to the librarian, for example.

We don't have any helpers that depend on a tree of dynamically created
config fragments at the moment, and we have to make sure that we can
glue them together. For instance, without this point we might do:
 - start unique helpers by running an external process which returns a config
 - helpers don't get given a special config, we just apply the config
they spit out in-process.

So yes, its necessary, and we could easily implement in a way that
fails this if if we don't require it.

>> 5) For low-overhead iteration
>> * Find an existing external process
>> * Must 'know' its config a-priori
>> -run tests-
>>  * check the process is running, do per-test isolation
>>  * run a test
>
> Hm.  This feels a bit tricky, but I guess you don't need this at the
> same time as concurrent testing...

Ideally starting things like appservers and the librarian wouldn't
take 6-7 seconds. removing zcml will probably fix that (and then we
can remove the low-overhead iteration story completely).


>> If the above set is complete, then I am proposing to combine things in
>> the following way.
>> Firstly, because its a good building block, the make run use case.
>> Note that the current code is duplicative/overlapping with the test
>> helper code - I'm proposing to consolidate this. This shouldn't look
>> too different to our runlaunchpad Service today, except that we'll
>> have more entry points (to do cleanup etc).
>>  - the small helper will do the following for a given service:
>>    start up the instance
>>    optionally print a bash snippet with variables (like ssh-agent
>> does), including the helpers pid
>>      - this is useful for running up isolated copies
>>  - main process runs
>
> I think I agree that the code we use in the 'make run' case is nicer and
> has a nicer API than the code we use in the tests, but I'm not sure that
> the make run use case is very much like the concurrent testing case --
> partly because of state that's embedded in things like the apache config
> and partly because the ways you interact with things in the 'make run'
> case doesn't have a convenient parent process/child process relationship
> to know which running instance we should be talking to.

I don't quite follow you here. I'm saying that there is a DAG (nothing
has mutual dependencies - this is true today, and if it stops being
true we could separate 'allocate and execute' to get a DAG); given
that we have a parent->child structure we can create (but it won't be
process based, rather we'd bring up helpers, and inject their returned
config into the next as per the dag).

>> This lets us capture useful things from starting it off without
>> needing a well known location a-priori.
>
> To state my above point again: in this case, sure you can start it off
> without needing a well-known location, but how do you find it again?

It returns it as shell variables - 'bash snippets'.

>> We can even 'run' postgresql in this fashion, and have it return the
>> DB name to use.
>
> Oh, I see, maybe you're thinking of this more at the level of how the
> 'make run' code builds up the various interacting processes it needs to
> start?

Yes.

>> Now, the cheap test iteration case can be addressed:
>>  - devs run eval `bin/run -i test --daemonise`
>>    - this outputs all the variables for all the servers started.
>>  - test code looks for a *per-service* information about pid files etc.
>>    e.g. LP_LIBRARIAN_PIDFILE and LP_LIBRARIAN_CONFIGFILE rather than
>> LP_PERSISTENT_TEST_SERVICES
>>  - to kill, eval `bin/test-servers --stop`
>>    (Which will kill the daemonised wrapper, and unset the environment
>> variables).
>>  - If LP_PERSISTENT_TEST_SERVICES is set and a service isn't running,
>> I propose to error, because I think it usefully indicates a bug in
>> that external process, and this is easier than detecting both 'not
>> started yet' and 'started but crashed' - especially given the test
>> runners tendancy to fork sub runners.
>>
>>
>> Concurrent testing then is easy: as long as all the fixtures are
>> meeting this contract, if the *default* behaviour is to  bring up a
>> unique instance, everything will come up fine.
>
> Well, "easy": I think we'll find we have a lot of absolute paths in our
> testrunner config currently, and these will need to be shaken out before
> we can reliably run tests in parallel on the same machine.  I think that
> if we can solve the external process problem, the other problems will be
> shallow even if they might be numerous, but maybe you are focusing a bit
> too much on the process case here.
>
> To try to be super clear, here's an example/use case: the acceptance
> tests for the scanner currently create branches in a known location
> (/var/tmp/bazaar.launchpad.net/mirrors I think), invoke the scanner
> script and check that the database has been updated.  In the new world,
> I guess they would create some branches in a temporary directory and
> then use some api to set the supermirror.hosted_branches_by_id (iirc)
> config key to point at this temporary directory and invoke the scanner
> script and check the database as before.

yes. Specifically I would probably layer it like this:
branches = BranchHostingFixture()
scanner = ScannerFixture(branches)
branches.setUp()
scanner.setUp()
# make branches in branches.mirror_dir
scanner.trigger_scan()

> This mythical API would have to work in a way that crosses process
> boundaries, possibly by creating yet another temporary directory,
> creating a file in it called launchpad.conf that looks like this:
>
> extends: $previous-value-of-LPCONFIG/launchpad.conf
>
> [supermirror]
> hosted_branches_by_id: file:///path/to/tmp/dir

created by branches.setUp(); found by scanner.setUp()

> and update the LPCONFIG environment variable to point at this directory
> (as well as updating the in-process in-memory config).  It would
> presuambly be organized as a fixture to make undoing all this easy.
>
> Does this make sense?

yes.

>> Note that in this model there is never a need to do more than 'kill
>> helper-pid' to shut something down: that helper pid will encapsulate
>> all the cleanup logic, kill-9ing, dropdb of temporary dbs etc, and the
>> helper code should be simple and robust. This will help give us a
>> simple, robust interface.
>
> Right, that bit sounds nice.  We should of course work on fixing it, but
> I don't think we can rely on our processes shutting down nicely when
> asked.  Having a wrapper that does that sounds good.
>
>> In the python code, I think something like the following will do:
>>
>> class ExternalService(Fixture):
>>     """An external service used by Launchpad.
>>
>>     :ivar service_config: A config fragment with the variables for this
>>         service.
>>     :ivar service_label: the label for the service. e.g. LIBRARIAN.
>> Used in generating
>>         environment variable names.
>>     """
>>
>>     def setUp(self, use_existing=False, unique_instance=True):
>>         """Setup an external service.
>>
>>         :param use_existing: If True, look for an use an existing instance.
>>         :param unique_instance: If False use the LP_CONFIG service definition.
>>             Otherwise, create a new service definition for this
>> instance, which can
>>             be found on self.service_config
>>         """
>>
>>     def reset(self):
>>         """Ensure the service is running and ready for another client.
>>
>>         Any state accumulated since setUp is discarded or ignored
>> (which is up to the service implementation).
>>         """
>>
>>     def cleanUp(self):
>>         """Shutdown the service and remove any state created as part
>> of setUp."""
>>
>>
>> The wrapper helper becomes something like the following (but with
>> private stdout/stderr to avoid race conditions and console spew).
>>
>> def wrapper_process(service):
>>     pid = os.fork()
>>     if pid:
>>         wait_for_ok(..)
>>         print "LP_%s_CONTROL_PID %d" % (service.service_label, pid)
>>         exit()
>>     service.setUp()
>>     try:
>>         print "LP_%S_CONFIGFILE %s" % (service.service_label,
>> stash_config(service.service_config))
>>         signal.pause()
>>     finally:
>>         service.cleanUp()
>>
>>
>> Note that reset() with persistent test services is obviously a little
>> more limited.
>>
>> What do you think?
>
> I think it by and large makes sense.  The only part that I think might
> have to change is that you talk about environment variables like
> LP_LIBRARIAN_CONFIGFILE, but in fact in Launchpad all services are
> configured from one config file.  So I think we'll end up managing a
> stack of config files, and pushing and popping from the stack will be a
> matter of updating where LPCONFIG points.
>
> Apologies for the slightly rambling reply, I hope it's useful.

It is indeed useful.

One misconception that this has shown up is that we have 'one config
file'. We -don't-.

We have many config files, smushed into one big file for testing, but
in production we have *mutually exclusive* values in the configs for
'librarianX' and 'appserverY', due to the difference between 'how a
service runs' and 'how a service is located'.

As such, I think its best to model that directly, and glue the 'how to
locate a service' bits together just-in-time for a service that we're
starting. its really just dependency injection write large.

-Rob



Follow ups

References