openstack-qa-team team mailing list archive
-
openstack-qa-team team
-
Mailing list archive
-
Message #00035
Incorporating Stress tests into Tempest
(after long cursing at pep8) I am preparing to check in a set of stress
tests that we have developed and wanted to get some feedback before I
attempt to do so.
The problem to be solved is that nova is a distributed, asynchronous
system that is prone to race condition bugs. These bugs will not be
easily found during
functional testing but will be encountered by users in large deployments
in a way that is hard to debug. The stress test tries to cause these
bugs to happen in a more
controlled environment.
The basic idea of the test is that there are a number of actions,
roughly corresponding to the Compute API, that are fired pseudo-randomly
at a nova cluster as fast as possible. These actions consist of what to
do, how to verify success, and a state filter to make sure that the
operation makes sense. For example, if the action is to reboot a server
and none are active, nothing should be done. A test case is a set of
actions to be performed and the probability that each action should be
selected. There are also parameters controlling rate of fire and stuff
like that. Currently there are only a few actions defined but this test
has discovered three bugs just with that so I want to check it in as
quickly as possible.
I was going to check in a 'stress' directory parallel to the 'tempest'
directory.
This test is not like functional tests in that it can never succeed,
only fail. Ideally it will run for a long time and so cannot really be run
after every checkin. It would be good to run a short-duration case as
part of the functional tests though.
This test requires some new parameters for the environment. For example,
one thing it does is periodically check the nova logs
on all cluster nodes to make sure there are no errors and will fail the
test is there are. So it needs the path to the ssh private key for
the cluster nodes. It seems that currently we have getters defined for
all of the config parameters. Should there be a new getter for
every kind of config option that some one adds to Tempest, or should we
just provide a method to get the parameter by string name?
This test was developed before the tempest code was available. The "what
to do" part of each action is pretty similar to the tempest methods that
call the API. Would it make sense at some point to extend the tempest
actions to include methods that enable them to participate in a stress test?
Any other comments or issues?
Thanks.
-David
Follow ups