openstack-qa-team team mailing list archive

Thread
Date

Re: Thoughts on input fuzzing tests

To: Jay Pipes <jaypipes@xxxxxxxxx>
From: pcrews <gleebix@xxxxxxxxx>
Date: Tue, 12 Jun 2012 16:32:02 -0400
Cc: openstack-qa-team@xxxxxxxxxxxxxxxxxxx
In-reply-to: <4FD79FE0.7060801@gmail.com>
User-agent: Mozilla/5.0 (X11; Linux x86_64; rv:12.0) Gecko/20120430 Thunderbird/12.0.1

On 06/12/2012 04:00 PM, Jay Pipes wrote:

On 06/12/2012 02:22 PM, Daryl Walleck wrote:

Due to the large number of input fuzzing tests that have been
submitted, I've been thinking of ways to reduce the amount of code
needed to achieve this (whether we should do it or not is a totally
different discussion). Rather than have x number of input tests for
say create server, wouldn't it be far easier to have a single create
server fuzz test which is data driven and accepts the desired inputs
and the expected exception. So instead of this (pseudo-coded things up
a bit):

https://gist.github.com/2919066

we could get the same effect with much less code by doing this:

https://gist.github.com/2919177

Regardless of implementation, I think the general idea of moving this
type of testing towards data driven functions would really help cut
down on redundant code.


Fuzz testing I believe is better done using a tool like randgen [1]

The basic strategy is to have a grammar document that describes the API
and then have a fuzz tester hammer the API with random bad and good
data, recording responses.

I've cc'd Patrick Crews, who is an expert on randgen and also works on
the OpenStack CI team, to see if he'd be interested in participating in
putting together a randgen grammar for OpenStack components and working
on some fuzz testing stuff in Tempest...

++ to this. I've been thinking about testing in this area since the SFdev summit : )

As an fyi, the randgen is a tool developed for testing database systems.In order to cover large amounts of ground quickly, the tool utilizesyacc-style grammars to define the 'playground'...the tool then randomlypicks and chooses from the possibilities.


As an example a grammar file like:
query:
    SELECT * FROM _table WHERE column_name comparison_operator value ;

can produce a lot of queries like:
SELECT * FROM table1 WHERE col_int > 9;
SELECT * FROM table99 WHERE col_char <= 'abbazabba';
SELECT * FROM table20 WHERE col_int_key = 'YHGNZ';

The intent is to produce a 'map' that can generate lots of test points

It is deterministic (the same seed will always produce the same results)and can be tweaked (one can provide various --seed values to shakethings up).

At present, we can easily produce a grammar that would generate variousapi calls. Execution and validation are another story. To elaborate,we could quickly produce a grammar that could generate text like:

create_instance(authorized_user, good_user_pw)
create_instance(unauthorized_user, good_user_pw)
delete_instance(instance_id)

Executing those against glance/nova/swift could take some additionalwork...either through hacking on the randgen itself or having some othertool do something with the generated calls. The tool is currentlydesigned to execute against a database...in the randgen itself, it isthe Executor modules (lib/GenTest/Executor) that we'd likely need toplay with.

Validation is also a question: One of the tricks used in the databaseworld is to run randomly generated tests against two systems - thedatabase under test and a reference system (like running against bothMySQL 5.1 and 5.5 for example...or the same version with differentoption settings, etc). Validation by hand is too time consuming andexpensive, so having general guidelines for machine validation is theway to go.

The simple fact is that the more "negative" tests we add to Tempest's
test suite, the longer Tempest is taking to run, and we are getting
diminishing returns in terms of the benefit of each added negative test
compared with the additional time to test. I think having a separate
fuzz testing suite that uses a grammar-based test will likely produce us
better negative API test coverage without having to write single test
methods for every conceivable variation of a bad request to an API.

++ to this. In the database world, Microsoft's SQL server team threw inthe towel on manual testing as a base - it is too expensive to generateand validate (and maintain) such tests...building a bank of combinationsthat can be automatically executed and validated is the way to go whenwe have so much ground to cover.

Looking forward to chatting with anyone who is interested in this. Willbe wrapping up some tasks this week and will be able to dig into thisseriously next week. However, anyone can ping me in IRC (pcrews)/ emailif they'd like to discuss / explore this further.


Thanks for bringing this up, Jay!

Cheers,
patrick


Best,
-jay

[1] https://launchpad.net/randgen

Follow ups

Re: Thoughts on input fuzzing tests
From: Nachi Ueno, 2012-06-12

References

Thoughts on input fuzzing tests
From: Daryl Walleck, 2012-06-12
Re: Thoughts on input fuzzing tests
From: Jay Pipes, 2012-06-12