← Back to team overview

openstack team mailing list archive

Re: Steps that can help stabilize Nova's trunk

 

Hey Jay,

I like what you propose here. I have a couple of comments and questions.

I see the 'smoketests' directory in the nova code. Is anyone running these tests on a regular basis (every commit)? Is this the best place to further build out integration tests?

---

Regarding environment/setup tools: I been working on an Openstack VPC toolkit project that we are using in Blacksburg to stage test some things in the cloud. I'm using Chef along with the Anso/Opscode Openstack cookbooks to setup Rackspace Cloud Servers with the latest trunk PPA packages.

This setup works well however I can only test with Qemu (no Xenserver) and using network managers that have DHCP (I use FlatDHCPManager since Cloud Servers kernels don't currently have the 'ndb' kernel module which would support the network injection stuff). Using this setup I'm able to create multi node installations where instances on different machines can ping each other. While this isn't what I would call a true production setup it is fully functional and can easily be run in parallel. The only limitations are the limits on your Cloud Servers Account.

If you have bare metal then you can simply swap out the Cloud Servers API layer with something that interfaces with your PXE imaging system. I'm a big fan of slicking the machines between each test run to avoid the buildup of cruft in the system.

This would integrate as a Hudson job nicely as well. We've done some similar setups in Rackspace Email and Apps using a single Hudson server. The hudson server runs a simple Bash script that invokes the toolkit to create the cloud servers, chef them up with the latest PPA packages (or your branch code), and then uses Torque (an HPC'ish resource manager) to run schedule test jobs on some of the machines. I use Chef recipes to install Torque along with a REST interface to schedule and monitor the jobs on a "head" node. The Hudson job the waits for the Torque jobs to finish. The last thing the Hudson script does is 'scp' the unit test results XML file back to the Hudson server where you can use something like the xUnit plugin to display and graph the results over time.

To summarize:

-testing in the cloud provides a low barrier of entry that anyone can use
-testing on bare metal is more expensive but gets us extra coverage (XenServer, etc.)
-we should do both as often as possible
-the same set of tools and tests should work in both environments

Dan

-----Original Message-----
From: "Jay Pipes" <jaypipes@xxxxxxxxx>
Sent: Wednesday, February 16, 2011 5:27pm
To: openstack@xxxxxxxxxxxxxxxxxxx
Subject: [Openstack] Steps that can help stabilize Nova's trunk

Hey all,

It's come to my attention that a number of folks are not happy that
Nova's trunk branch (lp:nova) is, shall we say, "less than stable". :)

First, before going into some suggestions on keeping trunk more
stable, I'd like to point out that trunk is, by nature, an actively
developed source tree. Nobody should have an expectation that they can
simply bzr branch lp:nova and everything will magically work with a)
their existing installations of software packages, b) whatever code
commits they have made locally, or c) whatever specific
hypervisor/volume/network environment that they test their local code
with. The trunk branch is, after all, in active development.

That said, there's *no* reason we can't *improve* the relative
stability of the trunk branch to make life less stressful for
contributors.  Here are a few suggestions on how to keep trunk a bit
more stable for those developers who actively develop from trunk.

1) Participate fully in code reviews. If you suspect a proposed branch
merge will "mess everything up for you", then you should notify
reviewers and developers about your concerns. Be proactive.

2) If you pull trunk and something breaks, don't just complain about
it. Log a bug immediately and talk to the reviewers/approvers of the
patch that broke your environment. Be constructive in your criticism,
and be clear about why the patch should have been more thoroughly or
carefully reviewed. If you don't, we're bound to repeat mistakes.

3) Help us to write functional and integration tests. It's become
increasingly clear from the frequency of breakages in trunk (and other
branches) that our unit tests are nowhere near sufficient to catch a
large portion of bugs. This is to be expected. Our unit tests use
mocks and stubs for virtually everything, and they only really test
code interfaces, and they don't even test that very well. We're
working on adding functional tests to Hudson that will run, as the
unit test do, before any merge into trunk, with any failure resulting
in a failed merge. However, we need your help to create functional
tests and integration tests (tests that various *real* components work
together properly).  We also need help writing test cases that ensure
software library dependencies and other packaging issues are handled
properly and don't break with minor patches.

4) If you have a specific environment/setup that you use (Rackers and
Anso guys, here...), then we need your assistance to set up test
clusters that will pull trunk into a wiped test environment and ensure
that a series of realistic calls to the Nova API are properly handled.
I know some of you are working on getting hardware ready. We need help
from the software teams to ensure that these environments are
initialized with the exact setups you use.

The more testing we fire off against each potential merge into trunk,
and the more those tests are hitting real-life deployment
environments, the more stable trunk will become and the easier your
life as a contributor will be.

Thanks in advance for your assistance, and please don't hesitate to
expand on any more suggestions you might have to stabilize trunk.

-jay

_______________________________________________
Mailing list: https://launchpad.net/~openstack
Post to     : openstack@xxxxxxxxxxxxxxxxxxx
Unsubscribe : https://launchpad.net/~openstack
More help   : https://help.launchpad.net/ListHelp





Follow ups

References