openstack team mailing list archive
-
openstack team
-
Mailing list archive
-
Message #03408
Re: Nova D3 Milestone and the skipped tests
I think, having read through everything that Vish said, and everything you guys have said, the only problem we have here is one of communication.
I'm sure that Trey didn't _want_ to skip these tests, and I know that Vish accepted the patch knowing the downsides. He had to trade off the time taken to fix everything perfectly, versus the further remerging / rebasing that would have been necessary if the patch got stalled for any longer. I agree with Soren that this was not a desirable thing, and I think everyone else would too, but Vish made that decision in full sight of the facts, and I'm happy to support him in that, even though it's my team that's taken the hit in this case.
So what we need to do is to fix the communication. I don't want Citrix to "own" the ESX support (or anyone else to own any other subsystem) because it would be too easy for people to say "that doesn't matter - Citrix will fix it". That way, we'll only ever find out too late about stuff that breaks. However, I _am_ happy for us to talk to people when it comes to complicated cross-cutting features. We very seriously want multi-NIC support on ESX, so we'd have been very happy to help, if only we found out a bit earlier. There will be plenty of things in the future that we want to work on multiple hypervisors, and we're willing to invest the developer time to make that the case.
I don't know if either of them will like this idea, but maybe we could make it Vish's problem or Theirry's problem to think about whether the right communication is happening. By that I mean, when the blueprint is reviewed / accepted for a release, either the PTL or the release manager could take the time to think about whether anyone else should be on the blueprint for notification or involvement.
Thoughts?
Ewan.
From: openstack-bounces+ewan.mellor=citrix.com@xxxxxxxxxxxxxxxxxxx [mailto:openstack-bounces+ewan.mellor=citrix.com@xxxxxxxxxxxxxxxxxxx] On Behalf Of Trey Morris
Sent: 03 August 2011 09:32
To: Soren Hansen
Cc: openstack@xxxxxxxxxxxxxxxxxxx
Subject: Re: [Openstack] Nova D3 Milestone and the skipped tests
On Tue, Aug 2, 2011 at 6:40 AM, Soren Hansen <soren@xxxxxxxxxxx<mailto:soren@xxxxxxxxxxx>> wrote:
2011/7/27 Trey Morris <trey.morris@xxxxxxxxxxxxx<mailto:trey.morris@xxxxxxxxxxxxx>>:
> I'm fairly certain that in this particular case, without support from the
> hypervisor and api lieutenants, and a testing guru or two, I would still be
> trying to fit all the pieces together.
On any reasonably large software project, sometimes making changes to
the code isn't just about making changes to the code. Sometimes it
involves pestering people to help you out if there are things you
can't work out yourself.
But at what cost? How long should we delay progress in one area, in this case networking, to make sure everything works even if it doesn't necessarily need to right now as we're no where near a release? We need to be able to develop in independent areas without having to worry about the entire codebase.
> Merging with broken pieces in this case was necessary.
I absolutely, nonequivocally disagree.
Preposterous!
> Some more thoughts to fuel discussion...
> 2) shims:
> Sandy would say we could use shims here, wait for the other parts to be
> merged, then remove the shims in such a way that nothing is ever broken. I
> think a procedure like this works in the vast majority of cases, and makes
> for smaller merges with trunk. Certainly we don't want shims in place any
> more than we want skipped tests, so how do we get the pieces updated? Who is
> going to do the updating?
Maybe I'm not completely clear on what you mean by "shims". Do you
mean wrappers for backwards compatibility?
Basically yeah, but they don't always need to be wrappers. Another example would be structure added to a model that presents a changed underlying db table in the same manner as it was presented before the change, allowing someone to change a db table without being required to update all of the code that refers to it at the same time. Bits and pieces could be merged over time and once they are all merged, then the model structure (shim) is removed.
> 3) skipped tests:
> Very visible. Everyone sees them. But very bad habits can be formed around
> using them.
a) Not everyone sees them. I rarely sit around and stare at the test
suite run. If it fails, yes, I go back and look, but if it all passes,
I don't care much about its output. This is, in fact, how I discovered
this: I made a change to the test suite that I *knew* should make it
fail, but it didn't. *Then* I went and looked and saw all of these.
Successful test run output:
"----------------------------------------------------------------------
Ran 1049 tests in 206.878s
OK (SKIP=44)"
Even if it all passes, you see the "SKIP=44" at the end. Either way, at this point the cat is over the wall so let's discuss remedy instead of the original reasons for taking this action. How should we handle situations like this in the future?
b) I never for a second assumed that a skipped test meant that it was
my problem to fix it. If someone disables a test, I assume they're
working around the clock(!) to fix the problem.
If the tests are skipped in an area of the code that you are responsible for, find out why and remedy the problem. This was the original purpose for lieutenants, there was no one to communicate these problems to at the time; no one responsible for the work.
> 4) dropped support:
> We've decided to add feature X to Nova. Developer A determines adding X
> requires nontrivial changes to hypervisors Q W E and R (and yes, soren,
> their tests). Developer A, being familiar with Q, updates Q (and tests) to
> work with Nova+X; however, W E and R are still broken. If requests for
> option 1 above fail and cooperation doesn't seem to happen and things drag
> out, we should remove the shims, merge, and drop support for W E and R from
> Nova until updated code (with tests) for W E and R is proposed for merging
> into trunk.
I'm perfectly fine with this. If no-one wants to maintain a particular
hypervisor, it gets dropped. This is perfectly reasonable.
So is this our new route? When code needs to hit trunk and would break an API or a hypervisor, network plugin, etc, we immediately drop support? Should we wait a milestone first? What are the criteria for becoming supported again? I'm fine with this route over skipped tests as well, we just need to make sure we've covered the particulars and tried options 1 and 2 first.
-trey
Follow ups
References