openstack team mailing list archive

Thread
Date

Re: Moving code hosting to GitHub

To: Thomas Goirand <thomas@xxxxxxxxxx>
From: Soren Hansen <soren@xxxxxxxxxxx>
Date: Thu, 21 Apr 2011 23:41:35 +0200
Cc: Elliot Murphy <elliot@xxxxxxxxxxxxx>, Robert Collins <robert.collins@xxxxxxxxxxxxx>, "openstack@xxxxxxxxxxxxxxxxxxx" <openstack@xxxxxxxxxxxxxxxxxxx>
In-reply-to: <4DB08E54.9040005@goirand.fr>

2011/4/21 Thomas Goirand <thomas@xxxxxxxxxx>:
> On 04/19/2011 05:55 AM, Soren Hansen wrote:
>> 2011/4/18 Thomas Goirand <thomas@xxxxxxxxxx>:
>>> Can't you just pull each individual patches that you feel ok with?  Is
>>> it simply not technically possible with bzr?
>> Short answer: no. Longer answer: Of course it's possible to extract
>> individual patches and apply them elsewhere, but it's tedious, manual
>> and throws away history. We bzr users care deeply about history :)
> I don't know bzr enough to be able to tell, but it seems like an area
> of improvement. History, for me, is quite important.
>
> With Git, it's really easy to get a bunch of patches, select the one we
> want, and reject others. To compete with Git, Bzr *must* be able to do
> that, and allowing rebase and merge of patches in order to keep a clean,
> readable, patch history.

Rebasing means ripping things out of their context and putting them
somewhere else. It not only modifies, but it discards history. As much
as you may want to alter history, a patch was written when it was
written and in the context in which it was written. You can rebase all
you want and make the patches appear in whatever order you please, but
the true history remains the same: The patch was written in a different
context at a different time. You are *discarding* the patch history when
you rebase. There's no longer any way to see the revisions the patch
went through. You're reducing your revision control system to a patch
management system. Patch management systems don't care about history of
your work. They care exclusively about the result: The patch that ends
up being accepted.

There are several reasons why I strongly oppose this practice:

There's a lot of wisdom to be found in work that gets rejected, and for
e.g. the Linux kernel, the only way you can see all the patch revisions
that didn't get approved is by wading through mailing lists. I firmly
believe this history belongs in the revision control system.  Example:
Say I'm working on a change to the locking mechanism in a particular
driver. I look at a similar driver to see how its done there and find
that it looks needlessly complicated or defensive and I go and implement
something simpler. Once submitted, it gets torn apart during review
because there are things I didn't take into account. Had I had easy
access to the rejected revisions of the same changes to the other
driver, I could have seen that they tried the same approach and that it
was changed for this or the other reason and avoided making the same
mistakes.

There's no shame in writing a patch that doesn't get accepted in the
first go.  Everyone does it.  Hiding this from the revision history just
makes it more scary for new people to join in the fun because everyone
else's work looks to always be perfect even though reality is much, much
different.

I find the rebasing/cherry-picking practice even worse in the Linux
kernel context due to the patch tagging used there. If I add a
"Signed-off-by: Soren Hansen" to a patch and someone cherry picks that
patch or moves it around as part of a rebase, my patch still shows up as
"Signed-off-by: me" even though I've never signed off on the patch in
its new context. I remember at one point I had a patch that added some
security checks to the end of a function. It got rebased and it applied
cleanly. It took a while before someone noticed that my patches didn't
work anymore. It turned out that my changes were applied to the
preceding function, because the two functions shared a bunch of
boilerplate code at the end (the last 7 lines were actually completely
identical), so the patch applied cleanly and bunch of other changes made
it so that the line numbers in my patch now pointed to this other
function. The problem here is that people were upset, and when they went
and bisected they could see that I had made this change and signed off
on it, and wondered what the heck I'd been thinking, and I can honestly
say I don't blame them.  According to the revision control system, I
signed off on this change in this context.  Except in the real world, I
didn't.

Note that I don't mention git or bzr here at all. All of this could
equally easily happen with bzr or git. Someone did write a rebase plugin
for bzr, and I hate that with the fire of a thousand suns, too. Just
because you're holding a gun aimed squarely at your foot, you don't
*have* to pull the trigger.

Concretely speaking, if you just look at "bzr log"'s default output,
you'll see a perfectly clean flow of the patches that ended up getting
accepted. These merge revisions have their own, unique commit message
that sums up and explains the change.  If you run "bzr log -n0", you'll
see all the gory details of the work leading up to these patches.

I completely agree that bzr should have better mechanics for sharing
working trees between different branches (the loom plugin does some of
this, if you're interested). Apart for when I'm working on the Linux
kernel, I've never really felt the need for it, but I understand that
many people do.

-- 
Soren Hansen        | http://linux2go.dk/
Ubuntu Developer    | http://www.ubuntu.com/
OpenStack Developer | http://www.openstack.org/

Follow ups

Re: Moving code hosting to GitHub
From: Thomas Goirand, 2011-04-22
Re: Moving code hosting to GitHub
From: FUJITA Tomonori, 2011-04-21
Re: Moving code hosting to GitHub
From: Vishvananda Ishaya, 2011-04-21

References

Moving code hosting to GitHub
From: Jay Pipes, 2011-04-08
Re: Moving code hosting to GitHub
From: Thomas Goirand, 2011-04-10
Re: Moving code hosting to GitHub
From: Elliot Murphy, 2011-04-11
Re: Moving code hosting to GitHub
From: Thomas Goirand, 2011-04-21