mugle-dev team mailing list archive

Thread
Date
Proper revision control practices

To: Scott Ritchie <s.ritchie73@xxxxxxxxx>, "Prageeth H. Silva" <p.silva2@xxxxxxxxxxxxxxxxxxxx>
From: Matt Giuca <matt.giuca@xxxxxxxxx>
Date: Mon, 7 Mar 2011 11:30:17 +1100
Cc: mugle-dev@xxxxxxxxxxxxxxxxxxx
Sender: eatmyshortz@xxxxxxxxx
Hi,

So as a follow-up to last night's history editing debacle, I thought I'd
send a couple of pointers on revision control. I've seen some rather bad
things happening in the past, but I have let them slide until now. This is
aimed mostly at Scott and Prageeth who haven't had a lot of experience with
revision control (for this reason, I don't blame you guys, but it still
needs improving). Now I have somewhat of a reputation as a "revision control
nazi" ... I don't expect things to be perfect, but some of these commits are
way out of hand. This is going to be a bit of a yell, but hopefully you'll
get some important advice which should help you in all of your future
projects.

There are several practical reasons to make shorter, more "atomic" commits.
A key one here is that part of my job is to watch the commits and make sure
the project is going OK. I tend to read all of the diffs, one commit at a
time, but I can't understand a 1200 line diff. In general, if your diff (bzr
diff | wc -l) is more than 1000 lines long, something is wrong. Another
important benefit to committing small diffs is that they are much less
likely to create a merge conflict if someone else is working on the same
file (I was working on an open source project recently, and I had publicly
stated that I had been working on a file for a month when another developer
rudely reformatted all the tabs to spaces, causing me to have an entire-file
merge conflict which took hours to resolve).

*What went wrong*

Basically, the commits from both of you are generally far too big. There's
nothing fundamentally wrong with big commits, but they should address just a
single issue. For example, my commit
39<http://bazaar.launchpad.net/%7Emugle-dev/mugle/trunk/revision/39>added
a lot of files, but they were all added at the same time because they
were all part of the GWT template. Similarly, my commit
29<http://bazaar.launchpad.net/%7Emugle-dev/mugle/trunk/revision/29>changes
fourteen files, but it isn't out of hand, because it did exactly the
same thing to each file (add a package import), and therefore, all of those
changes belong in the same diff.

By contrast (and I'm only picking on Prageeth because he's done a lot of
"heavy lifting"), commit
33<http://bazaar.launchpad.net/%7Emugle-dev/mugle/trunk/revision/33>is
a 1265 line diff, with four independent changes mentioned in the log.
Looking at that diff, it does all of the following:

   - Deletes and adds new imports to various files.
   - In Achievement.java, it deletes a huge number of lines, and adds them
   all back again with slightly different indentation. At the same time, it
   adds various new methods.
   - Adds tons of new methods to a wide range of classes.
   - Deletes some methods.
   - Adds new heading comments, tidying up the source code a bit.
   - In Game.java, a bunch of blank lines are deleted, more are added, and
   some blank lines just change the number of spaces/tabs. Notice the number of
   red/green lines in the Launchpad summary.

OK so *most* of those things are helpful changes to the source code. But
none of those activities are in any way related, and therefore they should
not appear in the same commit. The changes which helped to tidy up the code
(moving things around, adding comments, deleting unused imports, etc) are
helpful, but nowhere near as important as adding new features. Therefore,
add new features in one commit, tidy up in another. Then if I see a commit
log "added new features" I will read it. If I see another commit "tidy up
source code" I will ignore it. Do not feel embarrassed to make "a whole
commit" just to clean up some comments -- it is much better to do trivial
tasks in their own commit rather than include them with a bunch of other
changes.

In Achievement.java, we have obviously had tabs converted to spaces or vice
versa. The entire file shows as a huge delete and a huge add. Therefore, the
diff is *completely useless*. I can't possibly tell what you have actually
changed in that file -- I can see that setGame is not in the red, only the
green, so I guess you added that method. But I can't tell that you added
that method from the diff, because the diff shows that you added *everything
*. This is a small-scale example of what happened last night (you deleted
and added a dozen files, *as well* as making around 500 lines of changes to
those files -- I couldn't possibly tell what those 500 line changes were
since all bzr diff showed was a dozen new files being added). So firstly, do
you need to reformat all the tabs into spaces? That isn't something you
should do lightly (as it means you take credit for all the lines of code and
we can't see who really wrote them using bzr blame), but if it is, it is
absolutely something which should take place in a separate commit, "changed
tabs into spaces for consistency." You should *never* reformat code as well
as making actual meaningful changes. Note: I am aware that Eclipse probably
did this without you directly doing it, but I'll address that later.

As for Game.java, there are just tons of lines being added and deleted for
no reason whatsoever. Not as bad as Achievement, but they mess up the diff
and make it very hard to read. These aren't *intentional* changes I assume,
so why are they in the diff?

*How to make better commits*

So here is the bottom line: revision control is not about "it's time to make
a backup" or "I'd better share my work with everyone else." The most
fundamental aspect of revision control is that *you are absolutely in
control of the changes you make*. The most important feature of any revision
control system is "bzr diff". It tells you exactly what changes you are
about to permanently make to the project. So Eclipse might be responsible
for reformatting a bunch of lines of code without your permission, but
*you*are the one responsible for making them permanent by committing
them. "bzr
diff" shows you exactly what changes you/Eclipse has made, "bzr commit"
confirms that you approve of them. You should always run bzr diff
immediately before bzr commit. If there is a single line of the diff which
you did not intend to make or you cannot explain, then you should not be
committing. If you see a random blank line has been added and there is no
reason for it to be there, go back and delete it, so it doesn't pollute the
diff. If a block of code has changed its indentation, go back and fix it so
it is indented *exactly* the way it was before, so that the diff shows only
the changes that are important. If you really want to fix up the
indentation, fix it in a separate commit.

Working like this gives you a huge benefit: you can be (relatively) sure you
are not accidentally breaking some existing code. For example, if you have
put a random debug print somewhere in some existing code, it will stick out
like a sore thumb in the diff, and you can delete it before you commit.
However, if you have a random debug print in a file that has had its
indentation reformatted, then you won't notice it at all, and it may end up
in the final product. Prageeth, I don't blame you for not knowing how to use
bzr mv, but if you had checked the diff before committing r45, you would
have never been able to explain all of those 3000 lines, and therefore
should have known it was a bad idea. Scott, I don't blame you for not
knowing how .bzrignore works, but if you had checked the diff before
committing r46, you would have known that you were about to commit those
lines which you thought were private to yourself. Remember, you are not
committing your directory -- you are committing the diff, and that is what
everybody else will see.

It is very important to not commit bug fixes alongside any other changes.
Each bug fix should have its own individual commit, so that we can test,
yes, before this commit it was exhibiting a bug; after the commit it is not
(and nothing else changed or broke). It's also not good to commit code that
doesn't compile, as some of the commits I fixed last night did not. Please
use 'ant javac' (since ant is very slow, 'ant javac' at least compiles the
server-side stuff) before committing.

You'll note that when I applied Prageeth's changes last night, I did them in
a very atomic way. Revision
45<http://bazaar.launchpad.net/%7Emugle-dev/mugle/trunk/revision/45>is
now
*solely* for moving files. If you ever move files, it is a good idea to do
nothing else in that commit other than the move. Note that you can bzr mv a
directory rather than a lot of individual files -- that makes the log look
much cleaner. Note that in r45, I still modified a *huge* number of files,
but I didn't do anything other than fix up the package imports, which is
necessary for r45 to compile successfully. I committed all of the remaining
changes in revision
46<http://bazaar.launchpad.net/%7Emugle-dev/mugle/trunk/revision/46>,
so it is now possible to see exactly what Prageeth changed in those files
which were previously moved. I also noticed in the changes he emailed me
that Role was moving out of UserGroups, and that was causing quite a few
package imports to change, so I decided to separate that out from the
remaining changes and commit it on its
own<http://bazaar.launchpad.net/%7Emugle-dev/mugle/trunk/revision/50>.
I note that the subsequent 4 commits by Prageeth were perfectly reasonable,
so good job.

*Tools*

Finally, some tools for working with diffs. Now there are often times when
you are working on a big change and you find that you need to make a small
fix and commit that as a tiny commit. "bzr shelve" is incredibly useful for
this; it's how I managed to take Prageeth's huge commit and separate out
different parts. You just type "bzr shelve" and then it will go through the
whole diff and ask you "Shelve this change?" You just hit y or n (yes or no)
for each piece of the diff. For everything you hit 'y' on, it will
temporarily undo that change, reverting it back to the way it was before you
changed it, but it will store the change in the "shelf". So you should hit
'y' on everything you *do not want to commit*, and 'n' for everything you do
want to commit. Then you can commit just the changes you need. Afterwards,
run "bzr unshelve" to bring all the shelved changes back. "bzr shelve
--list" shows everything you have shelved. Importantly, you need to reload
the files in your text editors after using shelve or unshelve, or they will
be out of date and chaos will ensue.

Also you might want to start looking at branches. Branches are a fairly
advanced concept, but they are useful when you are working on something that
is simply too big to commit all in one go, but you can't commit in pieces or
it will break things. (Generally, new features.) Prageeth, you mentioned
that you have a large upcoming database change that won't be ready for a few
days. This scares me, as it implies many thousands of lines will change. It
also implies you will go days without committing, which means lots of
uncommitted changes that have a habit of going missing. What you might want
is a branch. A branch is just like a copy of the repository, where you are
free to break things. In 'trunk' you should never commit code that doesn't
work. But in a branch, you can commit half-implemented code or even code
that doesn't compile, and nobody will yell at you. When you are finished
with the branch, merge it back into trunk. It will show up as a single huge
commit, but it will have a lot of sub-commits showing the individual steps,
so it doesn't break the above guidelines.

To do this, you need to create a new directory for the branch (which is why
I recommend you have a directory "mugle" with a subdirectory "trunk" with
your checkout, then you can put other branches in the "mugle" directory). So
for example, come up with a name for the branch, such as "database-wrapper"
and go to the mugle directory:
mugle$ bzr branch trunk my-branch # (my-branch is the name of the branch)
mugle$ cd my-branch
mugle/my-branch$ bzr push lp:~mugle-dev/mugle/my-branch # (my-branch is the
name of the branch)
mugle/my-branch$ bzr bind

Now you can work in my-branch. Commits will still be public on Launchpad but
they will not interfere with the main trunk. When you are finished
implementing the branch:

mugle$ cd trunk
mugle/trunk$ bzr merge ../my-branch

Now this merges the branch with the trunk and you should deal with any
conflicts and make sure it still compiles. Then simply bzr commit to finish
off the branch -- now the branch changes are integrated with trunk.

I can demonstrate either of these tools in person if you would like to use
them.

One final point, just for house-keeping. If the commit you are making fixes
a bug, please add to the "bzr commit" line --fixes=lp:BUG-ID. For example,
in revision 39, I added --fixes=lp:722957. This helps link the bugs to the
commits so we can see where they were closed off. If you forget, it isn't a
big deal, but it's helpful to do that.

Thanks for reading. As I said, I don't expect you to follow all of the above
guidelines perfectly, and don't let it interfere with your work too much.
But, I don't want to have to read any more 1000-line+ diffs.

Matt
Follow ups

Re: Proper revision control practices
From: Matt Giuca, 2011-03-09