The erroneous Git Merge Commit ...

03 November 2015 - Git

I'm a huge fan of Git, but have certainly found it has a bit of a learning curve when first starting off. Whilst I now feel extremely confident using it, there were quite a few 'aha' moments required to get here. Git gives you a huge amount of power and flexibly - but it is frustrating that unless all developers working on a codebase properly understand it - especially rebasing - then the codebase tends to have LOTS of erroneous merge commits, making the Git history a complete mess.

What do I mean by erroneous merge commit? Well, take these screenshots as an example ...

Both of these screenshots demonstrate the same work being done by the same people. The first screenshot shows what the Git history would look like if Bob and Anna wrongly used merge instead of rebase. The second screenshot shows what the Git history would look like if they had used rebase instead.

In the first screenshot - notice all the merge commits? Notice how the history isn't linear? Notice how messy it looks? This is completely unnecessary and makes it hard to see what's going on. And this example only has a few commits involving just two people! Imagine a team working on a codebase this way with a long history of commits! Yuck!

Before we get into this - it's important with Git is to think of your Git history as a graph of nodes - not a list of checkins. Each commit is a node in the graph with the previous commit being that node's parent in the graph. Once you do this, it becomes much easier to visualise what's happening.

So what's happening in the first screenshot above? Where does this merge commit come from?

It happens when developers have made local commit(s), and before they had chance to push those commits, someone else has also pushed commits, meaning the branch diverges ...

In this example, the commit in green was the latest commit. Then I committed the commit in blue locally. However, before I pushed the commit in blue, Bob pushed his two commits. As Bob doesn't have my blue commit, then we now have a divergence where the blue commit, and Bob's first commit both have the same parent (the green commit).

At this point, if I just merge and push, then we get this erroneous merge commit. However, if before pushing, I did a simple rebase, then it would straighten up the commits into a neat linear line.

Unfortunately if you just use pull and push blindly - it defaults to the merge method.

So what is a rebase?

So how do we fix this? First let's define what rebasing means.

When you rebase, all you're doing is replaying a series of existing commits. By replaying, I mean re-committing those commits. Git does this for you - you just tell it which commit you want to rebase against, and it'll re-commit those again in order. If you're dealing with just a single local branch, then doing a non-interactive rebase obviously will not achieve much. You're just replaying the same changes you already have. However, if you're dealing with another branch (even if it's just the remote version of your current branch), then a rebase can be used to tidy up diverging branches by starting off at the commit where the branches diverged, replaying all the commits from the other branch, and then replaying the commits from your branch ontop of them, resulting in a nice straight linear clean history to push!

As a basic rule, the choice of whether to rebase or merge depends on whether the branches have diverged from each other locally. If the divergence has already been pushed, then it's too late. But if as in the example, it's not yet been pushed, then use rebase instead.

Going back to our earlier example ...

... you can see the divergence where a developer has a local commit they haven't pushed, and someone else has pushed some other changes.

You can clearly see here that the two branches (local and remote) have diverged. If you merge at this point, then Git will bring these two branches back together creating a merge commit (the red erroneous merge commit in the earlier diagram). If you push this, then it becomes permanent and makes the project history a mess and hard to follow. It's worth noting at this point, that if you have created this merge commit, but not yet pushed it - you can still do the rebase to tidy it up before pushing. Because the merge commit doesn't contain any meaningful changes, when you rebase (which remember is just a replay of commits), when it tries to replay the merge commit, it has nothing to do, so will not include that commit.

So how do we rebase? If you're using a GUI, then you should just be able to right-click on the origin/master commit, and choose 'rebase'. If you're using the command line, then type git rebase origin/master. It's that simple. This of course presumes you're working on your master branch. If not, replace 'master' with your branch name.

But isn't rebasing dangerous?

A common concern people have is that they might lose their changes when using rebase. You're actually fairly safe with this. You can't rebase if you have pending changes, so that means you must have committed or stashed all of your changes before rebasing. So that rules out your pending changes being lost by doing a rebase.

As for losing local commits that haven't yet been pushed - Git also has a really useful feature called the 'reflog'. This is basically a history of all the things you've done. It's just like a list of commits, except it stores the state of most things you do. If you type git reflog --date=iso into the command prompt, you'll get a list of changes. This output looks very similar to if you did a 'git log'...

It includes commit identifiers (SHAs), which you can then checkout as you would normally do with git checkout <sha>. This has certainly saved me a few times!

Should you ever use merge then?

A point to note is that you should never (or not without very good reason) push modified commits that have already been pushed. You should only rebase unpushed local commits. Git won't actually let you push modified commits that already exist on the remote unless you specify the 'force' flag.

Also, if you use feature branches where multiple people are collaborating on a feature branch, then you will obviously need to push this new branch, so a rebase would no longer be suitable in this case. Rebasing still applies locally though - you still also want to avoid the erroneous merge commit in your feature branch!

Merging is also useful locally because you can't rebase if you have pending uncommitted changes. So if you want to integrate someone else's changes into your working copy without stashing, rebasing, then unstashing - then there's no harm in merging at this point. It'll create a merge commit - but that's fine as long as you don't push it. Just make sure you tidy up before pushing by rebasing against 'origin/master' once you've committed your local changes. As mentioned earlier, the merge commit will be removed at this point, as it contains no changes.

Summary

I hope this post clears thing up a bit for anyone having trouble understanding when to use merge and when to use rebase. If there's anything in this post that's unclear or you think can be improved upon, then please let me know!

Happy rebasing!