Tuesday, September 13, 2016

Git: Rebase VS Merge

Distributed version control (git): Check.
Feature branches: Check.
Rebasing: Ugh, serioiusly.

On multiple projects I've attempted to introduce the concept of rebasing commits.  It rarely goes well (at first).  The initial reaction tends to be reluctance and confusion.  Reluctance, because it adds complexity and appears to give nothing back.  Confusion, because doing it wrong, and intertwangling feature branches in particular, turns out to be way too easy.
This is the first post in a series that covers how and why to make a clean commit history.  In this post I'll cover the history of merging, the difference between merging and rebasing, and the golden rule of rebasing.

In subsequent posts I'll cover the benefits of rebasing, cover common pitfalls, and provide several concrete workflow tips you can implement today to leave your project with a simple, clean, and linear source history with the ability to easily track down newly introduced issues.

The Good Old Days

Remember Source Safe and CVS?  Those were the good old days.  You'd simply check out files you were working on, and no one else could touch them.  No merge conflicts, ever.  It just worked, right?
That was great - right up until someone went on vaction and left a ton of files checked out, leaving the entire team unable to work.  Oh yea, not so great.
But remember subversion?  Those were the good old days.   Optimistic concurrency to the rescue.  Multiple people could work on the same file and 90% of the time it just worked.  And branching was super easy.
Subversion was great.  Right up until you had to merge branches.  Then it was time switch projects, and ASAP.  Merging was always a complete disaster.  Maybe subversion wasn't so great after all.
Git has revolutionized source control, first and foremost by making branching and merging easy.  Distributed version control systems (DVCS) like git allow us to work on features independently, and defer integrating with the rest of the team until the time is right.  Since we can commit locally, we can use small, frequent commits, thus allowing us to break complex features into smallier pieces and allowing our commit history to tell a story.
However, with git, while merging may be much easier than it once was - we still have to choose between two approaches.

To Merge or Not, That is the Question

The easiest option, and what most do by default is git merge.  Say you've started feature.awesomeness and committed "B" and "D" to it.  Except meanwhile back on develop someone committed "C".  
pre-merge.png
No big deal, you merge develop into feature.awesomeness with git merge develop and give it a nice descriptive commit message like "Merge branch 'develop' into feature.awesomeness".
merge-commit1.png
That last commit, a merge commit, is a necessary side-effect, and a teltale sign, of merging.  You'll end up with one of these sketchy looking commits each time you want to integrate with (pull from) another branch.
When the feature is done you can merge it back to develop with git checkout develop and git merge feature.awesomeness. And thanks to the magic of fast-forward you won't have a second commit in develop.
post-merge.png
While there the extra merge commit may be aesthetically unpleasing, and the lines branching and merging in the diagram a little noisy; this merge approach is easy, it's clear what happened, the commits maintain their chronology (A, B, C, D).

Marty McFly: "Yeah, well, history is gonna change"

The alternative is rebasing.  With this technique, you replay all of your local commits on top of another branch (usually develop or master).  This essentially deletes your original commits and does them again in the same order.  The other branch becomes the new "base".  So In the scenario above you'd do git rebase develop and end up with:
rebase1.png
Pretty.  Clean.  You can pull from develop as many times as you want, and you'll never get extra commits.
If you're done, you switch back to develop and with a quick git merge and a free fast-forward you get:
rebase2.png
Notice that the commits are no longer chronological (A, C, B, D rather than A, B, C, D) and that B and D both have the same commit time, 21:07.  Duplicate checkin times are the teltale sign of rebasing.    
Sounds just as easy as merging, looks clean, but rebasing can be dangerous.

Golden Rule of Rebasing

Since rebasing changes history, the most important rule is never to rebase after you're pushed to a remote branch.  Or, more subtly, never rebase if someone might be using using your remote branch (e.g. pure backups are ok).  The reasoning: if you've changed history, you can only push again with a git push -force, and forcing the push will overwrite remote history, making a mess for anyone needing to reconcile your old commits with your new ones.
An important corollary to the golden rule of rebasing is that if you push your local feature branch to origin but don't merge it to origin/develop, you are (usually) giving up the opportunity to rebase ever again on that branch.  In other words, if you want the benefits of rebasing (generally) refrain from pushing until you're ready to merge it back.
For example imagine you have a local unpushed branch like this:
goldenruleB1.png
Now you'd like to share your commit B with the world.  However, you aren't quite ready to merge it back to develop yet, and so you push without merging:
goldenruleB2.png
This may look fine, but you've just lost your ability to rebase.  This can be verified by examining the case where someone subsequently makes a change back on develop (C) while you have unpushed local changes (D):
goldenruleB3.png
Now if you rebase, you'll rewrite B, and that will result in a two commits called B, one locally and one on the server in origin/feature.awesomeness:
goldenruleB4.png
In this case git push -force can sometimes help, but it can also open a whole other can of worms.  The best option is to maintain the ability to rebase isby not pushing until it's absolutely necessary.

Conclusion

Hopefully you've learned a little about the history of merging, what merge commits are, how rebasing can elimiate merge commits, and a little of how to avoid rebasing trouble.  
But what if you're working on a large feature and you want to share it with another branch, but not merge back yet?  Are the benefits of rebasing really worth it?  This post has been more about tactics.  I'll cover strategy in subsequent posts.