Git, beyond version control
I remember the first time I started learning Git. At that time I was an SVN user and I didn’t actually learned it, I spent most of my time being mad at it until it clicked. And when it did I realized it is an awesome tool.
Even today, years after that, I can’t consider myself an expert on Git, every day I discover new useful commands. But over the course of all these years I have shifted from using Git solely as a version control system to use it also as an analysis tool. And I mean Analysis of almost everything, your codebase, your team, your team-mates…anything.
But before you get to the analysis part, you will first go through a brief intro about Git and some useful commands.
Git in a nutshell
Git is a version control system, pretty much like CVS, SVN, Bazaar, Mercurial, etc. But it has some interesting concepts. I am going to assume that you are familiar with SVN, mainly because of its popularity and because I will use it to help during this brief intro.
- Git is a distributed system (as oppose to SVN that was centralized). If you remember your days on SVN, your repository lived in a central (probably remote) machine, you checked out a copy (or mirror) of it and, in order to work with it, you needed to be in constant communication with your remote repo.
In Git, there is no central repository. You probably use it that way because you want to have a source of truth but it is not necessary. When you check out the repository, you are actually cloning it entirely and you can even use it as source of truth for other developers later on. - Git is and offline system (as oppose to SVN that was online). Remember that when working with SVN, if you wanted to commit something to the central repository, you had to have a connection to it. With Git, this is not necessary. You can work locally and offline as much as you want, commit changes locally and postpone to promote (push)those changes until later time when you have connection to your main repository.
- In Git, the essential unit of information is a commit (as oppose to SVN that is a revision). A Git commit consist of a lot of metadata in it (e.g. author’s name, date, etc.), one of it being a link to its previous commit. All that information gets squashed into what we know as commit hash, that of course will be different every time any of that metadata changes. In SVN a revision can be seen as a snapshot of your repository at any given in time. So every time you did a commit in SVN, that was (logically) considered as a new snapshot. This leads to some crucial differences between Git and SVN, one of them being how branches are represented. A branch in Git is just a commit with a label. In SVN a branch was actually a directory in your tree. Remember than in SVN you had your trunk master branch and some other directories that were your branches.
- Git allows to re-write history (as oppose to SVN where that was not possible). I will not enter into many details here, but in Git you can revert a change with a new commit that reverts the change OR, in a more clever manner, you can actually change or even delete a commit from the history, like it never happened. This is mainly because of a commit being the logical unit in Git.
Useful Git-tips and commands
Everybody gets by just fine using the Git basic commands. In this section I want to show some more basic commands that will make your life a bit easier.
Rebase over merge
Many people underestimate the power of a clean Git history. I however find it a very powerful tool to understand the evolution of your codebase but also to onboard new people.
Git merge is probably one of the most overused and abused commands and, its (over)usage over rebase can make the difference between a clean history or a totally messed-up one.
Merging brings two branches together preserving the graph of each commit history. Rebasing however unifies the branches rewriting the history of the source branch to appear as children of the destination branch.
Take the example bellow:
D - E - F - G - feature
/
A — B — C — I - J - K - master
The command git merge would result in the following graph.
D - E - F - G
/ \
A — B — C — I - J - K - L - master (merged)
The command: git rebase would result in the following graph.
D' - E' - F' - G' - feature
/
A — B — C — I - J - K - master
When merge is used, a new commit ‘L’ needs to be created to merge both branches. As oppose to the rebase case, where the history of the feature branch is modified to appear as a children of the master branch. If both branches were merged together at that point, master would be fast-forwarded to feature and would look like a continuous line.
Now, you might argue that the merge graph is not that bad and very understandable but, in reality, when you’re working on a team of a handful of people, all commiting new features, that graph gets very gnarly very quickly. On the other side, everybody tries to rebase, the history is a clean straight line.
Cherry-pick over duplicate commits
How many times were you working on a feature branch, came back to your master branch and realized that you needed that one changed you just did in the feature branch? How many times have you duplicated that change in the master branch?
Well, there is a better way, cherry-pick the commit that contains that change.
Imagine you’re on this situation, and you want the change you did in “E” to be part of master.
D - E - F - feature
/
A — B — C — I - J - K - master
You can just cherry-pick “E” from feature branch to master.
D - E - F - feature
/
A — B — C — I - J - K - E'
The cool thing about cherry-picking is that, although E’ and E are effectively different commits — they have different hash because as they have different metadata — Git will remember this pick when you merge the two branches.
Keep it insightful
Sometimes it happens that your comments to commits are not really good enough but, you can’t change them unless you re-write the history. And this is not really an option when those commits are already pushed to the remote branch that many of you team mates are working on.
Git has a solution for this problem, git notes.
$ git notes -h
usage: git notes [ — ref <notes-ref>] [list [<object>]]
or: git notes [ — ref <notes-ref>] add [-f] [ — allow-empty] [-m <msg> | -F <file> | (-c | -C) <object>] [<object>]
or: git notes [ — ref <notes-ref>] copy [-f] <from-object> <to-object>
or: git notes [ — ref <notes-ref>] append [ — allow-empty] [-m <msg> | -F <file> | (-c | -C) <object>] [<object>]
or: git notes [ — ref <notes-ref>] edit [ — allow-empty] [<object>]
or: git notes [ — ref <notes-ref>] show [<object>]
or: git notes [ — ref <notes-ref>] merge [-v | -q] [-s <strategy>] <notes-ref>
or: git notes merge — commit [-v | -q]
or: git notes merge — abort [-v | -q]
or: git notes [ — ref <notes-ref>] remove [<object>…]
or: git notes [ — ref <notes-ref>] prune [-n | -v]
or: git notes [ — ref <notes-ref>] get-ref — ref <notes-ref> use notes from <notes-ref>
You can add notes to any commit and the cool part is that, those notes are not really considered part of the commit metadata, so the commit hash stays the same…no history shall be re-writen.
Debug with Git…you read alright, debug
This is probably another hidden feature of Git, you can debug with it. Say you and your team just release code to production and a bug is appearing but you are sure that the bug was not present in your development branch. Imagine you can reproduce the bug, but you don’t really know where it was first introduced, all you know is one commit that did not have that bug. Sounds familiar?
You can track the bug down using git bisect.
First thing you do is to execute ‘git bisect bad’ to get Git started and tell it that the current commit is broken. Next you tell bisect the last known good state using ‘git bisect good [good_commit]’. At this point Git will figure how many commits exist between the good and the bad commit and will check-out the one in the middle. You can now run your tests.
If they fail the bug was introduced before this middle commit. If they pass, it was introduced after this middle commit. Just mark the current middle commit as bisect bad or good respectively and Git will check again a new middle commit (forward or backward). Repeat until you get to the commit that introduced the bug.
Git-nalyze everything
This is my favourite part. Up until now you have seen a lot of (hopefully) useful Git commands that will help your improve your version control and you have probably been using Git in this way and for this purpose.
But there is more. Git is an endless source of knowledge. You want to know how productive your team has been over any period of time? — You want to know the impact of that architectural change in the productivity of your team? — You want to know how your team reacted to that rearrangement? — You want to know when your team is more productive?…The answer to those and more questions are in Git.
And if you think of it, what better place would it be? At the end of the day, Git reflects everything about you as a developer, the habits, the disturbances, anything is one way or another inside the Git history. You just need to know where and how to look for it.
Going through all possible Git commands that allow to get insight in all those things is well beyond the scope of the post, but there are plenty of tools out there that help you with that. One of those is gitstats, but there are plenty more. To know what I am talking about, take a look at the kind of information you can gather looking at the images bellow on the linux kernel (full analysis here).
With tools like this, you can a great insight about the functioning of your team and react to them. Know things like when in the week your team is more or less productive, or correlate those times where big refactoring(s) happened, or team reorganisations, etc. with team productivity, or how fast you are able to onboard new people, or what have you. All that information is right there, inside your Git history and you just need to know how to look for it ;)
I you liked the article, click the 💚 below so other people will see it here on Medium.