A Few git Tips

A screenshot of gitk --all
A screenshot of gitk --all

I owe thanks to Johannes Schindelin, who passed on at least three of these tips to me in the course of working on Fiji :)  Update: this page was written when I was relatively new to git – some of the general remarks about the philosophy of git are better expressed in a short tutorial I wrote recently.

gitk –all

Whenever you’re confused, use gitk –all to work out what’s going on.  The –all parameter is important so that you see all your local and remote-tracking branches. It might take a while to get used to reading the graphic representation of what’s going on, but it very often explains why things aren’t working as you might expect.

Finding text anywhere in your complete history

Particularly if you are in the habit of creating lots of branches (as I suggest below) it is easy to forget where you introduced a particular bit of code.  You can find it again simply with git log -S<search-term> –all.   For example, suppose you know that “Factory” occurred as part of the class name that you’re looking for then git log -SFactory –all will list all commits in your repository (including remote-tracking branches) that mention “Factory” in the change they introduce.  If you want to see the patch as well (which is often helpful) add -p to that command.  Update and correction: note that “git log -S<search-term>” doesn’t quite do what I said there: in fact it finds where there are different numbers of occurrences of the string in a file before and after that commit.  In git version 1.74 and later you can use -G<regex> instead, which has more obvious semantics – it displays commits where the change introduced genuinely adds or removes a line that matches the given regular expression.

If you want to find all the branches that one of those commits is on, see the next tip:

Finding commits by SHA1sum

There are a lot of common situations where it’s useful to find out all the branches that a particular commit is on.  For example, if you’re using submodules, then git submodule update will detach HEAD in each submodule in order to move it to the right commit.  You might, however, want to do some work on the branch that commit is on.  To do this, use git branch –a –contains <SHA1sum of commit>.  For example:

  $ git branch -a --contains 6f2293e7f6428
  * (no branch)
    origin/fiji

(This suggests that you should create a branch that tracks the remote-tracking branch origin/fiji, and you’ll find the commit in the history for that branch.)

git-ps1

Use __git_ps1

It’s very easy to forget which branch you’re working on, particularly once you get used to switching your working tree to different branches with git checkout.  You can use the  $(__git_ps1 ” (%s)”) in your PS1 environment variable to include some useful information about your current branch in your bash prompt.  If you are not currently in a git repository, this evaluates to the empty string.  Another very nice feature about __git_ps1 is that it will remind you if you’re in the middle of a rebase.  It’s easy to abandon a rebase but forget to do git rebase –abort at the end.  git status won’t remind you of this (a bug, IMHO) but if it’s there in your bash prompt, it’s hard to ignore.

The example in the image uses:

  PS1=' \[\033[1;37m\]: \u@\h:\w\[\033[0;37m\]$(__git_ps1 " (%s)")\n '

… which, if I recall correctly, was partly based on Tony Finch‘s.

Create lots of topic branches

In other version control systems branching and merging are awkward compared to git, and this means that people are often reluctant to create new branches.  One of the things that I love about working with git is that it’s quite easy and natural to create a new branch for every idea you have, and just merge them into your master branch when you’re happy with them.  This matches the way that I like to work on things, as well – you can work on one idea until you’re stuck, then switch to something easier for a bit, etc. all in one copy of the repository.

Only keep one copy of a repository per computer

It’s tempting when starting to use git to clone a repository several times and work on different things in each.  In general it’s much better (and more “git-like”) to just have a single repository and get used to switching about your working tree with git checkout <BRANCH NAME>.  If you want to quickly put aside what you’re working on, without having to carefully put together a commit, then use git stash.  (Beware when unstashing some local modifications with git stash appply however – you need to switch back to the right branch before running that command.)

Turn on all git’s coloured output options

There’s a nice summary on this git cheat sheet of the options you need to add to your .gitconfig to turn on git’s coloured output for terminals.

In addition, you might find the –color-words options to git log and git diff useful.  Rather than producing a line-by-line diff, it will coloured added text in green and removed text in red within a single line.  e.g. try git log -p –color-words

View a particular file from a particular commit

Fairly often, I just want to have a quick look at the version of a file in a particular revision.  The easiest way to do this is with the <COMMIT>:<FILENAME> syntax that git show understands.  For example, to see the version of a file util/TestQuantile.java from a couple of commits ago, you can do:

  git show HEAD^^:util/TestQuantile.java

… or to see it from a topic branch called “experiment” do:

  git show experiment:util/TestQuantile.java

Note that the path is always relative to the repository root – even if you’re in the “util” subdirectory, you still have to include “util/” in the <FILENAME> part.  You can use the same syntax to git diff to compare two arbitrary files from arbitrary revisions.  It’s also frequently useful to view a file from where one of your branches was at a particular date, such as:

  git show master@{2.weeks.ago}:util/TestQuantile.java

To be clear, this is where your own master branch was two weeks ago – it uses the reflog for master rather than commit dates to find the commit.

Use the git glossary

The git glossary provides a very succinct and useful summary of a lot of git concepts, and should be your first port of call if you don’t understand something in the manual pages.  Fundamentally (and perhaps contrary to its reputation)  git is built on simple ideas, and most of these can be understood from the definitions there.

Cryptic Crossword – Numpty / 2

This is the second 15×15 cryptic crossword that I’ve tried to set, dating from some time last year.  There are a few things in here that I wouldn’t do nowadays, but rather than rewrite clues for the same words over and over, I think it’s best to just see what people think and move on.  (The feedback I’ve had so far suggests that it’s mostly quite easy.)

Download crossword number 2 [PDF]

I will update this post with a link to the answers in a couple of days. You can find the answers and explanations of the clues in this crossword in this other post.

git: fetch and merge, don’t pull

This is too long and rambling, but to steal a joke from Mark Twain Blaise Pascal I haven’t had time to make it shorter yet.  There is some discussion of this post on the git mailing list, but much of it is tangential to the points I’m trying to make here.

One of the git tips that I find myself frequently passing on to people is:

Don’t use git pull, use git fetch and then git merge.

The problem with git pull is that it has all kinds of helpful magic that means you don’t really have to learn about the different types of branch in git. Mostly things Just Work, but when they don’t it’s often difficult to work out why. What seem like obvious bits of syntax for git pull may have rather surprising results, as even a cursory look through the manual page should convince you.

The other problem is that by both fetching and merging in one command, your working directory is updated without giving you a chance to examine the changes you’ve just brought into your repository. Of course, unless you turn off all the safety checks, the effects of a git pull on your working directory are never going to be catastrophic, but you might prefer to do things more slowly so you don’t have to backtrack.

Branches

Before I explain the advice about git pull any further it’s worth clarifying what a branch is. Branches are often described as being a “line of development”, but I think that’s an unfortunate expression since:

  • If anything, a branch is a “directed acyclic graph of development” rather than a line.
  • It suggests that branches are quite heavyweight objects.

I would suggest that you think of branches in terms of what defines them: they’re a name for a particular commit and all the commits that are ancestors of it, so each branch is completely defined by the SHA1sum of the commit at the tip. This means that manipulating them is a very lightweight operation – you just change that value.

This definition has some perhaps unexpected implications. For example, suppose you have two branches, “stable” and “new-idea”, whose tips are at revisions E and F:

  A-----C----E ("stable")
   \
    B-----D-----F ("new-idea")

So the commits A, C and E are on “stable” and A, B, D and F are on “new-idea”. If you then merge “new-idea” into “stable” with the following commands:

    git checkout stable   # Change to work on the branch "stable"
    git merge new-idea    # Merge in "new-idea"

… then you have the following:

  A-----C----E----G ("stable")
   \             /
    B-----D-----F ("new-idea")

If you carry on committing on “new idea” and on “stable”, you get:

  A-----C----E----G---H ("stable")
   \             /
    B-----D-----F----I ("new-idea")

So now A, B, C, D, E, F, G and H are on “stable”, while A, B, D, F and I are on “new-idea”.

Branches do have some special properties, of course – the most important of these is that if you’re working on a branch and create a new commit, the branch tip will be advanced to that new commit. Hopefully this is what you’d expect. When merging with git merge, you only specify the branch you want to merge into the current one, and only your current branch advances.

Another common situation where this view of branches helps a lot is the following: suppose you’re working on the main branch of a project (called “master”, say) and realise later that what you’ve been doing might have been a bad idea, and you would rather it were on a topic branch. If the commit graph looks like this:

   last version from another repository
      |
      v
  M---N-----O----P---Q ("master")

Then you separate out your work with the following set of commands (where the diagrams show how the state has changed after them):

  git branch dubious-experiment

  M---N-----O----P---Q ("master" and "dubious-experiment")

  git checkout master

  # Be careful with this next command: make sure "git status" is
  # clean, you're definitely on "master" and the
  # "dubious-experiment" branch has the commits you were working
  # on first...

  git reset --hard <SHA1sum of commit N>

       ("master")
  M---N-------------O----P---Q ("dubious-experiment")

  git pull # Or something that updates "master" from
           # somewhere else...

  M--N----R---S ("master")
      \
       O---P---Q ("dubious-experiment")

This is something I seem to end up doing a lot… :)

Types of Branches

The terminology for branches gets pretty confusing, unfortunately, since it has changed over the course of git’s development. I’m going to try to convince you that there are really only two types of branches. These are:

(a) “Local branches”: what you see when you type git branch, e.g. to use an abbreviated example I have here:

       $ git branch
         debian
         server
       * master

(b) “Remote-tracking branches”: what you see when you type git branch -r, e.g.:

       $ git branch -r
       cognac/master
       fruitfly/server
       origin/albert
       origin/ant
       origin/contrib
       origin/cross-compile

The names of tracking branches are made up of the name of a “remote” (e.g. origin, cognac, fruitfly) followed by “/” and then the name of a branch in that remote respository. (“remotes” are just nicknames for other repositories, synonymous with a URL or the path of a local directory – you can set up extra remotes yourself with “git remote”, but “git clone” by default sets up “origin” for you.)

If you’re interested in how these branches are stored locally, look at the files in:

  • .git/refs/heads/ [for local branches]
  • .git/refs/remotes/ [for tracking branches]

Both types of branches are very similar in some respects – they’re all just stored locally as single SHA1 sums representing a commit. (I emphasize “locally” since some people see “origin/master” and assume that in some sense this branch is incomplete without access to the remote server – that isn’t the case.)

Despite this similarity there is one particularly important difference:

  • The safe ways to change remote-tracking branches are with git fetch or as a side-effect of git-push; you can’t work on remote-tracking branches directly. In contrast, you can always switch to local branches and create new commits to move the tip of the branch forward.

So what you mostly do with remote-tracking branches is one of the following:

  • Update them with git fetch
  • Merge from them into your current branch
  • Create new local branches based on them

Creating local branches based on remote-tracking branches

If you want to create a local branch based on a remote-tracking branch (i.e. in order to actually work on it) you can do that with git branch –track or git checkout –track -b, which is similar but it also switches your working tree to the newly created local branch. For example, if you see in git branch -r that there’s a remote-tracking branch called origin/refactored that you want, you would use the command:

    git checkout --track -b refactored origin/refactored

In this example “refactored” is the name of the new branch and “origin/refactored” is the name of existing remote-tracking branch to base it on. (In recent versions of git the “–track” option is actually unnecessary since it’s implied when the final parameter is a remote-tracking branch, as in this example.)

The “–track” option sets up some configuration variables that associate the local branch with the remote-tracking branch. These are useful chiefly for two things:

  • They allow git pull to know what to merge after fetching new remote-tracking branches.
  • If you do git checkout to a local branch which has been set up in this way, it will give you a helpful message such as:
    Your branch and the tracked remote branch 'origin/master'
    have diverged, and respectively have 3 and 384 different
    commit(s) each.

… or:

    Your branch is behind the tracked remote branch
    'origin/master' by 3 commits, and can be fast-forwarded.

The configuration variables that allow this are called “branch.<local-branch-name>.merge” and “branch.<local-branch-name>.remote”, but you probably don’t need to worry about them.

You have probably noticed that after cloning from an established remote repository git branch -r lists many remote-tracking branches, but you only have one local branch. In that case, a variation of the command above is what you need to set up local branches that track those remote-tracking branches.

You might care to note some confusing terminology here: the word “track” in “–track” means tracking of a remote-tracking branch by a local branch, whereas in “remote-tracking branch” it means the tracking of a branch in a remote repository by the remote-tracking branch. Somewhat confusing…

Now, let’s look at an example of how to update from a remote repository, and then how to push changes to a new repository.

Updating from a Remote Repository

So, if I want get changes from the remote repository called “origin” into my local repository I’ll type git fetch origin and you might see some output like this:

  remote: Counting objects: 382, done.
  remote: Compressing objects: 100% (203/203), done.
  remote: Total 278 (delta 177), reused 103 (delta 59)
  Receiving objects: 100% (278/278), 4.89 MiB | 539 KiB/s, done.
  Resolving deltas: 100% (177/177), completed with 40 local objects.
  From ssh://longair@pacific.mpi-cbg.de/srv/git/fiji
     3036acc..9eb5e40  debian-release-20081030 -> origin/debian-release-20081030
   * [new branch]      debian-release-20081112 -> origin/debian-release-20081112
   * [new branch]      debian-release-20081112.1 -> origin/debian-release-20081112.1
     3d619e7..6260626  master     -> origin/master

The most important bits here are the lines like these:

     3036acc..9eb5e40  debian-release-20081030 -> origin/debian-release-20081030
   * [new branch]      debian-release-20081112 -> origin/debian-release-20081112

The first line of these two shows that your remote-tracking branch origin/debian-release-20081030 has been advanced from the commit 3036acc to 9eb5e40. The bit before the arrow is the name of the branch in the remote repository. The second line similarly show that since we last did this, a new remote tracking branch has been created. (git fetch may also fetch new tags if they have appeared in the remote repository.)

The lines before those are git fetch working out exactly which objects it will need to download to our local repository’s pool of objects, so that they will be available locally for anything we want to do with these updated branches and tags.

git fetch doesn’t touch your working tree at all, so gives you a little breathing space to decide what you want to do next. To actually bring the changes from the remote branch into your working tree, you have to do a git merge. So, for instance, if I’m working on “master” (after a git checkout master) then I can merge in the changes that we’ve just got from origin with:

    git merge origin/master

(This might be a fast-forward, if you haven’t created any new commits that aren’t on master in the remote repository, or it might be a more complicated merge.)

If instead you just wanted to see what the differences are between your branch and the remote one, you could do that with:

    git diff master origin/master

This is the nice point about fetching and merging separately: it gives you the chance to examine what you’ve fetched before deciding what to do next. Also, by doing this separately the distinction between when you should use a local branch name and a remote-tracking branch name becomes clear very quickly.

Pushing your changes to a remote repository

How about the other way round? Suppose you’ve made some changes to the branch “experimental” and want to push that to a remote repository called “origin”. This should be as simple as:

    git push origin experimental

You might get an error saying that the remote repository can’t fast-forward the branch, which probably means that someone else has pushed different changes to that branch. So, that case you’ll need to fetch and merge their changes before trying the push again.

Aside

If the branch has a different name in the remote repository (“experiment-by-bob”, say) you’d do this with:

      git push origin experimental:experiment-by-bob

On older versions of git, if “experiment-by-bob” doesn’t already exist, the syntax needs to be:

      git push origin experimental:refs/heads/experiment-by-bob

… to create the remote branch.  However that seems to be no longer the case, at least in git version 1.6.1.2 – see Sitaram’s comment below.

If the branch name is the same locally and remotely then it will be created
automatically without you having to use any special syntax, i.e. you can just do git push origin experimental as normal.

In practice, however, it’s less confusing if you keep the branch names the same. (The <source-name>:<destination-name> syntax there is known as a “refspec”, about which we’ll say no more here.)

An important point here is that this git push doesn’t involve the remote-tracking branch origin/experimental at all – it will only be updated the next time you do git fetch. Correction: as Deskin Miller points out below, your remote-tracking branches will be updated on pushing to the corresponding branches in one of your remotes.

Why not git pull?

Well, git pull is fine most of the time, and particularly if you’re using git in a CVS-like fashion then it’s probably what you want. However, if you want to use git in a more idiomatic way (creating lots of topic branches, rewriting local history whenever you feel like it, and so on) then it helps a lot to get used to doing git fetch and git merge separately.

(occasional miscellanea)