Category Archives: git

The most confusing git terminology

To add my usual disclaimer to the start of these blog posts, I should say that I love git; I think it’s a beautiful and elegant system, and it saves me huge amounts of time in my daily work. However, I think it’s a fair criticism of the system that its terminology is very confusing for newcomers, and in particular those who have come from using CVS or Subversion.

This is a personal list of some of my “favourite” points of confusion, which I’ve seen arise time and time again, both in real life and when answering questions on Stack Overflow. To be fair to all the excellent people who have contributed to git’s development, in most cases it’s clear that they are well aware that these terms can be problematic, and are trying to improve the situation subject to compatibility constraints. The problems that seem most bizarre are those that reuse CVS and Subversion terms for completely different concepts – I speculate a bit about that at the bottom.

“update”

If you’ve used Subversion or CVS, you’re probably used to “update” being a command that goes to the remote repository, and incorporates changes from the remote version into your local copy – this is (very broadly) analogous to “git pull”.  So, when you see the following error message when using git:

foo.c: needs update

You might imagine that this means you need to run “git pull”. However, that’s wrong.  In fact, what “needs update” means is approximately: “there are local modifications to this file, which you should probably commit or stash”.

“track” and “tracking”

The word “track” is used in git in three senses that I’m aware of. This ambiguity is particularly nasty, because the latter two collide at a point in learning the system where newcomers to git are likely to be baffled anyway. Fortunately, this seems to have been recognized by git’s developers (see below).

1. “track” as in “untracked files”

To say that a file is tracked in the repository appears to mean that it is either present in the index or exists in the commit pointed to by HEAD.  You see this usage most often in the output of “git status”, where it will list “untracked files”:

# On branch master
# Untracked files:
#   (use "git add <file>..." to include in what will be committed)
#
#    .classpath

This sense is relatively intuitive, I think – it was only after complaining for a while about the next two senses of “track” that I even remembered that there was also this one :)

2. “track” as in “remote-tracking branch”

As a bit of background, you can think of a remote-tracking branch as a local cache of the state of a branch in a remote repository.  The most commonly seen example is origin/master, or, to name that ref in full, refs/remotes/origin/master.  Such branches are usually updated by git fetch (and thus also potentially by git pull).  They are also updated by a successful push to the branch in the remote repository that they correspond to.   You can merge from them, examine their history, etc. but you can’t work directly on them.

The sense of “track” in the phrase “remote-tracking branch” is indicating that the remote-tracking branch is tracking the state of the branch in the remote repository the last time that remote-tracking branch was updated.  So, you might say that refs/remotes/origin/master is tracking the state of the branch master in origin.

The “tracking” here is defined by the refspec in the config variable remote.<remote-name>.fetch and the URL in the config variable remote.<remote-name>.url.

3. “track” as in “git branch –track foo origin/bar” and “Branch foo set up to track remote branch bar from origin”

Again, if you want to do some work on a branch from a remote repository, but want to keep your work separate from everything else in your repository, you’ll typically use a command like the following (or one of its many “Do What I Mean” equivalents):

git checkout --track -b foo origin/bar

… which will result in the following messages:

Branch foo set up to track remote branch bar from origin
Switched to a new branch 'foo'

The sense of “track” both in the command and the output is distinct from the previous sense – it means that config options have been set that associate your new local branch with another branch in the remote repository. The documentation sometimes refers to this relationship as making bar in origin “upstream” of foo. This “upstream” association is very useful, in fact: it enables nice features like being able to just type git pull while you’re on branch foo in order to fetch from origin and then merge from origin/bar. It’s also how you get helpful messages about the state of your branch relative to the remote-tracking branch, like “Your branch foo is 24 commits ahead of origin/bar and can be fast-forwarded”.

The tracking here is defined by config variables branch.<branch-name>.remote and branch.<branch-name>.merge.

“tracking” Summary

Fortunately, the third sense of “tracking” seems to be being carefully deprecated – for example, one of the possible options for push.default used to be tracking, but this is now deprecated in favour of the option name upstream. The commit message for 53c403116 says:

push.default: Rename ‘tracking’ to ‘upstream’

Users are sometimes confused with two different types of “tracking” behavior in Git: “remote-tracking” branches (e.g. refs/remotes/*/*) versus the merge/rebase relationship between a local branch and its @{upstream} (controlled by branch.foo.remote and branch.foo.merge config settings).

When the push.default is set to ‘tracking’, it specifies that a branch should be pushed to its @{upstream} branch. In other words, setting push.default to ‘tracking’ applies only to the latter of the above two types of “tracking” behavior.

In order to make this more understandable to the user, we rename the push.default == ‘tracking’ option to push.default == ‘upstream’.

push.default == ‘tracking’ is left as a deprecated synonym for ‘upstream’.

“commit”

In CVS and Subversion, “commit” means to send your changes to the remote repository. In git the action of committing (with “git commit”) is entirely local; the closest equivalent of “cvs commit” is “git push”. In addition, the word “commit” in git has different verb and noun senses (although frankly I’ve never found this confusing myself). To quote from the helpful git glossary:

commit

As a noun: A single point in the git history; the entire history of a project is represented as a set of interrelated commits. The word “commit” is often used by git in the same places other revision control systems use the words “revision” or “version”. Also used as a short hand for commit object.

As a verb: The action of storing a new snapshot of the project’s state in the git history, by creating a new commit representing the current state of the index and advancing HEAD to point at the new commit.

Strictly speaking that makes two different noun senses, but as a user I’ve rarely found that confusing.

“checkout”

In CVS and Subversion “checkout” creates a new local copy of the source code that is linked to that repository. The closest command in git is “git clone”. However, in git, “git checkout” is used for something completely distinct. In fact, it has two largely distinct modes of operation:

  1. To switch HEAD to point to a new branch or commit, in the usage git checkout <branch>. If <branch> is genuinely a local branch, this will switch to that branch (i.e. HEAD will point to the ref name) or if it otherwise resolves to a commit will detach HEAD and point it directly to the commit’s object name.
  2. To replace a file or multiple files in the working copy and the index with their content from a particular commit or the index. This is seen in the usages: git checkout -- (update from the index) and git checkout <tree-ish> -- (where <tree-ish> is typically a commit).

(git checkout is also frequently used with -b, to create a new branch, but that’s really a sub-case of usage 1.)

In my ideal world, these two modes of operation would have different verbs, and neither of them would be “checkout”.

“HEAD” and “head”

There are usually many “heads” (lower-case) in a git repository – the tip of each branch is a head. However, there is only one HEAD (upper-case) which is a symbolic ref which points to the current branch or commit.

“fetch” and “pull”

I wasn’t aware of this until Roy Badami pointed it out, but it seems that git and Mercurial have opposite meanings for “fetch” and “pull” – see the top two lines in this table of git / hg equivalences. I think it’s understandable that since git’s and Mercurial’s development were more or less concurrent, such unfortunate clashes in terminology might occur.

“push” and “pull”

“git pull” is not the opposite of “git push”; the closest there is to an opposite of “git push” is “git fetch”.

“hash”, “SHA1”, “SHA1sum”, “object name” and “object identifier”

These terms are often used synonymously to mean the 40 characters hexadecimal strings that uniquely identify objects in git. “object name” seems to be the most official, but the least used in general. Referring to an object name as a SHA1sum is potentially confusing, since the object name for a blob is not the same as the SHA1sum of the file.

“remote branch”

This term is only used occasionally in the git documentation, but it’s one that I would always try to avoid because it tends to be unclear whether you mean “a branch in a remote repository” or “a remote-tracking branch”. Whenever a git beginner uses this phrase, I think it’s worth clarifying this, since it can avoid later confusion.

“index”, “staging area” and “cache”

As nouns, these are all synonyms, which all exist for historical reasons. Personally, I like “staging area” the best since it seems to be the easiest concept to understand for git beginners, but the other two are used more commonly in the documentation.

When used as command options, --index and --cached have distinct and consistent meanings, as explained by Junio C. Hamano in this useful blog post.

Why are there so many of these points of confusion?

I would speculate that the most significant effect that contributed to these terminology confusions is that git was being actively used by an enthusiastic community from very early in its development, which means that early names for concepts have tended to persist for the sake of compatibility and consistency. That doesn’t necessarily account for the many conflicts with CVS / Subversion usage, however.

To be fair to git, thinking up verbs for particular commands in any software is tough, and there have been enough version control systems written that to completely avoid clashes would lead to some convoluted choices. However, it’s hard to see git’s use of CVS / Subversion terminology for completely different concepts as anything but perverse. Linus has made it very clear many times that he hated CVS, and joked that a design principle for git was WWCVSND (What Would CVS Never Do); I’m sympathetic to that, as I’m sure most are, especially after having switched to the DVCS mindset. However, could that attitude have extended to deliberately disregarding concerns about terminology that might make it actively harder for people to migrate to git from CVS / Subversion? I don’t know nearly enough about the early development of git to know. However, it wouldn’t have been tough to find better choices for commit, checkout and update in each of their various senses.

Missing git hooks documentation

One part of git’s documentation that is particularly lacking is that on the subject of hooks.  In particular, that page doesn’t explain:

  • What the current working directory is when the hooks run.
  • Which helpful environment variables are set in the environment when the hooks are run.

These omissions are particularly irritating since the current directory is not consistent across the different hooks, and the setting of the GIT_DIR environment variable can cause some very surprising results in some situations.

So, I did some quick tests to find what the behaviour of git 1.7.1 is for each of these hooks, for both bare and non-bare repositories where appropriate.  (If I were more confident about the details of these, I would try to contribute a documentation patch, but that’s probably best done by someone who knows the code well.)

applypatch-msg
post-applypatch
pre-applypatch

(Tested with “git am” in a working tree – it cannot be used with a bare repository for obvious reasons.)

The current directory is the top level of the working tree.  The following environment variables are set:

  • GIT_AUTHOR_DATE, e.g. set to ‘Sat, 9 Apr 2011 10:13:24 +0200’
  • GIT_AUTHOR_NAME, e.g. set to ‘Ada Lovelace’
  • GIT_AUTHOR_EMAIL, e.g. set to ‘whoever@whereever’
  • GIT_REFLOG_ACTION is set to ‘am’

Note that GIT_DIR is not set.

pre-commit
prepare-commit-msg
commit-msgt
post-commit

(Tested with “git commit” in a working tree – it cannot be used with a bare repository.)

The current directory is the top level of the working tree.  The following environment variables are set:

  • GIT_DIR is set to ‘.git’
  • GIT_INDEX_FILE is set to ‘.git/index’

post-checkout

(Tested with “git commit” in a working tree – it cannot be used with a bare repository.)

The current directory is the top level of the working tree.  The following environment variables are set:

  • GIT_DIR is set to ‘.git’

pre-receive
update
post-receive
post-update

These hooks can be run either in a bare or a non-bare repository.  In both cases, the current working directory will be the git directory.  So, if this is a bare repository called “/src/git/test.git/”, that will be the current working directory – if this is a non-bare repository and the top level of the working tree is “/home/mark/test/” then the current working directory will be “/home/mark/test/.git/”.

In both cases, the following environment variable is set:

  • GIT_DIR is set to ‘.’

With a working tree, this is unexpectedly awkward, as described in Chris Johnsen’s answer that I linked to earlier.  If only GIT_DIR is set then this comment from the git man page applies:

Note: If –git-dir or GIT_DIR are specified but none of –work-tree, GIT_WORK_TREE and core.worktree is specified, the current working directory is regarded as the top directory of your working tree.

In other words, your working tree will also be the current directory (the “.git” directory), which almost certainly isn’t what you want.

pre-auto-gc

(Not tested yet.)

Summary

I think the (quite obvious) lesson from this is just:

  • Always test your hooks carefully, probably starting with a script that just echos the current working directory and GIT_DIR.

A related tip is that since the rules about GIT_DIR / –git-dir and GIT_WORK_TREE / –work-tree / core.worktree are so complex, I follow the rule of thumb that if you need to set either one, set both, and make sure you set them to an absolute path.

In case you’re interested, the test hook I used for this was just:

#!/bin/bash
echo Running $BASH_SOURCE
set | egrep GIT
echo PWD is $PWD

git: Too Many Topic Branches

Another couple of git tips, that might conceivably be useful to someone somewhere :)

git makes it so easy to create topic branches, that it’s easy to lose track of which branches were for what. Here are a couple of recipes that might help with this:

Order branches by last commit date

I often want to order my branches according to how recently I was working on them. We can approximate that by saying we’d like to order the branches by the commit date of each branch tip, and you can do that with git for-each-ref --sort=committerdate. For example, the shell script:


for C in $(git for-each-ref --sort=committerdate refs/heads --format='%(refname)')
do
    git show -s --format="%ci $C" "$C"
done

… produces output like this:

...
2011-02-23 18:57:01 -0500 refs/heads/trainable-seg-gui
2011-03-09 11:25:38 +0100 refs/heads/snt-swing-menus3
2011-03-11 00:44:37 +0100 refs/heads/master

Show branches that introduced changes to particular paths

If that doesn’t turn up the branch I’m looking for, it might be useful to just list those branches that made changes to particular paths (with respect to master). Here’s a similar example of how to do this, in this case looking for all those branches that introduced changes to the path src-plugins/Simple_Neurite_Tracer:


P="src-plugins/Simple_Neurite_Tracer"
for C in $(git for-each-ref --sort=committerdate refs/heads/ --format='%(refname)')
do
    git diff master..."$C" --quiet -- "$P" || echo $C
done

… which produces output like this:

...
refs/heads/sholl-analysis
refs/heads/for-rebasing
refs/heads/sholl-analysis-wip
refs/heads/sholl-analysis-wip2
...

Ordering branches by the last time the branch was changed

This came up in a Stack Overflow question – sometimes you might want to know when the branch pointer was last changed, not the date of the last commit. Jefromi’s answer provides a nice recipe that you could alter to order the branches in that way.

An asymmetry between git pull and push

Although git is an excellent system, which has certainly changed my way of working for the better, occasionally one comes across an inconsistency that seems bizarre. In case you don’t want to read the whole of this post, the one sentence summary would be, “By default, git push origin will update branches on the destination with one with the same name on the source, instead of using the association defined by git branch --track, which git pull origin would use — the config option push.default can change this behaviour.” However, for a more detailed explanation, read on…

Suppose someone has told you that they’ve pushed a topic branch to GitHub that they’d like you to work on. Let’s say that you’ve set up a remote called github for that repository, and the branch there is called new-feature2.  With a recent git (>= 1.6.1) you can just do git fetch and then:

git checkout -t github/new-feature2

… which will create a branch in your repository called new-feature2 based on github/new-feature2, and set various config options to associate your new-feature2 branch with github/new-feature2.  It will also checkout that new branch so that you can start working on it.  However, let’s suppose that you want to give your branch a more helpful name – let’s say that’s “add-menu”.  Then you might instead do:

git checkout -t -b add-menu github/new-feature2

… which has the same effects to the previous command, except for giving the branch a different name locally.  The config options that will have been set by that command are:

branch.add-menu.remote=github
branch.add-menu.merge=refs/heads/new-feature2

The detailed semantics of these config options are given in the branch.<name>.remote and branch.<name> merge sections of git config’s documentation, but, for the moment, just understand that this sets up an association between your local add-menu branch, and the new-feature2 branch on GitHub.

This association makes various helpful features of git possible – for example, this is how you get this nice information from git status:

$ git status
# On branch add-menu
# Your branch is ahead of 'github/new-feature2' by 5 commits.

It’s also the mechanism by which, when you’re on the add-menu branch, typing:

$ git pull github

… will cause git to run a git fetch, and then merge github/new-feature2 into your add-menu branch.  That’s all very helpful.

So, what happens when you want to push your changes back to the upstream branch?  You might hope that because this association exists in your config, then typing any of the following three commands while you’re on the add-menu branch would work:

  1. git push github add-menu
  2. git push github
  3. git push
  4. git push github HEAD

However, with the default git setup, none of these commands will result in new-feature2 being updated with your new commits on add-menu.  What does happen instead?

1. git push github add-menu

In this case git push parses add-menu as a refspec.  “refspecs” are usually of the form <src>:<dst>, telling you which local branch (src) you’d like to update the remote branch (def) with.  However, the default behaviour if you don’t add :<dst>, as in this example, is explained in here:

If :<dst> is omitted, the same ref as <src> will be updated.

So the command is equivalent to git push github add-menu:add-menu, which will create a new branch called add-menu on GitHub rather than updating new-feature2.

2. git push github

In this case, the refspec is omitted.  The documentation for git push again explains what happens in this case:

The special refspec : (or +: to allow non-fast-forward updates) directs git to push “matching” branches: for every branch that exists on the local side, the remote side is updated if a branch of the same name already exists on the remote side. This is the default operation mode if no explicit refspec is found (that is neither on the command line nor in any Push line of the corresponding remotes file—see below).

… so the new commits on your add-menu branch won’t be pushed.  However, the changes for every other branch for which there’s a matching name in your repository on GitHub will be!

2. git push

Again, we can find in the documentation for git push what happens if we miss out the remote as well:

git push: Works like git push <remote>, where <remote> is the current branch’s remote (or origin, if no remote is configured for the current branch).

In our example case, branch.add-menu.remote is set to github, so the behaviour in this case will be the same as in the previous one, i.e. probably not what you want.

4. git push github HEAD

Thanks to David Ongaro for suggesting adding this fourth wrong command. The git push documentation explains that this is:

A handy way to push the current branch to the same name on the remote.

In other words, in this example, that will end up being the same as git push github add-menu:add-menu, again creating an unwanted add-menu branch in the remote repository.

So how should you push?

The simplest option, which will work everywhere, is just to specify both the source and destination parts of the refspec, i.e.:

git push github add-menu:new-feature2

That means that you have to remember what the remote name should be, but it’s the least ambiguous way to push a branch, and in any case it’s a good idea to understand how to use refspecs more generally.

However, another alternative (available since git version 1.6.3) is to set the push.default config variable.  The documentation for this in the git config man page is:

push.default: Defines the action git push should take if no refspec is given on the command line, no refspec is configured in the remote, and no refspec is implied by any of the options given on the command line. Possible values are:

  • nothing – do not push anything.
  • matching – push all matching branches. All branches having the same name in both ends are considered to be matching. This is the default.
  • tracking – push the current branch to its upstream branch.
  • current – push the current branch to a branch of the same name.

So if you set push.default to tracking with one of:

$ git config push.default tracking # just for the current repository
$ git config --global push.default tracking # globally for your account

… then when you’re on the add-menu branch, git push github will update new-feature2 on GitHub with your changes in add-menu, and no other branches will be affected.

The commit message that introduced this change suggests that the reason that this option was introduced was exactly to avoid the kind of confusion I’ve described above:

When “git push” is not told what refspecs to push, it pushes all matching branches to the current remote. For some workflows this default is not useful, and surprises new users. Some have even found that this default behaviour is too easy to trigger by accident with unwanted consequences.

Personally, I don’t actually use this option, since I use git on so many different systems it would be more confusing to have different settings for push.default on some of them.  However, I hope it’s useful for some people, and it’s a shame that this behaviour couldn’t reasonably be made the default at this stage.

Update: Thanks to David Ongaro, who points out below that since git 1.7.4.2, the recommended value for the push.default option is upstream rather than tracking, although tracking can still be used as a deprecated synonym. The commit message that describes that change is nice, since it suggests that there is an effort underway to deprecate the term “track” in the context of setting this association with the upstream branch in a remote repository. (The totally different meanings of “track” in git branch --track and “remote-tracking branches” has long irritated me when trying to introduce git to people.)

Update (2012-07-20) There has been an ongoing discussion in the git world about what the default behaviour for git push should be, given that the default behaviour is so surprising to newcomers. It seems that the decision is to introduce a new value for push.default, called simple and ultimately make that the default. This decision is described in a commit message as follows:

push: introduce new push.default mode “simple”

When calling “git push” without argument, we want to allow Git to do
something simple to explain and safe. push.default=matching is unsafe
when used to push to shared repositories, and hard to explain to
beginners in some contexts. It is debatable whether ‘upstream’ or
‘current’ is the safest or the easiest to explain, so introduce a new
mode called ‘simple’ that is the intersection of them: push to the
upstream branch, but only if it has the same name remotely. If not, give
an error that suggests the right command to push explicitely to
‘upstream’ or ‘current’.

A question is whether to allow pushing when no upstream is configured. An
argument in favor of allowing the push is that it makes the new mode work
in more cases. On the other hand, refusing to push when no upstream is
configured encourages the user to set the upstream, which will be
beneficial on the next pull. Lacking better argument, we chose to deny
the push, because it will be easier to change in the future if someone
shows us wrong.

Original-patch-by: Jeff King

Signed-off-by: Matthieu Moy

This new possible value for push.default is available in 1.7.11, and will be made the default behaviour in the future (but it isn’t in any released version so far).

git Submodules Explained

I haven’t actually finished the FAQ bit of this post yet, but since I’m not sure when I’ll have time to do so, I’ll just publish it anyway – please let me know in the comments if this is useful for you, or there’s something else you’d like to see included.

Submodules in git are commonly misunderstood in various ways, and although the explanation in the official manual is clear and pretty easy to understand, I thought that a different treatment here might be useful to someone.

What are submodules?

A submodule in a git repository is like a sub-directory which is really a separate git repository in its own right.  This is a useful feature when you have a project in git which depends on a particular versions of other projects.  For example, if you’re developing a new Ruby-on-Rails application, you could add a clearly specified version of the Rails repository as a submodule at the path vendor/rails.  The example I’m going to use in this post however, called whatdotheyknow, is one of the various mySociety projects that depend on a repository called commonlib, which contains useful code common to at least one project.  In each project the commonlib repository has been added as a submodule.  (I’ll sometimes refer to the whatdotheyknow repository as the super-project, which I hope is clear.)

It’s important to understand that the repository which contains a submodule knows very little about it except for which version it should be and various bits of information about how to update it.  (More on that below.)  If you change directory into the submodule then you’ll find that it doesn’t know anything about the the parent project at all, and you can carry out operations in that repository as if it were standalone.

Before you proceed…

… it’s worth checking what version of git you have.  Many actions that you might perform that relate to submodules are done with the git submodule command, but in older versions of git this has two problems that make it very easy to get confused – I think these are important enough that everyone who uses submodules should be aware of them, and ideally upgrade their copy of git to a version that doesn’t have these problems: at least version 1.6.2.

The first of these is that if you had a typo in the name of a submodule listed on the command line, that would be silently ignored.  The second problem which compounded this is that if you spelled the submodule name with a trailing slash (as is common with tab-completion) then that did not refer to the submodule, and due to the previous problem would be ignored.  There were fixed in f3670a5749d70 and 496917b721ada.  (As a small point of interest, to find out which tagged releases had these fixes, I cloned git.git and did git tag --contains 496917b.)

Note also that version 1.7.0 and later versions of git have some annoying differences in behaviour, which are noted below.

How are submodules stored?

To answer this you need to understand a little bit about how git stores objects.  If you just want recipes for how to do particular things, then you can skip to “Things You Might Need To Do” below, but I think this section is useful for figuring out problems that might arise.

git’s model of the world is based around objects which are identified by their “object name”, which is the correct term for the SHA1sum hashes you see all over the place.  These objects can be of various types, such as “commit”, “tag”, “blob” (file), “tree” (directory), etc.  Each commit object points to a tree object which represents the state of your source code at that commit.  A tree object in turn consists of a list of objects with some metadata, e.g. as in this example for whatdotheyknow:

$ git ls-tree HEAD^{tree}
100644 blob 1e38e022c1c7d27f6dd9b765793087b59d147ef8    .cvsignore
100644 blob aa5036394edfea0a5dff64e0c53b4e9a026f1beb    .gitignore
100644 blob 4ef4ae8268dcad9b0de371f1aa63bb3ebbeb436a    .gitmodules
100755 blob 44c881fe25b8dc1413d9195677f492121a3789f0    INSTALL.txt
100644 blob 37312d9a1bcc80ac334547f047a2cece38dd24dc    README
100644 blob 3bb0e8592a41ae3185ee32266c860714980dbed7    Rakefile
040000 tree e326ffb3d697e7ac83fa19d93a8a3305120c719e    app
160000 commit fd91ab69279f1e0cfed53353e64811d5aa9c4b5f    commonlib
040000 tree ae93b14ec7ab01ee33053c32eca340a31ce6449f    config
040000 tree 8a7eb4d1552cc2a59fc0528c02fe0fb686d7f562    db
040000 tree 84fae00002a0e834140e2f806978748d50d60c4b    doc
040000 tree eb4089c7989ee846bbd66c97069aeff7853d0064    lib
040000 tree e7bcca0f6d561188730125b228a22a4d7bd68782    public
040000 tree f4e46de68199afa382d53583d83430c691aeb473    script
040000 tree e5772463cfed62ba63cfaf4e0eacecd1dc3895e5    spec
100644 blob bfc265e33e47ffa9796fe7bb7ae7d1fe7e633593    todo.txt
040000 tree 2999c0a790c0033ad93e312c0bc62ecdc9a18f81    vendor

As you can see, typically the types of objects listed in a tree are either blobs or trees, indicating files or subdirectories.  However, if a object of type “commit” is listed (with the mode 160000) that represents a submodule.  The object name (in this case fd91ab69…) is the commit that the submodule’s HEAD should be at.  One implication of this is that that object name usually won’t be known outside the submodule.  This sometimes causes confusion when people do git diff in the super-project and find a difference in the submodule entry, e.g.:

$ git diff
diff --git a/commonlib b/commonlib
index fd91ab6..d6593c6 160000
--- a/commonlib
+++ b/commonlib
@@ -1 +1 @@
-Subproject commit fd91ab69279f1e0cfed53353e64811d5aa9c4b5f
+Subproject commit d6593c6741b29680665b8ae7470e2f80ab9a5977

This output means that the submodule version which is committed in the whatdotheyknow repository is fd91ab69279, but if you change into the commonlib subdirectory, you will find that the HEAD of that repository is at d6593c6741.  Hopefully both of these commits will be known in the commonlib submodule, but neither will be in the whatdotheyknow repository.

The other information about the submodule which is stored in the super-project is stored in the .gitmodules file and in config options.

A submodule which is “initialized” will have a config option set to indicate the URL that the submodule should be cloned from if it is missing.  These config options are of the form submodule.<SUBMODULE-NAME>.url, so having initialized the commonlib submodule in whatdotheyknow, I can see the following:

$ git config --list|egrep ^submodule
submodule.commonlib.url=git://git.mysociety.org/commonlib

The .gitmodules file provides sensible default URLs for each submodule, and is committed in the repository like any other versioned file:

$ cat .gitmodules
[submodule "commonlib"]
 path = commonlib
 url = git://git.mysociety.org/commonlib

If you’re publishing a repository with the intention that anyone should be able clone and use it, you should make sure that the URLs specified in .gitmodules are ones that can be publicly accessed – so don’t, for example, use an SSH URL with your user name in it.  Since these URLs are only used when initializing a submodule, which you typically do only rarely, it’s not a great inconvenience that you may have to change them in order to push changes you’ve made in the submodule.

Things You Might Need To Do

This section lists some simple recipes for doing all kinds of things with submodules.  If you think there’s something I should add, please let me know.  For the sake of simplicity, in the examples below, I’m not listing submodule paths explicitly at the end of git submodule commands, which generally means that the action applies to all of the submodules.  (The exception is git submodule add, which of course only applies to a single submodule.)

Get a working submodule version after cloning

If you’ve just cloned a repository which contains submodules, you can initialize and clone all of them with:

git submodule update --init

This is the equivalent of running:

git submodule init
git submodule update

With version 1.6.5 of git and later, you can do this automatically by cloning the super-project with the –recursive option:

git clone --recursive git://github.com/mysociety/whatdotheyknow.git

See the status of all the submodules

Running git submodule without arguments defaults to running git submodule status, which produces a helpful summary of the status of all your submodules.  Each line begins with a space, a ‘+’ or a ‘-‘ which indicate the following things:

+
The version checked out in the submodule is different from that specified in the super-project. The object name shown is that of the commit that the submodule is currently at.  (The meaning of this symbol changed in 1.7.0.)
The submodule hasn’t been initialized or there’s no repository at the submodule path (e.g. if you’ve run git submodule init but not git submodule update, or you’ve later deleted the submodule directory from the working tree). The object name shown is the commit that’s specified in the super-project.
[space]
The submodule’s HEAD is at the correct version – the object name shown is that version.

In projects with many submodules this can be a helpful way to see at a glance where all your submodules are at.  For example, here’s some output from a version of the Fiji project that I’m working on:

 bbff1fd4545b3a614b14eb0770ac6028b648746d AutoComplete (bbff1fd)
+16dcf52ef2106cc92ba89c90b6b5f457bc7619ea ImageJA (heads/current-147-g16dcf52)
 5bfc9eb779d39e38c23ce1c3b01b49953ebd8463 RSyntaxTextArea (5bfc9eb)
 b9f11849599d536528c26bc599dbec4609d77dc4 Retrotranslator (remotes/origin/master)
 90287f0250542be256f67ade4e29a618bf6e688f TrakEM2 (0.7m-227-g90287f0)
+f25db2a43b95480c780d865323fce659a1135c2d VIB (tracer-1.4.0-candidate-849-gf25db2a)
 e4d3eb47a8f9d4e62d1f356636652c3ecc739d92 batik (remotes/origin/svn/git-svn@216063-588-ge4d3eb4)
 79de599df2550f2813fd449505b6fa55ca08cbb3 bio-formats (remotes/origin/contrib-380-g79de599)
 e73abece1ebf3a4aba22104ae9452b2b816ab0d7 clojure (remotes/origin/HEAD)
 39618b6d881fb0c3b52de4929aa34134bb32ffdb clojure-contrib (remotes/origin/master)
 9fa7f4d993f57e27e3134b016c7d36fbfd33e34c ij-plugins (9fa7f4d)
 7ffa48359cdbf7a47735b719a605ea322c58d694 java/linux (heads/master)
-cc218f05fdc0bb55f40f904d5d1f804e8751d0d2 java/linux-amd64
-4f3964234f4e6fd78247e5e7fad9c8becad53e8f java/macosx-java3d
-e79c51473df06f00d4ba9c913afe27e675f71d64 java/win32
-54e735c6c9bac65fcc889bc9e833213f19c7458a java/win64
 b362c662f79763c7927a2ba486243ccefa9222a1 junit (obsolete-cvsimport)
-9ae38d4bde196fa6a4595aebed9f218d4ec591bc jython
 c6e929a15d77545f03ea4883bf033e13c632ef12 live-helper (1.0.4-1-43-gc6e929a)
 79d369af87c4412a47f7065938fe18befc0a183e mpicbg (remotes/origin/trakem2-30-g79d369a)
 20ab0539cc248c642982fdf1330325636d8c55c0 tcljava (tcljava-141-2007-06-06-6-g20ab053)
 a7bfed6752ea1aeac73db386411329486e339f94 weka (a7bfed6)

Update submodules to the versions specified in HEAD

If you change the HEAD of your super-project (e.g. with git pull, or by checking out a new branch) you may find that your submodules are now at the wrong versions.  (You can check with git submodule status as shown above.)  If you’re not actively working on the submodules, then the simplest way to move the to the right versions is with:

git submodule update

If any initialized submodules are missing, this will clone them.  For other submodules where the repository exists, this will change into its subdirectory,  run git fetch (to make sure  all the most recent updates are present) and then git checkout the correct version.  This has the effect of “detaching HEAD” in each submodule, so if you want to work on a branch in any of those subdirectories, you’ll have to git checkout to a branch.

The most frequent errors that you’ll find when running git submodule update are likely to be due to someone having created a commit in the super-project that references a commit in the submodule that they’ve forgotten to push, so check that whenever you get errors about not being able to find particular versions.

Versions of git after 1.6.4 add the --merge and --rebase options to git submodule update to allow more flexible ways of updating your submodules while you’re working on them.

Add a new submodule to a repository

This is nice and easy to do from a URL.  For example, if we wanted to create a new mySociety project called “create robot MP”, and add commonlib to it, you would just use git submodule add:

$ mkdir createrobotmp
$ cd createrobotmp
$ git init
Initialized empty Git repository in /home/mark/tmp/createrobotmp/.git/
$ git submodule add git://git.mysociety.org/commonlib commonlib
Initialized empty Git repository in /home/mark/tmp/createrobotmp/commonlib/.git/
remote: Counting objects: 5240, done.
remote: Compressing objects: 100% (1974/1974), done.
remote: Total 5240 (delta 3311), reused 5038 (delta 3197)
Receiving objects: 100% (5240/5240), 1020.36 KiB | 377 KiB/s, done.
Resolving deltas: 100% (3311/3311), done.
$ git status
# On branch master
#
# Initial commit
#
# Changes to be committed:
#   (use "git rm --cached <file>..." to unstage)
#
#    new file:   .gitmodules
#    new file:   commonlib
#

Then you need to stage and commit .gitmodules and commonlib as with any other new files.  Since this puts the URL in the .gitmodules file, you should make this a publicly clonable URL, as mentioned above.

Change the remote for a submodule

If you frequently work in a submodule you might want to change the default remote “origin” to refer to a URL that you can push to, just so you can use one remote for everything.  You can do this by deleting orgin and adding it back with a new URL, with e.g.:

$ cd commonlib
$ git remote rm origin
$ git remote add origin ssh://mark@git.mysociety.org/data/git/public/commonlib.git
$ git remote -v
origin    ssh://mark@git.mysociety.org/data/git/public/commonlib.git

However, you’ll find that two helpful config options will have been deleted when removing and adding back origin, so you’ll want to add these back.

$ git config branch.master.remote origin
$ git config branch.master.merge refs/heads/master

These config options set up the helpful defaults for git pull when you’re on master.

If you’re in the habit of deleting whole submodules, and then recreating them with git submodule update then you should also make sure that you change the URL in the super-project’s config settings, e.g.:

$ git config --list|egrep submodule
submodule.commonlib.url=git://git.mysociety.org/commonlib
$ git config submodule.commonlib.url ssh://mark@git.mysociety.org/data/git/public/commonlib.git
$ git config --list|egrep submodule
submodule.commonlib.url=ssh://mark@git.mysociety.org/data/git/public/commonlib.git

Initialize a submodule with a non-standard URL

If you know in advance that you want to clone your submodules from a URL different from that specified in .gitmodules (e.g. with a private SSH URL that can you push to) then after cloning the superproject you can set the appropriate config by hand before running git submodule update.  This takes the place of the git submodule init command, for example:

$ git clone ssh://mark@git.mysociety.org/data/git/public/whatdotheyknow.git
[...]
$ cd whatdotheyknow
$ git config submodule.commonlib.url ssh://mark@git.mysociety.org/data/git/public/commonlib.git
$ git submodule update
Initialized empty Git repository in /home/mark/tmp/whatdotheyknow/commonlib/.git/
remote: Counting objects: 5240, done.
remote: Compressing objects: 100% (1974/1974), done.
remote: Total 5240 (delta 3311), reused 5038 (delta 3197)
Receiving objects: 100% (5240/5240), 1020.36 KiB | 533 KiB/s, done.
Resolving deltas: 100% (3311/3311), done.
Submodule path 'commonlib': checked out 'a901c2a431f7869f5c2eaee5808f8590ca78544e'
$ cd commonlib/
$ git remote show origin
* remote origin
 URL: ssh://mark@git.mysociety.org/data/git/public/commonlib.git
 HEAD branch: master
 Remote branch:
 master tracked
 Local branch configured for 'git pull':
 master merges with remote master
 Local ref configured for 'git push':
 master pushes to master (up to date)

Modified submodules in 1.7.0 and later

Versions 1.7.0 and later of git contain an annoying change in the behaviour of git submodule.  Submodules are now regarded as dirty if they have any modified files or untracked files, whereas previously it would only be the case if HEAD in the submodule pointed to the wrong commit. Why is this annoying?  The following reasons:

  • Firstly, the meaning of the plus sign (+) in the output of git submodule has changed, and the first time that you come across this it takes a little while to figure out what’s going wrong, for example by looking through changelogs or using git bisect on git.git to find the change.  It would have been much kinder to users to introduce a different symbol for “at the specified version, but dirty”.
  • git status is now very slow in projects with several large submodules.  (git status used to be nearly instant in a clone of fiji.git but trying just now with 1.7.0.4 took an incredible 45 seconds.)

This seems like a change that was introduced without considering the surprise and impact that it would have on users.  In any case, I’ve added this note here since if you work with submodules, you may need to be aware of this change in behaviour.

The output of git diff has changed as well, to add “-dirty” to the object name if the working tree of that submodule is dirty:

$ git diff imglib
diff --git a/imglib b/imglib
--- a/imglib
+++ b/imglib
@@ -1 +1 @@
-Subproject commit c5c6bbaf616d64fbd873df7b7feecebb81b5aee7
+Subproject commit c5c6bbaf616d64fbd873df7b7feecebb81b5aee7-dirty

Update: Thanks to VonC, who points out in the comments below that in git 1.7.2 there is now a “–ignore-submodules” option to git status which can restore the old behaviour and also provides the useful option that only changed files (not untracked files) cause the submodule to be shown as dirty.

Removing a submodule

There are instructions for the several steps required to remove a submodule at the bottom of this page:

http://git.wiki.kernel.org/index.php/GitSubmoduleTutorial

A Few git Tips

A screenshot of gitk --all
A screenshot of gitk --all

I owe thanks to Johannes Schindelin, who passed on at least three of these tips to me in the course of working on Fiji :)  Update: this page was written when I was relatively new to git – some of the general remarks about the philosophy of git are better expressed in a short tutorial I wrote recently.

gitk –all

Whenever you’re confused, use gitk –all to work out what’s going on.  The –all parameter is important so that you see all your local and remote-tracking branches. It might take a while to get used to reading the graphic representation of what’s going on, but it very often explains why things aren’t working as you might expect.

Finding text anywhere in your complete history

Particularly if you are in the habit of creating lots of branches (as I suggest below) it is easy to forget where you introduced a particular bit of code.  You can find it again simply with git log -S<search-term> –all.   For example, suppose you know that “Factory” occurred as part of the class name that you’re looking for then git log -SFactory –all will list all commits in your repository (including remote-tracking branches) that mention “Factory” in the change they introduce.  If you want to see the patch as well (which is often helpful) add -p to that command.  Update and correction: note that “git log -S<search-term>” doesn’t quite do what I said there: in fact it finds where there are different numbers of occurrences of the string in a file before and after that commit.  In git version 1.74 and later you can use -G<regex> instead, which has more obvious semantics – it displays commits where the change introduced genuinely adds or removes a line that matches the given regular expression.

If you want to find all the branches that one of those commits is on, see the next tip:

Finding commits by SHA1sum

There are a lot of common situations where it’s useful to find out all the branches that a particular commit is on.  For example, if you’re using submodules, then git submodule update will detach HEAD in each submodule in order to move it to the right commit.  You might, however, want to do some work on the branch that commit is on.  To do this, use git branch –a –contains <SHA1sum of commit>.  For example:

  $ git branch -a --contains 6f2293e7f6428
  * (no branch)
    origin/fiji

(This suggests that you should create a branch that tracks the remote-tracking branch origin/fiji, and you’ll find the commit in the history for that branch.)

git-ps1

Use __git_ps1

It’s very easy to forget which branch you’re working on, particularly once you get used to switching your working tree to different branches with git checkout.  You can use the  $(__git_ps1 ” (%s)”) in your PS1 environment variable to include some useful information about your current branch in your bash prompt.  If you are not currently in a git repository, this evaluates to the empty string.  Another very nice feature about __git_ps1 is that it will remind you if you’re in the middle of a rebase.  It’s easy to abandon a rebase but forget to do git rebase –abort at the end.  git status won’t remind you of this (a bug, IMHO) but if it’s there in your bash prompt, it’s hard to ignore.

The example in the image uses:

  PS1=' \[\033[1;37m\]: \u@\h:\w\[\033[0;37m\]$(__git_ps1 " (%s)")\n '

… which, if I recall correctly, was partly based on Tony Finch‘s.

Create lots of topic branches

In other version control systems branching and merging are awkward compared to git, and this means that people are often reluctant to create new branches.  One of the things that I love about working with git is that it’s quite easy and natural to create a new branch for every idea you have, and just merge them into your master branch when you’re happy with them.  This matches the way that I like to work on things, as well – you can work on one idea until you’re stuck, then switch to something easier for a bit, etc. all in one copy of the repository.

Only keep one copy of a repository per computer

It’s tempting when starting to use git to clone a repository several times and work on different things in each.  In general it’s much better (and more “git-like”) to just have a single repository and get used to switching about your working tree with git checkout <BRANCH NAME>.  If you want to quickly put aside what you’re working on, without having to carefully put together a commit, then use git stash.  (Beware when unstashing some local modifications with git stash appply however – you need to switch back to the right branch before running that command.)

Turn on all git’s coloured output options

There’s a nice summary on this git cheat sheet of the options you need to add to your .gitconfig to turn on git’s coloured output for terminals.

In addition, you might find the –color-words options to git log and git diff useful.  Rather than producing a line-by-line diff, it will coloured added text in green and removed text in red within a single line.  e.g. try git log -p –color-words

View a particular file from a particular commit

Fairly often, I just want to have a quick look at the version of a file in a particular revision.  The easiest way to do this is with the <COMMIT>:<FILENAME> syntax that git show understands.  For example, to see the version of a file util/TestQuantile.java from a couple of commits ago, you can do:

  git show HEAD^^:util/TestQuantile.java

… or to see it from a topic branch called “experiment” do:

  git show experiment:util/TestQuantile.java

Note that the path is always relative to the repository root – even if you’re in the “util” subdirectory, you still have to include “util/” in the <FILENAME> part.  You can use the same syntax to git diff to compare two arbitrary files from arbitrary revisions.  It’s also frequently useful to view a file from where one of your branches was at a particular date, such as:

  git show master@{2.weeks.ago}:util/TestQuantile.java

To be clear, this is where your own master branch was two weeks ago – it uses the reflog for master rather than commit dates to find the commit.

Use the git glossary

The git glossary provides a very succinct and useful summary of a lot of git concepts, and should be your first port of call if you don’t understand something in the manual pages.  Fundamentally (and perhaps contrary to its reputation)  git is built on simple ideas, and most of these can be understood from the definitions there.

git: fetch and merge, don’t pull

This is too long and rambling, but to steal a joke from Mark Twain Blaise Pascal I haven’t had time to make it shorter yet.  There is some discussion of this post on the git mailing list, but much of it is tangential to the points I’m trying to make here.

One of the git tips that I find myself frequently passing on to people is:

Don’t use git pull, use git fetch and then git merge.

The problem with git pull is that it has all kinds of helpful magic that means you don’t really have to learn about the different types of branch in git. Mostly things Just Work, but when they don’t it’s often difficult to work out why. What seem like obvious bits of syntax for git pull may have rather surprising results, as even a cursory look through the manual page should convince you.

The other problem is that by both fetching and merging in one command, your working directory is updated without giving you a chance to examine the changes you’ve just brought into your repository. Of course, unless you turn off all the safety checks, the effects of a git pull on your working directory are never going to be catastrophic, but you might prefer to do things more slowly so you don’t have to backtrack.

Branches

Before I explain the advice about git pull any further it’s worth clarifying what a branch is. Branches are often described as being a “line of development”, but I think that’s an unfortunate expression since:

  • If anything, a branch is a “directed acyclic graph of development” rather than a line.
  • It suggests that branches are quite heavyweight objects.

I would suggest that you think of branches in terms of what defines them: they’re a name for a particular commit and all the commits that are ancestors of it, so each branch is completely defined by the SHA1sum of the commit at the tip. This means that manipulating them is a very lightweight operation – you just change that value.

This definition has some perhaps unexpected implications. For example, suppose you have two branches, “stable” and “new-idea”, whose tips are at revisions E and F:

  A-----C----E ("stable")
   \
    B-----D-----F ("new-idea")

So the commits A, C and E are on “stable” and A, B, D and F are on “new-idea”. If you then merge “new-idea” into “stable” with the following commands:

    git checkout stable   # Change to work on the branch "stable"
    git merge new-idea    # Merge in "new-idea"

… then you have the following:

  A-----C----E----G ("stable")
   \             /
    B-----D-----F ("new-idea")

If you carry on committing on “new idea” and on “stable”, you get:

  A-----C----E----G---H ("stable")
   \             /
    B-----D-----F----I ("new-idea")

So now A, B, C, D, E, F, G and H are on “stable”, while A, B, D, F and I are on “new-idea”.

Branches do have some special properties, of course – the most important of these is that if you’re working on a branch and create a new commit, the branch tip will be advanced to that new commit. Hopefully this is what you’d expect. When merging with git merge, you only specify the branch you want to merge into the current one, and only your current branch advances.

Another common situation where this view of branches helps a lot is the following: suppose you’re working on the main branch of a project (called “master”, say) and realise later that what you’ve been doing might have been a bad idea, and you would rather it were on a topic branch. If the commit graph looks like this:

   last version from another repository
      |
      v
  M---N-----O----P---Q ("master")

Then you separate out your work with the following set of commands (where the diagrams show how the state has changed after them):

  git branch dubious-experiment

  M---N-----O----P---Q ("master" and "dubious-experiment")

  git checkout master

  # Be careful with this next command: make sure "git status" is
  # clean, you're definitely on "master" and the
  # "dubious-experiment" branch has the commits you were working
  # on first...

  git reset --hard <SHA1sum of commit N>

       ("master")
  M---N-------------O----P---Q ("dubious-experiment")

  git pull # Or something that updates "master" from
           # somewhere else...

  M--N----R---S ("master")
      \
       O---P---Q ("dubious-experiment")

This is something I seem to end up doing a lot… :)

Types of Branches

The terminology for branches gets pretty confusing, unfortunately, since it has changed over the course of git’s development. I’m going to try to convince you that there are really only two types of branches. These are:

(a) “Local branches”: what you see when you type git branch, e.g. to use an abbreviated example I have here:

       $ git branch
         debian
         server
       * master

(b) “Remote-tracking branches”: what you see when you type git branch -r, e.g.:

       $ git branch -r
       cognac/master
       fruitfly/server
       origin/albert
       origin/ant
       origin/contrib
       origin/cross-compile

The names of tracking branches are made up of the name of a “remote” (e.g. origin, cognac, fruitfly) followed by “/” and then the name of a branch in that remote respository. (“remotes” are just nicknames for other repositories, synonymous with a URL or the path of a local directory – you can set up extra remotes yourself with “git remote”, but “git clone” by default sets up “origin” for you.)

If you’re interested in how these branches are stored locally, look at the files in:

  • .git/refs/heads/ [for local branches]
  • .git/refs/remotes/ [for tracking branches]

Both types of branches are very similar in some respects – they’re all just stored locally as single SHA1 sums representing a commit. (I emphasize “locally” since some people see “origin/master” and assume that in some sense this branch is incomplete without access to the remote server – that isn’t the case.)

Despite this similarity there is one particularly important difference:

  • The safe ways to change remote-tracking branches are with git fetch or as a side-effect of git-push; you can’t work on remote-tracking branches directly. In contrast, you can always switch to local branches and create new commits to move the tip of the branch forward.

So what you mostly do with remote-tracking branches is one of the following:

  • Update them with git fetch
  • Merge from them into your current branch
  • Create new local branches based on them

Creating local branches based on remote-tracking branches

If you want to create a local branch based on a remote-tracking branch (i.e. in order to actually work on it) you can do that with git branch –track or git checkout –track -b, which is similar but it also switches your working tree to the newly created local branch. For example, if you see in git branch -r that there’s a remote-tracking branch called origin/refactored that you want, you would use the command:

    git checkout --track -b refactored origin/refactored

In this example “refactored” is the name of the new branch and “origin/refactored” is the name of existing remote-tracking branch to base it on. (In recent versions of git the “–track” option is actually unnecessary since it’s implied when the final parameter is a remote-tracking branch, as in this example.)

The “–track” option sets up some configuration variables that associate the local branch with the remote-tracking branch. These are useful chiefly for two things:

  • They allow git pull to know what to merge after fetching new remote-tracking branches.
  • If you do git checkout to a local branch which has been set up in this way, it will give you a helpful message such as:
    Your branch and the tracked remote branch 'origin/master'
    have diverged, and respectively have 3 and 384 different
    commit(s) each.

… or:

    Your branch is behind the tracked remote branch
    'origin/master' by 3 commits, and can be fast-forwarded.

The configuration variables that allow this are called “branch.<local-branch-name>.merge” and “branch.<local-branch-name>.remote”, but you probably don’t need to worry about them.

You have probably noticed that after cloning from an established remote repository git branch -r lists many remote-tracking branches, but you only have one local branch. In that case, a variation of the command above is what you need to set up local branches that track those remote-tracking branches.

You might care to note some confusing terminology here: the word “track” in “–track” means tracking of a remote-tracking branch by a local branch, whereas in “remote-tracking branch” it means the tracking of a branch in a remote repository by the remote-tracking branch. Somewhat confusing…

Now, let’s look at an example of how to update from a remote repository, and then how to push changes to a new repository.

Updating from a Remote Repository

So, if I want get changes from the remote repository called “origin” into my local repository I’ll type git fetch origin and you might see some output like this:

  remote: Counting objects: 382, done.
  remote: Compressing objects: 100% (203/203), done.
  remote: Total 278 (delta 177), reused 103 (delta 59)
  Receiving objects: 100% (278/278), 4.89 MiB | 539 KiB/s, done.
  Resolving deltas: 100% (177/177), completed with 40 local objects.
  From ssh://longair@pacific.mpi-cbg.de/srv/git/fiji
     3036acc..9eb5e40  debian-release-20081030 -> origin/debian-release-20081030
   * [new branch]      debian-release-20081112 -> origin/debian-release-20081112
   * [new branch]      debian-release-20081112.1 -> origin/debian-release-20081112.1
     3d619e7..6260626  master     -> origin/master

The most important bits here are the lines like these:

     3036acc..9eb5e40  debian-release-20081030 -> origin/debian-release-20081030
   * [new branch]      debian-release-20081112 -> origin/debian-release-20081112

The first line of these two shows that your remote-tracking branch origin/debian-release-20081030 has been advanced from the commit 3036acc to 9eb5e40. The bit before the arrow is the name of the branch in the remote repository. The second line similarly show that since we last did this, a new remote tracking branch has been created. (git fetch may also fetch new tags if they have appeared in the remote repository.)

The lines before those are git fetch working out exactly which objects it will need to download to our local repository’s pool of objects, so that they will be available locally for anything we want to do with these updated branches and tags.

git fetch doesn’t touch your working tree at all, so gives you a little breathing space to decide what you want to do next. To actually bring the changes from the remote branch into your working tree, you have to do a git merge. So, for instance, if I’m working on “master” (after a git checkout master) then I can merge in the changes that we’ve just got from origin with:

    git merge origin/master

(This might be a fast-forward, if you haven’t created any new commits that aren’t on master in the remote repository, or it might be a more complicated merge.)

If instead you just wanted to see what the differences are between your branch and the remote one, you could do that with:

    git diff master origin/master

This is the nice point about fetching and merging separately: it gives you the chance to examine what you’ve fetched before deciding what to do next. Also, by doing this separately the distinction between when you should use a local branch name and a remote-tracking branch name becomes clear very quickly.

Pushing your changes to a remote repository

How about the other way round? Suppose you’ve made some changes to the branch “experimental” and want to push that to a remote repository called “origin”. This should be as simple as:

    git push origin experimental

You might get an error saying that the remote repository can’t fast-forward the branch, which probably means that someone else has pushed different changes to that branch. So, that case you’ll need to fetch and merge their changes before trying the push again.

Aside

If the branch has a different name in the remote repository (“experiment-by-bob”, say) you’d do this with:

      git push origin experimental:experiment-by-bob

On older versions of git, if “experiment-by-bob” doesn’t already exist, the syntax needs to be:

      git push origin experimental:refs/heads/experiment-by-bob

… to create the remote branch.  However that seems to be no longer the case, at least in git version 1.6.1.2 – see Sitaram’s comment below.

If the branch name is the same locally and remotely then it will be created
automatically without you having to use any special syntax, i.e. you can just do git push origin experimental as normal.

In practice, however, it’s less confusing if you keep the branch names the same. (The <source-name>:<destination-name> syntax there is known as a “refspec”, about which we’ll say no more here.)

An important point here is that this git push doesn’t involve the remote-tracking branch origin/experimental at all – it will only be updated the next time you do git fetch. Correction: as Deskin Miller points out below, your remote-tracking branches will be updated on pushing to the corresponding branches in one of your remotes.

Why not git pull?

Well, git pull is fine most of the time, and particularly if you’re using git in a CVS-like fashion then it’s probably what you want. However, if you want to use git in a more idiomatic way (creating lots of topic branches, rewriting local history whenever you feel like it, and so on) then it helps a lot to get used to doing git fetch and git merge separately.