The most confusing git terminology

To add my usual disclaimer to the start of these blog posts, I should say that I love git; I think it’s a beautiful and elegant system, and it saves me huge amounts of time in my daily work. However, I think it’s a fair criticism of the system that its terminology is very confusing for newcomers, and in particular those who have come from using CVS or Subversion.

This is a personal list of some of my “favourite” points of confusion, which I’ve seen arise time and time again, both in real life and when answering questions on Stack Overflow. To be fair to all the excellent people who have contributed to git’s development, in most cases it’s clear that they are well aware that these terms can be problematic, and are trying to improve the situation subject to compatibility constraints. The problems that seem most bizarre are those that reuse CVS and Subversion terms for completely different concepts – I speculate a bit about that at the bottom.

“update”

If you’ve used Subversion or CVS, you’re probably used to “update” being a command that goes to the remote repository, and incorporates changes from the remote version into your local copy – this is (very broadly) analogous to “git pull”.  So, when you see the following error message when using git:

foo.c: needs update

You might imagine that this means you need to run “git pull”. However, that’s wrong.  In fact, what “needs update” means is approximately: “there are local modifications to this file, which you should probably commit or stash”.

“track” and “tracking”

The word “track” is used in git in three senses that I’m aware of. This ambiguity is particularly nasty, because the latter two collide at a point in learning the system where newcomers to git are likely to be baffled anyway. Fortunately, this seems to have been recognized by git’s developers (see below).

1. “track” as in “untracked files”

To say that a file is tracked in the repository appears to mean that it is either present in the index or exists in the commit pointed to by HEAD.  You see this usage most often in the output of “git status”, where it will list “untracked files”:

# On branch master
# Untracked files:
#   (use "git add <file>..." to include in what will be committed)
#
#    .classpath

This sense is relatively intuitive, I think – it was only after complaining for a while about the next two senses of “track” that I even remembered that there was also this one :)

2. “track” as in “remote-tracking branch”

As a bit of background, you can think of a remote-tracking branch as a local cache of the state of a branch in a remote repository.  The most commonly seen example is origin/master, or, to name that ref in full, refs/remotes/origin/master.  Such branches are usually updated by git fetch (and thus also potentially by git pull).  They are also updated by a successful push to the branch in the remote repository that they correspond to.   You can merge from them, examine their history, etc. but you can’t work directly on them.

The sense of “track” in the phrase “remote-tracking branch” is indicating that the remote-tracking branch is tracking the state of the branch in the remote repository the last time that remote-tracking branch was updated.  So, you might say that refs/remotes/origin/master is tracking the state of the branch master in origin.

The “tracking” here is defined by the refspec in the config variable remote.<remote-name>.fetch and the URL in the config variable remote.<remote-name>.url.

3. “track” as in “git branch –track foo origin/bar” and “Branch foo set up to track remote branch bar from origin”

Again, if you want to do some work on a branch from a remote repository, but want to keep your work separate from everything else in your repository, you’ll typically use a command like the following (or one of its many “Do What I Mean” equivalents):

git checkout --track -b foo origin/bar

… which will result in the following messages:

Branch foo set up to track remote branch bar from origin
Switched to a new branch 'bar'

The sense of “track” both in the command and the output is distinct from the previous sense – it means that config options have been set that associate your new local branch with another branch in the remote repository. The documentation sometimes refers to this relationship as making bar in origin “upstream” of foo. This “upstream” association is very useful, in fact: it enables nice features like being able to just type git pull while you’re on branch foo in order to fetch from origin and then merge from origin/bar. It’s also how you get helpful messages about the state of your branch relative to the remote-tracking branch, like “Your branch foo is 24 commits ahead of origin/bar and can be fast-forwarded”.

The tracking here is defined by config variables branch.<branch-name>.remote and branch.<branch-name>.merge.

“tracking” Summary

Fortunately, the third sense of “tracking” seems to be being carefully deprecated – for example, one of the possible options for push.default used to be tracking, but this is now deprecated in favour of the option name upstream. The commit message for 53c403116 says:

push.default: Rename ‘tracking’ to ‘upstream’

Users are sometimes confused with two different types of “tracking” behavior in Git: “remote-tracking” branches (e.g. refs/remotes/*/*) versus the merge/rebase relationship between a local branch and its @{upstream} (controlled by branch.foo.remote and branch.foo.merge config settings).

When the push.default is set to ‘tracking’, it specifies that a branch should be pushed to its @{upstream} branch. In other words, setting push.default to ‘tracking’ applies only to the latter of the above two types of “tracking” behavior.

In order to make this more understandable to the user, we rename the push.default == ‘tracking’ option to push.default == ‘upstream’.

push.default == ‘tracking’ is left as a deprecated synonym for ‘upstream’.

“commit”

In CVS and Subversion, “commit” means to send your changes to the remote repository. In git the action of committing (with “git commit”) is entirely local; the closest equivalent of “cvs commit” is “git push”. In addition, the word “commit” in git has different verb and noun senses (although frankly I’ve never found this confusing myself). To quote from the helpful git glossary:

commit

As a noun: A single point in the git history; the entire history of a project is represented as a set of interrelated commits. The word “commit” is often used by git in the same places other revision control systems use the words “revision” or “version”. Also used as a short hand for commit object.

As a verb: The action of storing a new snapshot of the project’s state in the git history, by creating a new commit representing the current state of the index and advancing HEAD to point at the new commit.

Strictly speaking that makes two different noun senses, but as a user I’ve rarely found that confusing.

“checkout”

In CVS and Subversion “checkout” creates a new local copy of the source code that is linked to that repository. The closest command in git is “git clone”. However, in git, “git checkout” is used for something completely distinct. In fact, it has two largely distinct modes of operation:

  1. To switch HEAD to point to a new branch or commit, in the usage git checkout <branch>. If <branch> is genuinely a local branch, this will switch to that branch (i.e. HEAD will point to the ref name) or if it otherwise resolves to a commit will detach HEAD and point it directly to the commit’s object name.
  2. To replace a file or multiple files in the working copy and the index with their content from a particular commit or the index. This is seen in the usages: git checkout -- (update from the index) and git checkout <tree-ish> -- (where <tree-ish> is typically a commit).

(git checkout is also frequently used with -b, to create a new branch, but that’s really a sub-case of usage 1.)

In my ideal world, these two modes of operation would have different verbs, and neither of them would be “checkout”.

“HEAD” and “head”

There are usually many “heads” (lower-case) in a git repository – the tip of each branch is a head. However, there is only one HEAD (upper-case) which is a symbolic ref which points to the current branch or commit.

“fetch” and “pull”

I wasn’t aware of this until Roy Badami pointed it out, but it seems that git and Mercurial have opposite meanings for “fetch” and “pull” – see the top two lines in this table of git / hg equivalences. I think it’s understandable that since git’s and Mercurial’s development were more or less concurrent, such unfortunate clashes in terminology might occur.

“push” and “pull”

“git pull” is not the opposite of “git push”; the closest there is to an opposite of “git push” is “git fetch”.

“hash”, “SHA1″, “SHA1sum”, “object name” and “object identifier”

These terms are often used synonymously to mean the 40 characters hexadecimal strings that uniquely identify objects in git. “object name” seems to be the most official, but the least used in general. Referring to an object name as a SHA1sum is potentially confusing, since the object name for a blob is not the same as the SHA1sum of the file.

“remote branch”

This term is only used occasionally in the git documentation, but it’s one that I would always try to avoid because it tends to be unclear whether you mean “a branch in a remote repository” or “a remote-tracking branch”. Whenever a git beginner uses this phrase, I think it’s worth clarifying this, since it can avoid later confusion.

“index”, “staging area” and “cache”

As nouns, these are all synonyms, which all exist for historical reasons. Personally, I like “staging area” the best since it seems to be the easiest concept to understand for git beginners, but the other two are used more commonly in the documentation.

When used as command options, --index and --cached have distinct and consistent meanings, as explained by Junio C. Hamano in this useful blog post.

Why are there so many of these points of confusion?

I would speculate that the most significant effect that contributed to these terminology confusions is that git was being actively used by an enthusiastic community from very early in its development, which means that early names for concepts have tended to persist for the sake of compatibility and consistency. That doesn’t necessarily account for the many conflicts with CVS / Subversion usage, however.

To be fair to git, thinking up verbs for particular commands in any software is tough, and there have been enough version control systems written that to completely avoid clashes would lead to some convoluted choices. However, it’s hard to see git’s use of CVS / Subversion terminology for completely different concepts as anything but perverse. Linus has made it very clear many times that he hated CVS, and joked that a design principle for git was WWCVSND (What Would CVS Never Do); I’m sympathetic to that, as I’m sure most are, especially after having switched to the DVCS mindset. However, could that attitude have extended to deliberately disregarding concerns about terminology that might make it actively harder for people to migrate to git from CVS / Subversion? I don’t know nearly enough about the early development of git to know. However, it wouldn’t have been tough to find better choices for commit, checkout and update in each of their various senses.

9 thoughts on “The most confusing git terminology”

  1. Hi, Mark.

    Thanks for yor excellent explanation of “tracking” branches! This is exactly what i searched for.

    Also, i think, here is typo:
    You wrote “The documentation sometimes refers to this relationship as making foo in origin “upstream” of bar.”, but it should be “bar in origin upstream of foo”.

  2. Really good article – a lot of these terms are thrown about in git documentation/tutorials without much explanation and can certainly trip you up.

  3. “Why are there so many of these points of confusion?”

    Probably beacuse Linus is a dictator and he probaby didn’t give shit about others’ opinions and past works for the terminology.

  4. I find the term upstream confusing. You mention one usage in this article. However, GitHub help recommends To keep track of the original repo [you forked from], you need to add another remote named upstream. So when someone says to push upstream, it’s ambiguous. I’m not sure if this usage is typical of Git in general or just GitHub, but it sure left me confused for a while.


  5. “remote branch”

    This term is only used occasionally in the git documentation, but it’s one that I would always try to avoid because it tends to be unclear whether you mean “a branch in a remote repository” or “a remote-tracking branch”. Whenever a git beginner uses this phrase, I think it’s worth clarifying this, since it can avoid later confusion.

    Doh! That’s exactly what I’m trying to find out, but then you don’t actually give us the answer!

    1. Hi Ian, That’s the thing – you often can’t tell without context. Where did you see “remote branch” used, or in what context? (“a branch in a remote repository” is the more accurate interpretation, I think, but people who don’t understand remote-tracking branches might use the term differently.)

  6. Thanks for the helpful git articles. I’m a git noob, and the more different sources I read, the closer my understanding converges on what’s really going on. It’s just a shame that numerous git commands have ended up being labels where one must simply memorize what they really do — might as well have called them things like “abc” and “xyz” instead of terms that initially lead one astray.

    If you’re motivated to write another one, here’s a suggestion. Maybe you could review all the places that git stores stuff, like working repository, index (aka cache or staging area), remote repository, local copy of remote repository (remote tracking branch?), stash (and stash versions), etc., and list the commands that transfer data in various ways among these places. I’ve had to gradually formulate and revise these ideas in my own mind as I’ve been reading about git, and if I had an up-front explanation of this at the outset, it would’ve saved me lots of time.

    1. Thanks for the kind words and your suggestion Oleh. I’ll certainly consider it – there are quite a few good diagrammatic representations of the commands that change what’s stored in the index, working tree, etc., but maybe there’s something more to do. (You can find some good examples with an image search for git index diagram.)

Leave a Reply

Your email address will not be published. Required fields are marked *

You may use these HTML tags and attributes: <a href="" title=""> <abbr title=""> <acronym title=""> <b> <blockquote cite=""> <cite> <code> <del datetime=""> <em> <i> <q cite=""> <strike> <strong>