To add my usual disclaimer to the start of these blog posts, I should say that I love git; I think it’s a beautiful and elegant system, and it saves me huge amounts of time in my daily work. However, I think it’s a fair criticism of the system that its terminology is very confusing for newcomers, and in particular those who have come from using CVS or Subversion.
This is a personal list of some of my “favourite” points of confusion, which I’ve seen arise time and time again, both in real life and when answering questions on Stack Overflow. To be fair to all the excellent people who have contributed to git’s development, in most cases it’s clear that they are well aware that these terms can be problematic, and are trying to improve the situation subject to compatibility constraints. The problems that seem most bizarre are those that reuse CVS and Subversion terms for completely different concepts – I speculate a bit about that at the bottom.
“update”
If you’ve used Subversion or CVS, you’re probably used to “update” being a command that goes to the remote repository, and incorporates changes from the remote version into your local copy – this is (very broadly) analogous to “git pull”. So, when you see the following error message when using git:
foo.c: needs update
You might imagine that this means you need to run “git pull”. However, that’s wrong. In fact, what “needs update” means is approximately: “there are local modifications to this file, which you should probably commit or stash”.
“track” and “tracking”
The word “track” is used in git in three senses that I’m aware of. This ambiguity is particularly nasty, because the latter two collide at a point in learning the system where newcomers to git are likely to be baffled anyway. Fortunately, this seems to have been recognized by git’s developers (see below).
1. “track” as in “untracked files”
To say that a file is tracked in the repository appears to mean that it is either present in the index or exists in the commit pointed to by HEAD. You see this usage most often in the output of “git status”, where it will list “untracked files”:
# On branch master # Untracked files: # (use "git add <file>..." to include in what will be committed) # # .classpath
This sense is relatively intuitive, I think – it was only after complaining for a while about the next two senses of “track” that I even remembered that there was also this one :)
2. “track” as in “remote-tracking branch”
As a bit of background, you can think of a remote-tracking branch as a local cache of the state of a branch in a remote repository. The most commonly seen example is origin/master, or, to name that ref in full, refs/remotes/origin/master. Such branches are usually updated by git fetch (and thus also potentially by git pull). They are also updated by a successful push to the branch in the remote repository that they correspond to. You can merge from them, examine their history, etc. but you can’t work directly on them.
The sense of “track” in the phrase “remote-tracking branch” is indicating that the remote-tracking branch is tracking the state of the branch in the remote repository the last time that remote-tracking branch was updated. So, you might say that refs/remotes/origin/master is tracking the state of the branch master in origin.
The “tracking” here is defined by the refspec in the config variable remote.<remote-name>.fetch and the URL in the config variable remote.<remote-name>.url.
3. “track” as in “git branch –track foo origin/bar” and “Branch foo set up to track remote branch bar from origin”
Again, if you want to do some work on a branch from a remote repository, but want to keep your work separate from everything else in your repository, you’ll typically use a command like the following (or one of its many “Do What I Mean” equivalents):
git checkout --track -b foo origin/bar
… which will result in the following messages:
Branch foo set up to track remote branch bar from origin Switched to a new branch 'foo'
The sense of “track” both in the command and the output is distinct from the previous sense – it means that config options have been set that associate your new local branch with another branch in the remote repository. The documentation sometimes refers to this relationship as making bar in origin “upstream” of foo. This “upstream” association is very useful, in fact: it enables nice features like being able to just type git pull while you’re on branch foo in order to fetch from origin and then merge from origin/bar. It’s also how you get helpful messages about the state of your branch relative to the remote-tracking branch, like “Your branch foo is 24 commits ahead of origin/bar and can be fast-forwarded”.
The tracking here is defined by config variables branch.<branch-name>.remote and branch.<branch-name>.merge.
“tracking” Summary
Fortunately, the third sense of “tracking” seems to be being carefully deprecated – for example, one of the possible options for push.default used to be tracking, but this is now deprecated in favour of the option name upstream. The commit message for 53c403116 says:
push.default: Rename ‘tracking’ to ‘upstream’
Users are sometimes confused with two different types of “tracking” behavior in Git: “remote-tracking” branches (e.g. refs/remotes/*/*) versus the merge/rebase relationship between a local branch and its @{upstream} (controlled by branch.foo.remote and branch.foo.merge config settings).
When the push.default is set to ‘tracking’, it specifies that a branch should be pushed to its @{upstream} branch. In other words, setting push.default to ‘tracking’ applies only to the latter of the above two types of “tracking” behavior.
In order to make this more understandable to the user, we rename the push.default == ‘tracking’ option to push.default == ‘upstream’.
push.default == ‘tracking’ is left as a deprecated synonym for ‘upstream’.
“commit”
In CVS and Subversion, “commit” means to send your changes to the remote repository. In git the action of committing (with “git commit”) is entirely local; the closest equivalent of “cvs commit” is “git push”. In addition, the word “commit” in git has different verb and noun senses (although frankly I’ve never found this confusing myself). To quote from the helpful git glossary:
commit
As a noun: A single point in the git history; the entire history of a project is represented as a set of interrelated commits. The word “commit” is often used by git in the same places other revision control systems use the words “revision” or “version”. Also used as a short hand for commit object.
As a verb: The action of storing a new snapshot of the project’s state in the git history, by creating a new commit representing the current state of the index and advancing HEAD to point at the new commit.
Strictly speaking that makes two different noun senses, but as a user I’ve rarely found that confusing.
“checkout”
In CVS and Subversion “checkout” creates a new local copy of the source code that is linked to that repository. The closest command in git is “git clone”. However, in git, “git checkout” is used for something completely distinct. In fact, it has two largely distinct modes of operation:
- To switch HEAD to point to a new branch or commit, in the usage git checkout <branch>. If <branch> is genuinely a local branch, this will switch to that branch (i.e. HEAD will point to the ref name) or if it otherwise resolves to a commit will detach HEAD and point it directly to the commit’s object name.
- To replace a file or multiple files in the working copy and the index with their content from a particular commit or the index. This is seen in the usages: git checkout -- (update from the index) and git checkout <tree-ish> -- (where <tree-ish> is typically a commit).
(git checkout is also frequently used with -b, to create a new branch, but that’s really a sub-case of usage 1.)
In my ideal world, these two modes of operation would have different verbs, and neither of them would be “checkout”.
“HEAD” and “head”
There are usually many “heads” (lower-case) in a git repository – the tip of each branch is a head. However, there is only one HEAD (upper-case) which is a symbolic ref which points to the current branch or commit.
“fetch” and “pull”
I wasn’t aware of this until Roy Badami pointed it out, but it seems that git and Mercurial have opposite meanings for “fetch” and “pull” – see the top two lines in this table of git / hg equivalences. I think it’s understandable that since git’s and Mercurial’s development were more or less concurrent, such unfortunate clashes in terminology might occur.
“push” and “pull”
“git pull” is not the opposite of “git push”; the closest there is to an opposite of “git push” is “git fetch”.
“hash”, “SHA1”, “SHA1sum”, “object name” and “object identifier”
These terms are often used synonymously to mean the 40 characters hexadecimal strings that uniquely identify objects in git. “object name” seems to be the most official, but the least used in general. Referring to an object name as a SHA1sum is potentially confusing, since the object name for a blob is not the same as the SHA1sum of the file.
“remote branch”
This term is only used occasionally in the git documentation, but it’s one that I would always try to avoid because it tends to be unclear whether you mean “a branch in a remote repository” or “a remote-tracking branch”. Whenever a git beginner uses this phrase, I think it’s worth clarifying this, since it can avoid later confusion.
“index”, “staging area” and “cache”
As nouns, these are all synonyms, which all exist for historical reasons. Personally, I like “staging area” the best since it seems to be the easiest concept to understand for git beginners, but the other two are used more commonly in the documentation.
When used as command options, --index and --cached have distinct and consistent meanings, as explained by Junio C. Hamano in this useful blog post.
Why are there so many of these points of confusion?
I would speculate that the most significant effect that contributed to these terminology confusions is that git was being actively used by an enthusiastic community from very early in its development, which means that early names for concepts have tended to persist for the sake of compatibility and consistency. That doesn’t necessarily account for the many conflicts with CVS / Subversion usage, however.
To be fair to git, thinking up verbs for particular commands in any software is tough, and there have been enough version control systems written that to completely avoid clashes would lead to some convoluted choices. However, it’s hard to see git’s use of CVS / Subversion terminology for completely different concepts as anything but perverse. Linus has made it very clear many times that he hated CVS, and joked that a design principle for git was WWCVSND (What Would CVS Never Do); I’m sympathetic to that, as I’m sure most are, especially after having switched to the DVCS mindset. However, could that attitude have extended to deliberately disregarding concerns about terminology that might make it actively harder for people to migrate to git from CVS / Subversion? I don’t know nearly enough about the early development of git to know. However, it wouldn’t have been tough to find better choices for commit, checkout and update in each of their various senses.
Leave a Reply