This is too long and rambling, but to steal a joke from Mark Twain Blaise Pascal I haven’t had time to make it shorter yet. There is some discussion of this post on the git mailing list, but much of it is tangential to the points I’m trying to make here.
One of the git tips that I find myself frequently passing on to people is:
Don’t use git pull, use git fetch and then git merge.
The problem with git pull is that it has all kinds of helpful magic that means you don’t really have to learn about the different types of branch in git. Mostly things Just Work, but when they don’t it’s often difficult to work out why. What seem like obvious bits of syntax for git pull may have rather surprising results, as even a cursory look through the manual page should convince you.
The other problem is that by both fetching and merging in one command, your working directory is updated without giving you a chance to examine the changes you’ve just brought into your repository. Of course, unless you turn off all the safety checks, the effects of a git pull on your working directory are never going to be catastrophic, but you might prefer to do things more slowly so you don’t have to backtrack.
Branches
Before I explain the advice about git pull any further it’s worth clarifying what a branch is. Branches are often described as being a “line of development”, but I think that’s an unfortunate expression since:
- If anything, a branch is a “directed acyclic graph of development” rather than a line.
- It suggests that branches are quite heavyweight objects.
I would suggest that you think of branches in terms of what defines them: they’re a name for a particular commit and all the commits that are ancestors of it, so each branch is completely defined by the SHA1sum of the commit at the tip. This means that manipulating them is a very lightweight operation – you just change that value.
This definition has some perhaps unexpected implications. For example, suppose you have two branches, “stable” and “new-idea”, whose tips are at revisions E and F:
A-----C----E ("stable")
\
B-----D-----F ("new-idea")
So the commits A, C and E are on “stable” and A, B, D and F are on “new-idea”. If you then merge “new-idea” onto “stable” with the following commands:
git checkout stable # Change to work on the branch "stable"
git merge new-idea # Merge in "new-idea"
… then you have the following:
A-----C----E----G ("stable")
\ /
B-----D-----F ("new-idea")
If you carry on committing on “new idea” and on “stable”, you get:
A-----C----E----G---H ("stable")
\ /
B-----D-----F----I ("new-idea")
So now A, B, C, D, E, F, G and H are on “stable”, while A, B, D, F and I are on “new-idea”.
Branches do have some special properties, of course – the most important of these is that if you’re working on a branch and create a new commit, the branch tip will be advanced to that new commit. Hopefully this is what you’d expect. When merging with git merge, you only specify the branch you want to merge into the current one, and only your current branch advances.
Another common situation where this view of branches helps a lot is the following: suppose you’re working on the main branch of a project (called “master”, say) and realise later that what you’ve been doing might have been a bad idea, and you would rather it were on a topic branch. If the commit graph looks like this:
last version from another repository
|
v
M---N-----O----P---Q ("master")
Then you separate out your work with the following set of commands (where the diagrams show how the state has changed after them):
git branch dubious-experiment
M---N-----O----P---Q ("master" and "dubious-experiment")
git checkout master
# Be careful with this next command: make sure "git status" is
# clean, you're definitely on "master" and the
# "dubious-experiment" branch has the commits you were working
# on first...
git reset --hard <SHA1sum of commit N>
("master")
M---N-------------O----P---Q ("dubious-experiment")
git pull # Or something that updates "master" from
# somewhere else...
M--N----R---S ("master")
\
O---P---Q ("dubious-experiment")
This is something I seem to end up doing a lot… :)
Types of Branches
The terminology for branches gets pretty confusing, unfortunately, since it has changed over the course of git’s development. I’m going to try to convince you that there are really only two types of branches. These are:
(a) “Local branches”: what you see when you type git branch, e.g. to use an abbreviated example I have here:
$ git branch
debian
server
* master
(b) “Remote-tracking branches”: what you see when you type git branch -r, e.g.:
$ git branch -r
cognac/master
fruitfly/server
origin/albert
origin/ant
origin/contrib
origin/cross-compile
The names of tracking branches are made up of the name of a “remote” (e.g. origin, cognac, fruitfly) followed by “/” and then the name of a branch in that remote respository. (“remotes” are just nicknames for other repositories, synonymous with a URL or the path of a local directory – you can set up extra remotes yourself with “git remote”, but “git clone” by default sets up “origin” for you.)
If you’re interested in how these branches are stored locally, look at the files in:
- .git/refs/heads/ [for local branches]
- .git/refs/remotes/ [for tracking branches]
Both types of branches are very similar in some respects – they’re all just stored locally as single SHA1 sums representing a commit. (I emphasize “locally” since some people see “origin/master” and assume that in some sense this branch is incomplete without access to the remote server – that isn’t the case.)
Despite this similarity there is one particularly important difference:
- The safe ways to change remote-tracking branches are with git fetch or as a side-effect of git-push; you can’t work on remote-tracking branches directly. In contrast, you can always switch to local branches and create new commits to move the tip of the branch forward.
So what you mostly do with remote-tracking branches is one of the following:
- Update them with git fetch
- Merge from them into your current branch
- Create new local branches based on them
Creating local branches based on remote-tracking branches
If you want to create a local branch based on a remote-tracking branch (i.e. in order to actually work on it) you can do that with git branch –track or git checkout –track -b, which is similar but it also switches your working tree to the newly created local branch. For example, if you see in git branch -r that there’s a remote-tracking branch called origin/refactored that you want, you would use the command:
git checkout --track -b refactored origin/refactored
In this example “refactored” is the name of the new branch and “origin/refactored” is the name of existing remote-tracking branch to base it on. (In recent versions of git the “–track” option is actually unnecessary since it’s implied when the final parameter is a remote-tracking branch, as in this example.)
The “–track” option sets up some configuration variables that associate the local branch with the remote-tracking branch. These are useful chiefly for two things:
- They allow git pull to know what to merge after fetching new remote-tracking branches.
- If you do git checkout to a local branch which has been set up in this way, it will give you a helpful message such as:
Your branch and the tracked remote branch 'origin/master'
have diverged, and respectively have 3 and 384 different
commit(s) each.
… or:
Your branch is behind the tracked remote branch
'origin/master' by 3 commits, and can be fast-forwarded.
The configuration variables that allow this are called “branch.<local-branch-name>.merge” and “branch.<local-branch-name>.remote”, but you probably don’t need to worry about them.
You have probably noticed that after cloning from an established remote repository git branch -r lists many remote-tracking branches, but you only have one local branch. In that case, a variation of the command above is what you need to set up local branches that track those remote-tracking branches.
You might care to note some confusing terminology here: the word “track” in “–track” means tracking of a remote-tracking branch by a local branch, whereas in “remote-tracking branch” it means the tracking of a branch in a remote repository by the remote-tracking branch. Somewhat confusing…
Now, let’s look at an example of how to update from a remote repository, and then how to push changes to a new repository.
Updating from a Remote Repository
So, if I want get changes from the remote repository called “origin” into my local repository I’ll type git fetch origin and you might see some output like this:
remote: Counting objects: 382, done.
remote: Compressing objects: 100% (203/203), done.
remote: Total 278 (delta 177), reused 103 (delta 59)
Receiving objects: 100% (278/278), 4.89 MiB | 539 KiB/s, done.
Resolving deltas: 100% (177/177), completed with 40 local objects.
From ssh://longair@pacific.mpi-cbg.de/srv/git/fiji
3036acc..9eb5e40 debian-release-20081030 -> origin/debian-release-20081030
* [new branch] debian-release-20081112 -> origin/debian-release-20081112
* [new branch] debian-release-20081112.1 -> origin/debian-release-20081112.1
3d619e7..6260626 master -> origin/master
The most important bits here are the lines like these:
3036acc..9eb5e40 debian-release-20081030 -> origin/debian-release-20081030 * [new branch] debian-release-20081112 -> origin/debian-release-20081112
The first line of these two shows that your remote-tracking branch origin/debian-release-20081030 has been advanced from the commit 3036acc to 9eb5e40. The bit before the arrow is the name of the branch in the remote repository. The second line similarly show that since we last did this, a new remote tracking branch has been created. (git fetch may also fetch new tags if they have appeared in the remote repository.)
The lines before those are git fetch working out exactly which objects it will need to download to our local repository’s pool of objects, so that they will be available locally for anything we want to do with these updated branches and tags.
git fetch doesn’t touch your working tree at all, so gives you a little breathing space to decide what you want to do next. To actually bring the changes from the remote branch into your working tree, you have to do a git merge. So, for instance, if I’m working on “master” (after a git checkout master) then I can merge in the changes that we’ve just got from origin with:
git merge origin/master
(This might be a fast-forward, if you haven’t created any new commits that aren’t on master in the remote repository, or it might be a more complicated merge.)
If instead you just wanted to see what the differences are between your branch and the remote one, you could do that with:
git diff master origin/master
This is the nice point about fetching and merging separately: it gives you the chance to examine what you’ve fetched before deciding what to do next. Also, by doing this separately the distinction between when you should use a local branch name and a remote-tracking branch name becomes clear very quickly.
Pushing your changes to a remote repository
How about the other way round? Suppose you’ve made some changes to the branch “experimental” and want to push that to a remote repository called “origin”. This should be as simple as:
git push origin experimental
You might get an error saying that the remote repository can’t fast-forward the branch, which probably means that someone else has pushed different changes to that branch. So, that case you’ll need to fetch and merge their changes before trying the push again.
Aside
If the branch has a different name in the remote repository (“experiment-by-bob”, say) you’d do this with:
git push origin experimental:experiment-by-bobOn older versions of git, if “experiment-by-bob” doesn’t already exist, the syntax needs to be:
git push origin experimental:refs/heads/experiment-by-bob… to create the remote branch. However that seems to be no longer the case, at least in git version 1.6.1.2 – see Sitaram’s comment below.
If the branch name is the same locally and remotely then it will be created
automatically without you having to use any special syntax, i.e. you can just do git push origin experimental as normal.In practice, however, it’s less confusing if you keep the branch names the same. (The <source-name>:<destination-name> syntax there is known as a “refspec”, about which we’ll say no more here.)
An important point here is that this git push doesn’t involve the remote-tracking branch origin/experimental at all – it will only be updated the next time you do git fetch. Correction: as Deskin Miller points out below, your remote-tracking branches will be updated on pushing to the corresponding branches in one of your remotes.
Why not git pull?
Well, git pull is fine most of the time, and particularly if you’re using git in a CVS-like fashion then it’s probably what you want. However, if you want to use git in a more idiomatic way (creating lots of topic branches, rewriting local history whenever you feel like it, and so on) then it helps a lot to get used to doing git fetch and git merge separately.
“An important point here is that this git push doesn’t involve the remote-tracking branch origin/experimental at all – it will only be updated the next time you do git fetch”
This is untrue: when pushing to a named remote, if the remote branch you’re pushing to is one which would be tracked according to your settings for that remote, your remote-tracking branch will be updated appropriately. This is the case even when pushing multiple branches.
Excellent post overall!
Deskin: thanks, and thank-you for catching that error; I’ve corrected the appropriate bits.
inside the little breakout box starting with “aside…”, it says:
… or if “experiment-by-bob” doesn’t already exist, the syntax needs to be:
git push origin experimental:refs/heads/experiment-by-bob
I have just tested it and it is not necessary; I can make a brand new branch and push it to a *different* new name to origin without needing that “refs/heads” part.
This is on Git 1.6.2; I did not check older versions but I do not remember ever having to do this.
Sitaram: thanks for bringing that to my attention. I’ve just tested on one of the systems I use which still has git 1.5.3.5 and it does give an error in that case:
$ git push origin topic:topic-new
error: dst refspec topic-new does not match any existing ref on the remote and does not start with refs/.
fatal: The remote end hung up unexpectedly
error: failed to push to '/home/mark/tmp/foo/.git'
… but the same test on git 1.6.1.2 works OK:
$ git push origin topic:new-topic
Total 0 (delta 0), reused 0 (delta 0)
To /home/mark/tmp/quuz/
* [new branch] topic -> new-topic
At least I wasn’t just making it up :) I’ll update the post with a note that that only applies to older versions.
In the first place, thank you for the article.
As a side question, after a fetch, how can I merge all tracked remote branches with the local ones in a single command? (besides creating a script to merge each branch in turn)
I don’t think there’s a single command that will do that. You’d have to be quite careful about writing such a script, since you would need to safely checkout each branch (i.e. check that the working tree is clean before you switch) and check that the merge will be a fast-forward.
I can’t really imagine wanting to do that, myself, because in practice the warning you get on checking out a branch (i.e. the one about the state of that branch with regard to the remote-tracking branch it tracks) stops me from forgetting to merge before carrying on work.
a very useful and descriptive post, thank you
the only thing missing, in my opinion, is an example of of updating a local -track branch with a remote tracking one.
Thanks for your comment, chhh. Do you mean something like the bit where I talk about doing “git merge origin/master”? I suppose it could go into more detail about the different options for doing that (e.g. “git reset –hard origin/master” to throw away your changes, “git rebase”, etc.) for various situations, but I’m a bit worried that the post is already overlong.
Well, it’s your article, so you decide.
Usually i just scan through pages when searching for solution, but read this whole text.
Yes “git merge origin/master” part may become confusing for many people who are only beginning using git. In my opinion it would have been very helpful if you had mentioned ways to get out of merge conflicts (not in full blown detail, just brief directions like git reset –merge/hard HEAD or something)
Excellent post! Thanks so much for taking the time to write this up, it has helped a lot. I’m pretty comfortable with SVN and some other proprietary SCM’s.. but i’m just learning Git. Some of the principals are quite tricky! :p
Fantastic article. Thanks!
Thank you for the post. It helps a lot ..
Hi,
thanks a lot for this in depth yet very easy to understand explanation of git internals. Not only it helped me to fetch and merge branches, but also enabled me to understand a lot of other git actions commands. I’ve never thought of git as a database containing simple graphs! This article should definately make to some git book – it helps to visualize git internal thus use git more efectively. Thanks again!
Just read this article, thought gosh how informative, and then laughed when I saw the author photo at the end :)
Hey Mark, cool post ! Guess I’d better follow the RSS now I’ve found ya…
Please do! You would be my third reader, as far as I know ;) I’m glad to hear the post was useful.
Really useful post, definitely helped me!
Cheers :)
Great article! It’s cleared at least one thing that I’ve been wondering about for a while, and that’s merging from a remote to a local branch that you’re not checkout out in.
i.e. I’m working on a feature branch, and have discovered a change in the master that I really want included in my feature branch. What I would end up doing is
$ git pull # from the feature branch, just to make sure I’m up to date
$ git stash # because I probably have wip that I’m not ready to commit
$ git checkout master # hopefully helpful message about being behind, and can fast-forward
$ git pull # I should probably do a git merge, since I’ve already fetched from the previous pull
$ git checkout feature
$ git merge master # and that merges all changes from the master since the branch, into feature
$ git stash pop # restore the wip
For cases where it’s not convenient to merge all of master in to the feature, I’ve used cherry-pick.
Seems a bit long winded, doesn’t it, and I’ve wondered if there’s a better way.
For a while, I was using git pull origin/master from my feature branch, to fetch and merge those latest master changes into my feature. From this article, it might be better to use a bit more caution and git fetch origin/master, before merging with git merge origin/master.
Does that seem like a better way of grabbing changes from a remote master into a local feature?
Well, you can just do:
git fetch origin # updates all remote-tracking branches for origin
git checkout feature
git merge origin/master
(Of course, that leaves your master where it was – you could checkout master and merge origin/master into that as well, though.)
Great post.
Thanks!
Great article. A light bulb definitely went on when I read the “remote-tracking branch name becomes clear very quickly” section. I was never sure what git considered “proper” addresses (or locations or names). The “refs/heads/master” naming convention is definitely a different way of naming locations than the “origin/branch” naming convention and confused me until I read your post. To check branch names I always did a “git branch -a” and thought that “remotes/origin/branchname” was an actual location to use on the command line rather than just “origin/branchname”.
A point that still needs clarification for me is the definition of “track”. What exactly is tracked? Is it a connection of some sort? With all the git commands what happens automatically with this “connection” and what do I have to do manually. To me it seems a better name would be “reference” because nothing seems to happen automatically unless you do a “git pull”. I guess “tracking” implies (to me) an automatic updated pointer or something. I’m guessing the tracked thing is just the parent lineage (SHA1 sum) number?
It seems from a branch maintenance point of view that the “tracking” feature of git falls short. I still find myself in deep development trees trying to find out who’s the ancestor of who and who is behind who. I’ve seen some gui tools that help visualize it but not that well (Tower).
Thanks again for helping me see the light.
The two different meanings of “track” in git are, indeed, rather confusing. I tried to address that in this Stack Overflow answer:
http://stackoverflow.com/questions/6631337/are-there-different-meanings-to-the-concept-of-tracking-in-git/6631524#6631524
In the “remote-tracking branch” sense, the “tracking” is defined in the configuration variables that define the remote. For example, in a clone of git’s source code that I have, the following configuration defines the remote:
remote.origin.url=git://git.kernel.org/pub/scm/git/git.git
remote.origin.fetch=+refs/heads/*:refs/remotes/origin/*
The first line option defines that URL of the repository that the remote refers to, and the second describes how the branch names are mapped when fetching from that remote – in this case, take all the branches under refs/heads in the remote repository, and update the remote-tracking branch with the same name under refs/remotes/origin/
Thanks for this post, it’s a very useful resource. When I teach people Git, this is one of the things I explain in some detail, much as you’ve done here. It helps so much not only with their ability to pull safely, but also with equipping them with the understanding to answer their own Git questions in the future.
Thank you for your post. Being fairly new to git, is there a tutorial or book to help demonstrate the concepts made in this article?
Personally, I like the book Pro Git by Scott Chacon, which is also readable online. There are plenty of good introductions to git concepts on the web – I particularly like git for computer scientists, but obviously that’s got a particular target audience. I tried to write a brief tutorial myself.
This is a great article, but I do agree that ‘git pull’ is perfectly fine most of the time.
I can see the benefit of ‘git fetch’ and ‘git merge’ when you’re working on branches for debian or the linux kernel, but I think it would hardly ever be necessary for projects with only a few contributors.
The updating local from remote part of the article, though simple enough to understand, is where I’m having difficulty. After doing the fetch, git diff shows a difference, but the git merge reports ‘already up to date’. Must be something simple, am a beginner with git.
git fetch doesn’t touch your working tree at all, so gives you a little breathing space to decide what you want to do next. To actually bring the changes from the remote branch into your working tree, you have to do a git merge. So, for instance, if I’m working on “master” (after a git checkout master) then I can merge in the changes that we’ve just got from origin with:
git merge origin/master
(This might be a fast-forward, if you haven’t created any new commits that aren’t on master in the remote repository, or it might be a more complicated merge.)
If instead you just wanted to see what the differences are between your branch and the remote one, you could do that with:
git diff master origin/master
Hi Jonathan: if git is reporting “already up to date” when you try to merge origin/master into your master branch, that means that you already have the complete history of origin/master included in your master branch. If git diff master origin/master then shows a difference between the two, that means that your master branch is ahead of origin/master.
Thanks for the quick response Mark. From your response I see exactly what is going on; I do indeed have master ahead of origin/master and so I may have misunderstood the usage of merge. In trying to learn, I had created a test whereby I was trying to ‘undo’ a local commit by fetching/merging from origin/master but I suspect this probably isn’t an intended behaviour: it must suggest to use git reset or similar on the master. I read half the Loeliger book without understanding it correctly. Thanks again!
excellent article thanks for creating it !!
Mark,
very useful article; got me unblocked. Delighted to see it was you who wrote it!
Hi Martin, I’m glad to hear the post was of use — you might find some of the other things I’ve written here that are in the git category useful as well. I hope you’re well.
Got my fundas straight. Great article.
I have this annoying habit of looking up words I don’t understand and ‘acylic’ is one of those. But can’t find it anywhere. Is ‘acyclic’ the word intended?
Yes, acyclic as in “has no cycles”. In other words, if you start any particular commit, you can never get back to the one you were at originally by following links to parent commits.
Very nice article! Thank you!
Thanks Mark – this topic has confused me for many months. I knew that fetch-merge was a safer option than pull from many other posts I’d read, but I now have an understanding why.
My friend Randy Fay also has a good article, which explains what can go wrong with multiple people doing multiple commits, where everyone is doing pull and push. He has a newer posting to help if you forget to do fetch-merge or insist on always pulling: .
Keep up the good work!
(I’ve edited your original comment and removed the two followups, which I trust is what you’d want.)
Thanks for the comments. Personally, I’ve always avoided the automatic rebase feature of git, since I like to decide just before pushing whether to rebase or keep any merge commits there – there’s nothing to stop one from pulling normally and rebasing later, after all.
Good to know, I can finally understand how to use a GUI to git.
Thank you for expliciting what are the differences between pull and fetch.
But I think it’s still lacking the explanation that git fetch origin master will download the content of the remotely stored master branch on the origin server into your LOCALLY origin/master branch. Thus, leaving your LOCAL master branch untouched ! You can then merge/rebase the LOCALLY origin/master branch on you local master branch.
But with the examples and explanation, I figured that out. (didn’t know how to handle the git svn rebase workflow)
If I may you didn’t get it: there is a typo in the article, you wrote acylic instead of acyclic (second c is forgotten :-).
Anyway great article!
Oops – I’ve corrected that now. Thanks for pointing out that (and that I’d failed to understand the previous comment mentioning the same error…)
excellent lesson, and lessons that can be used to help my lecture today. thank you
I’m still just reading about git – and I’m not sure I’ve got my head around it’s terminology yet which (I get the impression) is deliberately and self-indulgently incompatible with the terminology of other popular version control systems, so I may be completely confused, but…
To me, a merge is when you take two different sets of edits to the same file, and try to combine the intent of the edits to produce a single resulting file. It is a process that can in some cases be performed automatically, and in other cases is impossible without (sometimes significant) manual work.
Are you saying that a “git pull” (unlike an “hg pull”) implicitly performs a merge (in the above sense) without it being explicitly requested? That sounds… surprising… Either I don’t get git yet, or I don’t get git yet. (Or maybe both. Which is fine. It took me quite a while to evn start to get Hg, too, and I’m still only in the “reading about” phase with git.)
roy
Hi Roy – it’s nice to hear from you. You’re quite right about the terminology – I’ve got a blog post in mind about the most confusing terms in git, e.g. the three different senses of “track” – aargh!)
I hadn’t realized this one before, not being familiar with Mercurial, but, indeed, it seems that “hg pull” is closest to “git fetch”, while “hg pull -u” (or “hg fetch” – aargh again!) is closest to “git pull”.
With regard to explicit / implicit, I dunno. I’d probably argue that using “git pull” is to explicitly request a merge as well as a fetch, since the summary in its man page says “Fetch from and merge with another repository or a local branch” (my emphasis). However, part of the point of this blog post is that a lot of newcomers to git don’t realize that you can do “git fetch” instead of “git pull” to avoid the automatic merge.
The thing that I might change in your definition of a merge to make it more “git worldview” is that (usually) merges in git are between two commits (and thus the state of two complete source code trees) rather than a file-by-file operation. (I say “usually” because you can merge between more than two commits (octopus merges), there are subtree merges, etc. etc.)
Oh, Hi Mark, didn’t realise it was you!
I’m probably being unfair in my complaint about implicit merges – all source code control systems implicitly merge into the working copy all the time.
But in source code control systems I have some familiarity with (SVN, a little Hg) the only merges that I can think of that happen without typing the command ‘merge’ happen when there are uncommitted changes in the working copy, and an operation is performed that updates the working copy (and hence has to merge those changes in). And AIUI that’s not what we’re talking about here.
There’s also something odd going on with command naming – why is it that a pull is not the opposite of a push? To have push and fetch be the actual direct counterparts, with pull being something different, is… unnaturual to say the least…
roy
Oh, and I don’t think (although I haven’t used it) that “hg pull -u” will ever perform a merge unless there are any uncommitted changes in the working directory – everything I can see just says it’s equivalient to “hg pull; hg update” which will just pull changes in, and then update the working copy to the latest revision in the current branch.
But whoa!, I didn’t know about hg fetch. Talk about implicit, not only does it implicitly merge, it even implicitly commits the results of the merge. (Ok, I imagine you’ll get prompted for a commit message, so you’re unlikely to do so by accident, but still… IMHO something as wacky as fetch should be an Hg extension so you have to explicitly enable it….)
roy