<?xml version="1.0" encoding="UTF-8"?>
<rss version="2.0"
	xmlns:content="http://purl.org/rss/1.0/modules/content/"
	xmlns:wfw="http://wellformedweb.org/CommentAPI/"
	xmlns:dc="http://purl.org/dc/elements/1.1/"
	xmlns:atom="http://www.w3.org/2005/Atom"
	xmlns:sy="http://purl.org/rss/1.0/modules/syndication/"
	xmlns:slash="http://purl.org/rss/1.0/modules/slash/"
	>

<channel>
	<title>Mark's Blog &#187; git</title>
	<atom:link href="http://longair.net/blog/category/git/feed/" rel="self" type="application/rss+xml" />
	<link>http://longair.net/blog</link>
	<description>(occasional miscellanea)</description>
	<lastBuildDate>Tue, 03 Aug 2010 09:59:58 +0000</lastBuildDate>
	<language>en</language>
	<sy:updatePeriod>hourly</sy:updatePeriod>
	<sy:updateFrequency>1</sy:updateFrequency>
	<generator>http://wordpress.org/?v=3.0</generator>
		<item>
		<title>git Submodules Explained</title>
		<link>http://longair.net/blog/2010/06/02/git-submodules-explained/</link>
		<comments>http://longair.net/blog/2010/06/02/git-submodules-explained/#comments</comments>
		<pubDate>Wed, 02 Jun 2010 13:13:13 +0000</pubDate>
		<dc:creator>mark</dc:creator>
				<category><![CDATA[git]]></category>

		<guid isPermaLink="false">http://longair.net/blog/?p=446</guid>
		<description><![CDATA[<p>I haven&#8217;t actually finished the FAQ bit of this post yet, but since I&#8217;m not sure when I&#8217;ll have time to do so, I&#8217;ll just publish it anyway &#8211; please let me know in the comments if this is useful for you, or there&#8217;s something else you&#8217;d like to see included. </p> <p>Submodules in [...]]]></description>
			<content:encoded><![CDATA[<p><em>I haven&#8217;t actually finished the FAQ bit of this post yet, but since I&#8217;m not sure when I&#8217;ll have time to do so, I&#8217;ll just publish it anyway &#8211; please let me know in the comments if this is useful for you, or there&#8217;s something else you&#8217;d like to see included.<br />
</em></p>
<p>Submodules in git are commonly misunderstood in various ways, and although the <a href="http://kernel.org/pub/software/scm/git-core/docs/user-manual.html#submodules">explanation in the official manual</a> is clear and pretty easy to understand, I thought that a different treatment here might be useful to someone.</p>
<h2>What are submodules?</h2>
<p>A submodule in a git repository is like a sub-directory which is really a separate git repository in its own right.  This is a useful feature when you have a project in git which depends on a particular versions of other projects.  For example, if you&#8217;re developing a new Ruby-on-Rails application, you could add a clearly specified version of the Rails repository as a submodule at the path vendor/rails.  The example I&#8217;m  going to use in this post however, called <a href="http://github.com/mysociety/whatdotheyknow">whatdotheyknow</a>, is one of the various <a href="http://www.mysociety.org/">mySociety</a> projects that depend on a repository called <a href="http://github.com/mysociety/commonlib">commonlib</a>, which contains useful code common to at least one project.  In each project the commonlib repository has been added as a submodule.  (I&#8217;ll sometimes refer to the whatdotheyknow repository as the super-project, which I hope is clear.)</p>
<p>It&#8217;s important to understand that the repository which contains a submodule knows very little about it except for which version it should be and various bits of information about how to update it.  (More on that below.)  If you change directory into the submodule then you&#8217;ll find that it doesn&#8217;t know anything about the the parent project at all, and you can carry out operations in that repository as if it were standalone.</p>
<h2>Before you proceed&#8230;</h2>
<p>&#8230; it&#8217;s worth checking what version of git you have.  Many actions that you might perform that relate to submodules are done with the <tt>git submodule</tt> command, but in older versions of git this has two problems that make it <em>very</em> easy to get confused &#8211; I think these are important enough that everyone who uses submodules should be aware of them, and ideally upgrade their copy of git to a version that doesn&#8217;t have these problems: at least version 1.6.2.</p>
<p>The first of these is that if you had a typo in the name of a submodule listed on the command line, that would be silently ignored.  The second problem which compounded this is that if you spelled the submodule name with a trailing slash (as is common with tab-completion) then that did not refer to the submodule, and due to the previous problem would be ignored.  There were fixed in <a href="http://git.kernel.org/?p=git/git.git;a=commit;h=f3670a5749d704fe1edee4201f9b23adbf0bf967">f3670a5749d70</a> and <a href="http://git.kernel.org/?p=git/git.git;a=commit;h=496917b721adae11e596cd44b13cb8a49c388de7">496917b721ada</a>.  (As a small point of interest, to find out which tagged releases had these fixes, I <a href="http://git-scm.com/download">cloned git.git</a> and did <tt>git tag --contains 496917b</tt>.)</p>
<p>Note also that version 1.7.0 and later versions of git have some annoying differences in behaviour, which are <a href="#changes-in-1.7.0">noted below</a>.</p>
<h2>How are submodules stored?</h2>
<p>To answer this you need to understand a little bit about how git stores objects.  If you just want recipes for how to do particular things, then you can skip to &#8220;Things You Might Need To Do&#8221; below, but I think this section is useful for figuring out problems that might arise.</p>
<p>git&#8217;s <a href="http://book.git-scm.com/1_the_git_object_model.html">model of the world</a> is based around objects which are identified by their &#8220;object name&#8221;, which is the correct term for the SHA1sum hashes you see all over the place.  These objects can be of various types, such as &#8220;commit&#8221;, &#8220;tag&#8221;, &#8220;blob&#8221; (file), &#8220;tree&#8221; (directory), etc.  Each commit object points to a tree object which represents the state of your source code at that commit.  A tree object in turn consists of a list of objects with some metadata, e.g. as in this example for whatdotheyknow:</p>
<pre><strong>$ git ls-tree HEAD^{tree}</strong>
100644 blob 1e38e022c1c7d27f6dd9b765793087b59d147ef8    .cvsignore
100644 blob aa5036394edfea0a5dff64e0c53b4e9a026f1beb    .gitignore
100644 blob 4ef4ae8268dcad9b0de371f1aa63bb3ebbeb436a    .gitmodules
100755 blob 44c881fe25b8dc1413d9195677f492121a3789f0    INSTALL.txt
100644 blob 37312d9a1bcc80ac334547f047a2cece38dd24dc    README
100644 blob 3bb0e8592a41ae3185ee32266c860714980dbed7    Rakefile
040000 tree e326ffb3d697e7ac83fa19d93a8a3305120c719e    app
160000 commit fd91ab69279f1e0cfed53353e64811d5aa9c4b5f    commonlib
040000 tree ae93b14ec7ab01ee33053c32eca340a31ce6449f    config
040000 tree 8a7eb4d1552cc2a59fc0528c02fe0fb686d7f562    db
040000 tree 84fae00002a0e834140e2f806978748d50d60c4b    doc
040000 tree eb4089c7989ee846bbd66c97069aeff7853d0064    lib
040000 tree e7bcca0f6d561188730125b228a22a4d7bd68782    public
040000 tree f4e46de68199afa382d53583d83430c691aeb473    script
040000 tree e5772463cfed62ba63cfaf4e0eacecd1dc3895e5    spec
100644 blob bfc265e33e47ffa9796fe7bb7ae7d1fe7e633593    todo.txt
040000 tree 2999c0a790c0033ad93e312c0bc62ecdc9a18f81    vendor</pre>
<p>As you can see, typically the types of objects listed in a tree are either blobs or trees, indicating files or subdirectories.  However, if a object of type &#8220;commit&#8221; is listed (with the mode 160000) that represents a submodule.  The object name (in this case fd91ab69&#8230;) is the commit that the submodule&#8217;s HEAD should be at.  One implication of this is that that object name usually won&#8217;t be known outside the submodule.  This sometimes causes confusion when people do <tt>git diff</tt> in the super-project and find a difference in the submodule entry, e.g.:</p>
<pre><strong>$ git diff</strong>
diff --git a/commonlib b/commonlib
index fd91ab6..d6593c6 160000
--- a/commonlib
+++ b/commonlib
@@ -1 +1 @@
<span style="color: #ff0000;">-Subproject commit fd91ab69279f1e0cfed53353e64811d5aa9c4b5f</span>
<span style="color: #00ff00;">+Subproject commit d6593c6741b29680665b8ae7470e2f80ab9a5977</span></pre>
<p>This output means that the submodule version which is committed in the whatdotheyknow repository is fd91ab69279, but if you change into the commonlib subdirectory, you will find that the HEAD of that repository is at d6593c6741.  Hopefully both of these commits will be known in the commonlib submodule, but neither will be in the whatdotheyknow repository.</p>
<p>The other information about the submodule which is stored in the super-project is stored in the .gitmodules file and in config options.</p>
<p>A submodule which is &#8220;initialized&#8221; will have a config option set to indicate the URL that the submodule should be cloned from if it is missing.  These config options are of the form submodule.&lt;SUBMODULE-NAME&gt;.url, so having initialized the commonlib submodule in whatdotheyknow, I can see the following:</p>
<pre><strong>$ git config --list|egrep ^submodule</strong>
submodule.commonlib.url=git://git.mysociety.org/commonlib</pre>
<p>The .gitmodules file provides sensible default URLs for each submodule, and is committed in the repository like any other versioned file:</p>
<pre><strong>$ cat .gitmodules</strong>
[submodule "commonlib"]
 path = commonlib
 url = git://git.mysociety.org/commonlib</pre>
<p>If you&#8217;re publishing a repository with the intention that anyone should be able clone and use it, you should make sure that the URLs specified in .gitmodules are ones that can be publicly accessed &#8211; so don&#8217;t, for example, use an SSH URL with your user name in it.  Since these URLs are only used when initializing a submodule, which you typically do only rarely, it&#8217;s not a great inconvenience that you may have to change them in order to push changes you&#8217;ve made in the submodule.</p>
<h2>Things You Might Need To Do</h2>
<p>This section lists some simple recipes for doing all kinds of things with submodules.  If you think there&#8217;s something I should add, please let me know.  For the sake of simplicity, in the examples below, I&#8217;m not listing submodule paths explicitly at the end of <tt>git submodule</tt> commands, which generally means that the action applies to all of the submodules.  (The exception is <tt>git submodule add</tt>, which of course only applies to a single submodule.)</p>
<h3>Get a working submodule version after cloning</h3>
<p>If you&#8217;ve just cloned a repository which contains submodules, you can initialize and clone all of them with:</p>
<pre>git submodule update --init</pre>
<p>This is the equivalent of running:</p>
<pre>git submodule init
git submodule update</pre>
<p>With version 1.6.5 of git and later, you can do this automatically by cloning the super-project with the &#8211;recursive option:</p>
<pre>git clone --recursive git://github.com/mysociety/whatdotheyknow.git</pre>
<h3>See the status of all the submodules</h3>
<p>Running <tt>git submodule</tt> without arguments defaults to running <tt>git submodule status</tt>, which produces a helpful summary of the status of all your submodules.  Each line begins with a space, a &#8216;+&#8217; or a &#8216;-&#8217; which indicate the following things:</p>
<dl>
<dt>+</dt>
<dd>The version checked out in the submodule is different from that specified in the super-project.  The object name shown is that of the commit that the submodule is currently at.  (The meaning of this symbol <a href="#changes-in-1.7.0">changed in 1.7.0</a>.) </dd>
<dt>-</dt>
<dd>The submodule hasn&#8217;t been initialized or there&#8217;s no repository at the submodule path (e.g. if you&#8217;ve run <tt>git submodule init</tt> but not <tt>git submodule update</tt>, or you&#8217;ve later deleted the submodule directory from the working tree).  The object name shown is the commit that&#8217;s specified in the super-project.</dd>
<dt>[space]</dt>
<dd>The submodule&#8217;s HEAD is at the correct version &#8211; the object name shown is that version.</dd>
<dd> </dd>
</dl>
<p>In projects with many submodules this can be a helpful way to see at a glance where all your submodules are at.  For example, here&#8217;s some output from a version of the <a href="http://pacific.mpi-cbg.de/">Fiji project</a> that I&#8217;m working on:</p>
<pre> bbff1fd4545b3a614b14eb0770ac6028b648746d AutoComplete (bbff1fd)
+16dcf52ef2106cc92ba89c90b6b5f457bc7619ea ImageJA (heads/current-147-g16dcf52)
 5bfc9eb779d39e38c23ce1c3b01b49953ebd8463 RSyntaxTextArea (5bfc9eb)
 b9f11849599d536528c26bc599dbec4609d77dc4 Retrotranslator (remotes/origin/master)
 90287f0250542be256f67ade4e29a618bf6e688f TrakEM2 (0.7m-227-g90287f0)
+f25db2a43b95480c780d865323fce659a1135c2d VIB (tracer-1.4.0-candidate-849-gf25db2a)
 e4d3eb47a8f9d4e62d1f356636652c3ecc739d92 batik (remotes/origin/svn/git-svn@216063-588-ge4d3eb4)
 79de599df2550f2813fd449505b6fa55ca08cbb3 bio-formats (remotes/origin/contrib-380-g79de599)
 e73abece1ebf3a4aba22104ae9452b2b816ab0d7 clojure (remotes/origin/HEAD)
 39618b6d881fb0c3b52de4929aa34134bb32ffdb clojure-contrib (remotes/origin/master)
 9fa7f4d993f57e27e3134b016c7d36fbfd33e34c ij-plugins (9fa7f4d)
 7ffa48359cdbf7a47735b719a605ea322c58d694 java/linux (heads/master)
-cc218f05fdc0bb55f40f904d5d1f804e8751d0d2 java/linux-amd64
-4f3964234f4e6fd78247e5e7fad9c8becad53e8f java/macosx-java3d
-e79c51473df06f00d4ba9c913afe27e675f71d64 java/win32
-54e735c6c9bac65fcc889bc9e833213f19c7458a java/win64
 b362c662f79763c7927a2ba486243ccefa9222a1 junit (obsolete-cvsimport)
-9ae38d4bde196fa6a4595aebed9f218d4ec591bc jython
 c6e929a15d77545f03ea4883bf033e13c632ef12 live-helper (1.0.4-1-43-gc6e929a)
 79d369af87c4412a47f7065938fe18befc0a183e mpicbg (remotes/origin/trakem2-30-g79d369a)
 20ab0539cc248c642982fdf1330325636d8c55c0 tcljava (tcljava-141-2007-06-06-6-g20ab053)
 a7bfed6752ea1aeac73db386411329486e339f94 weka (a7bfed6)</pre>
<h3>Update submodules to the versions specified in HEAD</h3>
<p>If you change the HEAD of your super-project (e.g. with git pull, or by checking out a new branch) you may find that your submodules are now at the wrong versions.  (You can check with <tt>git submodule status</tt> as shown above.)  If you&#8217;re not actively working on the submodules, then the simplest way to move the to the right versions is with:</p>
<pre>git submodule update</pre>
<p>If any initialized submodules are missing, this will clone them.  For other submodules where the repository exists, this will change into its subdirectory,  run <tt>git fetch</tt> (to make sure  all the most recent updates are present) and then <tt>git checkout</tt> the correct version.  This has the effect of &#8220;detaching HEAD&#8221; in each submodule, so if you want to work on a branch in any of those subdirectories, you&#8217;ll have to <tt>git checkout</tt> to a branch.</p>
<p>The most frequent errors that you&#8217;ll find when running <tt>git submodule update</tt> are likely to be due to someone having created a commit in the super-project that references a commit in the submodule that they&#8217;ve forgotten to push, so check that whenever you get errors about not being able to find particular versions.</p>
<p>Versions of git after 1.6.4 add the <tt>--merge</tt> and <tt>--rebase</tt> options to <tt>git submodule update</tt> to allow more flexible ways of updating your submodules while you&#8217;re working on them.</p>
<h3>Add a new submodule to a repository</h3>
<p>This is nice and easy to do from a URL.  For example, if we wanted to create a new mySociety project called &#8220;create robot MP&#8221;, and add commonlib to it, you would just use <tt>git submodule add</tt>:</p>
<pre><strong>$ mkdir createrobotmp
$ cd createrobotmp
$ git init</strong>
Initialized empty Git repository in /home/mark/tmp/createrobotmp/.git/
<strong>$ git submodule add git://git.mysociety.org/commonlib commonlib</strong>
Initialized empty Git repository in /home/mark/tmp/createrobotmp/commonlib/.git/
remote: Counting objects: 5240, done.
remote: Compressing objects: 100% (1974/1974), done.
remote: Total 5240 (delta 3311), reused 5038 (delta 3197)
Receiving objects: 100% (5240/5240), 1020.36 KiB | 377 KiB/s, done.
Resolving deltas: 100% (3311/3311), done.
<strong>$ git status</strong>
# On branch master
#
# Initial commit
#
# Changes to be committed:
#   (use "git rm --cached &lt;file&gt;..." to unstage)
#
#    new file:   .gitmodules
#    new file:   commonlib
#</pre>
<p>Then you need to stage and commit .gitmodules and commonlib as with any other new files.  Since this puts the URL in the .gitmodules file, you should make this a publicly clonable URL, as mentioned above.</p>
<h3>Change the remote for a submodule</h3>
<p>If you frequently work in a submodule you might want to change the default remote &#8220;origin&#8221; to refer to a URL that you can push to, just so you can use one remote for everything.  You can do this by deleting orgin and adding it back with a new URL, with e.g.:</p>
<pre><strong>$ cd commonlib</strong><strong>
$ git remote rm origin</strong>
<strong>$ git remote add origin ssh://mark@git.mysociety.org/data/git/public/commonlib.git
$ git remote -v
</strong>origin    ssh://mark@git.mysociety.org/data/git/public/commonlib.git</pre>
<p>However, you&#8217;ll find that two helpful config options will have been deleted when removing and adding back origin, so you&#8217;ll want to add these back.</p>
<pre><strong><strong>$ git config branch.master.remote origin</strong></strong><strong>
$ git config branch.master.merge refs/heads/master</strong></pre>
<p>These config options set up the helpful defaults for <tt>git pull</tt> when you&#8217;re on master.</p>
<p>If you&#8217;re in the habit of deleting whole submodules, and then recreating them with <tt>git submodule update</tt> then you should also make sure that you change the URL in the super-project&#8217;s config settings, e.g.:</p>
<pre><strong>$ git config --list|egrep submodule</strong>
submodule.commonlib.url=git://git.mysociety.org/commonlib
<strong>$ git config submodule.commonlib.url ssh://mark@git.mysociety.org/data/git/public/commonlib.git
$ git config --list|egrep submodule</strong>
submodule.commonlib.url=ssh://mark@git.mysociety.org/data/git/public/commonlib.git</pre>
<h3>Initialize a submodule with a non-standard URL</h3>
<p>If you know in advance that you want to clone your submodules from a URL different from that specified in .gitmodules (e.g. with a private SSH URL that can you push to) then after cloning the superproject you can set the appropriate config by hand before running <tt>git submodule update</tt>.  This takes the place of the <tt>git submodule init</tt> command, for example:</p>
<pre><strong>$ git clone ssh://mark@git.mysociety.org/data/git/public/whatdotheyknow.git</strong>
[...]
<strong>$ cd whatdotheyknow</strong>
<strong>$ git config submodule.commonlib.url ssh://mark@git.mysociety.org/data/git/public/commonlib.git</strong>
<strong>$ git submodule update</strong>
Initialized empty Git repository in /home/mark/tmp/whatdotheyknow/commonlib/.git/
remote: Counting objects: 5240, done.
remote: Compressing objects: 100% (1974/1974), done.
remote: Total 5240 (delta 3311), reused 5038 (delta 3197)
Receiving objects: 100% (5240/5240), 1020.36 KiB | 533 KiB/s, done.
Resolving deltas: 100% (3311/3311), done.
Submodule path 'commonlib': checked out 'a901c2a431f7869f5c2eaee5808f8590ca78544e'
<strong>$ cd commonlib/</strong>
<strong>$ git remote show origin</strong>
* remote origin
 URL: ssh://mark@git.mysociety.org/data/git/public/commonlib.git
 HEAD branch: master
 Remote branch:
 master tracked
 Local branch configured for 'git pull':
 master merges with remote master
 Local ref configured for 'git push':
 master pushes to master (up to date)</pre>
<h3 id="changes-in-1.7.0">Modified submodules in 1.7.0 and later</h3>
<p>Versions 1.7.0 and later of git contain <a href="http://github.com/git/git/commit/ee6fc514f2df821c2719cc49499a56ef2fb136b0">an annoying change</a> in the behaviour of <tt>git submodule</tt>.  Submodules are now regarded as dirty if they have any modified files or untracked files, whereas previously it would only be the case if HEAD in the submodule pointed to the wrong commit. Why is this annoying?  The following reasons:</p>
<ul>
<li>Firstly, the meaning of the plus sign (+) in the output of <tt>git submodule</tt> has changed, and the first time that you come across this it takes a little while to figure out what&#8217;s going wrong, for example by looking through changelogs or using <tt>git bisect</tt> on git.git to find the change.  It would have been much kinder to users to introduce a different symbol for &#8220;at the specified version, but dirty&#8221;.</li>
<li><tt>git status</tt> is now very slow in projects with several large submodules.  (<tt>git status</tt> used to be nearly instant in a clone of <a href="http://pacific.mpi-cbg.de/cgi-bin/gitweb.cgi?p=fiji.git;a=summary">fiji.git</a> but trying just now with 1.7.0.4 took an incredible 45 seconds.)</li>
</ul>
<p>This seems like a change that was introduced without considering the surprise and impact that it would have on users.  In any case, I&#8217;ve added this note here since if you work with submodules, you may need to be aware of this change in behaviour.</p>
<p>The output of <tt>git diff</tt> has changed as well, to add &#8220;-dirty&#8221; to the object name if the working tree of that submodule is dirty:</p>
<pre><strong>$ git diff imglib</strong>
diff --git a/imglib b/imglib
--- a/imglib
+++ b/imglib
<span style="color: #ff00ff;">@@ -1 +1 @@</span>
<span style="color: #ff0000;">-Subproject commit c5c6bbaf616d64fbd873df7b7feecebb81b5aee7</span>
<span style="color: #00ff00;">+Subproject commit c5c6bbaf616d64fbd873df7b7feecebb81b5aee7-dirty
</span></pre>
<p><em>Update: Thanks to VonC, who points out in the comments below that in git 1.7.2 there is now a &#8220;&#8211;ignore-submodules&#8221; option to git status which can restore the old behaviour and also provides the useful option that only changed files (not untracked files) cause the submodule to be shown as dirty.</em></p>
<h3>Removing a submodule</h3>
<p>There are instructions for the several steps required to remove a submodule at the bottom of this page:</p>
<p>﻿﻿<a href="http://git.wiki.kernel.org/index.php/GitSubmoduleTutorial">http://git.wiki.kernel.org/index.php/GitSubmoduleTutorial</a></p>
]]></content:encoded>
			<wfw:commentRss>http://longair.net/blog/2010/06/02/git-submodules-explained/feed/</wfw:commentRss>
		<slash:comments>7</slash:comments>
		</item>
		<item>
		<title>A Few git Tips</title>
		<link>http://longair.net/blog/2009/04/25/a-few-git-tips/</link>
		<comments>http://longair.net/blog/2009/04/25/a-few-git-tips/#comments</comments>
		<pubDate>Sat, 25 Apr 2009 13:50:52 +0000</pubDate>
		<dc:creator>mark</dc:creator>
				<category><![CDATA[git]]></category>

		<guid isPermaLink="false">http://longair.net/blog/?p=59</guid>
		<description><![CDATA[<p class="wp-caption-text">A screenshot of gitk --all</p> <p>I owe thanks to Johannes Schindelin, who passed on at least three of these tips to me in the course of working on Fiji :) </p> gitk &#8211;all <p>Whenever you&#8217;re confused, use gitk &#8211;all to work out what&#8217;s going on.  The &#8211;all parameter is important so that you [...]]]></description>
			<content:encoded><![CDATA[<div id="attachment_64" class="wp-caption alignright" style="width: 160px"><a href="http://longair.net/blog/wp-content/uploads/2009/04/gitk-fiji.png"><img class="size-thumbnail wp-image-64" title="A screenshot of gitk --all" src="http://longair.net/blog/wp-content/uploads/2009/04/gitk-fiji-150x150.png" alt="A screenshot of gitk --all" width="150" height="150" /></a><p class="wp-caption-text">A screenshot of gitk --all</p></div>
<p><em>I owe thanks to Johannes Schindelin, who passed on at least three of these tips to me in the course of working on <a href="http://pacific.mpi-cbg.de/">Fiji</a> :)<br />
</em></p>
<h3>gitk &#8211;all</h3>
<p>Whenever you&#8217;re confused, use <em>gitk &#8211;all</em> to work out what&#8217;s going on.  The <em>&#8211;all </em>parameter is important so that you see all your local and remote-tracking branches. It might take a while to get used to reading the graphic representation of what&#8217;s going on, but it very often explains why things aren&#8217;t working as you might expect.</p>
<h3>Finding text anywhere in your complete history</h3>
<p>Particularly if you are in the habit of creating lots of branches (as I suggest below) it is easy to forget where you introduced a particular bit of code.  You can find it again simply with <em>git log -S&lt;search-term&gt; &#8211;all</em>.   For example, suppose you know that &#8220;Factory&#8221; occurred as part of the class name that you&#8217;re looking for then <em>git log -SFactory &#8211;all</em> will list all commits in your repository (including remote-tracking branches) that mention &#8220;Factory&#8221; in the change they introduce.  If you want to see the patch as well (which is often helpful) add<em> -p</em> to that command.   If you want to find all the branches that one of those commits is on, see the next tip:</p>
<h3>Finding commits by SHA1sum</h3>
<p>There are a lot of common situations where it&#8217;s useful to find out all the branches that a particular commit is on.  For example, if you&#8217;re using submodules then <em>git submodule update</em> will detach the head in each submodule to move it to the right commit.  You might want to work more on the branch that commit is on.  To do this, use <em>git branch &#8211;a &#8211;contains &lt;SHA1sum of commit&gt;</em>.  For example:</p>
<pre>  $ git branch -a --contains 6f2293e7f6428
  * (no branch)
    origin/fiji</pre>
<p>(This suggests that you should create a branch that tracks the remote-tracking branch origin/fiji, and you&#8217;ll find the commit in the history for that branch.)</p>
<p><a href="http://longair.net/blog/wp-content/uploads/2009/04/git-ps1.png"><img class="size-full wp-image-69 alignright" title="Example of __git_ps1" src="http://longair.net/blog/wp-content/uploads/2009/04/git-ps1.png" alt="git-ps1" width="306" height="95" /></a></p>
<h3>Use __git_ps1</h3>
<p>It&#8217;s very easy to forget which branch you&#8217;re working on, particularly once you get used to switching your working tree to different branches with <em>git checkout</em>.  You can use the  <em>$(__git_ps1 &#8221; (%s)&#8221;)</em> in your PS1 environment variable to include some useful information about your current branch in your bash prompt.  If you are not currently in a git repository, this evaluates to the empty string.  Another very nice feature about <em>__git_ps1</em> is that it will remind you if you&#8217;re in the middle of a rebase.  It&#8217;s easy to abandon a rebase but forget to do <em>git rebase &#8211;abort</em> at the end.  <em>git status</em> won&#8217;t remind you of this (a bug, IMHO) but if it&#8217;s there in your bash prompt, it&#8217;s hard to ignore.</p>
<p>The example in the image uses:</p>
<pre>  PS1=' \[\033[1;37m\]: \u@\h:\w\[\033[0;37m\]$(__git_ps1 " (%s)")\n '</pre>
<p>&#8230; which, if I recall correctly, was partly based on Tony Finch&#8217;s.</p>
<h3>Create lots of topic branches</h3>
<p>In other version control systems branching and merging are awkward compared to git, and this means that people are often reluctant to create new branches.  One of the things that I love about working with git is that it&#8217;s quite easy and natural to create a new branch for every idea you have, and just merge them into your master branch when you&#8217;re happy with them.  This matches the way that I like to work on things, as well &#8211; you can work on one idea until you&#8217;re stuck, then switch to something easier for a bit, etc. all in one copy of the repository.</p>
<h3>Only keep one copy of a repository per computer</h3>
<p>It&#8217;s tempting when starting to use git to clone a repository several times and work on different things in each.  In general it&#8217;s much better (and more &#8220;git-like&#8221;) to just have a single repository and get used to switching about your working tree with <em>git checkout &lt;BRANCH NAME&gt;</em>.  If you want to quickly put aside what you&#8217;re working on, without having to carefully put together a commit, then use <em>git stash</em>.  (Beware when unstashing some local modifications with <em>git stash appply</em> however &#8211; you need to switch back to the right branch before running that command.)</p>
<h3>Turn on all git&#8217;s coloured output options</h3>
<p>There&#8217;s a nice summary on this <a href="http://cheat.errtheblog.com/s/git">git cheat sheet</a> of the options you need to add to your .gitconfig to turn on git&#8217;s coloured output for terminals.</p>
<p>In addition, you might find the &#8211;color-words options to <em>git log</em> and <em>git diff </em>useful.  Rather than producing a line-by-line diff, it will coloured added text in green and removed text in red within a single line.  e.g. try <em>git log -p &#8211;color-words</em></p>
<h3>View a particular file from a particular commit</h3>
<p>Fairly often, I just want to have a quick look at the version of a file in a particular revision.  The easiest way to do this is with the <em>&lt;COMMIT&gt;:&lt;FILENAME</em>&gt; syntax that <em>git show</em> understands.  For example, to see the version of a file <em>util/TestQuantile.java</em> from a couple of commits ago, you can do:</p>
<pre>  git show HEAD^^:util/TestQuantile.java</pre>
<p>&#8230; or to see it from a topic branch called &#8220;experiment&#8221; do:</p>
<pre>  git show experiment:util/TestQuantile.java</pre>
<p>Note that the path is always relative to the repository root &#8211; even if you&#8217;re in the &#8220;util&#8221; subdirectory, you still have to include &#8220;util/&#8221; in the <em>&lt;FILENAME&gt;</em> part.  You can use the same syntax to <em>git diff</em> to compare two arbitrary files from arbitrary revisions.  It&#8217;s also frequently useful to view a file at a particular date, where the usual syntax for a revision at a particular date will work:</p>
<pre>  git show HEAD@{2.weeks.ago}:util/TestQuantile.java</pre>
<h3>Use the git glossary</h3>
<p>The <a href="http://www.kernel.org/pub/software/scm/git/docs/gitglossary.html">git glossary</a> provides a very succinct and useful summary of a lot of git concepts, and should be your first port of call if you don&#8217;t understand something in the manual pages.  Fundamentally (and perhaps contrary to its reputation)  git is built on simple ideas, and most of these can be understood from the definitions there.</p>
]]></content:encoded>
			<wfw:commentRss>http://longair.net/blog/2009/04/25/a-few-git-tips/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
		<item>
		<title>git: fetch and merge, don&#8217;t pull</title>
		<link>http://longair.net/blog/2009/04/16/git-fetch-and-merge/</link>
		<comments>http://longair.net/blog/2009/04/16/git-fetch-and-merge/#comments</comments>
		<pubDate>Thu, 16 Apr 2009 13:02:00 +0000</pubDate>
		<dc:creator>mark</dc:creator>
				<category><![CDATA[git]]></category>

		<guid isPermaLink="false">http://longair.net/blog/?p=6</guid>
		<description><![CDATA[<p>This is too long and rambling, but to steal a joke from Mark Twain Blaise Pascal I haven&#8217;t had time to make it shorter yet.  There is some discussion of this post on the git mailing list, but much of it is tangential to the points I&#8217;m trying to make here. </p> <p>One of [...]]]></description>
			<content:encoded><![CDATA[<p><em>This is too long and rambling, but to steal a joke from <del datetime="2009-04-16T12:35:43+00:00">Mark Twain</del> <a href="http://en.wikiquote.org/wiki/Blaise_Pascal">Blaise Pascal</a> I haven&#8217;t had time to make it shorter yet.  There is <a href="http://thread.gmane.org/gmane.comp.version-control.git/116903/">some discussion of this post on the git mailing list</a>, but much of it is tangential to the points I&#8217;m trying to make here.<br />
</em></p>
<p>One of the git tips that I find myself frequently passing on to people is:</p>
<blockquote><p>Don&#8217;t use <em>git pull</em>, use <em>git fetch </em>and then <em>git merge</em>.</p></blockquote>
<p>The problem with <em>git pull</em> is that it has all kinds of helpful magic that means you don&#8217;t really have to learn about the different types of branch in git.  Mostly things Just Work, but when they don&#8217;t it&#8217;s often difficult to work out why.  What seem like obvious bits of syntax for <em>git pull</em> may have rather surprising results, as even a cursory look through the manual page should convince you.</p>
<p>The other problem is that by both fetching and merging in one command, your working directory is updated without giving you a chance to examine the changes you&#8217;ve just brought into your repository.  Of course, unless you turn off all the safety checks, the effects of a <em>git pull</em> on your working directory are never going to be catastrophic, but you might prefer to do things more slowly so you don&#8217;t have to backtrack.</p>
<h3>Branches</h3>
<p>Before I explain the advice about <em>git pull</em> any further it&#8217;s worth clarifying what a branch is.  Branches are often described as being a &#8220;line of development&#8221;, but I think that&#8217;s an unfortunate expression since:</p>
<ul>
<li>If anything, a branch is a &#8220;directed acylic graph of development&#8221; rather than a line.</li>
<li>It suggests that branches are quite heavyweight objects.</li>
</ul>
<p>I would suggest that you think of branches in terms of what defines them: they&#8217;re a name for a particular commit and all the commits that are ancestors of it, so each branch is completely defined by the SHA1sum of the commit at the tip.  This means that manipulating them is a very lightweight operation &#8211; you just change that value.</p>
<p>This definition has some perhaps unexpected implications.  For example, suppose you have two branches, &#8220;stable&#8221; and &#8220;new-idea&#8221;, whose tips are at revisions E and F:</p>
<pre>  A-----C----E ("stable")
   \
    B-----D-----F ("new-idea")</pre>
<p>So the commits A, C and E are on &#8220;stable&#8221; and A, B, D and F are on &#8220;new-idea&#8221;.  If you then merge &#8220;new-idea&#8221; onto &#8220;stable&#8221; with the following commands:</p>
<pre>    git checkout stable   # Change to work on the branch "stable"
    git merge new-idea    # Merge in "new-idea"</pre>
<p>&#8230; then you have the following:</p>
<pre>  A-----C----E----G ("stable")
   \             /
    B-----D-----F ("new-idea")</pre>
<p>If you carry on committing on &#8220;new idea&#8221; and on &#8220;stable&#8221;, you get:</p>
<pre>  A-----C----E----G---H ("stable")
   \             /
    B-----D-----F----I ("new-idea")</pre>
<p>So now  A, B, C, D, E, F, G and H are on &#8220;stable&#8221;, while A, B, D, F and I are on &#8220;new-idea&#8221;.</p>
<p>Branches do have some special properties, of course &#8211; the most important of these is that if you&#8217;re working on a branch and create a new commit, the branch tip will be advanced to that new commit.  Hopefully this is what you&#8217;d expect.  When merging with <em>git merge</em>, you only specify the branch you want to merge into the current one, and only your current branch advances.</p>
<p>Another common situation where this view of branches helps a lot is the following: suppose you&#8217;re working on the main branch of a project (called &#8220;master&#8221;, say) and realise later that what you&#8217;ve been doing might have been a bad idea, and you would rather it were on a topic branch.  If the commit graph looks like this:</p>
<pre>   last version from another repository
      |
      v
  M---N-----O----P---Q ("master")</pre>
<p>Then you separate out your work with the following set of commands (where the diagrams show how the state has changed after them):</p>
<pre>  git branch dubious-experiment

  M---N-----O----P---Q ("master" and "dubious-experiment")

  git checkout master

  # Be careful with this next command: make sure "git status" is
  # clean, you're definitely on "master" and the
  # "dubious-experiment" branch has the commits you were working
  # on first...

  git reset --hard &lt;SHA1sum of commit N&gt;

       ("master")
  M---N-------------O----P---Q ("dubious-experiment")

  git pull # Or something that updates "master" from
           # somewhere else...

  M--N----R---S ("master")
      \
       O---P---Q ("dubious-experiment")</pre>
<p>This is something I seem to end up doing a lot&#8230;  :)</p>
<h3>Types of Branches</h3>
<p>The terminology for branches gets pretty confusing, unfortunately, since it has changed over the course of git&#8217;s development.  I&#8217;m going to try to convince you that there are really only two types of branches.  These are:</p>
<p>(a) &#8220;Local branches&#8221;: what you see when you type <em>git branch</em>, e.g. to use an abbreviated example I have here:</p>
<pre>       $ git branch
         debian
         server
       * master</pre>
<p>(b) &#8220;Remote-tracking branches&#8221;: what you see when you type <em>git branch -r</em>, e.g.:</p>
<pre>       $ git branch -r
       cognac/master
       fruitfly/server
       origin/albert
       origin/ant
       origin/contrib
       origin/cross-compile</pre>
<p>The names of tracking branches are made up of the name of a &#8220;remote&#8221; (e.g. origin, cognac, fruitfly) followed by &#8220;/&#8221; and then the name of a branch in that remote respository. (&#8220;remotes&#8221; are just nicknames for other repositories, synonymous with a URL or the path of a local directory &#8211; you can set up extra remotes yourself with &#8220;git remote&#8221;, but &#8220;git clone&#8221; by default sets up &#8220;origin&#8221; for you.)</p>
<p>If you&#8217;re interested in how these branches are stored locally, look at the files in:</p>
<ul>
<li><tt>.git/refs/heads/</tt> [for local branches]</li>
<li><tt>.git/refs/remotes/</tt> [for tracking branches]</li>
</ul>
<p>Both types of branches are very similar in some respects &#8211; they&#8217;re all just stored <strong>locally</strong> as single SHA1 sums representing a commit.  (I emphasize &#8220;locally&#8221; since some people see &#8220;origin/master&#8221; and assume that in some sense this branch is incomplete without access to the remote server &#8211; that isn&#8217;t the case.)</p>
<p>Despite this similarity there is one particularly important difference:</p>
<ul>
<li>The safe ways to change remote-tracking branches are with <em>git fetch</em> or as a side-effect of <em>git-push</em>; you can&#8217;t work on remote-tracking branches directly.  In contrast, you can always switch to local branches and create new commits to move the tip of the branch forward.</li>
</ul>
<p>So what you mostly do with remote-tracking branches is one of the following:</p>
<ul>
<li>Update them with <em>git fetch</em></li>
<li>Merge from them into your current branch</li>
<li>Create new local branches based on them</li>
</ul>
<h3>Creating local branches based on remote-tracking branches</h3>
<p>If you want to create a local branch based on a remote-tracking branch (i.e. in order to actually work on it) you can do that with <em>git branch &#8211;track</em> or <em>git checkout &#8211;track -b</em>, which is similar but it also switches your working tree to the newly created local branch.  For example, if you see in <em>git branch -r</em> that there&#8217;s a remote-tracking branch called origin/refactored that you want, you would use the command:</p>
<pre>    git checkout --track -b refactored origin/refactored</pre>
<p>In this example &#8220;refactored&#8221; is the name of the new branch and &#8220;origin/refactored&#8221; is the name of existing remote-tracking branch to base it on.  (In recent versions of git the &#8220;&#8211;track&#8221; option is actually unnecessary since it&#8217;s implied when the final parameter is a remote-tracking branch, as in this example.)</p>
<p>The &#8220;&#8211;track&#8221; option sets up some configuration variables that associate the local branch with the remote-tracking branch. These are useful chiefly for two things:</p>
<ul>
<li>They allow <em>git pull</em> to know what to merge after fetching new remote-tracking branches.</li>
<li>If you do <em>git checkout</em> to a local branch which has been set up in this way, it will give you a helpful message such as:</li>
</ul>
<pre>    Your branch and the tracked remote branch 'origin/master'
    have diverged, and respectively have 3 and 384 different
    commit(s) each.</pre>
<p>&#8230; or:</p>
<pre>    Your branch is behind the tracked remote branch
    'origin/master' by 3 commits, and can be fast-forwarded.</pre>
<p>The configuration variables that allow this are called &#8220;branch.&lt;local-branch-name&gt;.merge&#8221; and &#8220;branch.&lt;local-branch-name&gt;.remote&#8221;, but you probably don&#8217;t need to worry about them.</p>
<p>You have probably noticed that after cloning from an established remote repository <em>git branch -r</em> lists many remote-tracking branches, but you only have one local branch.  In that case, a variation of the command above is what you need to set up local branches that track those remote-tracking branches.</p>
<blockquote><p>You might care to note some confusing terminology here: the word &#8220;track&#8221; in &#8220;&#8211;track&#8221; means tracking of a remote-tracking branch by a local branch, whereas in &#8220;remote-tracking branch&#8221; it means the tracking of a branch in a remote repository by the remote-tracking branch.  Somewhat confusing&#8230;</p></blockquote>
<p>Now, let&#8217;s look at an example of how to update from a remote repository, and then how to push changes to a new repository.</p>
<h3>Updating from a Remote Repository</h3>
<p>So, if I want get changes from the remote repository called &#8220;origin&#8221; into my local repository I&#8217;ll type <em>git fetch origin</em> and you might see some output like this:</p>
<pre>  remote: Counting objects: 382, done.
  remote: Compressing objects: 100% (203/203), done.
  remote: Total 278 (delta 177), reused 103 (delta 59)
  Receiving objects: 100% (278/278), 4.89 MiB | 539 KiB/s, done.
  Resolving deltas: 100% (177/177), completed with 40 local objects.
  From ssh://longair@pacific.mpi-cbg.de/srv/git/fiji
     3036acc..9eb5e40  debian-release-20081030 -&gt; origin/debian-release-20081030
   * [new branch]      debian-release-20081112 -&gt; origin/debian-release-20081112
   * [new branch]      debian-release-20081112.1 -&gt; origin/debian-release-20081112.1
     3d619e7..6260626  master     -&gt; origin/master</pre>
<p>The most important bits here are the lines like these:</p>
<pre>     3036acc..9eb5e40  debian-release-20081030 -&gt; origin/debian-release-20081030
   * [new branch]      debian-release-20081112 -&gt; origin/debian-release-20081112</pre>
<p>The first line of these two shows that your remote-tracking branch origin/debian-release-20081030 has been advanced from the commit 3036acc to 9eb5e40.  The bit before the arrow is the name of the branch in the remote repository.  The second line similarly show that since we last did this, a new remote tracking branch has been created.  (<em>git fetch</em> may also fetch new tags if they have appeared in the remote repository.)</p>
<p>The lines before those are <em>git fetch</em> working out exactly which objects it will need to download to our local repository&#8217;s pool of objects, so that they will be available locally for anything we want to do with these updated branches and tags.</p>
<p><em>git fetch</em> doesn&#8217;t touch your working tree at all, so gives you a little breathing space to decide what you want to do next.  To actually bring the changes from the remote branch into your working tree, you have to do a <em>git merge</em>.  So, for instance, if I&#8217;m working on &#8220;master&#8221; (after a <em>git checkout master</em>) then I can merge in the changes that we&#8217;ve just got from origin with:</p>
<pre>    git merge origin/master</pre>
<p>(This might be a fast-forward, if you haven&#8217;t created any new commits that aren&#8217;t on master in the remote repository, or it might be a more complicated merge.)</p>
<p>If instead you just wanted to see what the differences are between your branch and the remote one, you could do that with:</p>
<pre>    git diff master origin/master</pre>
<p>This is the nice point about fetching and merging separately: it gives you the chance to examine what you&#8217;ve fetched before deciding what to do next.  Also, by doing this separately the distinction between when you should use a local branch name and a remote-tracking branch name becomes clear very quickly.</p>
<h3>Pushing your changes to a remote repository</h3>
<p>How about the other way round?  Suppose you&#8217;ve made some changes to the branch &#8220;experimental&#8221; and want to push that to a remote repository called &#8220;origin&#8221;.  This should be as simple as:</p>
<pre>    git push origin experimental</pre>
<p>You might get an error saying that the remote repository can&#8217;t fast-forward the branch, which probably means that someone else has pushed different changes to that branch.  So, that case you&#8217;ll need to fetch and merge their changes before trying the push again.</p>
<blockquote><p><em>Aside</em></p>
<p>If the branch has a different name in the remote repository (&#8220;experiment-by-bob&#8221;, say) you&#8217;d do this with:</p>
<pre>      git push origin experimental:experiment-by-bob</pre>
<p>On older versions of git, if &#8220;experiment-by-bob&#8221; doesn&#8217;t already exist, the syntax needs to be:</p>
<pre>      git push origin experimental:refs/heads/experiment-by-bob</pre>
<p>&#8230; to create the remote branch.  However that seems to be no longer the case, at least in git version 1.6.1.2 &#8211; see Sitaram&#8217;s comment below.</p>
<p>If the branch name is the same locally and remotely then it will be created<br />
automatically without you having to use any special syntax, i.e. you can just do <em>git push origin experimental</em> as normal.</p>
<p>In practice, however, it&#8217;s less confusing if you keep the branch names the same.  (The &lt;source-name&gt;:&lt;destination-name&gt; syntax there is known as a &#8220;refspec&#8221;, about which we&#8217;ll say no more here.)</p></blockquote>
<p><span style="text-decoration: line-through;">An important point here is that this <em>git push</em> doesn&#8217;t involve the remote-tracking branch origin/experimental at all &#8211; it will only be updated the next time you do <em>git fetch</em>.</span> <em>Correction: as Deskin Miller points out below, your remote-tracking branches will be updated on pushing to the corresponding branches in one of your remotes.</em></p>
<h3>Why not git pull?</h3>
<p>Well, <em>git pull</em> is fine most of the time, and particularly if you&#8217;re using git in a CVS-like fashion then it&#8217;s probably what you want.  However, if you want to use git in a more idiomatic way (creating lots of topic branches, rewriting local history whenever you feel like it, and so on) then it helps a lot to get used to doing  <em>git fetch</em> and <em>git merge</em> separately.</p>
]]></content:encoded>
			<wfw:commentRss>http://longair.net/blog/2009/04/16/git-fetch-and-merge/feed/</wfw:commentRss>
		<slash:comments>11</slash:comments>
		</item>
	</channel>
</rss>
