git Submodules Explained

I haven’t actually finished the FAQ bit of this post yet, but since I’m not sure when I’ll have time to do so, I’ll just publish it anyway – please let me know in the comments if this is useful for you, or there’s something else you’d like to see included.

Submodules in git are commonly misunderstood in various ways, and although the explanation in the official manual is clear and pretty easy to understand, I thought that a different treatment here might be useful to someone.

What are submodules?

A submodule in a git repository is like a sub-directory which is really a separate git repository in its own right.  This is a useful feature when you have a project in git which depends on a particular versions of other projects.  For example, if you’re developing a new Ruby-on-Rails application, you could add a clearly specified version of the Rails repository as a submodule at the path vendor/rails.  The example I’m going to use in this post however, called whatdotheyknow, is one of the various mySociety projects that depend on a repository called commonlib, which contains useful code common to at least one project.  In each project the commonlib repository has been added as a submodule.  (I’ll sometimes refer to the whatdotheyknow repository as the super-project, which I hope is clear.)

It’s important to understand that the repository which contains a submodule knows very little about it except for which version it should be and various bits of information about how to update it.  (More on that below.)  If you change directory into the submodule then you’ll find that it doesn’t know anything about the the parent project at all, and you can carry out operations in that repository as if it were standalone.

Before you proceed…

… it’s worth checking what version of git you have.  Many actions that you might perform that relate to submodules are done with the git submodule command, but in older versions of git this has two problems that make it very easy to get confused – I think these are important enough that everyone who uses submodules should be aware of them, and ideally upgrade their copy of git to a version that doesn’t have these problems: at least version 1.6.2.

The first of these is that if you had a typo in the name of a submodule listed on the command line, that would be silently ignored.  The second problem which compounded this is that if you spelled the submodule name with a trailing slash (as is common with tab-completion) then that did not refer to the submodule, and due to the previous problem would be ignored.  There were fixed in f3670a5749d70 and 496917b721ada.  (As a small point of interest, to find out which tagged releases had these fixes, I cloned git.git and did git tag --contains 496917b.)

Note also that version 1.7.0 and later versions of git have some annoying differences in behaviour, which are noted below.

How are submodules stored?

To answer this you need to understand a little bit about how git stores objects.  If you just want recipes for how to do particular things, then you can skip to “Things You Might Need To Do” below, but I think this section is useful for figuring out problems that might arise.

git’s model of the world is based around objects which are identified by their “object name”, which is the correct term for the SHA1sum hashes you see all over the place.  These objects can be of various types, such as “commit”, “tag”, “blob” (file), “tree” (directory), etc.  Each commit object points to a tree object which represents the state of your source code at that commit.  A tree object in turn consists of a list of objects with some metadata, e.g. as in this example for whatdotheyknow:

$ git ls-tree HEAD^{tree}
100644 blob 1e38e022c1c7d27f6dd9b765793087b59d147ef8    .cvsignore
100644 blob aa5036394edfea0a5dff64e0c53b4e9a026f1beb    .gitignore
100644 blob 4ef4ae8268dcad9b0de371f1aa63bb3ebbeb436a    .gitmodules
100755 blob 44c881fe25b8dc1413d9195677f492121a3789f0    INSTALL.txt
100644 blob 37312d9a1bcc80ac334547f047a2cece38dd24dc    README
100644 blob 3bb0e8592a41ae3185ee32266c860714980dbed7    Rakefile
040000 tree e326ffb3d697e7ac83fa19d93a8a3305120c719e    app
160000 commit fd91ab69279f1e0cfed53353e64811d5aa9c4b5f    commonlib
040000 tree ae93b14ec7ab01ee33053c32eca340a31ce6449f    config
040000 tree 8a7eb4d1552cc2a59fc0528c02fe0fb686d7f562    db
040000 tree 84fae00002a0e834140e2f806978748d50d60c4b    doc
040000 tree eb4089c7989ee846bbd66c97069aeff7853d0064    lib
040000 tree e7bcca0f6d561188730125b228a22a4d7bd68782    public
040000 tree f4e46de68199afa382d53583d83430c691aeb473    script
040000 tree e5772463cfed62ba63cfaf4e0eacecd1dc3895e5    spec
100644 blob bfc265e33e47ffa9796fe7bb7ae7d1fe7e633593    todo.txt
040000 tree 2999c0a790c0033ad93e312c0bc62ecdc9a18f81    vendor

As you can see, typically the types of objects listed in a tree are either blobs or trees, indicating files or subdirectories.  However, if a object of type “commit” is listed (with the mode 160000) that represents a submodule.  The object name (in this case fd91ab69…) is the commit that the submodule’s HEAD should be at.  One implication of this is that that object name usually won’t be known outside the submodule.  This sometimes causes confusion when people do git diff in the super-project and find a difference in the submodule entry, e.g.:

$ git diff
diff --git a/commonlib b/commonlib
index fd91ab6..d6593c6 160000
--- a/commonlib
+++ b/commonlib
@@ -1 +1 @@
-Subproject commit fd91ab69279f1e0cfed53353e64811d5aa9c4b5f
+Subproject commit d6593c6741b29680665b8ae7470e2f80ab9a5977

This output means that the submodule version which is committed in the whatdotheyknow repository is fd91ab69279, but if you change into the commonlib subdirectory, you will find that the HEAD of that repository is at d6593c6741.  Hopefully both of these commits will be known in the commonlib submodule, but neither will be in the whatdotheyknow repository.

The other information about the submodule which is stored in the super-project is stored in the .gitmodules file and in config options.

A submodule which is “initialized” will have a config option set to indicate the URL that the submodule should be cloned from if it is missing.  These config options are of the form submodule.<SUBMODULE-NAME>.url, so having initialized the commonlib submodule in whatdotheyknow, I can see the following:

$ git config --list|egrep ^submodule
submodule.commonlib.url=git://git.mysociety.org/commonlib

The .gitmodules file provides sensible default URLs for each submodule, and is committed in the repository like any other versioned file:

$ cat .gitmodules
[submodule "commonlib"]
 path = commonlib
 url = git://git.mysociety.org/commonlib

If you’re publishing a repository with the intention that anyone should be able clone and use it, you should make sure that the URLs specified in .gitmodules are ones that can be publicly accessed – so don’t, for example, use an SSH URL with your user name in it.  Since these URLs are only used when initializing a submodule, which you typically do only rarely, it’s not a great inconvenience that you may have to change them in order to push changes you’ve made in the submodule.

Things You Might Need To Do

This section lists some simple recipes for doing all kinds of things with submodules.  If you think there’s something I should add, please let me know.  For the sake of simplicity, in the examples below, I’m not listing submodule paths explicitly at the end of git submodule commands, which generally means that the action applies to all of the submodules.  (The exception is git submodule add, which of course only applies to a single submodule.)

Get a working submodule version after cloning

If you’ve just cloned a repository which contains submodules, you can initialize and clone all of them with:

git submodule update --init

This is the equivalent of running:

git submodule init
git submodule update

With version 1.6.5 of git and later, you can do this automatically by cloning the super-project with the –recursive option:

git clone --recursive git://github.com/mysociety/whatdotheyknow.git

See the status of all the submodules

Running git submodule without arguments defaults to running git submodule status, which produces a helpful summary of the status of all your submodules.  Each line begins with a space, a ‘+’ or a ‘-‘ which indicate the following things:

+
The version checked out in the submodule is different from that specified in the super-project. The object name shown is that of the commit that the submodule is currently at.  (The meaning of this symbol changed in 1.7.0.)
The submodule hasn’t been initialized or there’s no repository at the submodule path (e.g. if you’ve run git submodule init but not git submodule update, or you’ve later deleted the submodule directory from the working tree). The object name shown is the commit that’s specified in the super-project.
[space]
The submodule’s HEAD is at the correct version – the object name shown is that version.

In projects with many submodules this can be a helpful way to see at a glance where all your submodules are at.  For example, here’s some output from a version of the Fiji project that I’m working on:

 bbff1fd4545b3a614b14eb0770ac6028b648746d AutoComplete (bbff1fd)
+16dcf52ef2106cc92ba89c90b6b5f457bc7619ea ImageJA (heads/current-147-g16dcf52)
 5bfc9eb779d39e38c23ce1c3b01b49953ebd8463 RSyntaxTextArea (5bfc9eb)
 b9f11849599d536528c26bc599dbec4609d77dc4 Retrotranslator (remotes/origin/master)
 90287f0250542be256f67ade4e29a618bf6e688f TrakEM2 (0.7m-227-g90287f0)
+f25db2a43b95480c780d865323fce659a1135c2d VIB (tracer-1.4.0-candidate-849-gf25db2a)
 e4d3eb47a8f9d4e62d1f356636652c3ecc739d92 batik (remotes/origin/svn/git-svn@216063-588-ge4d3eb4)
 79de599df2550f2813fd449505b6fa55ca08cbb3 bio-formats (remotes/origin/contrib-380-g79de599)
 e73abece1ebf3a4aba22104ae9452b2b816ab0d7 clojure (remotes/origin/HEAD)
 39618b6d881fb0c3b52de4929aa34134bb32ffdb clojure-contrib (remotes/origin/master)
 9fa7f4d993f57e27e3134b016c7d36fbfd33e34c ij-plugins (9fa7f4d)
 7ffa48359cdbf7a47735b719a605ea322c58d694 java/linux (heads/master)
-cc218f05fdc0bb55f40f904d5d1f804e8751d0d2 java/linux-amd64
-4f3964234f4e6fd78247e5e7fad9c8becad53e8f java/macosx-java3d
-e79c51473df06f00d4ba9c913afe27e675f71d64 java/win32
-54e735c6c9bac65fcc889bc9e833213f19c7458a java/win64
 b362c662f79763c7927a2ba486243ccefa9222a1 junit (obsolete-cvsimport)
-9ae38d4bde196fa6a4595aebed9f218d4ec591bc jython
 c6e929a15d77545f03ea4883bf033e13c632ef12 live-helper (1.0.4-1-43-gc6e929a)
 79d369af87c4412a47f7065938fe18befc0a183e mpicbg (remotes/origin/trakem2-30-g79d369a)
 20ab0539cc248c642982fdf1330325636d8c55c0 tcljava (tcljava-141-2007-06-06-6-g20ab053)
 a7bfed6752ea1aeac73db386411329486e339f94 weka (a7bfed6)

Update submodules to the versions specified in HEAD

If you change the HEAD of your super-project (e.g. with git pull, or by checking out a new branch) you may find that your submodules are now at the wrong versions.  (You can check with git submodule status as shown above.)  If you’re not actively working on the submodules, then the simplest way to move the to the right versions is with:

git submodule update

If any initialized submodules are missing, this will clone them.  For other submodules where the repository exists, this will change into its subdirectory,  run git fetch (to make sure  all the most recent updates are present) and then git checkout the correct version.  This has the effect of “detaching HEAD” in each submodule, so if you want to work on a branch in any of those subdirectories, you’ll have to git checkout to a branch.

The most frequent errors that you’ll find when running git submodule update are likely to be due to someone having created a commit in the super-project that references a commit in the submodule that they’ve forgotten to push, so check that whenever you get errors about not being able to find particular versions.

Versions of git after 1.6.4 add the --merge and --rebase options to git submodule update to allow more flexible ways of updating your submodules while you’re working on them.

Add a new submodule to a repository

This is nice and easy to do from a URL.  For example, if we wanted to create a new mySociety project called “create robot MP”, and add commonlib to it, you would just use git submodule add:

$ mkdir createrobotmp
$ cd createrobotmp
$ git init
Initialized empty Git repository in /home/mark/tmp/createrobotmp/.git/
$ git submodule add git://git.mysociety.org/commonlib commonlib
Initialized empty Git repository in /home/mark/tmp/createrobotmp/commonlib/.git/
remote: Counting objects: 5240, done.
remote: Compressing objects: 100% (1974/1974), done.
remote: Total 5240 (delta 3311), reused 5038 (delta 3197)
Receiving objects: 100% (5240/5240), 1020.36 KiB | 377 KiB/s, done.
Resolving deltas: 100% (3311/3311), done.
$ git status
# On branch master
#
# Initial commit
#
# Changes to be committed:
#   (use "git rm --cached <file>..." to unstage)
#
#    new file:   .gitmodules
#    new file:   commonlib
#

Then you need to stage and commit .gitmodules and commonlib as with any other new files.  Since this puts the URL in the .gitmodules file, you should make this a publicly clonable URL, as mentioned above.

Change the remote for a submodule

If you frequently work in a submodule you might want to change the default remote “origin” to refer to a URL that you can push to, just so you can use one remote for everything.  You can do this by deleting orgin and adding it back with a new URL, with e.g.:

$ cd commonlib
$ git remote rm origin
$ git remote add origin ssh://mark@git.mysociety.org/data/git/public/commonlib.git
$ git remote -v
origin    ssh://mark@git.mysociety.org/data/git/public/commonlib.git

However, you’ll find that two helpful config options will have been deleted when removing and adding back origin, so you’ll want to add these back.

$ git config branch.master.remote origin
$ git config branch.master.merge refs/heads/master

These config options set up the helpful defaults for git pull when you’re on master.

If you’re in the habit of deleting whole submodules, and then recreating them with git submodule update then you should also make sure that you change the URL in the super-project’s config settings, e.g.:

$ git config --list|egrep submodule
submodule.commonlib.url=git://git.mysociety.org/commonlib
$ git config submodule.commonlib.url ssh://mark@git.mysociety.org/data/git/public/commonlib.git
$ git config --list|egrep submodule
submodule.commonlib.url=ssh://mark@git.mysociety.org/data/git/public/commonlib.git

Initialize a submodule with a non-standard URL

If you know in advance that you want to clone your submodules from a URL different from that specified in .gitmodules (e.g. with a private SSH URL that can you push to) then after cloning the superproject you can set the appropriate config by hand before running git submodule update.  This takes the place of the git submodule init command, for example:

$ git clone ssh://mark@git.mysociety.org/data/git/public/whatdotheyknow.git
[...]
$ cd whatdotheyknow
$ git config submodule.commonlib.url ssh://mark@git.mysociety.org/data/git/public/commonlib.git
$ git submodule update
Initialized empty Git repository in /home/mark/tmp/whatdotheyknow/commonlib/.git/
remote: Counting objects: 5240, done.
remote: Compressing objects: 100% (1974/1974), done.
remote: Total 5240 (delta 3311), reused 5038 (delta 3197)
Receiving objects: 100% (5240/5240), 1020.36 KiB | 533 KiB/s, done.
Resolving deltas: 100% (3311/3311), done.
Submodule path 'commonlib': checked out 'a901c2a431f7869f5c2eaee5808f8590ca78544e'
$ cd commonlib/
$ git remote show origin
* remote origin
 URL: ssh://mark@git.mysociety.org/data/git/public/commonlib.git
 HEAD branch: master
 Remote branch:
 master tracked
 Local branch configured for 'git pull':
 master merges with remote master
 Local ref configured for 'git push':
 master pushes to master (up to date)

Modified submodules in 1.7.0 and later

Versions 1.7.0 and later of git contain an annoying change in the behaviour of git submodule.  Submodules are now regarded as dirty if they have any modified files or untracked files, whereas previously it would only be the case if HEAD in the submodule pointed to the wrong commit. Why is this annoying?  The following reasons:

  • Firstly, the meaning of the plus sign (+) in the output of git submodule has changed, and the first time that you come across this it takes a little while to figure out what’s going wrong, for example by looking through changelogs or using git bisect on git.git to find the change.  It would have been much kinder to users to introduce a different symbol for “at the specified version, but dirty”.
  • git status is now very slow in projects with several large submodules.  (git status used to be nearly instant in a clone of fiji.git but trying just now with 1.7.0.4 took an incredible 45 seconds.)

This seems like a change that was introduced without considering the surprise and impact that it would have on users.  In any case, I’ve added this note here since if you work with submodules, you may need to be aware of this change in behaviour.

The output of git diff has changed as well, to add “-dirty” to the object name if the working tree of that submodule is dirty:

$ git diff imglib
diff --git a/imglib b/imglib
--- a/imglib
+++ b/imglib
@@ -1 +1 @@
-Subproject commit c5c6bbaf616d64fbd873df7b7feecebb81b5aee7
+Subproject commit c5c6bbaf616d64fbd873df7b7feecebb81b5aee7-dirty

Update: Thanks to VonC, who points out in the comments below that in git 1.7.2 there is now a “–ignore-submodules” option to git status which can restore the old behaviour and also provides the useful option that only changed files (not untracked files) cause the submodule to be shown as dirty.

Removing a submodule

There are instructions for the several steps required to remove a submodule at the bottom of this page:

http://git.wiki.kernel.org/index.php/GitSubmoduleTutorial


Posted

in

by

Tags:

Comments

18 responses to “git Submodules Explained”

  1. William Avatar
    William

    Git is awesome for managing a tree, but when you start to try to manage a forest with submodules, things get ugly (to my taste anyway).

    I like google’s repo:
    http://google-opensource.blogspot.com/2008/11/gerrit-and-repo-android-source.html

    Repo isn’t used to replace git. You use git to manage your tree, and repo to manage your forest.

    1. mark Avatar

      Thanks for the pointer, William – I hadn’t looked properly at repo, but I certainly shall.

  2. VonC Avatar

    Git 1.7.2 (http://www.kernel.org/pub/software/scm/git/docs/RelNotes-1.7.2.txt) seems to want to alleviate the ‘git status’ issue, by adding the “–ignore-submodules” option.

    1. mark Avatar

      VonC: thanks for pointing that out – I hadn’t realized that those changes were in a released version yet. I will update the post.

      It’s a shame that there doesn’t (yet?) seem to be a config option that can set a particular –ignore-submodules setting as a default.

      1. mark Avatar
        mark

        There should be a config option for that soon:

        http://www.spinics.net/lists/git/msg136891.html

  3. szg Avatar

    I actually prefer the git 1.7 submodule behaviour. With the older git I’ve forgotten to commit in a submodule for nearly a year in a project. Now both git status and git diff show me I’ve done some changes there. The git 1.7 release notes refer to such cases as a justification. I think that the former lightning-fast git-status in a big project with the bulk of the stuff in submodules was just and ILLUSION, both in terms of speed and output. I’m sure you never ever forget about changes in your submodules, but as for me, I’m very very stupid. Git 1.7 is my friend.

    1. mark Avatar

      szg: yes, I know what you mean – in fact, I’m finding the new behaviour increasingly helpful as I get used to it, although the time taken for “git status” is often infuriating . I suppose I mostly object to the way that such a change was introduced: I think was unfriendly to the users, especially since the options to revert to the existing behaviour are only appearing in later versions now…

  4. An M Avatar

    I love 1.7’s git st

  5. Ben Turner (@phantomwhale) Avatar

    Once more I have git knowledge gaps, and once more I find myself hitting your blog via the Google ! Another great post, dude, not only answering my question, but then the next four or five questions that immediately followed ;)

  6. Henning Glatter-Gotz Avatar

    Mark, nice post that helped me sort out some things while moving from SVN to git. The link referenced at the bottom of the post has a 404 (http://git.wiki.kernel.org/index.php/GitSubmoduleTutorial)

  7. David Alsbury Avatar

    I am another person trying to make the switch from SVN to Git. I found your article easy to understand and very helpful. Thanks.

  8. don b Avatar

    thank you for this.

  9. Joe Elizondo Avatar
    Joe Elizondo

    I’m getting the diff submodule entry as a text file that comes along with my commit and push. I probably should have updated the submodule before I committed, but is it really worth redoing the commit when the only extra piece is a small text snippet that looks like this?

    diff –git a/commonlib b/commonlib
    index fd91ab6..d6593c6 160000
    — a/commonlib
    +++ b/commonlib
    @@ -1 +1 @@
    -Subproject commit fd91ab69279f1e0cfed53353e64811d5aa9c4b5f
    +Subproject commit d6593c6741b29680665b8ae7470e2f80ab9a5977

    Will this cause a merge conflict or change the submodule version on say, the master branch that I’m merging into? Will I be given the chance to use the desired submodule on a merge? Just curious how this snippet showed up in my commit and if it will have messy consequences.

  10. Inner Avatar
    Inner

    -Subproject commit c5c6bbaf616d64fbd873df7b7feecebb81b5aee7
    +Subproject commit c5c6bbaf616d64fbd873df7b7feecebb81b5aee7-dirty

    How to commit the ‘c5c6bbaf616d64fbd873df7b7feecebb81b5aee7-dirty’

    $ git status
    On branch master
    Your branch is up-to-date with ‘origin/master’.
    Changes not staged for commit:
    (use “git add …” to update what will be committed)
    (use “git checkout — …” to discard changes in working directory)
    (commit or discard the untracked or modified content in submodules)

    modified: lib (modified content)

    no changes added to commit (use “git add” and/or “git commit -a”)

  11. Iaraterezinhadeazambujarocha Avatar

    I do neither. I wrote a siplme script for me to quickly run after cloning a repo that takes care of all of this for me. You can find the script . Note: It is written in Perl.The main reason that I wrote this is that I have submodules that also have submodules. Remembering which submodules I had to switch to to do subsequent inits and updates on was a pain, so I wrote this to handle it all for me. It works well for non-nested submodules as well.

  12. Tom Avatar
    Tom

    Here’s another case I’m running into, which I haven’t found a good answer to:

    I’ve added (and committed, and pushed) my submodules to .gitmodules. But one time when I cloned it, somehow, they didn’t get added to .git/config, even though they’re in .gitmodules.

    Commands like “git submodule init” and “git submodule update” don’t do anything, probably because they operate on .git/config rather than .gitmodules.

    How do I re-initialize .git/config with .gitmodules?

  13. kawal bhatti Avatar

    Nice blog ……thanks

  14. John Avatar
    John

    Your post dates back already some years, so not surprising that links change. So actually the very first one in the post “explanation in the official manual” ends up in a 404

Leave a Reply

Your email address will not be published. Required fields are marked *

This site uses Akismet to reduce spam. Learn how your comment data is processed.