I’ve resisted getting an all-in-one printer / scanner / copier device in the past, largely due to worrying about the driver situation on Linux, but when I found out that my scanner hadn’t survived the trip to Zürich and we were also without a printer, we risked it and bought a Canon PIXMA MP560.
I say “risked”, because the noise levels in the search results when trying to find out whether this printer would actually work were very high. In the hope that it will be of use to anyone similarly searching in the future, I thought it might be useful to add the following notes about things that definitely work for us. (There are lots of things that we haven’t simply tried yet, e.g. printing via USB or scanning via USB, so I can’t comment on what’s involved in getting those to work.)
Update: with these drivers the printer does seem to have trouble printing more complex pages – it’ll only print out the top eighth of the page and then give up. e.g. as an example, try printing out the English rules for Tobago (PDF linked from that page). Rather annoying, and I haven’t had time to report it to Canon yet.
Obviously I can’t provide support if you’re having problems with this printer, so this is just to describe what worked for us on Ubuntu 9.04 (Jaunty Jackalope) and 9.10 (Lucid Lynx). I’m pretty happy with this printer, and it’s great to see that Canon are supporting Linux users by providing drivers.
Printing over Wireless
The instructions supplied with the printer explain how to get it to connect to your wireless network, which worked fine with WPA2 (AES).
Then you need to go to the drivers page for the PIXMA MP560 on Canon’s website, select “Linux” and “English”, and download “Debian Linux Print Drivers (3.2)”. That’ll give you a tar file, which you should unpack in a new directory. This in turn contains a tar.gz archive called cnijfilter-mp560series-3.20-1-i386-deb.tar.gz. Unpack that and you’ll get the following:
The “install.sh” script will fail if your setup is like either of ours, with the error message:
==================================================
Canon Inkjet Printer Driver Ver.3.20-1 for Linux
Copyright CANON INC. 2001-2009
All Rights Reserved.
==================================================
Error! Cannot specify package management system.
This error arises when both “dpkg” and “rpm” exist in your path, so you need to edit install.sh to cause the test for “rpm” to fail – find these lines:
## Judge is the distribution supporting rpm? ##
rpm --version 1> /dev/null 2>&1
c_system_rpm=$?
… and change “rpm –version” to “rpm-no-thanks –version”, or something. If you re-run ./install.sh then it should all work OK. (I’ve put in bold the bits where you need user interaction.
: mark@cava:~/Desktop/canon-drivers/cnijfilter-mp560series-3.20-1-i386-deb (master)
./install.sh
==================================================
Canon Inkjet Printer Driver Ver.3.20-1 for Linux
Copyright CANON INC. 2001-2009
All Rights Reserved.
==================================================
Execution command = sudo dpkg -iG ./packages/cnijfilter-common_3.20-1_i386.deb
[sudo] password for mark:
(Reading database ... 490522 files and directories currently installed.)
Preparing to replace cnijfilter-common 3.20-1 (using .../cnijfilter-common_3.20-1_i386.deb) ...
Unpacking replacement cnijfilter-common ...
Setting up cnijfilter-common (3.20-1) ...
Processing triggers for libc-bin ...
ldconfig deferred processing now taking place/sbin/ldconfig.real: /usr/local/lib/ is not a symbolic link
Execution command = sudo dpkg -iG ./packages/cnijfilter-mp560series_3.20-1_i386.deb
(Reading database ... 490522 files and directories currently installed.)
Preparing to replace cnijfilter-mp560series 3.20-1 (using .../cnijfilter-mp560series_3.20-1_i386.deb) ...
Unpacking replacement cnijfilter-mp560series ...
Setting up cnijfilter-mp560series (3.20-1) ...
Processing triggers for libc-bin ...
ldconfig deferred processing now taking place
/sbin/ldconfig.real: /usr/local/lib/ is not a symbolic link
## Driver packages installed. ##
#=========================================================#
# Register Printer
#=========================================================#
Next, register the printer to the computer.
Connect the printer, and then turn on the power.
To use the printer on the network, connect the printer to the network.
When the printer is ready, press the Enter key.
>
#=========================================================#
# Connection Method
#=========================================================#
1) USB
2) Network
Select the connection method.[1]2
Searching for printers...
#=========================================================#
# Select Printer
#=========================================================#
Select the printer.
If the printer you want to use is not listed, select Update [0] to search again.
To cancel the process, enter [Q].
-----------------------------------------------------------
0) Update
-----------------------------------------------------------
Could not detect the target printer.
-----------------------------------------------------------
(Currently selected:[0](Update) )
[At this point I remembered to turn on the printer...]
Enter the value. [0]0
Connect the printer, and then turn on the power.
To use the printer on the network, connect the printer to the network.
When the printer is ready, press the Enter key.
>
Searching for printers..
#=========================================================#
# Select Printer
#=========================================================#
Select the printer.
If the printer you want to use is not listed, select Update [0] to search again.
To cancel the process, enter [Q].
-----------------------------------------------------------
0) Update
-----------------------------------------------------------
Target printers detected (MAC address IP address)
1) Canon MP560 series (00-1E-8F-64-60-96 192.168.1.5)
-----------------------------------------------------------
(Currently selected:[1]Canon MP560 series (00-1E-8F-64-60-96 192.168.1.5))
Enter the value. [1]
#=========================================================#
# Printer Name
#=========================================================#
Enter the printer name.[MP560LAN]mp560-again
Execution command = sudo /usr/sbin/lpadmin -p mp560-again -m canonmp560.ppd -v cnijnet:/00-1E-8F-64-60-96 -E
#=========================================================#
# Set as Default Printer
#=========================================================#
Do you want to set this printer as the default printer? (yes/no) [yes]no
#=========================================================#
Installation has been completed.
Printer Name : mp560-again
Select this printer name for printing.
#=========================================================#
Then the printer should be set up correctly, and you can test it by going to System -> Administration -> Printing, right clicking on the new printer, selecting “Properties” and clicking on “Print Test Page”.
Scanning to a USB stick
This has turned out to be convenient enough that I haven’t bothered to get scanning over USB from my netbook to work yet, and of course it works fine – just follow the prompts on the display. It only offers scanning to PDF or JPEG, unfortunately – it would be nice if PNG was another option available when scanning to a USB mass storage device.
I haven’t actually finished the FAQ bit of this post yet, but since I’m not sure when I’ll have time to do so, I’ll just publish it anyway – please let me know in the comments if this is useful for you, or there’s something else you’d like to see included.
Submodules in git are commonly misunderstood in various ways, and although the explanation in the official manual is clear and pretty easy to understand, I thought that a different treatment here might be useful to someone.
What are submodules?
A submodule in a git repository is like a sub-directory which is really a separate git repository in its own right. This is a useful feature when you have a project in git which depends on a particular versions of other projects. For example, if you’re developing a new Ruby-on-Rails application, you could add a clearly specified version of the Rails repository as a submodule at the path vendor/rails. The example I’m going to use in this post however, called whatdotheyknow, is one of the various mySociety projects that depend on a repository called commonlib, which contains useful code common to at least one project. In each project the commonlib repository has been added as a submodule. (I’ll sometimes refer to the whatdotheyknow repository as the super-project, which I hope is clear.)
It’s important to understand that the repository which contains a submodule knows very little about it except for which version it should be and various bits of information about how to update it. (More on that below.) If you change directory into the submodule then you’ll find that it doesn’t know anything about the the parent project at all, and you can carry out operations in that repository as if it were standalone.
Before you proceed…
… it’s worth checking what version of git you have. Many actions that you might perform that relate to submodules are done with the git submodule command, but in older versions of git this has two problems that make it very easy to get confused – I think these are important enough that everyone who uses submodules should be aware of them, and ideally upgrade their copy of git to a version that doesn’t have these problems: at least version 1.6.2.
The first of these is that if you had a typo in the name of a submodule listed on the command line, that would be silently ignored. The second problem which compounded this is that if you spelled the submodule name with a trailing slash (as is common with tab-completion) then that did not refer to the submodule, and due to the previous problem would be ignored. There were fixed in f3670a5749d70 and 496917b721ada. (As a small point of interest, to find out which tagged releases had these fixes, I cloned git.git and did git tag --contains 496917b.)
Note also that version 1.7.0 and later versions of git have some annoying differences in behaviour, which are noted below.
How are submodules stored?
To answer this you need to understand a little bit about how git stores objects. If you just want recipes for how to do particular things, then you can skip to “Things You Might Need To Do” below, but I think this section is useful for figuring out problems that might arise.
git’s model of the world is based around objects which are identified by their “object name”, which is the correct term for the SHA1sum hashes you see all over the place. These objects can be of various types, such as “commit”, “tag”, “blob” (file), “tree” (directory), etc. Each commit object points to a tree object which represents the state of your source code at that commit. A tree object in turn consists of a list of objects with some metadata, e.g. as in this example for whatdotheyknow:
$ git ls-tree HEAD^{tree}
100644 blob 1e38e022c1c7d27f6dd9b765793087b59d147ef8 .cvsignore
100644 blob aa5036394edfea0a5dff64e0c53b4e9a026f1beb .gitignore
100644 blob 4ef4ae8268dcad9b0de371f1aa63bb3ebbeb436a .gitmodules
100755 blob 44c881fe25b8dc1413d9195677f492121a3789f0 INSTALL.txt
100644 blob 37312d9a1bcc80ac334547f047a2cece38dd24dc README
100644 blob 3bb0e8592a41ae3185ee32266c860714980dbed7 Rakefile
040000 tree e326ffb3d697e7ac83fa19d93a8a3305120c719e app
160000 commit fd91ab69279f1e0cfed53353e64811d5aa9c4b5f commonlib
040000 tree ae93b14ec7ab01ee33053c32eca340a31ce6449f config
040000 tree 8a7eb4d1552cc2a59fc0528c02fe0fb686d7f562 db
040000 tree 84fae00002a0e834140e2f806978748d50d60c4b doc
040000 tree eb4089c7989ee846bbd66c97069aeff7853d0064 lib
040000 tree e7bcca0f6d561188730125b228a22a4d7bd68782 public
040000 tree f4e46de68199afa382d53583d83430c691aeb473 script
040000 tree e5772463cfed62ba63cfaf4e0eacecd1dc3895e5 spec
100644 blob bfc265e33e47ffa9796fe7bb7ae7d1fe7e633593 todo.txt
040000 tree 2999c0a790c0033ad93e312c0bc62ecdc9a18f81 vendor
As you can see, typically the types of objects listed in a tree are either blobs or trees, indicating files or subdirectories. However, if a object of type “commit” is listed (with the mode 160000) that represents a submodule. The object name (in this case fd91ab69…) is the commit that the submodule’s HEAD should be at. One implication of this is that that object name usually won’t be known outside the submodule. This sometimes causes confusion when people do git diff in the super-project and find a difference in the submodule entry, e.g.:
This output means that the submodule version which is committed in the whatdotheyknow repository is fd91ab69279, but if you change into the commonlib subdirectory, you will find that the HEAD of that repository is at d6593c6741. Hopefully both of these commits will be known in the commonlib submodule, but neither will be in the whatdotheyknow repository.
The other information about the submodule which is stored in the super-project is stored in the .gitmodules file and in config options.
A submodule which is “initialized” will have a config option set to indicate the URL that the submodule should be cloned from if it is missing. These config options are of the form submodule.<SUBMODULE-NAME>.url, so having initialized the commonlib submodule in whatdotheyknow, I can see the following:
If you’re publishing a repository with the intention that anyone should be able clone and use it, you should make sure that the URLs specified in .gitmodules are ones that can be publicly accessed – so don’t, for example, use an SSH URL with your user name in it. Since these URLs are only used when initializing a submodule, which you typically do only rarely, it’s not a great inconvenience that you may have to change them in order to push changes you’ve made in the submodule.
Things You Might Need To Do
This section lists some simple recipes for doing all kinds of things with submodules. If you think there’s something I should add, please let me know. For the sake of simplicity, in the examples below, I’m not listing submodule paths explicitly at the end of git submodule commands, which generally means that the action applies to all of the submodules. (The exception is git submodule add, which of course only applies to a single submodule.)
Get a working submodule version after cloning
If you’ve just cloned a repository which contains submodules, you can initialize and clone all of them with:
git submodule update --init
This is the equivalent of running:
git submodule init
git submodule update
With version 1.6.5 of git and later, you can do this automatically by cloning the super-project with the –recursive option:
Running git submodule without arguments defaults to running git submodule status, which produces a helpful summary of the status of all your submodules. Each line begins with a space, a ‘+’ or a ‘-’ which indicate the following things:
+
The version checked out in the submodule is different from that specified in the super-project. The object name shown is that of the commit that the submodule is currently at. (The meaning of this symbol changed in 1.7.0.)
-
The submodule hasn’t been initialized or there’s no repository at the submodule path (e.g. if you’ve run git submodule init but not git submodule update, or you’ve later deleted the submodule directory from the working tree). The object name shown is the commit that’s specified in the super-project.
[space]
The submodule’s HEAD is at the correct version – the object name shown is that version.
In projects with many submodules this can be a helpful way to see at a glance where all your submodules are at. For example, here’s some output from a version of the Fiji project that I’m working on:
Update submodules to the versions specified in HEAD
If you change the HEAD of your super-project (e.g. with git pull, or by checking out a new branch) you may find that your submodules are now at the wrong versions. (You can check with git submodule status as shown above.) If you’re not actively working on the submodules, then the simplest way to move the to the right versions is with:
git submodule update
If any initialized submodules are missing, this will clone them. For other submodules where the repository exists, this will change into its subdirectory, run git fetch (to make sure all the most recent updates are present) and then git checkout the correct version. This has the effect of “detaching HEAD” in each submodule, so if you want to work on a branch in any of those subdirectories, you’ll have to git checkout to a branch.
The most frequent errors that you’ll find when running git submodule update are likely to be due to someone having created a commit in the super-project that references a commit in the submodule that they’ve forgotten to push, so check that whenever you get errors about not being able to find particular versions.
Versions of git after 1.6.4 add the --merge and --rebase options to git submodule update to allow more flexible ways of updating your submodules while you’re working on them.
Add a new submodule to a repository
This is nice and easy to do from a URL. For example, if we wanted to create a new mySociety project called “create robot MP”, and add commonlib to it, you would just use git submodule add:
$ mkdir createrobotmp
$ cd createrobotmp
$ git init
Initialized empty Git repository in /home/mark/tmp/createrobotmp/.git/
$ git submodule add git://git.mysociety.org/commonlib commonlib
Initialized empty Git repository in /home/mark/tmp/createrobotmp/commonlib/.git/
remote: Counting objects: 5240, done.
remote: Compressing objects: 100% (1974/1974), done.
remote: Total 5240 (delta 3311), reused 5038 (delta 3197)
Receiving objects: 100% (5240/5240), 1020.36 KiB | 377 KiB/s, done.
Resolving deltas: 100% (3311/3311), done.
$ git status
# On branch master
#
# Initial commit
#
# Changes to be committed:
# (use "git rm --cached <file>..." to unstage)
#
# new file: .gitmodules
# new file: commonlib
#
Then you need to stage and commit .gitmodules and commonlib as with any other new files. Since this puts the URL in the .gitmodules file, you should make this a publicly clonable URL, as mentioned above.
Change the remote for a submodule
If you frequently work in a submodule you might want to change the default remote “origin” to refer to a URL that you can push to, just so you can use one remote for everything. You can do this by deleting orgin and adding it back with a new URL, with e.g.:
These config options set up the helpful defaults for git pull when you’re on master.
If you’re in the habit of deleting whole submodules, and then recreating them with git submodule update then you should also make sure that you change the URL in the super-project’s config settings, e.g.:
If you know in advance that you want to clone your submodules from a URL different from that specified in .gitmodules (e.g. with a private SSH URL that can you push to) then after cloning the superproject you can set the appropriate config by hand before running git submodule update. This takes the place of the git submodule init command, for example:
$ git clone ssh://mark@git.mysociety.org/data/git/public/whatdotheyknow.git
[...]
$ cd whatdotheyknow$ git config submodule.commonlib.url ssh://mark@git.mysociety.org/data/git/public/commonlib.git$ git submodule update
Initialized empty Git repository in /home/mark/tmp/whatdotheyknow/commonlib/.git/
remote: Counting objects: 5240, done.
remote: Compressing objects: 100% (1974/1974), done.
remote: Total 5240 (delta 3311), reused 5038 (delta 3197)
Receiving objects: 100% (5240/5240), 1020.36 KiB | 533 KiB/s, done.
Resolving deltas: 100% (3311/3311), done.
Submodule path 'commonlib': checked out 'a901c2a431f7869f5c2eaee5808f8590ca78544e'
$ cd commonlib/$ git remote show origin
* remote origin
URL: ssh://mark@git.mysociety.org/data/git/public/commonlib.git
HEAD branch: master
Remote branch:
master tracked
Local branch configured for 'git pull':
master merges with remote master
Local ref configured for 'git push':
master pushes to master (up to date)
Modified submodules in 1.7.0 and later
Versions 1.7.0 and later of git contain an annoying change in the behaviour of git submodule. Submodules are now regarded as dirty if they have any modified files or untracked files, whereas previously it would only be the case if HEAD in the submodule pointed to the wrong commit. Why is this annoying? The following reasons:
Firstly, the meaning of the plus sign (+) in the output of git submodule has changed, and the first time that you come across this it takes a little while to figure out what’s going wrong, for example by looking through changelogs or using git bisect on git.git to find the change. It would have been much kinder to users to introduce a different symbol for “at the specified version, but dirty”.
git status is now very slow in projects with several large submodules. (git status used to be nearly instant in a clone of fiji.git but trying just now with 1.7.0.4 took an incredible 45 seconds.)
This seems like a change that was introduced without considering the surprise and impact that it would have on users. In any case, I’ve added this note here since if you work with submodules, you may need to be aware of this change in behaviour.
The output of git diff has changed as well, to add “-dirty” to the object name if the working tree of that submodule is dirty:
This is really a post about adapting Helvetireader for netbook-sized screens, but I can’t resist adding a short rant about online RSS aggregators first…
Why Google Reader?
Once upon a time, on the recommendation of Need To Know, I started using Bloglines to keep track of blogs and anything else that published an RSS feed. It worked pretty well, but had a number of usability problems that were fixed by the excellent Bloglines Beta. Well, it was excellent apart from one thing: at some point (one or two years ago?) feeds stopped being updated in a timely fashion. I reported this as a bug but, in contrast to all my previous bug reports to them, I never even received a confirmation email back. Problems of this type have been widely reported but as far as I know this is still unfixed. Unfortunately, while I loved the interface of Bloglines Beta this lack of updates meant that it had effectively become useless, so I’ve switched to Google Reader – the other reasonable alternative when I was considering a switch seemed to be “NewsGator Online“, but that failed to import my rather large list of feeds and the service has now shut down anyway.
It’s very sad that Bloglines’s previously excellent service seems to be now so unloved by its current owners and maintainers.
Google Reader’s Shortcomings
(Update: originally this paragraph had a short complaint about how Google Reader couldn’t be configured in a way that I was happy with, but then Jenny pointed out that in fact it could :) In short, the following settings are the ones that I like:
Select “New Items”, which is a toggle for the view, applying to everything until you change it back to “All Items”.
(Tedious) Go to every feed and folder in the subscription list on the left and select from the drop-down menus “Sort by oldest”.
Annoyingly the “Sort by oldest” option then only shows the unread items in the last 30 days, ordered chronologically – but this is still much better than the default configuration, in my opinion.)
There are also some superficial annoyances that fortunately can be easily fixed:
The appearance of the page is noisy and inelegant.
The layout works badly on small screens, such as my netbook.
At some point I was told about Jon Hicks’s “Helvetireader” redesign of the CSS for Google reader. This is a vast improvement in the appearance of the application, I think, and it hides much of the noise in the interface that I don’t care about – for example, here are some “before and after” screenshots:
And with Helvetireader:However this still uses up a lot of screen real estate at the top of the page for elements that I don’t care about, such as the search box, login details, social features, etc. Also, on my netbook the pane on the left is so large that you can’t get a single Dinosaur Comics visible in the right :) So I added a few more instances of display: none to the Helvetireader CSS and adjusted the sizes of the main panes. The only non-obvious bit about this was that there’s some bit of Google’s javascript that sets the display property of the search box back to block – fortunately, if you use Greasemonkey to change the ID of this element to something else, then it can’t be found by ID and changed back. This is a bit horrible, but doesn’t seem to obviously break anything else in the interface…If you want to try this out, then the Greasemonkey script is here, and that points to the CSS here. Those are tracked in a github gist in case you want to see all the changes back to the original Helvetireader. This works well in Chromium (so presumably Google Chrome too) and is OK in Firefox, but for an odd bug where the window doesn’t redraw correctly after changing the CSS – you can resize the window to force a redraw, though.
Update: the screenshot there shows a slightly older version of my customization – since then I’ve added back some options for feed settings, etc.
I rather like this approach to changing the appearance of websites to match your needs. You can find many inventive examples of this idea at userstyles.org.
I used to host my photos with a simple set of CGI scripts that basically worked well enough for my simple requirements. Such web applications are easy and fun to write, but in the end I decided that it wasn’t worth it because:
Hosting large amounts of data on a generic shell account is typically quite expensive. Flickr‘s “pro” account subscription is a very good deal in comparison: as long as each photo is beneath 20 megabytes in size, you can upload as many as you like for $24.95 a year.
The community aspect of sites like Flickr is very encouraging – it’s lovely to have random people say nice things about your photographs, and occasionally have people use them in articles, etc.
(Some people are put off from using Flickr by the appearance of the site, but its API means that there are plenty of alternative front-ends for viewing or presenting your photos, such as flickriver.)
The slight problem with switching to hosting on Flickr was that previously I’d indexed all my photos by the MD5sum of the original image, so several of my pages had links or inline images that pointed to an MD5sum-based URL on the old site. It occurred to me that it might be useful in general to have “machine tags” on each photo with a hash or checksum of the image, so that, for example:
You can simply check which photos have already been uploaded.
You can find URLs for all the different image sizes, etc. based on the content of the file.
Unfortunately, I hadn’t done this when uploading the files in the first place, so had to write a script (flickr-checksum-tags.py) which takes the slightly extraordinary step of downloading the original version of every photo that doesn’t have the checksum tags to a temporary file, hashing each file, adding the tags and deleting the temporary file. This add tags for the MD5sum and the SHA1sum, using a namespace and keys suggested in this discussion, where someone suggests taking the same approach. These tags are of the form:
I contribute to a collaborative blog called fifteensquared (under the name mhl) where each day people explain the clues for the Guardian, Observer, Independent, and Financial Times crosswords, as well as a number of more difficult and specialized crosswords. This is a great way to improve at cryptic crosswords – each day you have your best go at the puzzle and then can find out from the blog what you were missing and why. One interesting aspect of this is that the posts on certain crosswords generate far more comments than others. The Guardian week-day crosswords have substantially more comments than any other category, so all of these examples are from these posts. I wouldn’t take this measure of “controversy” all that seriously, since I’m not compensating for the overall variation of the number of comments: there used to be very few comments on any day, and there have been various periods where off-topic chatter has been more strongly discouraged. Nevertheless, I thought it might be interesting to do a post on what it was that made these crosswords particularly “controversial”. I’ve started with those that got 53 or more responses, which is an arbitrary threshold designed to include the memorable Auster crossword that had the answer “HUMP THE BLUEY” :)
Of course, the crosswords that people like most tend not to get nearly so many comments as those where there’s ambiguity in interpretation, or disagreements over the fairness – it would be nice if the site had a simple mechanism for rating crosswords so that it would be easy to pick out the really great ones.
It has been fun to read through these posts again, and be reminded that while it can be tiresome to read lots of complaints about a particular crossword, there are plenty of commenters on fifteensquared who consistently add lots of interest and fun to doing the puzzles that I would otherwise miss.
I still think that this crossword had an over-the-top reaction, which was largely because of two clues:
“What Aussie swagmen do to obey Jesus’ instruction? (John 5:8) (4,3,5)” => HUMP THE BLUEY. To quote from my post, this is a ‘[d]ouble definition, the first of which is rather difficult: Chambers defines “hump the bluey” as “(Aust) to travel on foot, carrying a bundle of possessions”, and Jesus’s instruction in John 5:8 is “Rise, take up thy bed, and walk” (King James Version)’
“Place where night finally gives way to day in a line between the poles (7)” => EQUADOR. It seems that this was an error, since it was later changed in the online version to “Place where Queen is replaced by Charlie and night finally gives way to day in a line between the poles (7)” => ECUADOR.
Otherwise the puzzle was rather easy, with a very high number of anagrams. The more I think about the former clue, the more it makes me smile, so even though I couldn’t solve it unaided I’m glad it was in the crossword.
This puzzle, which the majority seemed to enjoy, had a theme of two bicentenaries: the births of Charles Darwin and Abraham Lincoln. It’s unusual in this list because there were very few criticisms of the puzzle, but still many comments. (Quite a lot of the comments were rather off-topic, but not in a way that I thought was inappropriate.)
This crossword had a couple of mini-themes relating to euphemisms and a scattering of biblical clues. The most discussed clues in this one were:
“7 for what Saul did in Engedi (8,3,4)”, where the answer to 7 is EUPHEMISM => COVERING ONES FEET. A very difficult clue, where the answer is an obscure euphemism for defecating taken from the story of Saul relieving himself in 1 Samuel 24 verse 3 – the Hebrew is literally “covered his feet”.
“Gent’s son, perhaps is (7)” => BELGIAN. A very tough cryptic definition, where you need to know that “Gent” is the native spelling of Ghent, and the “[blah]‘s son” or “son of [blah]” expression to mean “someone from [blah]“
The clue for 15 across was missing in some versions.
“Aramaic skull by barbarian in gaolbreak” => GOLGOTHA. There’s some discussion of which languages Golgotha means “place of the skull” in, and “gaolbreak” for (GAOL)* is, as you would expect, regarded by some as unfair.
There were a number of answers with difficult vocabulary, in particular DIABASE, LARBOARD and ANIMADVERT.
I defended a few of these in the comments at the time, but in retrospect I think this was too tough for a daily puzzle – it probably justified the number of comments.
This was considered easy by most, but with many clues that people felt were either unsound or unsatisfactory in some way, e.g.:
“Man from Naples is rescued from a riot (5)” => MARIO. (Unless “from” is doing double duty, there’s no indication that this is a hidden answer – even if it’s just “‘s” or “of” there needs to be something extra there…)
Other complaints were of clues that weren’t cryptic enough or double definitions where both parts were very similar.
The large number of comments here were mostly genuine discussions of how to parse the clues, e.g. in what sense “of the French” can be DE, or whether ALBAT sounds the same as “Albert”. This was just a difficult puzzle for a weekday, I think, and some typically Araucarian touches in the cluing.
A quite do-able crossword, and most of the discussion is taken up on with the question of whether particular abbreviations are reasonable, in particular:
whether “relation” to give PI should be allowed
the (rarely seen) T for “Troy”, referring to an abbreviation for the unit of weight sometimes used for precious metals
The former question comes up quite frequently, and unfortunately tends to provoke tedious discussion. In the online archive it’s only been used by Chifonie, as far as I can tell.
This puzzle was very well received by the regular contributors for its humour and a couple of entertaining liberties. The controversy here was generated by a comment from don which rewrote negative comments from the previous day’s blog post to apply to this puzzle, to make the point, as I understand it, that reactions to puzzles are strongly biased by the name of the setter.
The comments consist largely of off-topic banter (e.g. about Paul’s clue competition), except for these issues:
“Archbishop hit little girl (7)” => LAMBETH: it turns out that LAMBETH being synonymous with “the Archbishop of Canterbury” is supported by Collins and the OED.
The clue “Ring student beset by siren (5)” => CIRCE provoked comments that Circe was not a Siren, but it was later argued that she was still a siren in the less specific sense :)
A puzzle with a mini-theme of literature, which generated a bit of generic outrage. I think that two of the clues which upset people were genuinely sub-standard, though:
“Poem cut and edited to be on standby (4)” => IDLY, which is IDYL[l] then “edited” to rearrange L and Y. As well as that rather indirect construction, I don’t think one can substitute IDLY for “standby”, “on standby” or “to be on standby” in a sentence.
“Cheat to ask for oil over the water, say (7)” => BEGUILE. The most convincing two options were BEG = “ask” + UILE = sounds like the French for oil (“huile”) or an Irish pronunciation of “oil”. In either case “over the water … say” would indicate a non-mainland homophone.
The controversy here mostly arose from some of the answers and constructions being very hard. e.g. it contained the words COLOSTRUM, RELIEVO, MONOPHTHONG, EREMITE and ELEMI and a several more that are less than obvious. On the other hand, many of the clues had Enigmatist’s characteristic humorous touches – I remember laughing at several of them. The constructions that provoked the most discussion were:
“River spot in which I’ll get lost, say. No circumnavigating Backs (7)” => YANGTZE. C. G. Rishikesh explains this as EG = “say” + NAY = “no” around (“circumnavigating”) Z[i]T = “spot in which I’ll get lost”, with “Backs” indicating reversal of everything.
“Man by joiner in a whirl? The reverse (9)” => ALEXANDER, which IanN14 explains as A + REEL = “whirl” reversed around X = “by” + AND = “joiner”.
Unfortunately the end of the discussion degenerated into some bad-tempered back-and-forth, tangentially related to the frequently seen “cattle” meaning of “neat”.
Everyone seemed to find this pretty easy, and it was uncontroversial apart from the odd instance of an adjectival phrase defining a noun and the use of a number of words in the clue directly in the answer. The mostly off-topic comments include some discussion of whether accents should matter in crosswords, how strict cluing should be, and regrettably some trolling.
A very tricky crossword themed around a quotation from Thomas Babington Macaulay’s tribute to John Milton, on the occasion of the 400th anniversary of Milton’s birth. Of the many clues that caused problems, there were:
“Parrot on top of one of the Roses? (4)” => LORY. “Lancaster OR York” were the Roses in the War of the Roses.
“Ross’s leader following his follower, mostly one that suffers (6)” => MARTYR. Ross’s follower is Cromarty (as in Ross and Cromarty), so “mostly” might give you [cro]MARTY followed by Ross’s leader (R).
“Forties cry that’s on the up when winning first gold (6)” => EXCELS. I think the definition, somewhat bizarrely, is a homophone for XLs (“Forties” in Roman numerals). Geoff Moss explained that the subsidiary is that EXCELSIOR = “on the up” can be obtained by adding I = “first” + OR = “gold”.
‘Shocking omission of “the plural of mou_e (7)”‘ => SEISMIC. A nice clue, I thought – if you insert SEISMIC into “the plural of mou_e” you get “the plural of mouSE IS MICe”, the definition being “shocking”.
“Bird on pole, no friend to our friend (5)” => CROWN. CROW = “bird” + N = “pole”. The definition refers to Milton’s opposition to the monarchy.
Some difficult answers as well (MONDAY CLUB, INANITION for me) made this all round a serious puzzle, and justified the large number of comments on it. I remember really liking Geoff Moss’s comment about how to consider a crossword after finishing it.
This puzzle was themed after capital cities, all of which were missing the definition part – 13 were hidden in the grid, all but one being 6 letters long. (The rubric gave quite a big hint, in fact: “Thirteen solutions are of a set, one of which is here translated into its own language. None of these is further defined.”) The most difficult clue for people seemed to be:
“1 in 2 (4)” => WIEN. 2 was LONDON, London is sometimes known as “The Great WEN” and WIEN is the German for Vienna. (This was the solution “translated into its own language” referred to in the clue.) Undoubtedly a very difficult clue.
The large number of comments on this crossword were partly due to it starting the (rather frequent) debate on whether Araucaria deserves the high standing in which he is held. My feeling is that fifteensquared would be better off without these debates, since people’s preferences are so personal, but that would be rather hard to enforce. I think Eileen’s comment on this one sums up how I feel about difficult but fair crosswords. However, I think it’s clear from the irritation that people express when obscure words or constructions come up in the daily crosswords that lots of people do take it much more personally.
And the winner of the grand prize is Rover! The problematic clues in this crossword were:
“It’s pretty to behold what Platonic friends discuss at leisure (4-2-8)” => LOVE-IN-IDLENESS. Love-in-Idleness is a flower, so “It’s pretty to behold” is the somewhat weak definition – LOVE is what friends discuss in Plato’s Symposium, and IN-IDLENESS is “at leisure”.
‘Translator of “The German Eating Fish” (7)’ => DECODER. People generally assumed this was a mistake (COD = “Fish” in DER = “The German”) but perhaps it was meant to be CODE = “Fish” (a cipher used in WWII) in DER = “The German”. However, that latter interpretation would need the clue to have “Fish, Say” or “Fish, Perhaps”.
“Almost general tutorials (7)” => CLASSES. This should be read as “genera” (“Almost general”). In retrospect, I don’t think this should have caused so many problems.
Again, it’s worth noting that despite provoking lots of discussion and criticism, there aren’t that many really problematic clues, as Sil van den Hoek points out. If I were a setter for a national newspaper, I’m not sure I would cope well with reading these discussions, given how tough it is to write a single good and original clue.
Eventually, I reached a point with my long-suffering and much-repaired iPod where it didn’t seem to be worth continuing to pay to get it fixed up again, especially since I could reasonably switch to a device with solid-state storage instead of a hard disk. Since I’m trying to avoid using Apple products because of both ideological and pragmatic concerns, I went for the strategy of trying to buy the cheapest option in Dixons (sorry, “Currys.digital”) when I was running for a train. (This approach minimised the amount of time I could spend on the decision and so created extra happiness in itself.) The device I came away with was an ex-display model of the Sandisk Sansa Clip, with an extra discount because it was actually missing the clip bit. That came to £30 for an 8 gigabyte device, whereas an iPod nano with the same storage would have been about £140. (Of course, it doesn’t have the gorgeous screen of the nano and can’t play video – on the other hand, 8GB would get filled up fast if I put any video on it.) The biggest advantage of this cute little device over the iPod, of course, is that it’s much less likely that the manufacturer will deliberately prevent the device from working with my computer, as Apple have repeatedly done to GNU/Linux users who’ve bought their devices.
Anyway, I’m basically very happy with this replacement. There are a few small user interface problems, e.g. I miss the click-wheel for seeking within a track – fast-forward and rewind on the Clip are a bit sluggish initially. Additionally it takes a couple of seconds to wake up after it’s been paused for a while, which is surprisingly irritating. On the other hand:
It plays Ogg Vorbis files!
It plays FLAC files!
The little screen is bright and clear
It’s really small and light
It has a “sleep” function
You can use it to take voice notes (although annoyingly you don’t seem to be able to set the time and date, so the timestamps are useless)
The sound is good
The UI is generally very responsive, so the small screen isn’t too bad for searching for music
It remembers the point at which you stopped listening to an audiobook, and offers to resume from that point when you return to it later
Sadly, gapless playback doesn’t work, which is basically par-for-the-course for MP3 files (even when they have correct delay and padding in the metadata), but rather surprising in the case of FLAC and Ogg Vorbis files.
The only other problem was getting it to present podcasts in the way that I like, which is probably due to my odd preferences rather than the device itself – the rest of this post is about that.
Podcasts
This may be unusual, but I find the most useful way to go through podcasts on any audio player is to have all of the most recently downloaded episodes from any source in one playlist, ordered from oldest to newest. With gtkpod and my old iPod, you could easily set up a “smart playlist” to do this, but I had to write a short script to create this on the Clip. It’s simple enough, apart from one point:
I found that very few episodes were actually appearing in the playlist, and it turns out that this was because the Clip cares deeply about that the TCON id3v2 tag, and if this contains “Podcast”, it won’t appear in any playlist under the “Music” menu – it’ll only appear in the “Podcasts” menu. So, there’s an option (which I always use) to wipe out that tag if it contains “podcast”.
(Incidentally, I heartily recommend hpodder as a podcatcher – it’s the only one I’ve found that does what I want in its default configuration.)
In case it’s of any use to you, the script is included below – it’s hosted as a gist on github, so you can clone it (or download the raw file) from the links below the file.
LyX is a lovely bit of software for preparing beautiful documents – you get the high quality output of LaTeX and the advantages of logical document description in a usable interface and without having to remember TeX syntax. There are a few aspects of using LyX that puzzled me while writing a certain large document, however – many of these are dealt with in the LyX FAQ, but I thought it would be worth collecting those that were most useful to me here.
Use pdflatex for Output
There are various different options for generating output PDF output in LyX, but it will save you trouble if you do everything using pdflatex in the first place. (I think this is the upshot of the slightly unclear advice in the FAQ on the subject.) This turned out to be particularly important because when your document is 50000 words long and has over 100 figures, the other methods take over 10 minutes to generate a PDF; pdflatex would finish in a couple of seconds. If you take this advice then you have to change the Document -> Settings -> Document Class option to pdfTeX, or you get some surprising errors. Also, I would strongly recommend that you only use PNG files for bitmap images and PDF for vector graphics. (PNG is obviously sensible, but in the case of vector graphics I found PS and EPS files unexpectedly awkward in terms of getting the orientation and clipping right.)
Incorrect Colours in Bitmap Graphics
I came across a bizarre problem where the colours would be slightly wrong for certain PNG files that I include in the document. (I suppose I should say “colors” too, just for the sake of searchers using American English.) This turned out to be a problem with full-colour PNG images with transparency (i.e. RGBA images), which my notes say is discussed further intheseposts. Setting the PDF version as suggested in the first of those posts didn’t help me at all, so I had to convert all my RGBA PNG files to RGB. If you want to check for these files you can use file(1), something like:
… and I fixed them by feeding the filenames (one per line) to a script like:
#!/bin/sh
set -e
while [ $# -ne 0 ]
do
t=`mktemp`
convert "$1" png24:"$t" && mv "$t" "$1"
shift
done
Obviously you need imagemagick installed for the “convert” command.
Footnotes
By default there is no extra vertical space to separate footnotes, but I much prefer there to be a small gap. To do that, add to the following line to the document preamble:
\setlength{\footnotesep}{12pt}
Captions
By default, the formatting of caption text in floated figures looks very similar to the main body text. Somewhere on the web I found the recommendation to use the “caption” package to change this, e.g. by adding the following to the preamble:
Making tables fit on the page is annoying – just changing the text size often doesn’t reduce the overall size or causes a horrible font to be used. Resizing the whole table is the best way I found. Before the table (either in a float or in the normal flow of the document) I added the following in ERT:
\resizebox{\textwidth}{!}{%
and then immediately after the table added, again in ERT:
}
This scales the table such that the width of the table fits the page width.
Suppressing Pages Numbers For Full Page Figures
If you want to use a whole page for a floated figure, the page number can overlap with the figure or just look odd. However, second tip here: http://wiki.lyx.org/FAQ/UnfloatingFigureOnEmptyPage works well to remove page numbers from all-page figures. To summarize, add the following to the preamble:
\usepackage{floatpag}
\floatpagestyle{plain} % Default page style for pages with only floats
Then, in ERT before the figure (but still in the float) add:
\thisfloatpagestyle{empty}
\vspace{-\headsep}
… and similarly, after the figure but above the caption add:
\vspace{0.3cm}
… or you may find the caption too close to the graphics.
Better On-Screen Fonts in PDFs
As explained in the second question in this mini-FAQ on generating PDFs from LyX you should use the outline font version of Computer Modern instead of the bitmapped versions. For me, this boiled down to going to Document -> Settings -> Fonts and setting the Roman font option to “AE (Almost European)”.
You can further improve the rendering of text in your output by using microtype. Just add
\usepackage{microtype}
… to the preamble. (These suggestions only apply if you’re using the pdflatex workflow as suggested above.)
Adding Footnotes in Tables
You might notice that adding footnotes in table cells doesn’t work. One answer to this is manually add them in ERT with \footnotemark and \footnotetext, possibly adjusting the counter as described in that FAQ entry.
I submitted my PhD thesis over a month ago now (on the 11th of September) and I’ve still not recovered properly from the experience. Perhaps that’s to be expected after 5 years of it. At some point I’ll have to try to write something coherent about what it has been like, but all I can really say at the moment is that I still stand by my advice that embarking on PhD research is a bad idea for almost everyone. Anyway, as a way of trying to put it all into perspective I wrote a few scripts to visualize my thesis and the process of writing it, so I’ve collected a few of these here.
The first of these is pretty simple to do, since I just collected some word frequency data and fed that into Wordle:
This next graph shows how the number of lines in my thesis document slowly increased over time. The flat period for a year at the beginning really represents starting small bits of chapters and then realizing that much of the work and analysis would have to be redone:
(In case you’re wondering, the thesis was about 50000 words in the end, which corresponds to about 40000 lines of the LyX document, since the LyX format is very verbose – it does roughly correspond to how the thesis as a whole progressed, though.)
Throughout writing the thesis I wondered what the graph of citations would look like, but didn’t have time to do anything about it until after submitting. I was hoping I could use Google Scholar (or some similar online archive) to discover the “A cites B” relationship, but there isn’t an API for it at the moment, and I didn’t think webscraping these data would be worth it. However, I kept all the papers I could find in PDF format in my thesis git repository, consistently named as papers/[BIBTEX-KEY].pdf, so it was simple to write a short Python script which searched for each paper’s title in the text of every other paper. This means that it will miss quite a lot of relationships, since pdftotext doesn’t work satisfactorily on many of the papers, some have OCR errors, etc. etc. but I’m pleased that it seems to have extracted so many of them:
The colours indicate how recently the paper was published, from purple (1967) to 2009 (red). The script outputs the relationships in graphviz‘s dot format, and that image was rendered with “neato”. I excluded any apparently unconnected papers. In case you’re interested in the rather shoddy script, I’ve put it online.
Finally, I thought it might be nice to include a section of one of the images from my thesis to add a flavour of what I’ve been doing – this shows the primary paths of some some neurons which were traced with my “Simple Neurite Tracer” tool and registered with CMTK:
I recently made a change to the Scottish Parliament parser in ukparse so that it would preserve as accurately as possible where the timestamps occur within the text of the Official Report. (These are now in <placeholder> tags throughout the XML.) One of the things this lets us do is get a rough estimate of how fast each MSP talks. This isn’t meant to be taken particularly seriously, for the various reason given below, but it’s perhaps a nice example of how you can use the structured data version of Scottish Parliament that I created for They Work For You Scotland for simple analyses of what’s being said in parliament.
Top 25 Fastest-Talking MSPs
This league table was very quickly put together, so I apologise for any errors – it’s only really meant as a demonstration anyway. I’ve only included the top 25, since the slower end of the table tends to be distorted by non-speech actions in the parliament falling in between timestamps and appearing to make a speech slower than it actually was. On the other hand, there is no converse effect (speeches appearing to be faster than they were) that can arise except through errors in the timestamps in the Official Report.
It’s worth noting that since speeches are typically time-limited by the Presiding Officer, there is a incentive to talk fast. Also the variance in words-per-minute in this top 25 is not large, and having watched videos of these MSPs’ speeches, it’s clear that that there isn’t an obvious relationship between clarity and average words-per-minute.
The script only takes notice of a speech when there is a single speaker between two consecutive timepoints. Unfortunately, this doesn’t happen as often as I would like, so you only get a small sample of each MSP’s speeches represented here – another reason to be suspicious of the results.
The script ignores timestamps that have the same time as the previous one and any speeches that contain text like “meeting suspended”, which often indicate that the next timestamp falls at the end of the break in proceedings. It also ignores any speech less than 2 minutes in length, since these have the highest error. “Words” are just considered to be anything in the speech that’s separated by whitespace.
I’ve also excluded the following speakers:
Anyone who has spoken in the Official Report who isn’t an MSP, such as Her Majesty the Queen and the speakers at Time for Reflection.
Anyone who has presided over the parliament, in particular the Presiding Officer and Deputy Presiding Officers. Whoever is chairing proceedings often has to introduce breaks, divisions or other actions which have no reported speech attached to them but nonetheless take up time between the placeholders, so their apparent speaking speed is much too low.
This post discusses a script that converts one frequently used crossword file format into another one with the advantage that it can be loaded into a free software crossword client.
The brilliant cryptic crossword in The Independent is available for free online, but only in the form of a Java applet, generated with the non-free software Crossword Compiler. It’s wonderful that the crossword is published online now, but the way it’s published is irritating for me because of the following things:
Unbelievably, given how long they’ve been around, Java applets still have terrible usability problems: in particular they’re slow to load and hang Firefox while they’re doing so.
If you accidentally navigate away from the page you lose everything you’ve done in the crossword so far.
The applet is fiddly to use – it has to be carefully arranged to fit in the browser window on my netbook and then you can only see a couple of clues at a time. The tiny scrollbar buttons are awkward to hit.
There’s no simple way to print the crossword from the applet.
An easy solution to these problems was suggested to me by a post on Dafydd’s livejournal. I hadn’t head of xword before, but it’s a pretty nice gtk interface for doing crosswords and it reads the AcrossLite .PUZ format appropriately liberally, e.g. ignoring the checksums that AcrossLite itself requires. So, this Python 3 script, called ccj-parse.py will parse the .bin or .ccj file that the applet loads and generates a .puz file which is acceptable to xword. e.g. example usage:
And here’s an example screenshot showing xword with The Independent crossword from a couple of days ago:
Independent 7103 / Merlin in xword
This solution works pretty well for me. xword isn’t perfect by any means, but has the advantage that it works well on my netbook’s small screen, and, perhaps most importantly, autosaves your progress. I also like that if you’ve used the “Solve Word” (or “Cheat”) button, the lights are marked with a red triangle in the corner to indicate which ones you had to give up on.
I’ve tested this script on lots of The Indepedent’s cryptic crosswords, and quickly checked that it works on one from the Glasgow Herald, but I’ve no idea how generally it will work.
Someone created a sourceforge project called ccj2puz that suggests it would do the same, but it’s never had any source code uploaded, as far as I can tell, and the author hasn’t replied to the message I sent asking about it. The basic file format is quite easily guessable from the output of hexdump -C, so I don’t think this is a particularly big deal.