What does the poster look like?
See the reduced versions in the section Layout Alternatives below.
Legibility Issues
The chief problem with this poster is that there are so many characters that even when printed onto A0, you basically need a magnifiying glass to see anything. The image above and to the right is one square inch of the final poster (reduced to 300dpi). I've played around with different ways of formatting the poster and separating blocks - my current favourite is the "inline" version described below. If you want to get an idea of how large the characters will be on various numbers of A0 sheets, the images below show the top left 1x2 inches, each reduced to 72dpi (a typical screen resolution).


One option for printing out the poster is to divide it up with pamdice into 16 A4 sheets and sticking them together. The advantage of this is that you can get a good idea of what it looks like just from a normal printer (my absurdly-cheap-but-good Samsung ML-1610 had no problem printing the 600dpi A4 bitmaps), but of course the ideal is to print it out on the highest definition an A0 plotter you can find.
Downloadable Versions
Sadly, I don't believe that I'm allowed to provide downloadable versions of the final poster. Each of the code charts has the following text in it:
“You may freely use these code charts for personal or internal business uses only. You may not incorporate them either wholly or in part into any product or publication, or otherwise distribute them without express written permission from the Unicode Consortium. However, you may provide links to these charts.
“The fonts and font data used in production of these Code Charts may NOT be extracted, or used in any other way in any product or publication, without permission or license granted by the typeface owner(s).”
I may write to them to clarify their position on such posters, but I suspect the answer will be disappointing. ☹ I can't see anything that prohibits me from using the charts for this purpose, which is clearly just personal use, I think.
How was the poster generated?
If you want to see all the gory details of how to generate these posters, you can download the scripts from github - the repository is here:
http://github.com/mhl/unicode-poster/tree/master
I don't necessarily recommend trying this yourself - you'll need 30GiB of free disk space, lots of patience and be happy about editing hastily-written Ruby scripts to do what you want.
(In fact, there's a much better way of doing this that James pointed out. All the code charts have fonts for the character grids embedded in the PDFs (you can see them with pdffonts) and can be simply extracted. So, you could create a much more elegant postscript version with all the fonts embedded which would be smaller, faster, better, etc. Of course, this approach is prohibited by the terms of use, so a better approach would be to use some of the many genuinely free TrueType fonts covering much of Unicode.)
I roughly followed the method suggested by Ian Albert, but with a hot-potch of Ruby scripts. One lesson that became clear early on is that Imagemagick is basically hopeless for this kind of task. e.g. "convert" uses Ghostscript to convert from PDFs to PNGs, but in such a way that it takes forever (and All The Memory) on larger files. Just invoking gs directly avoids this. In addition, "montage" just doesn't work for joining very large images. GIMP and recent versions of the netpbm tools, on the other hand, cope very well with the large images.
The scripts will basically:
- Download all the PDF charts.
- Use gs to convert every page of them into PNGs. (In fact, it fails for the later pages in some files, but always after the code charts.)
- Find the grids in each PNG file to detect where the corners of each cell are. This works for almost every block that we're interested in, but there are a few special cases which are dealt with by hand in the script. A trivially smarter version of that program wouldn't need these special cases.
- Crop out each cell and save as a PNG.
- Divide each cell into the bottom part, with the codepoint text and the top part, with the glyph.
- Exclude any top parts that appear to be empty or bottom parts that seem to be hashed out.
- Run ocrad on all the bottom parts to find which codepoint each is, and store that in codepoints.yaml. (I didn't think of doing this myself - I got the idea from Peter Harkins who does exactly that.)
- Find the sizes of every top part (i.e. just the glyphs) and store them in top-sizes.yaml
- Use a binary search to find how wide the page should be to match ISO A series paper proportions once the glyphs are arranged onto n posters.
- Draw a series of strips which are each the height of a codepoint glyph.
- Convert these strips to PGM and concatenate them into a gigantic PGM files, one for each page of the poster.
- Scale this down to 600dpi (or whatever resolution).
- Try to find someone to print this bitmap out onto A0 paper :)
- Optionally (by hand) correct the mis-OCRed codepoints to create images for every glyph which are named with the codepoint.
Printing Issues
Bitmaps of this size are a real pain to print. If you have access to an HP DesignJet without an annoying print server in the way, then you're in luck. You can send HP RTL to it, or sending TIFF files might work too. If you must go via PostScript (as is enforced by the informatics print server, for example) then you're probably not going to be able to print it out losslessly at the full resolution of the printer. The current version on my wall was produced by using PhotoShop and the HP DesignJet printer driver to print-to-file, then sending that to the printer. Unfortunately, this only seems to work if (aarrgh) you check the JPEG encoding box and I suspect that the version I have is only 300dpi. Later versions of the poster were printed by converting the 600dpi PGM file to PCL3GUI, wrapped in PJL and sent directly to an HP DesignJet (thanks to code supplied by James, who also kindly printed a couple out for me.)
Layout Alternatives
I've played around with a few of different layout strategies for these posters, and settled on two which I think are best. The first has quite a bit of whitespace to make the separation into blocks clear. The other one just puts all the block headings and characters inline so that the characters are as large as possible.
The layout with lots of whitespace is shown below, but with all the characters fitted onto two two A4 pages (at 300dpi) - these are useless for posters, of course, but will show you what the layout of the A0 posters would be. (Note that this is for version 4.1 of the standard, which has many fewer characters, so comparing the layouts directly with the inline version is difficult.)
The layout with everything inline is similarly shown below - this is for version 5.0, though, which has many more characters.
This is the inline layout again, but squeezed onto one A4 sheet at 300dpi. For some indication of how legible this is, see the photos below:
The following two images are two characters cropped out of a 300dpi single-sheet A0 bitmap, and a microscope image of those images on a test printing of the same image. (The microscope images are taken using the The Proscope ("As seen on the hit CBS TV series CSI & CSI Miami!"!!?!!1111). Sadly, I couldn't get it to work on Linux, so those are taken from a Mac.)


And similarly with two other characters:


The DesignJet printers understand 24bit data at 600dpi, regardless of what the actual output looks like, so I could try doing some more versions at twice the resolution to see what they look like. They're very pricy to print, though, so I'll have to wait for some further funding :(