Category Archives: PhD Work

LyX Tips for Thesis Writing

LyX is a lovely bit of software for preparing beautiful documents – you get the high quality output of LaTeX and the advantages of logical document description in a usable interface and without having to remember TeX syntax.  There are a few aspects of using LyX that puzzled me while writing a certain large document, however – many of these are dealt with in the LyX FAQ, but I thought it would be worth collecting those that were most useful to me here.

Use pdflatex for Output

There are various different options for generating output PDF output in LyX, but it will save you trouble if you do everything using pdflatex in the first place. (I think this is the upshot of the slightly unclear advice in the FAQ on the subject.) This turned out to be particularly important because when your document is 50000 words long and has over 100 figures, the other methods take over 10 minutes to generate a PDF; pdflatex would finish in a couple of seconds. If you take this advice then you have to change the Document -> Settings -> Document Class option to pdfTeX, or you get some surprising errors. Also, I would strongly recommend that you only use PNG files for bitmap images and PDF for vector graphics. (PNG is obviously sensible, but in the case of vector graphics I found PS and EPS files unexpectedly awkward in terms of getting the orientation and clipping right.)

Incorrect Colours in Bitmap Graphics

I came across a bizarre problem where the colours would be slightly wrong for certain PNG files that I include in the document. (I suppose I should say “colors” too, just for the sake of searchers using American English.) This turned out to be a problem with full-colour PNG images with transparency (i.e. RGBA images), which my notes say is discussed further in these posts. Setting the PDF version as suggested in the first of those posts didn’t help me at all, so I had to convert all my RGBA PNG files to RGB. If you want to check for these files you can use file(1), something like:

find . -type f -iname '*.png' -print0 | xargs -n 1 -0 file | egrep RGBA

… and I fixed them by feeding the filenames (one per line) to a script like:

#!/bin/sh

set -e

while [ $# -ne 0 ]
do
 t=`mktemp`
 convert "$1" png24:"$t" && mv "$t" "$1"
 shift
done

Obviously you need imagemagick installed for the “convert” command.

Footnotes

By default there is no extra vertical space to separate footnotes, but I much prefer there to be a small gap. To do that, add to the following line to the document preamble:

\setlength{\footnotesep}{12pt}

Captions

By default, the formatting of caption text in floated figures looks very similar to the main body text. Somewhere on the web I found the recommendation to use the “caption” package to change this, e.g. by adding the following to the preamble:

\usepackage[margin=10pt,font=small,labelfont=bf,labelsep=endash]{caption}

Fitting Tables Onto Pages

Making tables fit on the page is annoying – just changing the text size often doesn’t reduce the overall size or causes a horrible font to be used. Resizing the whole table is the best way I found. Before the table (either in a float or in the normal flow of the document) I added the following in ERT:

 \resizebox{\textwidth}{!}{%

and then immediately after the table added, again in ERT:

 }

This scales the table such that the width of the table fits the page width.

Suppressing Pages Numbers For Full Page Figures

If you want to use a whole page for a floated figure, the page number can overlap with the figure or just look odd.  However, second tip here: http://wiki.lyx.org/FAQ/UnfloatingFigureOnEmptyPage works well to remove page numbers from all-page figures. To summarize, add the following to the preamble:

\usepackage{floatpag}
\floatpagestyle{plain} % Default page style for pages with only floats

Then, in ERT before the figure (but still in the float) add:

\thisfloatpagestyle{empty}
\vspace{-\headsep}

… and similarly, after the figure but above the caption add:

\vspace{0.3cm}

… or you may find the caption too close to the graphics.

Better On-Screen Fonts in PDFs

As explained in the second question in this mini-FAQ on generating PDFs from LyX you should use the outline font version of Computer Modern instead of the bitmapped versions. For me, this boiled down to going to Document -> Settings -> Fonts and setting the Roman font option to “AE (Almost European)”.

You can further improve the rendering of text in your output by using microtype. Just add

\usepackage{microtype}

… to the preamble. (These suggestions only apply if you’re using the pdflatex workflow as suggested above.)

Adding Footnotes in Tables

You might notice that adding footnotes in table cells doesn’t work. One answer to this is manually add them in ERT with \footnotemark and \footnotetext, possibly adjusting the counter as described in that FAQ entry.

Thesis Visualization

I submitted my PhD thesis over a month ago now (on the 11th of September) and I’ve still not recovered properly from the experience.  Perhaps that’s to be expected after 5 years of it.  At some point I’ll have to try to write something coherent about what it has been like, but all I can really say at the moment is that I still stand by my advice that embarking on PhD research is a bad idea for almost everyone. Anyway, as a way of trying to put it all into perspective I wrote a few scripts to visualize my thesis and the process of writing it, so I’ve collected a few of these here.

The first of these is pretty simple to do, since I just collected some word frequency data and fed that into Wordle:

This next graph shows how the number of lines in my thesis document slowly increased over time. The flat period for a year at the beginning really represents starting small bits of chapters and then realizing that much of the work and analysis would have to be redone:

Thesis-Writing Progress(In case you’re wondering, the thesis was about 50000 words in the end, which corresponds to about 40000 lines of the LyX document, since the LyX format is very verbose – it does roughly correspond to how the thesis as a whole progressed, though.)

Throughout writing the thesis I wondered what the graph of citations would look like, but didn’t have time to do anything about it until after submitting.  I was hoping I could use Google Scholar (or some similar online archive) to discover the “A cites B” relationship, but there isn’t an API for it at the moment, and I didn’t think webscraping these data would be worth it. However, I kept all the papers I could find in PDF format in  my thesis git repository, consistently named as papers/[BIBTEX-KEY].pdf, so it was simple to write a short Python script which searched for each paper’s title in the text of every other paper. This means that it will miss quite a lot of relationships, since pdftotext doesn’t work satisfactorily on many of the papers, some have OCR errors, etc. etc. but I’m pleased that it seems to have extracted so many of them:

The colours indicate how recently the paper was published, from purple (1967) to 2009 (red). The script outputs the relationships in graphviz‘s dot format, and that image was rendered with “neato”.  I excluded any apparently unconnected papers. In case you’re interested in the rather shoddy script, I’ve put it online.

Finally, I thought it might be nice to include a section of one of the images from my thesis to add a flavour of what I’ve been doing – this shows the primary paths of some some neurons which were traced with my “Simple Neurite Tracer” tool and registered with CMTK:

Traced and registered Fm neurons