<?xml version="1.0" encoding="UTF-8"?>
<rss version="2.0"
	xmlns:content="http://purl.org/rss/1.0/modules/content/"
	xmlns:wfw="http://wellformedweb.org/CommentAPI/"
	xmlns:dc="http://purl.org/dc/elements/1.1/"
	xmlns:atom="http://www.w3.org/2005/Atom"
	xmlns:sy="http://purl.org/rss/1.0/modules/syndication/"
	xmlns:slash="http://purl.org/rss/1.0/modules/slash/"
	>

<channel>
	<title>Mark's Blog &#187; thesis phd visualization graphs</title>
	<atom:link href="http://longair.net/blog/tag/thesis-phd-visualization-graphs/feed/" rel="self" type="application/rss+xml" />
	<link>http://longair.net/blog</link>
	<description>(occasional miscellanea)</description>
	<lastBuildDate>Tue, 08 May 2012 15:59:31 +0000</lastBuildDate>
	<language>en</language>
	<sy:updatePeriod>hourly</sy:updatePeriod>
	<sy:updateFrequency>1</sy:updateFrequency>
	<generator>http://wordpress.org/?v=3.3.2</generator>
		<item>
		<title>Thesis Visualization</title>
		<link>http://longair.net/blog/2009/10/21/thesis-visualization/</link>
		<comments>http://longair.net/blog/2009/10/21/thesis-visualization/#comments</comments>
		<pubDate>Wed, 21 Oct 2009 12:21:31 +0000</pubDate>
		<dc:creator>mark</dc:creator>
				<category><![CDATA[PhD Work]]></category>
		<category><![CDATA[thesis phd visualization graphs]]></category>

		<guid isPermaLink="false">http://longair.net/blog/?p=290</guid>
		<description><![CDATA[<p>I submitted my PhD thesis over a month ago now (on the 11th of September) and I&#8217;ve still not recovered properly from the experience.  Perhaps that&#8217;s to be expected after 5 years of it.  At some point I&#8217;ll have to try to write something coherent about what it has been like, but all I can [...]]]></description>
			<content:encoded><![CDATA[<p>I submitted my PhD thesis over a month ago now (on the 11th of September) and I&#8217;ve still not recovered properly from the experience.  Perhaps that&#8217;s to be expected after 5 years of it.  At some point I&#8217;ll have to try to write something coherent about what it has been like, but all I can really say at the moment is that I still stand by my advice that embarking on PhD research is a bad idea for almost everyone. Anyway, as a way of trying to put it all into perspective I wrote a few scripts to visualize my thesis and the process of writing it, so I&#8217;ve collected a few of these here.</p>
<p>The first of these is pretty simple to do, since I just collected some word frequency data and fed that into <a href="http://www.wordle.net">Wordle</a>:</p>
<p style="text-align: center;"><a href="http://www.flickr.com/photos/mhl20/3911477087/sizes/o/"><img class="aligncenter" src="http://farm3.static.flickr.com/2599/3911477087_1c712d6ac0.jpg" alt="" width="500" height="281" /></a></p>
<p>This next graph shows how the number of lines in my thesis document slowly increased over time. The flat period for a year at the beginning really represents starting small bits of chapters and then realizing that much of the work and analysis would have to be redone:</p>
<p><a href="http://longair.net/blog/wp-content/uploads/2009/10/thesis-writing-progress.png"><img class="aligncenter size-medium wp-image-292" title="Thesis-Writing Progress" src="http://longair.net/blog/wp-content/uploads/2009/10/thesis-writing-progress-300x225.png" alt="Thesis-Writing Progress" width="300" height="225" /></a>(In case you&#8217;re wondering, the thesis was about 50000 words in the end, which corresponds to about 40000 lines of the LyX document, since the LyX format is very verbose &#8211; it does roughly correspond to how the thesis as a whole progressed, though.)</p>
<p>Throughout writing the thesis I wondered what the graph of citations would look like, but didn&#8217;t have time to do anything about it until after submitting.  I was hoping I could use Google Scholar (or some similar online archive) to discover the &#8220;A cites B&#8221; relationship, but there isn&#8217;t an API for it at the moment, and I didn&#8217;t think webscraping these data would be worth it. However, I kept all the papers I could find in PDF format in  my thesis git repository, consistently named as <tt>papers/[BIBTEX-KEY].pdf</tt>, so it was simple to write a short Python script which searched for each paper&#8217;s title in the text of every other paper. This means that it will miss quite a lot of relationships, since <tt>pdftotext</tt> doesn&#8217;t work satisfactorily on many of the papers, some have OCR errors, etc. etc. but I&#8217;m pleased that it seems to have extracted so many of them:</p>
<p style="text-align: center;"><a href="http://www.flickr.com/photos/mhl20/4031094433/sizes/l/"><img class="aligncenter" src="http://farm3.static.flickr.com/2534/4031094433_a21a55215a.jpg" alt="" width="500" height="403" /></a></p>
<p>The colours indicate how recently the paper was published, from purple (1967) to 2009 (red). The script outputs the relationships in <a href="http://www.graphviz.org/">graphviz</a>&#8216;s dot format, and that image was rendered with &#8220;neato&#8221;.  I excluded any apparently unconnected papers. In case you&#8217;re interested in the rather shoddy script, <a href="http://github.com/mhl/draw-citation-graph/blob/master/draw-citation-graph.py">I&#8217;ve put it online</a>.</p>
<p>Finally, I thought it might be nice to include a section of one of the images from my thesis to add a flavour of what I&#8217;ve been doing &#8211; this shows the primary paths of some some neurons which were traced with my <a href="http://pacific.mpi-cbg.de/wiki/index.php/Simple_Neurite_Tracer">&#8220;Simple Neurite Tracer&#8221;</a> tool and registered with <a href="http://www.nitrc.org/projects/cmtk/">CMTK</a>:</p>
<p><a href="http://longair.net/blog/wp-content/uploads/2009/10/traced-registered-Fm-neurons.png"><img class="aligncenter size-medium wp-image-295" title="Traced and registered Fm neurons" src="http://longair.net/blog/wp-content/uploads/2009/10/traced-registered-Fm-neurons-300x166.png" alt="Traced and registered Fm neurons" width="300" height="166" /></a></p>
]]></content:encoded>
			<wfw:commentRss>http://longair.net/blog/2009/10/21/thesis-visualization/feed/</wfw:commentRss>
		<slash:comments>2</slash:comments>
		</item>
	</channel>
</rss>

