Approximate UK postcode boundaries from the Voronoi diagram of ONSPD

This work has been superceded by a new dataset of postcode boundaries I’ve since made – this post is here largely for historical interest as a result, but the data may still be useful because it includes some (very approximate) postcode polygons for Northern Ireland, which I didn’t have data for in the new approach.

TL;DR: you can try entering a postcode here and click through to see the very approximate boundaries of the postcode unit, sector, district and area around there.

This project started from me being curious about some simple questions – what does the boundary look like of all the houses with the same postcode as us? How much of our street does it cover? How much bigger would the boundary of your postcode be if you lived somewhere much more rural?

The problem with answering these questions properly in general (i.e. make it easy for anyone to find out) is that it would be incredibly expensive to do so. There are many underlying reasons for this, but essentially it comes to down to the fact that you need the latitude, longitude and postcode of every building in the UK. The only dataset which has this information is Ordnance Survey’s Address-Base product, which has wretched licensing terms. Even if I had £130,000 a year (pricing here) to spend on this, I wouldn’t be able to share the results with people due to the licensing, which is much of the point.

(Although the reasons I was interested in this originally are a bit frivolous, it really is a long-running scandal that this address database isn’t open – Sym wrote about one of the reasons why this matters on the Democracy Club blog and this case study about the Open Addresses project from the ODI gives you lots of good background about the problem, including the huge economic benefits for the country you could expect from this data being made open.)

Anyway, unfortunately, this means that I’ll have to settle for answering these questions imprecisely, using data that is open. A good starting point for this is the ONSPD (the Office of National Statistics Postcode Directory), which contains a CSV file with every postcode in the UK, and, for most of them, it has the latitude and longitude of the centroid of that postcode.

What I wanted to do, essentially, is to find, for each postcode centroid, the boundary of all the points that are closer to that centroid than the centroid of any other postcode. In mathematical terms, we want the Voronoi diagram of the postcode centroids, and we can calculate that with Python’s matplotlib by generating the Delaunay Triangulation of the points with matplotlib.delaunay. (Delaunay triangulation is a closely related geometric concept, from which you can derive the Voronoi diagram.)

That’s not the whole story, however, since we have to think about what happens around the edges of this cloud of postcode points. For example, here is the Voronoi diagram of just the postcode centroids in TD15 (Berwick-upon-Tweed):

The most obvious features there are probably the big spikes out to sea and to the south-east, but they actually aren’t anything to worry about: it’s just that the outermost postcode centroids around the coast at Berwick-upon-Tweed are concave, which produces big circumcircles from the Delaunay triangulation and so large triangles in the Voronoi diagram. Instead, the problem is the postcode centroid I’ve highlighted with a red arrow in the south. This point isn’t actually contained in any of the polygons in the generated Voronoi diagram. I don’t want to risk this happening around the edges of the cloud of postcode points, so before calculating the Voronoi diagram I’ve added 200 points in a very large diameter circle around the true postcode points. These “points at infinity” mean that each point around the edge is definitely contained in a polygon in the Voronoi diagram. For example, if we do the same with the Berwick-upon-Tweed example, you instead get a diagram like this:

I’ve highlighted the same postcode centroid with a red arrow, and you can see that this means it’s now contained in a polygon.

Here’s an example of the polygons you get for the postcodes in SW2 1 which also shows includes these “points at infinity”:

This does mean that when you run this process for the whole UK, you might end up with massive postcode polygons around the coasts, so the script checks if any of the polygons points might lie outside the boundary of the UK (taken from OpenStreetMap) and if so clips that polygon to that boundary. (That boundary is a little way out to sea—you can see it as the purple line in the picture above—but it’s the most readily available good boundary data for the whole country I could find.)

Another inconvenience we have to deal with is that there are multiple postcodes for a single point in some cases. (One of the most famous examples is the DVLA in Swansea, which has 11 postcodes for one building. That pales in comparison to the Royal Mail Birmingham Mail Centre, though, that appears to have 411 postcodes.) We can’t have duplicate points when constructing the Voronoi diagram, so the script preserves a mapping from point to postcodes so we can work out later which polygon corresponds to which postcodes.

One other thing that made this slightly more complicated is that the latitudes and longitudes in ONSPD are for the WGS 84 coordinate system¹ – if you generate the Voronoi diagram from these coordinates directly,  you end up with polygons that are elongated, since lines of longitude converge as you go further north and we’re far from the equator. To avoid this, the script transforms coordinates onto the Ordnance Survey National Grid before calculating the Voronoi diagram and transforms the coordinates back to WGS 84 before generating the KML file. This reduces that distortion a lot, although the grid of course is rather off for the western parts of Northern Ireland.

¹ Correction: Matthew pointed out that ONSPD does have columns with eastings and northings as well as WGS 84 coordinates, so I could have avoided the first transformation.

A last stage of post-processing is to take the shapes around these individual postcode centroids and agglomerate them into higher level groupings of postcodes, in particular:

  • The postcode area, e.g. EH
  • The postcode district, e.g. EH8
  • The postcode sector, e.g. EH8 9

It’s nice to be able to see these higher level areas too, so this extra step seemed worth doing, e.g. here are the postcode areas for the whole country:

Anyway, my script for generating the original postcode polygons takes a few hours (there are about 2.6 million postcodes in the ONSPD database), which could certainly be sped up a lot, but doesn’t bother me too much since I’d only really need to update it on a new release of ONSPD. And this was just a fun thing to do anyway. (It was meant to be a “quick hack”, but I can’t really call it that given the amount of time it’s taken.)

One small note about that is that I hadn’t really used Python’s multiprocessing library before, but it made it very easy to create a pool of as many worker processes as I have cores on my laptop to speed up the generation. This can be nicely combined with the tqdm package for generating progress meters, as you can see here.

The results are somehow very satisfying to me – seeing the tessellation of the polygons, particularly in areas with widely varying population density is rather beautiful.

Get this data

If you just want to look up a single postcode, you can put your postcode into a MapIt instance I set up with all these boundaries loaded into it.

That MapIt instance also provides a helpful API for getting the data – see the details on the front page.

If you just want an archive with all 1.7 million KML files, you can also download that.

Please let me know in the comments if you make anything fun with these!

Related projects

Geolytix

Of course, I’m not the only person to have done this. I found out after working on this on and off in my spare time for a while that there’s a company called Geolytix who have taken a similar approach to generating postcode sector boundaries, but who used real-world features (e.g. roads, railways) close to the inferred boundaries and adjusted the boundaries to match. There’s more about that in this blog post of theirs:

https://blog.geolytix.net/2012/11/20/postal-sector-boundaries-by-geolytix/

They’ve released those boundaries (down to the postcode sector level) as open data as a snapshot in 2012, but are charging for updates, as explained there.

The results look rather good! Personally, I’m more interested in the fine grained postcode boundaries (below the postcode sector level), which aren’t included in that output, but it’s nice to see this approach being taken a big step further than I have.

OS Code-Point Polygons

The Ordnance Survey, every civic tech developer’s favourite vampire squid, do sell postcode polygons that are as presumably as good as you can get. (It sounds from the Geolytix blog post above that they are generated as the Voronoi diagram of Address-Base points, which is what I’d ideally like to do myself – i.e. just run this script on all the addresses in Address-Base.) You can see that here:

https://www.ordnancesurvey.co.uk/business-and-government/products/code-point-with-polygons.html

Naturally, because it’s the Ordnance Survey, this product is also very expensive and has crappy licensing terms.

What’s next for this project?

I’m not sure if I’ll do any more on this, since it’s been much more of a time sink than I’d hoped already, but if I were to take it further then things I have in mind are:

  • Get a better coastline polygon for the UK. The OSM administrative boundary for the UK was very convenient to use, but because it extends out into the sea some distance, it makes areas around the coast bulge out and that means that you can’t really compare the areas of postcode shapes, which is one of the things I was interested in. You could create a better polygon as a union of some of the shapes in Boundary-Line, for example.
  • Adding other sources of postcode points, e.g. OpenAddresses – although the project is in hibernation, I hoped I’d be able to use the addresses they’d already collected, but the bulk downloads don’t seem to include coordinates. I might be missing something obvious, but I haven’t heard back from emailing them about that.
  • It would be nice to make a Leaflet-based viewer for scrolling around the country and exploring the data. (For viewing multiple areas I’ve been loading them into QGIS, but it would be nice to have a web-based alternative.)

Source code and recreating the data

Source code

You can clone the source for this project from:

https://github.com/mhl/postcodes-mapit

If you want to generate this data yourself you can follow the instructions in the README.


Posted

in

by

Tags:

Comments

10 responses to “Approximate UK postcode boundaries from the Voronoi diagram of ONSPD”

  1. Alistair Turnbull Avatar
    Alistair Turnbull

    Mark, this is heroic! Good on you.

    1. mark Avatar

      Thanks, Alistair :)

  2. Bob Harper Avatar

    Massive bit of work, mad props Mark.

    1. mark Avatar

      Thanks, Bob – it’s very kind of you to say so.

  3. Beth Avatar

    Great article !,

    Do you know where I can download a Shapefile for all Posta District boundaries? We found a nice Google Table site: https://fusiontables.google.com/DataSource?docid=16RlvT0U2gcuENoydpohUSel2BEqmbpYfyfrMNWk

    However, the above link does not contain the postal district boundary outline, for example, Chiswick, W4?

    Our site:
    https://market.mashape.com/VanitySoft/uk-boundaries-io

  4. Gaurav Agarwal Avatar

    Hey Hi Mark,

    Thanks for your amazing article! Your research and time will not be wasted :)

    I wanted to know if we can get a database of latitude longitude coordinates of every (well as many as possible) of the addresses within a postcode ?

    I would want to talk to you.

    Looking forward to talking to you.

    Best
    Gaurav

    1. mark Avatar

      I’m afraid that you would have to buy AddressBase to get the latitude and longitude of every address within a postcode. (This blog post is about trying to get an approximation of postcode boundaries _without_ that information.)

      1. Gaurav Agarwal Avatar

        Thank you Mark. I thought as part if your research you must have gathered more details.

        But I’m good now. Thanks again!

  5. Gaurav Agarwal Avatar

    Hi Mark

    Is it possible to talk to you to discuss above?

    My email ID is 1gaurav6400@gmail.com

    Thanks,
    Gaurav

  6. zestate Avatar

    This is really awesome mate, I downloaded the KML file will be implementing it in our site soon. Can’t thank you enough.

Leave a Reply

Your email address will not be published. Required fields are marked *

This site uses Akismet to reduce spam. Learn how your comment data is processed.