Category Archives: GIS

Open Data GB postcode unit boundaries

Summary

In October 2020 there was big news for postcode geography geeks like me: the ONS added a postcode column to their NSUL dataset, which gave us for the first time a coordinate and postcode for each property in Great Britain as open data. I’d previously tried to generate polygons for the boundary of each postcode based on the single coordinate per postcode found in ONSPD, but the results were very approximate. With this new data I could use a very similar technique, but this time with coordinates for all the postal addresses in each postcode to get much more accurate postcode unit polygons.

This turned out to be quite a lot of work, but the results are pretty satisfying, I think. The two biggest problems with the end results are:

  1. NSUL only covers Great Britain; I don’t know of any similar dataset for Northern Ireland.
  2. NSUL doesn’t just contain UPRNs that represent deliverable postal addresses; it also has coordinates and postcodes for things like whole streets, pollution monitoring devices, verges and also historic UPRNs and so on. Since I can’t filter these out with any open data that I know of, these points distort some of the boundaries. See the section “How does this compare to ‘Code-Point with Polygons’?” below.

However, as far as I know these are the best open data boundaries of GB postcode units available for free, and they could still be improved by some work on excluding UPRNs that aren’t current postal addresses. (See “What could be improved?” below for some suggestions.)

What is this dataset useful for?

I suppose that the main use-case for these boundaries is that you have some data where values are associated with postcodes, and you want to visualize them on a map.

For me personally, it also just sates my curiosity about the notional shapes of postcodes, e.g. where they just run down one side of a street, how new developments break up existing patterns, etc. etc.

Get the data

You can download GeoJSON for the postcode unit, sector, district and area polygons from here:

Download the data as GeoJSON files (approx. 1GB)

In case you just want to explore the data a bit, you can search for a postcode or location in this MapIt instance and click through to see the postcode unit, sector district or area boundaries. Please don’t rely on this MapIt as an API, since it costs a not insignificant amount to run, and unless there’s a lot of interest in this I’m likely to take it down at some point. If you are interested in using this on an ongoing basis, please get in touch with me.

Future maintenance

I’m publishing this partly because it’s been a fun project and I find the results very pleasing, and partly to gauge interest in this dataset. It’s worth making clear that I’m not committing to making regular releases as NSUL is updated. It’s rather expensive in EC2 time to regenerate this data, and I’m not sure that an individual with very little free time is the right person to be doing that anyway.

If you would be interested in getting regular updates to this data (or working on improving it) or you represent an institution that might have the resources and longevity to take over maintenance of this, please do get in touch with me.

Caveats

  • This data is only for Great Britain, not Northern Ireland. As far as I know there is no equivalent dataset to NSUL to enable me to generate postcode polygons of similar accuracy to the rest of this dataset for Northern Ireland. If you know of one, please let me know!  The crucial thing is to have the coordinates and postcode for each UPRN in Northern Ireland. NSUL provides this for Great Britain but not for Northern  Ireland.
  • You’ll find that some of the postcode polygons overlap. There
    are broadly two cases where this happens:

    • Genuine “vertical streets”, like the DVLA building in
      Swansea, where there are multiple postcodes for a single
      building.
    • Some UPRNs in the NSUL dataset represent historic addresses, where a building previously had a different postcode. Since I can’t filter out these historic UPRNs with open data, you get cases like the one shown below, where SW2 1EP and SW2 1BX overlap because there’s a historic UPRN for a building with postcode SW2 1BX that was knocked down and replaced by a block of flats, each of which has the postcode SW2 1EP.

      An image of postcode polygons in Brixton, showing that SW2 1EP and SW2 1BX overlap due to a single point with multiple UPRNs, one of them historic
      Click through for the animated GIF – this shows the overlap between the polygons I’ve generated for SW2 1BX and SW2 1EP
  • Because of memory constraints, I couldn’t calculate the Voronoi diagram for every point in NSUL in one pass, so I calculated it for each region of Great Britain separately. This means that the postcode units that cross from one region to another (e.g. TD9 0TU or CH4 0DF) aren’t as neat as they might be.
  • This data is based on the October 2020 release of NSUL, which is fairly old now.

How does this compare to ‘Code-Point with Polygons’?

This dataset is similar to the Ordnance Survey’s Code-Point with Polygons product, which is very much not Open Data. (I’m not even sure how much it costs because their pricing is so opaque.)

Going by the description of Code-Point with Polygons on the Ordnance Survey’s website, it sounds like it’s generated with a very similar method to the one I’m using, but unlike me they are able to filter out all the points that don’t represent current deliverable postal addresses, which means that their polygons are higher quality.

We can look at a small example of how the differences between these datasets arise, since the user guide for Code-Point with Polygons contains some images showing the postal addresses used as the basis of the data. Here’s a small part of Portsmouth from that document:

An example of Code-Point with Polygons source points and polygons from the user guide.

Here’s the same region from the data I’ve generated from NSUL, with a few of the extraneous points labelled:

Sounce points from NSUL and postcode unit polygons generated from them (i.e. part of the dataset I’m publishing with this post)

You can see by comparing the points in these images how many additional ones there are in the NSUL dataset compared to the one that the OS are using. I’ve labelled a few of the extra points in the latter image for the sake of showing what kind of things they represent. (I discovered what these are by looking them up in FindMyAddress, but that site’s license agreement is so restrictive that it’s not useful for anything more than an occasional lookup.)

Hopefully this example makes clear the kind of differences you should expect to see between those datasets. In addition to the extra points I can’t filter out, it also seems that the OS polygons have been snapped to the main road. This is something that one theoretically could do with open data as well. (See “What could be improved?” below.)

So, in summary, if you want or need the best quality postcode polygons for Great Britain, you should buy Code-Point with Polygons. I suspect that for a fair proportion of use cases, though, the data I’ve generated might be good enough and is free to use under the Open Government License.

What could be improved?

It’s rather unlikely that I’ll have time to do any of these any time soon, but here are some ideas for how to improve this data:

  • Robin Houston pointed out to me that Output Area boundaries are open data, and each Output Area is a union of a few postcode units. (The mapping from postcode unit to Output Area is readily available.) This means that you could filter out some proportion of irrelevant UPRNs by looking for coordinates that fall outside the Output Area that they should lie in and adding them to an exclude-list. I think this would be worth doing at some point, although you would have to be careful how you did it – e.g. there are plenty of postcodes that didn’t exist at the time of the last census, so you don’t want to accidentally exclude all of the points from those postcodes! Possibly the best time to do this would be when the Output Areas for the 2021 census are published, which is likely to be in April / May 2020.
  • One could pretty easily set up a tool to crowdsource finding points that are clearly outside a postcode. In many cases it is very obvious just by overlaying the positions of UPRNs over OpenStreetMap tiles that some are in the middle of a road, or miles away from the rest of the points in that postcode. You’d have to make it very clear to contributors that they have to do this by local knowledge or common sense, rather than referring to any AddressBase-derived data set, but I think something like this could work pretty well.
  • As I mentioned elsewhere in this post, lots of postcode unit boundaries divide one side of a road from another or appear to run down a river, say, so I can conceive that you could write some code to try to adjust the boundaries of these polygons to features like roads in OpenStreetMap, so long as moving that boundary wouldn’t include points not previously in the postcode boundary. I’m not sure where I’d start with this, though.

One final point: I suspect this won’t be directly helpful, but it’s worth following this FOI request from Robert Whittaker, which asks for a list of historic and parent UPRNs; these are a big chunk of the UPRNs I’d want to filter out of NSUL for this project. Even if the request is eventually successful, it wouldn’t be usable in a project like this, since my aim is to create open data, and even if a list of historic and parent UPRNs were released under FOI they wouldn’t be usable as open data. However it’s interesting to read the refusals as a clear demonstration of  how obviously broken the Ordnance Survey’s incentives and obligations are.

Lessons (re-)learned

  • When you run out of inodes on your filesystem, that’s a clear sign that you should be using a database instead of the filesystem.
  • Python’s multiprocessing package, and the Pool class in particular, are pretty easy to use and very effective.
  • As ever, PostgreSQL is just an amazing bit of software. (Also PostGIS and GeoDjango.)
  • QGIS is so powerful, and so fun to use!
  • AWS billing alerts are harder to set up than they should be, but it’s very important to do that for personal projects where you need a lot of CPU – accidentally leaving a powerful EC2 instance running can cost you a lot.

Licensing

You may use this data under the Open Government License v3.0, and must include the following copyright notices:

  • Contains OS data © Crown copyright and database right 2020
  • Contains Royal Mail data © Royal Mail copyright and database right 2020
  • Source: Office for National Statistics licensed under the Open Government Licence v.3.0

Source code

The code to generate this data can be found on GitHub.

Changelog

You can find a list of changes in versions of this data on GitHub.

Approximate UK postcode boundaries from the Voronoi diagram of ONSPD

This work has been superceded by a new dataset of postcode boundaries I’ve since made – this post is here largely for historical interest as a result, but the data may still be useful because it includes some (very approximate) postcode polygons for Northern Ireland, which I didn’t have data for in the new approach.

TL;DR: you can try entering a postcode here and click through to see the very approximate boundaries of the postcode unit, sector, district and area around there.

This project started from me being curious about some simple questions – what does the boundary look like of all the houses with the same postcode as us? How much of our street does it cover? How much bigger would the boundary of your postcode be if you lived somewhere much more rural?

The problem with answering these questions properly in general (i.e. make it easy for anyone to find out) is that it would be incredibly expensive to do so. There are many underlying reasons for this, but essentially it comes to down to the fact that you need the latitude, longitude and postcode of every building in the UK. The only dataset which has this information is Ordnance Survey’s Address-Base product, which has wretched licensing terms. Even if I had £130,000 a year (pricing here) to spend on this, I wouldn’t be able to share the results with people due to the licensing, which is much of the point.

(Although the reasons I was interested in this originally are a bit frivolous, it really is a long-running scandal that this address database isn’t open – Sym wrote about one of the reasons why this matters on the Democracy Club blog and this case study about the Open Addresses project from the ODI gives you lots of good background about the problem, including the huge economic benefits for the country you could expect from this data being made open.)

Anyway, unfortunately, this means that I’ll have to settle for answering these questions imprecisely, using data that is open. A good starting point for this is the ONSPD (the Office of National Statistics Postcode Directory), which contains a CSV file with every postcode in the UK, and, for most of them, it has the latitude and longitude of the centroid of that postcode.

What I wanted to do, essentially, is to find, for each postcode centroid, the boundary of all the points that are closer to that centroid than the centroid of any other postcode. In mathematical terms, we want the Voronoi diagram of the postcode centroids, and we can calculate that with Python’s matplotlib by generating the Delaunay Triangulation of the points with matplotlib.delaunay. (Delaunay triangulation is a closely related geometric concept, from which you can derive the Voronoi diagram.)

That’s not the whole story, however, since we have to think about what happens around the edges of this cloud of postcode points. For example, here is the Voronoi diagram of just the postcode centroids in TD15 (Berwick-upon-Tweed):

The most obvious features there are probably the big spikes out to sea and to the south-east, but they actually aren’t anything to worry about: it’s just that the outermost postcode centroids around the coast at Berwick-upon-Tweed are concave, which produces big circumcircles from the Delaunay triangulation and so large triangles in the Voronoi diagram. Instead, the problem is the postcode centroid I’ve highlighted with a red arrow in the south. This point isn’t actually contained in any of the polygons in the generated Voronoi diagram. I don’t want to risk this happening around the edges of the cloud of postcode points, so before calculating the Voronoi diagram I’ve added 200 points in a very large diameter circle around the true postcode points. These “points at infinity” mean that each point around the edge is definitely contained in a polygon in the Voronoi diagram. For example, if we do the same with the Berwick-upon-Tweed example, you instead get a diagram like this:

I’ve highlighted the same postcode centroid with a red arrow, and you can see that this means it’s now contained in a polygon.

Here’s an example of the polygons you get for the postcodes in SW2 1 which also shows includes these “points at infinity”:

This does mean that when you run this process for the whole UK, you might end up with massive postcode polygons around the coasts, so the script checks if any of the polygons points might lie outside the boundary of the UK (taken from OpenStreetMap) and if so clips that polygon to that boundary. (That boundary is a little way out to sea—you can see it as the purple line in the picture above—but it’s the most readily available good boundary data for the whole country I could find.)

Another inconvenience we have to deal with is that there are multiple postcodes for a single point in some cases. (One of the most famous examples is the DVLA in Swansea, which has 11 postcodes for one building. That pales in comparison to the Royal Mail Birmingham Mail Centre, though, that appears to have 411 postcodes.) We can’t have duplicate points when constructing the Voronoi diagram, so the script preserves a mapping from point to postcodes so we can work out later which polygon corresponds to which postcodes.

One other thing that made this slightly more complicated is that the latitudes and longitudes in ONSPD are for the WGS 84 coordinate system¹ – if you generate the Voronoi diagram from these coordinates directly,  you end up with polygons that are elongated, since lines of longitude converge as you go further north and we’re far from the equator. To avoid this, the script transforms coordinates onto the Ordnance Survey National Grid before calculating the Voronoi diagram and transforms the coordinates back to WGS 84 before generating the KML file. This reduces that distortion a lot, although the grid of course is rather off for the western parts of Northern Ireland.

¹ Correction: Matthew pointed out that ONSPD does have columns with eastings and northings as well as WGS 84 coordinates, so I could have avoided the first transformation.

A last stage of post-processing is to take the shapes around these individual postcode centroids and agglomerate them into higher level groupings of postcodes, in particular:

  • The postcode area, e.g. EH
  • The postcode district, e.g. EH8
  • The postcode sector, e.g. EH8 9

It’s nice to be able to see these higher level areas too, so this extra step seemed worth doing, e.g. here are the postcode areas for the whole country:

Anyway, my script for generating the original postcode polygons takes a few hours (there are about 2.6 million postcodes in the ONSPD database), which could certainly be sped up a lot, but doesn’t bother me too much since I’d only really need to update it on a new release of ONSPD. And this was just a fun thing to do anyway. (It was meant to be a “quick hack”, but I can’t really call it that given the amount of time it’s taken.)

One small note about that is that I hadn’t really used Python’s multiprocessing library before, but it made it very easy to create a pool of as many worker processes as I have cores on my laptop to speed up the generation. This can be nicely combined with the tqdm package for generating progress meters, as you can see here.

The results are somehow very satisfying to me – seeing the tessellation of the polygons, particularly in areas with widely varying population density is rather beautiful.

Get this data

If you just want to look up a single postcode, you can put your postcode into a MapIt instance I set up with all these boundaries loaded into it.

That MapIt instance also provides a helpful API for getting the data – see the details on the front page.

If you just want an archive with all 1.7 million KML files, you can also download that.

Please let me know in the comments if you make anything fun with these!

Related projects

Geolytix

Of course, I’m not the only person to have done this. I found out after working on this on and off in my spare time for a while that there’s a company called Geolytix who have taken a similar approach to generating postcode sector boundaries, but who used real-world features (e.g. roads, railways) close to the inferred boundaries and adjusted the boundaries to match. There’s more about that in this blog post of theirs:

Postal Sector Boundaries by Geolytix

They’ve released those boundaries (down to the postcode sector level) as open data as a snapshot in 2012, but are charging for updates, as explained there.

The results look rather good! Personally, I’m more interested in the fine grained postcode boundaries (below the postcode sector level), which aren’t included in that output, but it’s nice to see this approach being taken a big step further than I have.

OS Code-Point Polygons

The Ordnance Survey, every civic tech developer’s favourite vampire squid, do sell postcode polygons that are as presumably as good as you can get. (It sounds from the Geolytix blog post above that they are generated as the Voronoi diagram of Address-Base points, which is what I’d ideally like to do myself – i.e. just run this script on all the addresses in Address-Base.) You can see that here:

https://www.ordnancesurvey.co.uk/business-and-government/products/code-point-with-polygons.html

Naturally, because it’s the Ordnance Survey, this product is also very expensive and has crappy licensing terms.

What’s next for this project?

I’m not sure if I’ll do any more on this, since it’s been much more of a time sink than I’d hoped already, but if I were to take it further then things I have in mind are:

  • Get a better coastline polygon for the UK. The OSM administrative boundary for the UK was very convenient to use, but because it extends out into the sea some distance, it makes areas around the coast bulge out and that means that you can’t really compare the areas of postcode shapes, which is one of the things I was interested in. You could create a better polygon as a union of some of the shapes in Boundary-Line, for example.
  • Adding other sources of postcode points, e.g. OpenAddresses – although the project is in hibernation, I hoped I’d be able to use the addresses they’d already collected, but the bulk downloads don’t seem to include coordinates. I might be missing something obvious, but I haven’t heard back from emailing them about that.
  • It would be nice to make a Leaflet-based viewer for scrolling around the country and exploring the data. (For viewing multiple areas I’ve been loading them into QGIS, but it would be nice to have a web-based alternative.)

Source code and recreating the data

Source code

You can clone the source for this project from:

https://github.com/mhl/postcodes-mapit

If you want to generate this data yourself you can follow the instructions in the README.