In October 2020 there was big news for postcode geography geeks like me: the ONS added a postcode column to their NSUL dataset, which gave us for the first time a coordinate and postcode for each property in Great Britain as open data. I’d previously tried to generate polygons for the boundary of each postcode based on the single coordinate per postcode found in ONSPD, but the results were very approximate. With this new data I could use a very similar technique, but this time with coordinates for all the postal addresses in each postcode to get much more accurate postcode unit polygons.
This turned out to be quite a lot of work, but the results are pretty satisfying, I think. The two biggest problems with the end results are:
- NSUL only covers Great Britain; I don’t know of any similar dataset for Northern Ireland.
- NSUL doesn’t just contain UPRNs that represent deliverable postal addresses; it also has coordinates and postcodes for things like whole streets, pollution monitoring devices, verges and also historic UPRNs and so on. Since I can’t filter these out with any open data that I know of, these points distort some of the boundaries. See the section “How does this compare to ‘Code-Point with Polygons’?” below.
However, as far as I know these are the best open data boundaries of GB postcode units available for free, and they could still be improved by some work on excluding UPRNs that aren’t current postal addresses. (See “What could be improved?” below for some suggestions.)
What is this dataset useful for?
I suppose that the main use-case for these boundaries is that you have some data where values are associated with postcodes, and you want to visualize them on a map.
For me personally, it also just sates my curiosity about the notional shapes of postcodes, e.g. where they just run down one side of a street, how new developments break up existing patterns, etc. etc.
Get the data
You can download GeoJSON for the postcode unit, sector, district and area polygons from here:
Download the data as GeoJSON files (approx. 1GB)
In case you just want to explore the data a bit, you can search for a postcode or location in this MapIt instance and click through to see the postcode unit, sector district or area boundaries. Please don’t rely on this MapIt as an API, since it costs a not insignificant amount to run, and unless there’s a lot of interest in this I’m likely to take it down at some point. If you are interested in using this on an ongoing basis, please get in touch with me.
I’m publishing this partly because it’s been a fun project and I find the results very pleasing, and partly to gauge interest in this dataset. It’s worth making clear that I’m not committing to making regular releases as NSUL is updated. It’s rather expensive in EC2 time to regenerate this data, and I’m not sure that an individual with very little free time is the right person to be doing that anyway.
If you would be interested in getting regular updates to this data (or working on improving it) or you represent an institution that might have the resources and longevity to take over maintenance of this, please do get in touch with me.
- This data is only for Great Britain, not Northern Ireland. As far as I know there is no equivalent dataset to NSUL to enable me to generate postcode polygons of similar accuracy to the rest of this dataset for Northern Ireland. If you know of one, please let me know! The crucial thing is to have the coordinates and postcode for each UPRN in Northern Ireland. NSUL provides this for Great Britain but not for Northern Ireland.
- You’ll find that some of the postcode polygons overlap. There
are broadly two cases where this happens:
- Genuine “vertical streets”, like the DVLA building in
Swansea, where there are multiple postcodes for a single
- Some UPRNs in the NSUL dataset represent historic addresses, where a building previously had a different postcode. Since I can’t filter out these historic UPRNs with open data, you get cases like the one shown below, where SW2 1EP and SW2 1BX overlap because there’s a historic UPRN for a building with postcode SW2 1BX that was knocked down and replaced by a block of flats, each of which has the postcode SW2 1EP.
- Genuine “vertical streets”, like the DVLA building in
- Because of memory constraints, I couldn’t calculate the Voronoi diagram for every point in NSUL in one pass, so I calculated it for each region of Great Britain separately. This means that the postcode units that cross from one region to another (e.g. TD9 0TU or CH4 0DF) aren’t as neat as they might be.
- This data is based on the October 2020 release of NSUL, which is fairly old now.
How does this compare to ‘Code-Point with Polygons’?
This dataset is similar to the Ordnance Survey’s Code-Point with Polygons product, which is very much not Open Data. (I’m not even sure how much it costs because their pricing is so opaque.)
Going by the description of Code-Point with Polygons on the Ordnance Survey’s website, it sounds like it’s generated with a very similar method to the one I’m using, but unlike me they are able to filter out all the points that don’t represent current deliverable postal addresses, which means that their polygons are higher quality.
We can look at a small example of how the differences between these datasets arise, since the user guide for Code-Point with Polygons contains some images showing the postal addresses used as the basis of the data. Here’s a small part of Portsmouth from that document:
Here’s the same region from the data I’ve generated from NSUL, with a few of the extraneous points labelled:
You can see by comparing the points in these images how many additional ones there are in the NSUL dataset compared to the one that the OS are using. I’ve labelled a few of the extra points in the latter image for the sake of showing what kind of things they represent. (I discovered what these are by looking them up in FindMyAddress, but that site’s license agreement is so restrictive that it’s not useful for anything more than an occasional lookup.)
Hopefully this example makes clear the kind of differences you should expect to see between those datasets. In addition to the extra points I can’t filter out, it also seems that the OS polygons have been snapped to the main road. This is something that one theoretically could do with open data as well. (See “What could be improved?” below.)
So, in summary, if you want or need the best quality postcode polygons for Great Britain, you should buy Code-Point with Polygons. I suspect that for a fair proportion of use cases, though, the data I’ve generated might be good enough and is free to use under the Open Government License.
What could be improved?
It’s rather unlikely that I’ll have time to do any of these any time soon, but here are some ideas for how to improve this data:
- Robin Houston pointed out to me that Output Area boundaries are open data, and each Output Area is a union of a few postcode units. (The mapping from postcode unit to Output Area is readily available.) This means that you could filter out some proportion of irrelevant UPRNs by looking for coordinates that fall outside the Output Area that they should lie in and adding them to an exclude-list. I think this would be worth doing at some point, although you would have to be careful how you did it – e.g. there are plenty of postcodes that didn’t exist at the time of the last census, so you don’t want to accidentally exclude all of the points from those postcodes! Possibly the best time to do this would be when the Output Areas for the 2021 census are published, which is likely to be in April / May 2020.
- One could pretty easily set up a tool to crowdsource finding points that are clearly outside a postcode. In many cases it is very obvious just by overlaying the positions of UPRNs over OpenStreetMap tiles that some are in the middle of a road, or miles away from the rest of the points in that postcode. You’d have to make it very clear to contributors that they have to do this by local knowledge or common sense, rather than referring to any AddressBase-derived data set, but I think something like this could work pretty well.
- As I mentioned elsewhere in this post, lots of postcode unit boundaries divide one side of a road from another or appear to run down a river, say, so I can conceive that you could write some code to try to adjust the boundaries of these polygons to features like roads in OpenStreetMap, so long as moving that boundary wouldn’t include points not previously in the postcode boundary. I’m not sure where I’d start with this, though.
One final point: I suspect this won’t be directly helpful, but it’s worth following this FOI request from Robert Whittaker, which asks for a list of historic and parent UPRNs; these are a big chunk of the UPRNs I’d want to filter out of NSUL for this project. Even if the request is eventually successful, it wouldn’t be usable in a project like this, since my aim is to create open data, and even if a list of historic and parent UPRNs were released under FOI they wouldn’t be usable as open data. However it’s interesting to read the refusals as a clear demonstration of how obviously broken the Ordnance Survey’s incentives and obligations are.
- When you run out of inodes on your filesystem, that’s a clear sign that you should be using a database instead of the filesystem.
- Python’s multiprocessing package, and the Pool class in particular, are pretty easy to use and very effective.
- As ever, PostgreSQL is just an amazing bit of software. (Also PostGIS and GeoDjango.)
- QGIS is so powerful, and so fun to use!
- AWS billing alerts are harder to set up than they should be, but it’s very important to do that for personal projects where you need a lot of CPU – accidentally leaving a powerful EC2 instance running can cost you a lot.
You may use this data under the Open Government License v3.0, and must include the following copyright notices:
- Contains OS data © Crown copyright and database right 2020
- Contains Royal Mail data © Royal Mail copyright and database right 2020
- Source: Office for National Statistics licensed under the Open Government Licence v.3.0
The code to generate this data can be found on GitHub.
You can find a list of changes in versions of this data on GitHub.
Leave a Reply