A fairly common feaute of modern web sites is the proliferation of coordinate based graphics. The most ubiquitous of these are slippy maps (a technology that allows one to zoom and pan around a set of maps that "slip" into view smoothly) like Google Maps, OpenStreetMap and Leaflet. But for dealing with large data-sets at a zoomed-out level, graphical projections of land masses are often the preferred way. Enter shapefiles, geo-Json and topo-Json files, and their manipulation.

Go directly to the beautiful result, just to see!

What is the goal?

Virtually every country in the world, together with its states, provinces, districts, counties, municipalities and wards have been measured into a coordinate space by diligent geo-specialists using all sorts of survey tools. We don't have to do that. These coordinate spaces and their boundaries have been stored into files for later reuse by many different GIS programs.

A popular format for storing this hard-won information is as a shapefile. The shapefile format is a geospatial vector data format widely used by geographic software. The format describes vector features like points, lines and polygons, which in turn represent geographic entities like the ones measured above. These can be geographic (lakes and forests), political (countries and war zones) or administrative (municipalities in South Africa, for example, the subject of our discussion today).

While shapefiles contain tons of information, they are quite bulky and encoded in a binary format that is quite impossible to manipulate without specialized GIS software, which we don't want to use.

We must convert this binary encoding into something much more usable on a web page, something we can see, edit, join, and generally manipulate, like JSON (JavaScript Object Notation). So lets get started: convert a shapefile to a topoJson format.

Shapefiles for South Africa can be downloaded from several sources. I found the most comprehensive single-file download at The Humanitarian Data Exchange. Look for a .zip file containing shapefiles (I found one called 'zaf_adm_2016SADB_OCHA_SHP.zip', but it could change without notice, I'd expect). Extract the archive to a place on your computer. There will be a ton of shape and other supporting files. For the purpose of this exercise, I used "zaf_admbnda_adm2_2016SADB_OCHA.shp", South Africa's political districts (Admin level 2).

Grab the tools

We're going to be using a set of command-line tools developed by Mike Bostock which run on Node.js. So, first thing's first, get Node set up on your system. I use a Debian Linux implementation of WSL, since my main dev machine is a Windows 10 one. Why not just use the Windows Powershell command-line? I ran into errors, and went with an error-free Linux workflow instead...

We'll be using 4 different command-line tools, each adding some value in the conversion chain:

  • shp2json - converts shapefiles to geoJson.
  • ndjson-map - manipulates geoJson properties.
  • geostitch - converts straight Cartesian line segments to geodesic segments.
  • geo2topo - converts geoJSON to topoJSON, our final input.

Install them onto your Linux subsystem all at once with NPM:

                        
sudo npm install --global topojson shapefile topojson-client ndjson-cli topojson-simplify
                    

1 - Convert shapefiles to geoJSON

Log into your favorite linux bash, and navigate to the directory that you'd like to work from. Then execute the shp2json command:

                        
                            sudo shp2json -n source/zaf_admbnda_adm2_2016SADB_OCHA.shp > build/districts.geo.json
                    

The output file has increased in size dramatically to 29,632KB from its shapefile parent of 11,821KB, but now looks much more readable. It is also newline-delimited:

                        
                            {"type":"Feature","properties":{"ADM0_PCODE":"ZA","ADM0_EN":"South Africa","ADM1_PCODE":"ZA1","ADM1_ID":"WC","ADM1_EN":"Western Cape","ADM1_TYPE":"Province","ADM2_PCODE":"ZA101","ADM2_ID":"DC1","ADM2_EN":"West Coast","ADM2_TYPE":"District Municipality"},"geometry":{"type":"MultiPolygon","coordinates":[[[[18.0742700022589,-33.4126199998131], ...
                            {"type":"Feature","properties":{"ADM0_PCODE":"ZA","ADM0_EN":"South Africa","ADM1_PCODE":"ZA1","ADM1_ID":"WC","ADM1_EN":"Western Cape","ADM1_TYPE":"Province","ADM2_PCODE":"ZA102","ADM2_ID":"DC2","ADM2_EN":"Cape Winelands","ADM2_TYPE":"District Municipality"},"geometry":{"type":"Polygon","coordinates":[[[20.16649000261153,-32.22706999768216], ...
                            {"type":"Feature","properties":{"ADM0_PCODE":"ZA","ADM0_EN":"South Africa","ADM1_PCODE":"ZA1","ADM1_ID":"WC","ADM1_EN":"Western Cape","ADM1_TYPE":"Province","ADM2_PCODE":"ZA103","ADM2_ID":"DC3","ADM2_EN":"Overberg","ADM2_TYPE":"District Municipality"},"geometry":{"type":"MultiPolygon","coordinates":[[[[19.418070002328783,-34.68667999798353], ...
                            {"type":"Feature","properties":{"ADM0_PCODE":"ZA","ADM0_EN":"South Africa","ADM1_PCODE":"ZA1","ADM1_ID":"WC","ADM1_EN":"Western Cape","ADM1_TYPE":"Province","ADM2_PCODE":"ZA104","ADM2_ID":"DC4","ADM2_EN":"Eden","ADM2_TYPE":"District Municipality"},"geometry":{"type":"Polygon","coordinates":[[[22.391378001093923,-33.36413799707089], ...
                    

2 - Manipulate feature properties

Using the newly created geoJSON file as input, execute the ndjson-map newline-delimited command to create a new property called "id" (or anything you would like), with a value taken from an existing property. Then delete all the superflous properties, of which there are many:

                        
sudo ndjson-map 'd.id={ cnt: d.properties.ADM0_PCODE, prv: d.properties.ADM1_ID, dis: d.properties.ADM2_ID }, delete d.properties, d' < build/districts.geo.json > build/districts.geo.map.json
                    

The output file has diminished only slightly in size to 29,622KB as a result of the removal of unneccessary information, and now has only an "id" property tagged onto the back of each element. All other properties have been deleted. Otherwise it's identical to its input file.

3 - Antimeridian Cutting

Ever tried laying a basketball out flat on the ground, with the inside down, and the outside up? Well, you're going to have to cut the ball somewhere, while still leaving it in one piece, and then you'll have to stretch the edges of all the bumpy bits so that they're flat. Tricky exercise...

Same applies when coordinates plotted for a spherical world are to be mapped to a flat web page. We must do some cutting, stretching and stitching. Thankfully, Mike Bostock has provided us with a pair of scissors equal to the job. It's called geostitch and it's executed like so:

                        
sudo geostitch -n < build/districts.geo.map.json > build/districts.geo.prj.json
                    

The output from this command has not visually changed much, and neither has the file-size, but major changes have in fact been applied to the coordinate system, allowing for different projections later on.

4 - Convert to topoJSON

And then finally, convert our newly created geoJSON with limited properties to the final input required for mapping to a browser: topoJSON. Again, the command-line toolkit provides us with the neccessary command called, fittingly, geo2topo. Execute the command with a quantization argument -q and inform it that it's dealing with a newline-delimited file with -n.

                        
sudo geo2topo -q 1e5 -n districts=build/districts.geo.prj.json > build/districts.topo.json
                    

The conversion of geoJSON to topoJSON has a huge effect on filesize: from 29,622KB to just 2,935KB, only 10% of the input. The JSON is also well organized, and even more readable. But where did the filesize go?

Notice how the "arcs" property is entirely removed from the "geometries" polygons, which refer to them as a series of arrays. The "arcs" property contains unique sets of coordinates that can be reused! The border of America with Canada is the same shape as the border of Canada with America, and so on. They're duplicates, and json replaces them with a single reference.

Also, the "arcs" coordinates are simple integers, not typical coordinates with large precision. They have been "quantized", a process of limiting a fairly large precision to a very small one (i.e. down to integers from large floats). These integer coordinates are then plotted against a scale that nearly represents real-life coordinates. Some loss occurs. Finally, the entire coordinate system has been simplified by reducing, by a factor, the number of points (vectors) required to describe a polygon.

The topoJson file looks something like this:

                        
{
    "type": "Topology",
    "objects": {
        "districts": {
            "type": "GeometryCollection",
            "geometries": [
                {"type": "MultiPolygon","arcs": [[[0]],[[1]],[[2]],[[3]],[[4]],[[5]],[[6]],[[7, 8, 9, 10]]],"id": {"cnt": "ZA","prv": "WC","dis": "DC1"}},
                {"type": "Polygon","arcs": [[11, 12, 13, 14, 15, 16, 17, 18, -8, 19]],"id": {"cnt": "ZA","prv": "WC","dis": "DC2"}},
                {"type": "MultiPolygon","arcs": [[[20]],[[21]],[[22, 23, 24, 25, -16, 14, -14, 26]]],	"id": {"cnt": "ZA","prv": "WC","dis": "DC3"}},
                {"type": "Polygon","arcs": [[27, 28, -24, 22, -27, -13, 29]],"id": {"cnt": "ZA","prv": "WC","dis": "DC4"}},
                { ... }]
        }
    },
    "arcs": [[[9837, 11185],[1, 0],[1, -1], ... [2, 3],[1, 3],[2, 2]]],
    "bbox": [16.451890000423244, -34.83417000334296, 32.94498494489935, -22.125030057790845],
    "transform": {"scale": [0.00016493259877074882, 0.000127092670382225],"translate": [16.451890000423244, -34.83417000334296]}
}
                    

5 - Let's draw a map!

We've built a fine topoJSON input file, that is emminently readable, and brings a tear to every JSON-lover's eye. So bully, what now? I can wrap it in a ball, toss it from server to client, open it....

Or, I can give it to d3's superb join algorithm to split out the texty paths in the file, generate discrete svg paths, locate them in their correct coordinate space, style and attribute them, and sit back to watch a new graphical map in the making. Add this code to a <script> tag at the bottom of the HTML body (or turn it into a reusable closure). It will execute as soon as the page builder gets there, fetch your new topoJSON file, and do its magic.

                        
<script>
    "use strict";
    d3.json("/resources/districts.topo.json").then(function(mapData) {
        let cScale = d3.scaleOrdinal(d3.schemeCategory10).domain(function(d) { return d.id; });
        let zafScale = d3.scaleLinear().domain([0,2000]).range([0,8228]);
        let vWidth = d3.select("#zamap").node().offsetWidth;
        let vHeight = vWidth * 0.93;
        let vGeopathEngine = d3.geoPath()
            .projection(d3.geoNaturalEarth1()
                .center([24.8, -28.5])
                .translate([vWidth / 2, vHeight / 2])
                .scale(zafScale(vWidth)));
        let svg = d3.select(document.getElementById("zamap"))
            .append("svg")
            .attr("width", vWidth)
            .attr("height", vHeight);
        svg.append("g")
            .attr("class", "country")
            .selectAll("path")
            .data(topojson.feature(mapData, mapData.objects.districts).features)
            .enter()
            .append("path")
            .attr("id", function(d) { return d.id })
            .style("fill", function(d) { return cScale(d.id.prv) })
            .style("stroke", "#fff")
            .style("stroke-width", 0.3)
            .attr("class", "locmuni")
            .attr("d", vGeopathEngine)
            .each(function(d) {return d=d.id;})
            .on('mouseover', function (d) {
                d3.selectAll("path").attr("opacity", 0.4)
                this.parentNode.appendChild(this);
                var vBBox = d3.select(this).node().getBBox();
                d3.select(this)
                    //.attr("transform", "translate(" + -(vBBox.x + (vBBox.width/2)) + "," + -(vBBox.y + (vBBox.height/2)) + ") scale(2)")
                    .attr("opacity", 1)
                    .style("stroke", "#000");
            })
            .on("mouseout", function() {
                d3.selectAll("path")
                    .attr("opacity", 1);
                d3.select(this)
                    //.attr("transform", "scale(1)")
                    .style("stroke", "#fff");
            })
            .each(function(d) {
                return d3.select(this).datum(d.id);
            });
            mapData = null;
        });
</script>
                    

And voila! We have a pretty map of South Africa's political districts demarcation (the so-called Admin Level 2 demarcation). Use it for choropleth voting trends, crime statistics, population density, bubble overlays... there is no end to the fun we can have with interactive maps.

South Africa - Administrative areas 2: Districts

What makes the map interactive? The fact that each map element, a district in this case, is an svg path and a first citizen of the DOM. It can be individually styled and attributed, and react to events. In this example, I have elected to highlight a municipality whenever the mouse hovers over it by changing its stroke color, bringing it to the front, and making background municipalities partially transparent.