A fairly common feaute of modern web sites is the proliferation of coordinate based graphics. The most ubiquitous of these are slippy maps (a technology that allows one to zoom and pan around a set of maps that "slip" into view smoothly) like Google Maps, OpenStreetMap and Leaflet. But for dealing with large data-sets at a zoomed-out level, graphical projections of land masses are often the preferred way. Enter shapefiles, geo-Json and topo-Json files, and their manipulation.

Go directly to the beautiful result, just to see!

What is the goal?

Virtually every country in the world, together with its states, provinces, districts, counties, municipalities and wards have been measured into a coordinate space by diligent geo-specialists using all sorts of survey tools. We don't have to do that. These coordinate spaces and their boundaries have been stored into files for later reuse by many different GIS programs.

A popular format for storing this hard-won information is as a shapefile. The shapefile format is a geospatial vector data format widely used by geographic software. The format describes vector features like points, lines and polygons, which in turn represent geographic entities like the ones measured above. These can be geographic (lakes and forests), political (countries and war zones) or administrative (municipalities in South Africa, for example, the subject of our discussion today).

While shapefiles contain tons of information, they are quite bulky, and quite impossible to manipulate without specialized GIS software, which we don't want to use. A sample from a shapefile looks like this!

                        
    Í  Ö    ÈP Öp»¢k3@S!åWAÀÖp»¢k3@ž¨êWAÀŁwJk3@þf¡þìWAÀ;ñA	k3@Á&zEðWAÀ7÷Ïk3@ ‡•ñWAÀœ·šnk3@ªïr8óWAÀ°ÁCwk3@ŸÙÎ÷WAÀHæÃ'k3@óà¯ûWAÀŁwJk3@óà¯ûWAÀóï…dk3@®áñmúWAÀ—RgÅþj3@íA½ûWAÀ—RgÅþj3@
    6XÿWAÀÉ Bk3@åܐOXAÀ;ñA	k3@œœi–XAÀK)Ø
    k3@Êd 1XAÀ£QÊàk3@­ŽèËXAÀÖp»¢k3@B†
    xXAÀ}î´k3@&}(
    XAÀóï…dk3@Ò#·XAÀ)+mÿj3@+OkXAÀmpÔÖúj3@JE¡fXAÀшùòj3@“Ü5Ã
    XAÀlKmZðj3@JE¡fXAÀ_1cîj3@áܱXAÀ%Ý^@öj3@nˆýXAÀ«\Îüj3@
    {“XAÀ«\Îüj3@ÑDÿÕXAÀy®5‡ùj3@¸’LÍXAÀÉLñ
    ïj3@…2¿}XAÀdŒÖkìj3@¸’LÍXAÀlKmZðj3@{R%"XAÀVY!¡ój3@õ,%XAÀûúËQòj3@¾¤1™*XAÀlKmZðj3@À«‹<,XAÀdŒÖkìj3@2à÷3.XAÀž8F6åj3@2à÷3.XAÀJgoïáj3@ójä,XAÀû‹ïŸàj3@´Ý”+XAÀö•˜¨Þj3@                        
                    

We must convert this binary encoding into something much more usable on a web page, something we can see, edit, join, and generally manipulate, like JSON (JavaScript Object Notation). So lets get started: convert a shapefile to a topoJson format.

Shapefiles for South Africa can be downloaded from several sources. I found the most comprehensive single-file download at The Humanitarian Data Exchange. Look for a .zip file containing shapefiles (I found one called 'zaf_adm_2016SADB_OCHA_SHP.zip', but it could change without notice, I'd expect). Extract the archive to a place on your computer. There will be a ton of shape and other supporting files. For the purpose of this exercise, I used

Grab the tools

We're going to be using a set of command-line tools developed by Mike Bostock which run on Node.js. So, first thing's first, get Node set up on your system. I use a Debian Linux implementation of WSL, since my main dev machine is a Windows 10 one. Why not just use the Windows Powershell command-line? I ran into errors, and went with an error-free Linux workflow instead...

We'll be using 4 different command-line tools, each adding some value in the conversion chain:

  • shp2json - converts shapefiles to geoJson.
  • ndjson-map - manipulates geoJson properties.
  • geostitch - converts straight Cartesian line segments to geodesic segments.
  • geo2topo - converts geoJSON to topoJSON, our final input.
                        
sudo npm install --global topojson shapefile topojson-client ndjson-cli topojson-simplify
                    

1 - Convert shapefiles to geoJSON

Log into your favorite linux bash, and navigate to the directory that you'd like to work from. Then execute the shp2json command:

                        
sudo shp2json -n source/zaf_admbnda_adm3_2016SADB_OCHA.shp > build/adm3_1_gj.json
                    

The output file has increased in size dramatically to 46,046KB from its shapefile parent of 16,346KB, but now looks much more readable. It is also newline-delimited:

                        
{"type":"Feature","properties":{"ADM0_PCODE":"ZA","ADM0_EN":"South Africa","ADM1_PCODE":"ZA1","ADM1_ID":"WC","ADM1_EN":"Western Cape","ADM1_TYPE":"Province","ADM2_PCODE":"ZA101","ADM2_ID":"DC1","ADM2_EN":"West Coast","ADM2_TYPE":"District Municipality","ADM3_PCODE":"ZA1011","ADM3_ID":"WC011","ADM3_EN":"Matzikama","ADM3_REF":"Matzikama","ADM3_TYPE":"Local Municipality"},"geometry":{"type":"Polygon","coordinates":[[[18.78052000039772,-30.601470010132797], ... [18.78052000039772,-30.601470010132797]]]}}
{"type":"Feature","properties":{"ADM0_PCODE":"ZA","ADM0_EN":"South Africa","ADM1_PCODE":"ZA1","ADM1_ID":"WC","ADM1_EN":"Western Cape","ADM1_TYPE":"Province","ADM2_PCODE":"ZA101","ADM2_ID":"DC1","ADM2_EN":"West Coast","ADM2_TYPE":"District Municipality","ADM3_PCODE":"ZA1012","ADM3_ID":"WC012","ADM3_EN":"Cederberg","ADM3_REF":"Cederberg","ADM3_TYPE":"Local Municipality"},"geometry":{"type":"Polygon","coordinates":[[[18.483739989409045,-31.884550004279227], ... [18.483739989409045,-31.884550004279227]]]}}
{"type":"Feature","properties":{"ADM0_PCODE":"ZA","ADM0_EN":"South Africa","ADM1_PCODE":"ZA1","ADM1_ID":"WC","ADM1_EN":"Western Cape","ADM1_TYPE":"Province","ADM2_PCODE":"ZA101","ADM2_ID":"DC1","ADM2_EN":"West Coast","ADM2_TYPE":"District Municipality","ADM3_PCODE":"ZA1013","ADM3_ID":"WC013","ADM3_EN":"Bergrivier","ADM3_REF":"Bergrivier","ADM3_TYPE":"Local Municipality"},"geometry":{"type":"Polygon","coordinates":[[[18.446639998936856,-32.46681000337889], ... [18.446639998936856,-32.46681000337889]]]}}
{"type":"Feature","properties":{"ADM0_PCODE":"ZA","ADM0_EN":"South Africa","ADM1_PCODE":"ZA1","ADM1_ID":"WC","ADM1_EN":"Western Cape","ADM1_TYPE":"Province","ADM2_PCODE":"ZA101","ADM2_ID":"DC1","ADM2_EN":"West Coast","ADM2_TYPE":"District Municipality","ADM3_PCODE":"ZA1014","ADM3_ID":"WC014","ADM3_EN":"Saldanha Bay","ADM3_REF":"Saldanha Bay","ADM3_TYPE":"Local Municipality"},"geometry":{"type":"MultiPolygon","coordinates":[[[[17.98590000368228,-33.1507499970147], ... [17.967830002252885,-32.7008099889898]]]]}}
                    

2 - Manipulate feature properties

Using the newly created geoJSON file as input, execute the ndjson-map newline-delimited command to create a new property called "id" (or anything you would like), with a value taken from an existing property. Then delete all the superflous properties, of which there are many:

                        
sudo ndjson-map 'd.id = d.properties.ADM3_ID, delete d.properties, d' < build/adm3_1_gj.json > build/adm3_2_id.json
                    

The output file has diminished somewhat in size to 141,654KB as a result of the removal of unneccessary information, and now has only an "id" property tagged onto the back of each element. All other properties have been deleted:

                        
{"type":"Feature","geometry":{"type":"Polygon","coordinates":[[[30.088217081852918,-23.17143398339158], ... [30.088217081852918,-23.17143398339158]]]},"id":"ZA9344001"}
{"type":"Feature","geometry":{"type":"Polygon","coordinates":[[[30.064168009745913,-23.14930733716386], ... [30.064168009745913,-23.14930733716386]]]},"id":"ZA9344002"}
{"type":"Feature","geometry":{"type":"Polygon","coordinates":[[[30.34665565213669,-23.05667251962103], ... [30.34665565213669,-23.05667251962103]]]},"id":"ZA9344003"}
{"type":"Feature","geometry":{"type":"Polygon","coordinates":[[[30.407106943330803,-23.088048934110986], ... [30.407106943330803,-23.088048934110986]]]},"id":"ZA9344004"}
                    

3 - Antimeridian Cutting

Ever tried laying a basketball out flat on the ground, with the inside down, and the outside up? Well, you're going to have to cut the ball somewhere, while still leaving it in one piece, and then you'll have to stretch the edges of all the bumpy bits so that they're flat. Tricky exercise...

Same applies when coordinates plotted for a spherical world are to be mapped to a flat web page. We must do some cutting, stretching and stitching. Thankfully, Mike Bostock has provided us with a pair of scissors equal to the job. It's called geostitch and it's executed like so:

                        
sudo geostitch -n < build/adm3_2_id.json > build/adm3_3_st.json
                    

The output from this command has not visually changed much, and neither has the file-size, but major changes have in fact been applied to the coordinate system, allowing for different projections later on.

4 - Convert to topoJSON

And then finally, convert our newly created geoJSON with limited properties to the final input required for mapping to a browser: topoJSON. Again, the command-line toolkit provides us with the neccessary command called, fittingly, geo2topo. Execute the command with a quantization argument -q and inform it that it's dealing with a newline-delimited file with -n.

                        
sudo geo2topo -q 1e5 -n build/adm3_3_st.json > build/adm3_4_topo.json
                    

The conversion of geoJSON to topoJSON has a huge effect on filesize: from 141,654KB to just 11,948KB, only 8.4% of the input. The JSON is also well organized, and even more readable. But where did the filesize go?

Notice how the "arcs" property is entirely removed from the "geometries" polygons, which refer to them as a series of arrays. The "arcs" property contains unique sets of coordinates that can be reused! The border of America with Canada is the same shape as the border of Canada with America, and so on. They're duplicates, and json replaces them with a single reference.

Also, the "arcs" coordinates are simple integers, not typical coordinates with large precision. They have been "quantized", a process of limiting a fairly large precision to a very small one (i.e. down to integers from large floats). These integer coordinates are then plotted against a scale that nearly represents real-life coordinates. Some loss occurs. Finally, the entire coordinate system has been simplified by reducing, by a factor, the number of points (vectors) required to describe a polygon.

The topoJson file looks something like this:

                        
{
    "type":"Topology",
    "objects":{
        "wards_3_st":{
            "type":"GeometryCollection",
            "geometries":[
                {"type":"Polygon","arcs":[[0,1,2,3,4,5]],"id":"ZA9344001"},
                {"type":"Polygon","arcs":[[6,-5,7,8,9,10]],"id":"ZA9344002"},
                {"type":"Polygon","arcs":[[11,12,13,14,15,16,17,18,19,20,21]],"id":"ZA9344003"},
                {"type":"Polygon","arcs":[[22,23,24,-18,-17,-16,25]],"id":"ZA9344004"},
                {"type":"Polygon","arcs":[[-4,26,27,28,29,30,31,32,33,34,35,-8]],"id":"ZA9344005"},
                {"type":"Polygon","arcs":[[36,37,38]],"id":"ZA9344006"},
                {"type":"Polygon","arcs":[[39,40,41,42,-39,43,44,45,46]],"id":"ZA9344007"},
                {"type":"Polygon","arcs":[[47,48,-41]],"id":"ZA9344008"},
                {"type":"Polygon","arcs":[[49,50,51,52,-48,-40,53,54,55,56]],"id":"ZA9344009"}
                . 
                . 
                . 
            ]
        }
        . 
        . 
        . 
    },
    "arcs":[[[82936,91710],[-1,-2],[0,-1],[0,-1],[2,-2],[2,-2],[1,-2],[1,-3],[1,-4],[1,-2],[0,-1],[-1,1],[-1,-1],[0,-1],[-1,-1],[-2,-6],
                [0,-2],[-1,-1],[-1,0],[-2,0],[-2,0],[-1,-1],[-1,-1],[-1,-1],[0,-1],[-1,-3],[-2,-3],[-1,-2],[-1,-2],[-3,-3],[-2,-4],[-1,-2],
                [0,-1],[0,-1],[1,-1],[2,-1],[3,-1],[2,-1],[1,-1],[1,-1],[0,-2],[0,-3],[2,-1],[0,-2],[0,-3],[1,-3],[0,-2],[-1,-1],[-1,-3],
                . 
                . 
                . 
                [2,0],[2,0],[2,0],[2,0],[2,0],[2,-2],[2,0],[1,0],[1,-1],[1,-1],[1,-1],[1,-2],[1,-1],[1,-2],[1,0],[0,-1],[1,-1],[0,-1],[0,-1],
                [0,-1],[1,-2],[0,-2],[0,-1],[0,-2],[0,-2],[0,-2],[0,-2],[0,-1],[0,-1],[1,0]]],
    "bbox":[16.451890000423248,-34.83417000334293,32.94498494489932,-22.125030057790866],
    "transform":{
        "scale":[0.0001649325987707484,0.0001270926703822245],
        "translate":[16.451890000423248,-34.83417000334293]
    }
}
                    

5 - Let's draw a map!

We've built a fine topoJSON input file, that is emminently readable, and brings a tear to every JSON-lover's eye. So bully, what now? I can wrap it in a ball, toss it from server to client, open it....

Or, I can give it to d3's superb join algorithm to split out the texty paths in the file, generate discrete svg paths, locate them in their correct coordinate space, style and attribute them, and sit back to watch a new graphical map in the making. Add this code to a <script> tag at the bottom of the HTML body (or turn it into a reusable closure). It will execute as soon as the page builder gets there, fetch your new topoJSON file, and do its magic.

                        
<script>
    d3.json("/resources/adm3_4_topo.json").then(function(mapData) {
        var cScale = d3.scaleOrdinal(d3.schemeCategory10).domain(function(d) { return d.id; });
        let vWidth = d3.select("#zamap").node().offsetWidth;
        let vHeight = 520;
        let svg = d3.select(document.getElementById("zamap"))
            .append("svg")
            .attr("width", vWidth)
            .attr("height", vHeight)
            .selectAll("path")
            .data(topojson.feature(mapData, mapData.objects.adm3_3_st).features)
            .enter()
            .append("path")
            .attr("id", function(d) { return d.id })
            .style("fill", function(d) { return cScale(d.id) })
            .style("stroke", "#fff")
            .style("stroke-width", 0.3)
            .attr("class", "locmuni")
            .attr("d", d3.geoPath()
                .projection(d3.geoNaturalEarth1()
                    .center([24.8, -28.5])
                    .translate([vWidth / 2, vHeight / 2])
                    .scale(2200)))
            .on('mouseover', function (d) {
                d3.selectAll("path").attr("opacity", 0.4)
                this.parentNode.appendChild(this);
                var vBBox = d3.select(this).node().getBBox();
                d3.select(this)
                    .attr("transform", "translate(" + -(vBBox.x + (vBBox.width/2)) + "," + -(vBBox.y + (vBBox.height/2)) + ") scale(2)")
                    .attr("opacity", 1)
                    .style("stroke", "#000");
            })
            .on("mouseout", function() {
                d3.selectAll("path").attr("opacity", 1);
                d3.select(this)
                    .attr("transform", "scale(1)")
                    .style("stroke", "#fff");
            });
        });
</script>
                    

And voila! We have a pretty map of South Africa's local municipality demarcation (the so-called Admin Level 3 demarcation). Use it for choropleth voting trends, crime statistics, population density, bubble overlays... there is no end to the fun we can have with interactive maps.

South Africa - Administrative areas 3: municipalities

What makes the map interactive? The fact that each map element, a municipality in this case, is an svg path and a first citizen of the DOM. It can be individually styled and attributed, and react to events. In this example, I have elected to highlight a municipality whenever the mouse hovers over it by doubling its size, changing its stroke color, bringing it to the front, and making background municipalities partially transparent.