Display Big Data With Woosmap

10 min - Author: Marine

.rb-lazy { display: none; }

A key task when designing our map technology stack was to be able to render a lot of location places while maintaining acceptable response times and keeping a smooth navigation. This is a big challenge as you have to manage all parts of the stack, from the geographic data storage to the client side rendering.

Woosmap allows you to display hundreds of thousands locations whatever the browsing interface, mobile or not. To address this issue, we have integrated a map “multi-scale” rendering technology based on the tiling method.

The purpose of this article is to build a definition and explain what we’ve implemented to deal with large amount of locations while still offering a good user experience.

Tiling Method

The tiling consists in cutting “large” geographic datasets in many rectangles which could then reassembled on demand on the client side. Nowadays, this method is used in many web-mapping applications, from Google Maps to Mapbox. Below is a basic schema of a raster tile - an image tile.

.rb-lazy { display: none; }

For this specific tile, the image is pre-built on the server so there is never any wait-time for the tile to be rendered - it is simply sent immediately to the client.

Tiled Maps

There is a need for tiling when the size of a vector or a raster layer is decreasing the performances of the navigation in the map: loading time too important and navigation not smooth. Basically, this technology allows faster navigation and a generally more positive experience when searching the information you want on a map. We could find a lot of arguments in favour of implementing tiles.

Cache efficiently on the client: if you download tiles of Paris to view a map of La Defense your browser can make use of those same tiles from cache instead of downloading them again when showing neighbouring areas.
Load progressively: the center of the map will load before the outer edges, letting you pan or zoom into a particular spot even if tiles at the edges of your map view haven’t finished loading.
Simple to use: the coordinate scheme describing map tiles is simple, making it easy to implement integrating technologies on the server, web, desktop, and mobile devices.

Originally, when we talk about Tiled Maps, we refer primarily to images - rasterized tiles - pre-built on the server. In the past several years, a new data storage format called “vector tiles” has gained popularity. Below is a brief introduction to both of these tiling methods.

Raster Tiles

A Raster image is made of pixels, each of a different color, arranged to display an image. Raster tiles are simply image tiles representing the map data. Most of the web mapping technologies are raster based. Those maps consist of many map tiles ordered in pyramidal scheme. Such tiles are being loaded in maps quite fast and that is because they are most of the time already rendered on servers.

Vector Tiles

Much like raster tiles, vector tiles are simply vector representations of geographic data whereby vector features are clipped to the boundary of each tile. They are the vector data equivalent of image tiles for web mapping, applying the strengths of tiling – developed for caching, scaling and serving map imagery rapidly – to vector data.

The idea behind vector tiles is that it is more efficient to keep data styling separate from the data coordinates and attributes. The client can use a predefined set of styling rules to draw tiles of raw vector coordinates and attribute data sent by the server. This allows the restyling of data on the fly, which is another serious limitation of rasterized tiles. Vector tiles have several advantages over fully rendered image tiles:

On-Demand Styling: as vectors, tiles can be styled when requested, allowing many map styles.
Small Size: vector tiles are really small, enabling fast map loads and efficient caching.
Client Resolution: raster tiles are pre-rendered at what is assumed to be a normal screen resolution. Vector tiles are delivered to the client device so the shape rendering appears as clear as the screen resolution.

Benchmark of Vector vs Raster

Test Case

It is difficult, first of all, to benchmark performance between vector against raster tiles as the available features of each method are not similar and the JavaScript API used for rendering (Web GL or not) significantly influences the results.

Dataset of 85K location places (watch it here)
Small data attributes (e.g. id,name and address fields)
Screen Resolution of 2880px x 1800px (Macbook Pro 15” Retina).
Map centered on London
No use of Web GL

Agregated Tiles Sizes

The following diagram shows the agregated tiles sizes of vector vs raster map. As the vector data encapsulates attribute information, required for styling for example, that the raster doesn’t, this chart should therefore be taken with precaution (regarding your data attributes).

.rb-lazy { display: none; }

In our case, we are using the Google Maps DataLayer to display the vector tiles. As demonstrated in this diagram, performance degrades as markers are added. To process over 1000 markers from vector tiles make the web map unusable.

Considering the above and that we decided not to use Web GL due to browser compatibility criteria, we have developed an hybrid tiling technology that combines the two approaches, raster and vector ;-). The idea behind this is to get the most out of both worlds and to better meet the needs of users of our product.

Woosmap implementation

Multi-Scale Rendering

We’ve implemented a combination of raster tiles for the top level scales, and vector tiles at the bottom level scales. This way, we benefit from having a fast web map at every scale while preserving a smooth navigation and a clear rendering for higher client resolution.

.rb-lazy { display: none; }

To switch from one to the other, we define a zoom level breakpoint. As the tiled vector is rendered on the client, it’s quite easy to offer a good user experience but for the top level, the raster rendering, that is “une autre paire de manches”, as the French expression goes!

Interactivity

To offer a fast web mapping experience you should pay attention to the essential interactive aspect. On a typical map you can have thousands of features which all need to be usable on a variety of devices. The UTFGrid specification defines a way to transport interactive data to a map interface, like a tooltip, so that it loads progressively and performs well across legacy browsers and modern mobile devices.

On the following jsFiddle Sample, you can get information (id on hover, name on click) on each dots despite the use of raster images. The production of these features was made possible thanks to the support of UTF Grid tiles.

This technology has been developed by Mapbox and is open sourced. You can read the specification on their github.

Woosmap Technology Stack

We’ve implemented an architecture based on open standards, like Mapnik and UTF Grid, and proprietary Woosmap components. Below is the corresponding schema of our tiling architecture.

.rb-lazy { display: none; }

User Interface

Users interact with your mapping application primarily through a JavaScript library that listens to user events, requests tiles from the map server, assembles tiles in the viewport, and draws additional elements on the map, such as popups, markers, and vector shapes. We provide a small JavaScript API to display your data over Google Maps and to implement essential features, like search by address and get directions from user location. For example, the TiledView class enables the use of the raster tiles (see official doc). Our WebApp natively supports the hybrid tiling system.

Tile Cache on CDN

A tile cache is a server that sits between the browser and the map server. It checks to see if a requested map tile is already hanging around in a cache somewhere, where it can be served up quickly to short-circuit the call to the map server. If the map tile has not been generated, the tile cache gets it from the map server and saves it to speed up subsequent requests. In our configuration, we’re using a Content Delivery Network as it allows us to manage tile cache as well as other traffic to end users. The principle of a CDN is quite simple: dispersing all of your static content across multiple servers geographically closer to your users will make your web pages load much faster.

We’re caching the tiles only on-demand, not for pre-seeding, to keep data the most up-to-date as possible. However, our CDN instructs a visitor’s browser to cache files for two hours to prevent heavy loads due to concurrent users. During this period, the browser loads the files from its local cache, speeding up page loads.

Map Server

Basically, the map server brick takes geospatial data as input and renders graphical output. In our case, it spits out a series of map tiles, which are uniformly sized graphic (raster) and JSON (vector) files that are served to and assembled in the browser as the displayed map.

Raster Tiler

For the Raster tiling, we’ve implemented Mapnik using the python bindings (Mapnik is orignially written in C++). It’s an open source toolkit for developing mapping applications that renders beautifully, has a developer-friendly interface and offers strong performance. Furthermore, the toolkit is aimed primarily at web-based development and there is a large developers community around this map server.

Homemade Vector Tiler

For the vector tiles, we could use Mapnik as it supports natively this feature (See this implementation of Mapbox Vector Tile specification for Mapnik), but our data is lightweight (it supports only point geometry) and we don’t need all the complexity of this spec. That’s why we developed our own custom vector tiler server. For now, we’re not using a data encoding to transport the vector data from map server through to the client. This could be a great improvement so we are looking seriously on implementing the Google Porotobuf data interchange format.

Geospatial Data

All the geographic dataset we manage is stored using a spatial database which provides a geometry type and functions that operate on it. This gives us the ability to make SQL queries that include spatial predicates like “within two miles of this latitude and longitude”, that is especially interesting for our needs to bring our users to the displayed places.

Final Thoughts

Our platform is able to dynamically serve a lot of concurrent users displaying huge geographic data. The development of an hybrid tiling strategy and the use of a content delivery network make us comfortable to support heavy loads.

We are continuously working to improve the user experience and need to explore certain topics in greater detail. For instance, the Google ProtoBuf data interchange format could be adapted to our homemade tiler to increase the performance. Also, the use of a WebGL technology, like the CanvasLayer Utility for Google Maps, on the client side would be a great improvement for a smoother navigation.

If you have any questions about the content or the process described, please don’t hesitate to reach out to me through the contact page.

Useful Links

Take control of your maps, an old but great introduction to maps for developersWhy tiled maps, teachers from The Pennsylvania State UniversityMapnik, the indispensable map server toolkitMastering the interactivity of your rasterized maps with UTF Grid