The state of glacier mapping in OpenStreetMap
After my previous looks at the OpenStreetMap coastline data quality and the situation of waterbody mapping the glaciers (i.e. the permanently frozen water on the planet) are the last remaining hydrographic features i have not yet looked at.
Glaciers are an important element to map because they have an enormous influence on their environment. Beyond that they are also an important element simply because of their extent - glacier ice on earth covers a larger area than all rivers and lakes combined and contains more freshwater than those as well. At the same time glaciers are a very volatile element, especially with the widespread retreat of glaciers that is observed around the globe. But even without this particular element of change glaciers have always been quickly changing in reaction of local and global changes of the environment.
The way glaciers are mapped in Openstreetmap is actually quite simple, they are polygons tagged with natural=glacier. This tag can also be applied to nodes making the location of a glacier without specifying the geometry but this is not very common. In case of Antarctica where most of the continent is covered with ice the ice is not mapped explicitly and instead the ice free areas are. In addition there is a proposal for tags with more specific information about the glaciers.
According to taginfo there are about 21000 elements tagged natural=glacier. In addition there are about 90000 non-ice areas in Antarctica. This seems much more but note the 21000 glacier elements include several extremely large and complex multipolygons including the Greenland inland ice. The number of nodes is a much more reliable measure for the amount of data and there are about 3.1 million nodes in glacier polygons and about 3.5 million nodes in non-ice polygons in Antarctica.
When looking in more detail at this data one of the most remarkable things you see even before looking at the actual data quality is how it has been produced. Openstreetmap is a community project based on the contributions of many people around the world so you expect the data to come from a large number of different users. When I look at the nodes to take into account all modifications to actual glacier geometry there are only about 570 different user accounts which have last touched any of the nodes that are part of a glacier polygon. Of these there are only 18 users which are last modifiers of more than 10000 nodes together accounting for more than 90 percent of the data. Of these 18 user accounts 10 are import accounts (or have been used for importing glacier data, mostly from CANVEC).
This relation (less than 5 percent of the active users contributing more than 90 percent of the data and most of this being imports) is in itself not unusual, it is fully in line with the global statistics. However the total number of only 570 user accounts actively contributing to mapping glaciers is, considering the importance of glaciers as pointed out above. Each of the eight high volume contributors (and of course also each of the ten large import accounts) has focus on particular regions and these regions will be visible in the analysis of the data quality and distribution below.
Like in case of the coastlines and waterbodies it is difficult to assess the completeness of the data without a reference. Fortunately there are a number of other open data sets available for glaciers that can be used for comparison. Most notably GLIMS aims to collect glacier data with global coverage. In addition there is extensive data from the Canadian and US official mapping agencies for North America (CANVEC and NHD). Using these three alternative sources combined the following image shows where OSM data exists and where it is clearly missing. Note NHD data is only used for Alaska since for the continuous US GLIMS already includes extensive USGS data.
Areas with glacier data in OSM are shown in white, areas with no glaciers in OSM but in at least one of the other data sets area in red. What this map does not show is where any of the data sources contains glaciers which do not exist in reality, either because the data is outdated or because there are areas wrongly mapped as glacier. These exist particularly in OSM data since people mapping purely from aerial images frequently see glaciers where there are none or deliberately use the tag for other features - more on that later.
At the first glance this map looks quite good for Openstreetmap - there is a lot of white in the map, especially due to Greenland and Antarctica. At a close look however it becomes clear that the Openstreetmap data has significant gaps in coverage to say the least. All the large red areas are places where OSM data is incomplete. Areas where OSM is comparitively good are New Zealand (data there is import based) Greenland, South America, Alps and the rest of southern Europe, Caucasus and in a way continental Russia (the last more in an 'among the blind the single-eyed is king' way). Interestingly these are also the main areas of activity of the eight most active contributors in glacier mapping mentioned above. In particular Greenland and South America are almost completely the work of just two people.
Fully missing in all the data sets and therefore not visible in the above map are most of the subantarctic islands and many of the Russian glaciers.
The question of data quality essentially boils down to three elements: How detailed is the data, how up-to-date and how reliable. None of these questions can be answered for any of the data sets as a whole. In case of GLIMS for example much of the data of the Himalaya/Tibet region and the contiguous US is very old (from the 1960s/1970s) while other areas are quite up-to-date. Below you can see a map showing the year of last modification of the nodes in the OSM glacier data. This does not tell much about the actual age of the data though. Most of the glacier mapping in Openstreetmap is done from aerial/satellite images and these can already be quite old at the time the glaciers are mapped.
What can be seen however is that much of the glacier data in OSM has been created or modified in the last two years. Where the data does not come from imports the original source is usually not older than from 1999 since that's the age of the oldest satellite images from the Landsat mosaic widely used as a basis. The basis of the Greenland mapping for example is mostly the 1999-2003 Landsat imagery. The Alps are in many parts based on fairly new aerial images but in parts there are also imports with a much older basis. South America is mostly fairly new.
To evaluate the level of detail i measured the average node distance of the polygons as i did in case of the coastlines. Analyzing the angles as well would not make much sense for the glaciers since sharp corners can exist on the outlines of glaciers naturally on any scale and therefore are no indication for the data quality. The node distance analysis can be seen in the following map. It should be kept in mind that the node distance is only a hint and not a reliable measure for the level of detail in the data.
This in a way confirms what i have written above, some areas like the Alps and most of South America are mapped in high detail. Same for the import based coverage in New Zealand and Canada (the latter very incomplete tough). Antarctica where data is import based as well varies but in most parts is quite good as well, especially the Antarctic Peninsula. You can also see the detailed mapping i did on Franz Josef Land.
Regions with rather limited detail on the other hand are Northern Europe, the whole Himalaya-Tibet region (were data is very incomplete in addition), Alaska and Patagonia. Greenland is comparatively detailed in the southern parts despite the fairly large average node distance.
As already hinted above the OSM data is also not always reliable. These data errors come in five forms:
- Glaciers fully missing because they have not been mapped. This usually applies to larger areas as discussed above but sometimes also individual glaciers are missing in an area otherwise covered (which is much more difficult to find).
- Glaciers that are covered with debris or otherwise obscured (like due to clouds or masking in aerial/satellite images) are only partly mapped indicating a smaller glacier area than in reality. This happens frequently in OSM but also occurs in other remotely sensed glacier data like in GLIMS.
- Glaciers mapped where there are none in reality because a mapper used the glacier tag for something else (i.e. deliberate faulty mapping, mostly to get a certain result in the rendered map).
- Areas wrongly mapped as glaciers because a mapper incorrectly identified an area (most frequently snow) on aerial/satellite images as glacier (i.e. misinterpretation of source data).
All of these cases occur frequently, (1) especially in remote areas of course - where glaciers often can be found of course, occurence of (2) and (4) depend on local geographic conditions of course. A few examples:
- Missing glaciers in a larger area in the eastern Himalaya
- Incomplete mapping with some glaciers mapped and others missing in the Caucasus
- Incorrect glacier outline due to debris cover in Greenland
- Misused glacier tagging in Indonesia
- Incorrect mapping of snow cover as glaciers in eastern Turkey
- Significant overestimation of the glaciated area due to snow cover in Patagonia
So in summary there are a few areas where the OpenStreetMap glacier data is of good quality exceeding other freely available data sources. It is however en large seriously incomplete and therefore in most cases insufficient for being used as a basis for maps that display glaciers.
One obvious question is of course if data imports in OpenStreetMap from the other free sources would be useful. I would answer this only partially with a yes. Since as mentioned in the beginning glaciers are changing quite rapidly a lot of data in those data sets is quite outdated. It can of course be argued that outdated data is better than no data but as far as OpenStreetMap is concerned the more important question is if such data would be supportive for the community mapping project itself, specifically if having old and outdated data in the database is more encouraging for manual updates based on up-to-date information than having no data. In addition the GLIMS data requires significant processing in many areas before it could be imported into OSM.
Christoph Hormann, November 2013