Different map projections

Steve Simon

2020-03-15

install.packages(“rnaturalearthhires”, repos=“http://packages.ropensci.org”, type=“source”)

I have been drawing some maps in R and was struggling to figure out how to display things with the appropriate map projections. Here is some code where I experimented with a variety of approaches.

The earth is a three dimensional object and there is no way to display it on a two dimensional computer screen without some distortion. Your goal is to minimize that distortion.

First things first. There are a variety of map options, and they have a variety of storage methods.

One of the tutorials I viewed had mentioned Natural Earth, an open source repository. You can get easy access to this information with two R packages: rnaturalearth and rnaturalearthdata.

What these give you is the ability to draw outlines of any and every country. You also get the outlines of states/provinces/districts in many countries. There’s a lot more, but I want to just show something simple now to get you started.

The traditional format for storing map information is shapefiles. The shapefile format was defined the 1990s and is a rather intimidating collection of multiple files. You can find a nice summary on the Wikipedia page on shapefiles or at a Library of Congress’s page on archival formats.

A newer format, simple features, has greater flexibility and a simpler structure. It’s still not simple, and I have been able to delve into the inner workings to get important details extracted.

There is a package in R, sf, that allows you to easily manipulate geographic data stored in a simple features format. This package has a nice series of vignettes, starting here.

Please note that the rnaturalearthhires package is needed below, but it cannot be downloaded directly (at least on my system). Instead you have to type in the command

install.packages("rnaturalearthhires", repos="http://packages.ropensci.org", type="source")

So to get everthing to work, even for a simple example, you need a whole bunch of packages. Here are the ones I use in this example.

suppressMessages(suppressWarnings(library(lubridate)))
suppressMessages(suppressWarnings(library(mapproj)))
suppressMessages(suppressWarnings(library(rgeos)))
suppressMessages(suppressWarnings(library(rjson)))
suppressMessages(suppressWarnings(library(rnaturalearth)))
suppressMessages(suppressWarnings(library(rnaturalearthdata)))
suppressMessages(suppressWarnings(library(sf)))
suppressMessages(suppressWarnings(library(stringr)))
suppressMessages(suppressWarnings(library(tidyverse)))

I’m going to read in the United States file using the ne_states function. The country=“United States of America” parameter excludes other countries that have states, provinces, or districts. I further want to remove Alaska and Hawaii from the file. So sorry to our 49th and 50th states, but the graphs are much more manageable for these examples when I focus just on the 48 states. The str function tells you a bit of information about how the data is organized within the states data frame.

ne_states(country="United States of America", returnclass = "sf") %>%
  filter(postal == "CO") -> co
str(co)
## Classes 'sf' and 'data.frame':   1 obs. of  84 variables:
##  $ featurecla: chr "Admin-1 scale rank"
##  $ scalerank : int 2
##  $ adm1_code : chr "USA-3522"
##  $ diss_me   : int 3522
##  $ iso_3166_2: chr "US-CO"
##  $ wikipedia : chr "http://en.wikipedia.org/wiki/Colorado"
##  $ iso_a2    : chr "US"
##  $ adm0_sr   : int 1
##  $ name      : chr "Colorado"
##  $ name_alt  : chr "CO|Colo."
##  $ name_local: chr NA
##  $ type      : chr "State"
##  $ type_en   : chr "State"
##  $ code_local: chr "US08"
##  $ code_hasc : chr "US.CO"
##  $ note      : chr NA
##  $ hasc_maybe: chr NA
##  $ region    : chr "West"
##  $ region_cod: chr NA
##  $ provnum_ne: int 0
##  $ gadm_level: int 1
##  $ check_me  : int 20
##  $ datarank  : int 1
##  $ abbrev    : chr "Colo."
##  $ postal    : chr "CO"
##  $ area_sqkm : int 0
##  $ sameascity: int NA
##  $ labelrank : int 0
##  $ name_len  : int 8
##  $ mapcolor9 : int 1
##  $ mapcolor13: int 1
##  $ fips      : chr "US08"
##  $ fips_alt  : chr NA
##  $ woe_id    : int 2347564
##  $ woe_label : chr "Colorado, US, United States"
##  $ woe_name  : chr "Colorado"
##  $ latitude  : num 39
##  $ longitude : num -106
##  $ sov_a3    : chr "US1"
##  $ adm0_a3   : chr "USA"
##  $ adm0_label: int 2
##  $ admin     : chr "United States of America"
##  $ geonunit  : chr "United States of America"
##  $ gu_a3     : chr "USA"
##  $ gn_id     : int 5417618
##  $ gn_name   : chr "Colorado"
##  $ gns_id    : int -1
##  $ gns_name  : chr NA
##  $ gn_level  : int 1
##  $ gn_region : chr NA
##  $ gn_a1_code: chr "US.CO"
##  $ region_sub: chr "Mountain"
##  $ sub_code  : chr NA
##  $ gns_level : int -1
##  $ gns_lang  : chr NA
##  $ gns_adm1  : chr NA
##  $ gns_region: chr NA
##  $ min_label : num 3.5
##  $ max_label : num 7.5
##  $ min_zoom  : num 2
##  $ wikidataid: chr "Q1261"
##  $ name_ar   : chr NA
##  $ name_bn   : chr NA
##  $ name_de   : chr "Colorado"
##  $ name_en   : chr "Colorado"
##  $ name_es   : chr "Colorado"
##  $ name_fr   : chr "Colorado"
##  $ name_el   : chr NA
##  $ name_hi   : chr NA
##  $ name_hu   : chr "Colorado"
##  $ name_id   : chr "Colorado"
##  $ name_it   : chr "Colorado"
##  $ name_ja   : chr NA
##  $ name_ko   : chr NA
##  $ name_nl   : chr "Colorado"
##  $ name_pl   : chr "Kolorado"
##  $ name_pt   : chr "Colorado"
##  $ name_ru   : chr NA
##  $ name_sv   : chr "Colorado"
##  $ name_tr   : chr "Colorado"
##  $ name_vi   : chr "Colorado"
##  $ name_zh   : chr NA
##  $ ne_id     : chr "1159315343"
##  $ geometry  :sfc_MULTIPOLYGON of length 1; first list element: List of 1
##   ..$ :List of 1
##   .. ..$ : num [1:170, 1:2] -109 -109 -109 -109 -108 ...
##   ..- attr(*, "class")= chr [1:3] "XY" "MULTIPOLYGON" "sfg"
##  - attr(*, "sf_column")= chr "geometry"
##  - attr(*, "agr")= Factor w/ 3 levels "constant","aggregate",..: NA NA NA NA NA NA NA NA NA NA ...
##   ..- attr(*, "names")= chr [1:83] "featurecla" "scalerank" "adm1_code" "diss_me" ...
names(co)
##  [1] "featurecla" "scalerank"  "adm1_code"  "diss_me"    "iso_3166_2"
##  [6] "wikipedia"  "iso_a2"     "adm0_sr"    "name"       "name_alt"  
## [11] "name_local" "type"       "type_en"    "code_local" "code_hasc" 
## [16] "note"       "hasc_maybe" "region"     "region_cod" "provnum_ne"
## [21] "gadm_level" "check_me"   "datarank"   "abbrev"     "postal"    
## [26] "area_sqkm"  "sameascity" "labelrank"  "name_len"   "mapcolor9" 
## [31] "mapcolor13" "fips"       "fips_alt"   "woe_id"     "woe_label" 
## [36] "woe_name"   "latitude"   "longitude"  "sov_a3"     "adm0_a3"   
## [41] "adm0_label" "admin"      "geonunit"   "gu_a3"      "gn_id"     
## [46] "gn_name"    "gns_id"     "gns_name"   "gn_level"   "gn_region" 
## [51] "gn_a1_code" "region_sub" "sub_code"   "gns_level"  "gns_lang"  
## [56] "gns_adm1"   "gns_region" "min_label"  "max_label"  "min_zoom"  
## [61] "wikidataid" "name_ar"    "name_bn"    "name_de"    "name_en"   
## [66] "name_es"    "name_fr"    "name_el"    "name_hi"    "name_hu"   
## [71] "name_id"    "name_it"    "name_ja"    "name_ko"    "name_nl"   
## [76] "name_pl"    "name_pt"    "name_ru"    "name_sv"    "name_tr"   
## [81] "name_vi"    "name_zh"    "ne_id"      "geometry"

There is a lot of information in the list above and more than half of it is just giving different names to each state. The last column in this data frame is geometry and this is going to be the last column of any data stored in the simple features format. It ends up being a very long string with latitude and longitude coordinates for one or more polygons. You need multiple polygons for a disconnected state like Michigan, or a state with a lot of islands, like Florida.

Similarly, the ne_countries function gets information about the boundaries of every country in the world.

countries <- ne_countries(scale = "medium", returnclass = "sf")
str(countries)
## Classes 'sf' and 'data.frame':   241 obs. of  64 variables:
##  $ scalerank : int  3 1 1 1 1 3 3 1 1 1 ...
##  $ featurecla: chr  "Admin-0 country" "Admin-0 country" "Admin-0 country" "Admin-0 country" ...
##  $ labelrank : num  5 3 3 6 6 6 6 4 2 6 ...
##  $ sovereignt: chr  "Netherlands" "Afghanistan" "Angola" "United Kingdom" ...
##  $ sov_a3    : chr  "NL1" "AFG" "AGO" "GB1" ...
##  $ adm0_dif  : num  1 0 0 1 0 1 0 0 0 0 ...
##  $ level     : num  2 2 2 2 2 2 2 2 2 2 ...
##  $ type      : chr  "Country" "Sovereign country" "Sovereign country" "Dependency" ...
##  $ admin     : chr  "Aruba" "Afghanistan" "Angola" "Anguilla" ...
##  $ adm0_a3   : chr  "ABW" "AFG" "AGO" "AIA" ...
##  $ geou_dif  : num  0 0 0 0 0 0 0 0 0 0 ...
##  $ geounit   : chr  "Aruba" "Afghanistan" "Angola" "Anguilla" ...
##  $ gu_a3     : chr  "ABW" "AFG" "AGO" "AIA" ...
##  $ su_dif    : num  0 0 0 0 0 0 0 0 0 0 ...
##  $ subunit   : chr  "Aruba" "Afghanistan" "Angola" "Anguilla" ...
##  $ su_a3     : chr  "ABW" "AFG" "AGO" "AIA" ...
##  $ brk_diff  : num  0 0 0 0 0 0 0 0 0 0 ...
##  $ name      : chr  "Aruba" "Afghanistan" "Angola" "Anguilla" ...
##  $ name_long : chr  "Aruba" "Afghanistan" "Angola" "Anguilla" ...
##  $ brk_a3    : chr  "ABW" "AFG" "AGO" "AIA" ...
##  $ brk_name  : chr  "Aruba" "Afghanistan" "Angola" "Anguilla" ...
##  $ brk_group : chr  NA NA NA NA ...
##  $ abbrev    : chr  "Aruba" "Afg." "Ang." "Ang." ...
##  $ postal    : chr  "AW" "AF" "AO" "AI" ...
##  $ formal_en : chr  "Aruba" "Islamic State of Afghanistan" "People's Republic of Angola" NA ...
##  $ formal_fr : chr  NA NA NA NA ...
##  $ note_adm0 : chr  "Neth." NA NA "U.K." ...
##  $ note_brk  : chr  NA NA NA NA ...
##  $ name_sort : chr  "Aruba" "Afghanistan" "Angola" "Anguilla" ...
##  $ name_alt  : chr  NA NA NA NA ...
##  $ mapcolor7 : num  4 5 3 6 1 4 1 2 3 3 ...
##  $ mapcolor8 : num  2 6 2 6 4 1 4 1 1 1 ...
##  $ mapcolor9 : num  2 8 6 6 1 4 1 3 3 2 ...
##  $ mapcolor13: num  9 7 1 3 6 6 8 3 13 10 ...
##  $ pop_est   : num  103065 28400000 12799293 14436 3639453 ...
##  $ gdp_md_est: num  2258 22270 110300 109 21810 ...
##  $ pop_year  : num  NA NA NA NA NA NA NA NA NA NA ...
##  $ lastcensus: num  2010 1979 1970 NA 2001 ...
##  $ gdp_year  : num  NA NA NA NA NA NA NA NA NA NA ...
##  $ economy   : chr  "6. Developing region" "7. Least developed region" "7. Least developed region" "6. Developing region" ...
##  $ income_grp: chr  "2. High income: nonOECD" "5. Low income" "3. Upper middle income" "3. Upper middle income" ...
##  $ wikipedia : num  NA NA NA NA NA NA NA NA NA NA ...
##  $ fips_10   : chr  NA NA NA NA ...
##  $ iso_a2    : chr  "AW" "AF" "AO" "AI" ...
##  $ iso_a3    : chr  "ABW" "AFG" "AGO" "AIA" ...
##  $ iso_n3    : chr  "533" "004" "024" "660" ...
##  $ un_a3     : chr  "533" "004" "024" "660" ...
##  $ wb_a2     : chr  "AW" "AF" "AO" NA ...
##  $ wb_a3     : chr  "ABW" "AFG" "AGO" NA ...
##  $ woe_id    : num  NA NA NA NA NA NA NA NA NA NA ...
##  $ adm0_a3_is: chr  "ABW" "AFG" "AGO" "AIA" ...
##  $ adm0_a3_us: chr  "ABW" "AFG" "AGO" "AIA" ...
##  $ adm0_a3_un: num  NA NA NA NA NA NA NA NA NA NA ...
##  $ adm0_a3_wb: num  NA NA NA NA NA NA NA NA NA NA ...
##  $ continent : chr  "North America" "Asia" "Africa" "North America" ...
##  $ region_un : chr  "Americas" "Asia" "Africa" "Americas" ...
##  $ subregion : chr  "Caribbean" "Southern Asia" "Middle Africa" "Caribbean" ...
##  $ region_wb : chr  "Latin America & Caribbean" "South Asia" "Sub-Saharan Africa" "Latin America & Caribbean" ...
##  $ name_len  : num  5 11 6 8 7 5 7 20 9 7 ...
##  $ long_len  : num  5 11 6 8 7 13 7 20 9 7 ...
##  $ abbrev_len: num  5 4 4 4 4 5 4 6 4 4 ...
##  $ tiny      : num  4 NA NA NA NA 5 5 NA NA NA ...
##  $ homepart  : num  NA 1 1 NA 1 NA 1 1 1 1 ...
##  $ geometry  :sfc_MULTIPOLYGON of length 241; first list element: List of 1
##   ..$ :List of 1
##   .. ..$ : num [1:10, 1:2] -69.9 -69.9 -69.9 -70 -70.1 ...
##   ..- attr(*, "class")= chr [1:3] "XY" "MULTIPOLYGON" "sfg"
##  - attr(*, "sf_column")= chr "geometry"
##  - attr(*, "agr")= Factor w/ 3 levels "constant","aggregate",..: NA NA NA NA NA NA NA NA NA NA ...
##   ..- attr(*, "names")= chr [1:63] "scalerank" "featurecla" "labelrank" "sovereignt" ...

Notice that this data frame has a lot of statistical information (population, area, etc.) for each country. You don’t need that information for the examples shown here, but it is nice to have.

Here is a picture of the earth as it might look like if viewed from the sun. The north pole is showing because it is summertime in the northern hemisphere. All the countries that are easily visible in this perspective are outlined in black and the state of Colorado in the United States is outlined in red.

Colorado is a very interesting state with a simple boundary. The sourthern edge is 37 degrees latitude. The northern edge is 41 degrees north latitude. The eastern edge is 102 degrees, 02 minutes, 48 seconds west longitude. The western edge is 109 degrees, 02 minutes, 48 seconds west longitude.

You can calculate the length of each of these boundaries with a bit of simple math. The radius of the earth is 6,378 kilometers. So the circumference is

r <- 6378
2*pi*r
## [1] 40074.16

Each degree of latitude is

2*pi*r/360
## [1] 111.3171

So the four degrees that you would have to travel to get from the southern border (37 degrees) to the northern border (41 degrees) is

2*pi*r*4/360
## [1] 445.2684

The distance from the east to the west border is tricky. At the equator a degree of distance is 111.3 meters, but as you move away from the equator, the lines of latitude converge so a degree of longitude shrinks. A simple bit of geometry can show you that a single degree of longitude at 37 degrees is

2*pi*r*cos(37*pi/180)/360
## [1] 88.90179

but at 41 degrees, it is

2*pi*r*cos(41*pi/180)/360
## [1] 84.01208

That means that the 9 degrees that you travel from the eastern border to the western border is

2*pi*r*cos(37*pi/180)*9/360
## [1] 800.1161

at 37 degrees, but only

2*pi*r*cos(41*pi/180)*9/360
## [1] 756.1087

So the northern border of Colorado is shorter than the southern border.

And all this time you thought Colorado was a rectangle! It’s not. It is not a trapezoid either. It exists on a three dimensional globe and is more complex than any two dimensional geometric object can represent.

You need to make a projection any time you try to represent a three dimensional object like the state of Colorado on a two dimensional medium like a computer screen or a sheet of paper. Here is the Lambert projection.

projection_string <- "+proj=laea +lon_0=-105.5 +lat_0=39 +datum=WGS84 +no_defs"
co %>%
  ggplot() + 
  geom_sf(color="red", fill="white") +
  coord_sf(crs=projection_string)

Notice that it has to warp some of the straight edge boundaries in order to keep the distances correct at the northern and southern borders.

Plotting the entire continent of North America shows even more warping. Notice how the state of Alaska has been twisted almost 45 degrees to accomodate. The island of Greenland is also twisted but in the opposite direction.

The distortions of the Lambert projection are more extreme when you are trying to fit large distances into a single map. The distortions become most noticeable at the edges. There is almost no distortion near the center.

projection_string <- "+proj=laea +lon_0=-105.5 +lat_0=39 +datum=WGS84 +no_defs"
countries %>%
  filter(continent=="North America") %>%
  ggplot() + 
  geom_sf(color="red", fill="white") +
  coord_sf(crs=projection_string)

The most common projection is the Mercator projection. Here is the Mercator projection for the state of Colorado.

projection_string <- "+proj=merc +lon_0=0 +datum=WGS84 +no_defs"
co %>%
  ggplot() + 
  geom_sf(color="red", fill="white") +
  coord_sf(crs=projection_string)

No warping of the boundaries but the lengths of the northern and southern borders are displayed incorrectly as having the same length. Not only are distances distorted, but areas as well. That becomes more obvious when you look at the entire continent.

projection_string <- "+proj=merc +lon_0=-105.5 +datum=WGS84 +no_defs"
countries %>%
  filter(continent=="North America") %>%
  ggplot() + 
  geom_sf(color="red", fill="white") +
  coord_sf(crs=projection_string)

Notice how Alaska has grown to more than half the size of the rest of the United States. Alaska is big, but not that big! Also Greenland has expaned to mostrous proportions. The Mercator projection is simple, but it has a lot of distortions at the higher latitudes.

But notice how the eastern border of the state of Alaska is perfectly vertical, representing a true north/south orientation.

No projection system is perfect, and you should try different approaches to see what works best for your situation.