-
Notifications
You must be signed in to change notification settings - Fork 5
/
05b-visual-variables.Rmd
872 lines (723 loc) · 47.2 KB
/
05b-visual-variables.Rmd
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
279
280
281
282
283
284
285
286
287
288
289
290
291
292
293
294
295
296
297
298
299
300
301
302
303
304
305
306
307
308
309
310
311
312
313
314
315
316
317
318
319
320
321
322
323
324
325
326
327
328
329
330
331
332
333
334
335
336
337
338
339
340
341
342
343
344
345
346
347
348
349
350
351
352
353
354
355
356
357
358
359
360
361
362
363
364
365
366
367
368
369
370
371
372
373
374
375
376
377
378
379
380
381
382
383
384
385
386
387
388
389
390
391
392
393
394
395
396
397
398
399
400
401
402
403
404
405
406
407
408
409
410
411
412
413
414
415
416
417
418
419
420
421
422
423
424
425
426
427
428
429
430
431
432
433
434
435
436
437
438
439
440
441
442
443
444
445
446
447
448
449
450
451
452
453
454
455
456
457
458
459
460
461
462
463
464
465
466
467
468
469
470
471
472
473
474
475
476
477
478
479
480
481
482
483
484
485
486
487
488
489
490
491
492
493
494
495
496
497
498
499
500
501
502
503
504
505
506
507
508
509
510
511
512
513
514
515
516
517
518
519
520
521
522
523
524
525
526
527
528
529
530
531
532
533
534
535
536
537
538
539
540
541
542
543
544
545
546
547
548
549
550
551
552
553
554
555
556
557
558
559
560
561
562
563
564
565
566
567
568
569
570
571
572
573
574
575
576
577
578
579
580
581
582
583
584
585
586
587
588
589
590
591
592
593
594
595
596
597
598
599
600
601
602
603
604
605
606
607
608
609
610
611
612
613
614
615
616
617
618
619
620
621
622
623
624
625
626
627
628
629
630
631
632
633
634
635
636
637
638
639
640
641
642
643
644
645
646
647
648
649
650
651
652
653
654
655
656
657
658
659
660
661
662
663
664
665
666
667
668
669
670
671
672
673
674
675
676
677
678
679
680
681
682
683
684
685
686
687
688
689
690
691
692
693
694
695
696
697
698
699
700
701
702
703
704
705
706
707
708
709
710
711
712
713
714
715
716
717
718
719
720
721
722
723
724
725
726
727
728
729
730
731
732
733
734
735
736
737
738
739
740
741
742
743
744
745
746
747
748
749
750
751
752
753
754
755
756
757
758
759
760
761
762
763
764
765
766
767
768
769
770
771
772
773
774
775
776
777
778
779
780
781
782
783
784
785
786
787
788
789
790
791
792
793
794
795
796
797
798
799
800
801
802
803
804
805
806
807
808
809
810
811
812
813
814
815
816
817
818
819
820
821
822
823
824
825
826
827
828
829
830
831
832
833
834
835
836
837
838
839
840
841
842
843
844
845
846
847
848
849
850
851
852
853
854
855
856
857
858
859
860
861
862
863
864
865
866
867
868
869
870
871
# Visual variables
<!-- JN: idea - add text section -->
Visual variables are methods to translate information given in variables into many types of visualizations, including maps.
Basic visual variables are color, size, and shape^[Other visual variables include position, orientation, and texture.].
All of them can influence our perception and understanding of the presented information, therefore it is worth to understand when and how they can be used.
<!-- raster data in the plot? -->
<!-- size symbol only has four symbols (the rest have five) -->
```{r visual-variables, echo=FALSE, warning=FALSE, fig.asp = .5, fig.cap="Basic visual variables and their representations on maps", message=FALSE}
source("code/visual_variables.R")
visual_variables()
```
The use of visual variables on maps depends on two main things: (a) type of the presented variable, and (b) type of the map layer.
Figure \@ref(fig:visual-variables) shows examples of different visual variables.
Color is the most universal visual variable.
It can represent both qualitative (categorical) and quantitative (numerical) variables, and also we can color symbols, lines, or polygon fillings (sections \@ref(color-palettes) and \@ref(color-scale-styles)).
Sizes, on the other hand, should focus on quantitative variables.
Small symbols could represent low values of a given variable, and the higher the value, the larger the symbol.
Quantitative values of line data can be shown with the widths of the lines (section \@ref(sizes)).
The use of shapes usually should be limited to qualitative variables, and different shapes can represent different categories of points (section \@ref(shapes)).
<!--JN: line type is it a shape or should we make a new group-->
Similarly, qualitative variables in lines can be presented by different line types.
Values of polygons usually cannot be represented by either shapes or sizes, as these two features are connected to the geometries of the objects.
<!-- exception - cartograms - ref to other chapter \@ref(other-types) -->
<!-- also, sometimes it is possible to use several visual variables at the same time (e.g. width lines + colors) -->
## Colors
\index{colors}
Colors, along with sizes and shapes, are the most often used to express values of attributes or their properties.
Proper use of colors draws the attention of viewers and has a positive impact on the clarity of the presented information.
On the other hand, poor decisions about colors can lead to misinterpretation of the map.
Section \@ref(color-palettes) explains how colors are represented in R, how to decide which colors to use, and how to set different colors on maps.
Section \@ref(color-scale-styles) focuses on how to specify color breaks and which types of scales styles are appropriate in different cases.
### Color palettes
\index{color palettes}
<!-- reference this bp - https://earthobservatory.nasa.gov/blogs/elegantfigures/2013/08/06/subtleties-of-color-part-2-of-6/ -->
<!-- As we discussed in ..., -->
<!-- We can express values of attributes in spatial data using colors, shapes, or sizes. -->
<!-- https://en.wikipedia.org/wiki/Color_scheme -->
\index{colors}
\index{hexadecimal form}
Colors in R are created based either on the color name or its hexadecimal form.
R understands 657 built-in color names, such as `"red"`, `"lightblue"` or `"gray90"`, that are available using the `colors()` function.
<!-- demo("colors") -->
<!-- http://www.stat.columbia.edu/~tzheng/files/Rcolor.pdf -->
Hexadecimal form, on the other hand, can represent 16,777,216 unique colors.
It consists of six-digits prefixed by the `#` (hash) symbol, where red, green, and blue values are each represented by two characters.
In hexadecimal form, `00` is interpreted as `0.0` which means a lack of a particular color and `FF` means `1.0` and shows that the given color has maximal intensity.
For example, `#000000` represents black color, `#FFFFFF` white color, and `#00FF00` green color.
<!-- hex alpha?? -->
Using a single color we are able to draw points, lines, polygon borders, or their areas.
In that scenario, all of the elements will have the same color.
However, often we want to represent different values in our data using different colors.
This is a role for color palettes.
A color palette is a set of colors used to distinguish the values of variables on maps.
\index{color palettes}
Color palettes in R are usually stored as a vector of either color names or hexadecimal representations.
For example, `c("red", "green", "blue")` or `c("#66C2A5", "#FC8D62", "#8DA0CB")`.
It allows every one of us to create our own color palettes.
However, the decision on how to decide which colors to use is not straightforward, and usually requires thinking about several aspects.
\index{color properties}
Firstly, what kind of variable we want to show?
<!-- a next sentence is a simplification, as always -->
Is it a <!--qualitative-->categorical variable where each value represents a <!--orderless-->group or a <!--quantitative-->numerical variable in which values have order?
<!-- http://colorspace.r-forge.r-project.org/articles/palette_visualization.html -->
The variable type impacts how it should be presented on the map.
For categorical variables, each color usually should receive the same perceptual weight, which is done by using colors with the same brightness<!--luminance-->, but different hue<!--type of color-->.
On the other hand, for numerical variables, we should easily understand which colors represent lower and which represent higher values.
This is done by manipulating colorfulness<!--chroma,saturation--> and brightness<!--luminance-->.
For example, low values could be presented by a blue color with low colorfulness and high brightness, and with growing values, colorfulness increases and brightness decreases.
\index{color perception}
Next consideration is related to how people <!--(reader/viewers)--> perceive some colors.
Usually, we want them to be able to preliminary understand which values the colors represent without looking at the legend -- colors should be intuitive.
For example, in the case of categorical variables representing land use, we usually want to use some type of blue color for rivers, green for trees, and white for ice.
This idea also extends to numerical variables, where we should think about the association between colors and cultural values.
The blue color is usually connected to cold temperature, while the red color is hot or can represent danger or something not good.
However, we need to be aware that the connection between colors and cultural values varied between cultures.
<!-- http://uxblog.idvsolutions.com/2013/07/language-and-color.html -->
\index{color blindness}
Another thing to consider is to use a color palette that is accessible for people with color vision deficiencies (color blindness).
<!-- https://en.wikipedia.org/wiki/Color_blindness -->
There are several types of color blindness, with the red-green color blindness (*deuteranomaly*) being the most common.
It is estimated that up to about 8% of the male population and about 0.5% of the female population in some regions of the world is color blind [@birch_worldwide_2012;@sharpe_opsin_1999].
<!-- tools in R for checking for colorblindness -->
<!-- Simultaneous contrast. --><!-- background -->
The relation between the selected color palette and other map elements or the map background should be also taken into a consideration.
For example, using a bright or dark background color on a map has an impact on how people will perceive different color palettes.
<!-- relation between the background col and other colors -->
<!-- using two or more palettes (e.g. lines and points): -->
<!-- color palettes then should be complementary -->
<!-- should we add: (?) -->
<!-- aesthetic -->
<!-- similar to lines types, fonts, etc, positions -->
<!-- hard to grasp, hard to learn, look for good examples and be inspired -->
\index{color palettes}
Generally, color palettes can be divided into three main types (Figure \@ref(fig:palette-types)):
- **Categorical** (also known as Qualitative) - used for presenting categorical information, for example, categories or groups.
Every color in this type of palettes should receive the same perceptual weight, and the order of colors is meaningless.
Categorical color palettes are usually limited to dozen or so different colors, as our eyes have problems with distinguishing a large number of different hues.
Their use includes, for example, regions of the world or land cover categories.
- **Sequential** - used for presenting continuous variables, in which order matters.
Colors in this palette type changes from low to high (or vice versa), which is usually underlined by luminance differences (light-dark contrasts).
Sequential palettes can be found in maps of GDP, population density, elevation, and many others.
- **Diverging** - used for presenting continuous variables, but where colors diverge from a central neutral value to two extremes.
Therefore, in sense, they consist of two sequential palettes that meet in the midpoint value.
Examples of diverging palettes include maps where a certain temperature or median value of household income is use as the midpoint.
It can also be used on maps to represent difference or change as well.
<!-- idea: replace one diverging palette with the dark in the middle -->
<!-- do it, if (when) tmap has hcl.colors build-in -->
```{r palette-types, fig.cap="Examples of three main types of color palettes: categorical, sequential, and diverging", echo=FALSE, fig.asp=0.5}
# y - a named list of palettes
source("code/palette_figures.R")
p_cat = hcl.colors(7, "Set3")
p_seq = rev(hcl.colors(7, "YlOrBr"))
p_div = hcl.colors(7, "RdYlGn")
p_cat2 = tmaptools::get_brewer_pal("Set2", n = 7, plot = FALSE)
p_seq2 = rev(hcl.colors(7, "viridis"))
p_div2 = hcl.colors(7, "BrBG")
y = list(Categorical = list(Set3 = p_cat, Set2 = p_cat2),
Sequential = list(YlOrBr = p_seq, viridis = p_seq2),
Diverging = list(RdYlGn = p_div, BrBG = p_div2))
plot_palette_types(y)
```
<!-- idea: add bivariate/trivariate schemes (if/when implemented in tmap) -->
\index{color palettes}
Gladly, a lot of work has been put on creating color palettes that are grounded in the research of perception and design.
Currently, [several dozens of R packages](https://github.com/EmilHvitfeldt/r-color-palettes
) contain hundreds of color palettes.
The most popular among them are **RColorBrewer** [@R-RColorBrewer] and **viridis** [@R-viridis].
**RColorBrewer** builds upon a set of perceptually ordered color palettes [@harrower_colorbrewer_2003] and the associated website at https://colorbrewer2.org.
The website not only presents all of the available color palettes, but also allow to filter them based on their properties, such as being colorblind safe or print-friendly.
The **viridis** package has five color palettes are perceptually-uniform and suitable for people with color blindness.
Four palettes is this package ("viridis", "magma", "plasma", and "inferno") are derived from the work on the color palettes for [the matplotlib Python library](http://bids.github.io/colormap/).
The last one, "cividis", is based on the work of @nunez_optimizing_2018.
```{r}
RColorBrewer::brewer.pal(7, "RdBu")
viridis::viridis(7)
```
\index{color palettes}
In the last few years, the **grDevices** package that is an internal part of R, have received several improvements over color palette handling.^[Learn more about them at https://developer.r-project.org/Blog/public/2019/04/01/hcl-based-color-palettes-in-grdevices/ and https://developer.r-project.org/Blog/public/2019/11/21/a-new-palette-for-r/index.html.]
It includes creation of `hcl.colors()` and `palette.colors()`.
The `hcl.colors()` function [incorporates color palettes from several R packages](http://colorspace.r-forge.r-project.org/articles/approximations.html), including **RColorBrewer**, **viridis**, **rcartocolor** [@carto_cartocolors_2019;@R-rcartocolor], and **scico** [@crameri_geodynamic_2018;@R-scico].
You can get the list of available palette names for `hcl.colors()` using the `hcl.pals()` function and visualize all of the palettes with `colorspace::hcl_palettes(plot = TRUE)`.
The `palette.colors()` function adds [several palettes for categorical data](https://developer.r-project.org/Blog/public/2019/11/21/a-new-palette-for-r/index.html).
It includes `"Okabe-Ito"` [suited for color vision deficiencies](https://jfly.uni-koeln.de/color/) or `"Polychrome 36"` that has 36 unique colors [@coombes_polychrome_2019].
You can find the available names of the palettes for this function using `palette.pals()`
```{r}
grDevices::hcl.colors(7, "Oslo")
grDevices::palette.colors(7, "Okabe-Ito")
```
\index{color palettes!rainbow}
One of the most widely used color palettes is "rainbow" (the `rainbow()` function in R).
It was inspired by colors of rainbows - a set of seven colors going from red to violet.
However, this palette has a number of disadvantages, including irregular changes in brightness affecting its interpretation or being unsuitable for people with color vision deficiencies [@borland_rainbow_2007;@stauffer_somewhere_2015;@quinan_examining_2019].
Depending on a given situation, there are many palettes better suited for visualization than "rainbow", including sequential `"viridis"` and `"ag_Sunset"` or diverging `"Purple-Green"` and `"Fall"`.
All of them can be created with the `grDevices::hcl.colors()` function.
More examples showing alternatives to the "rainbow" palette are in the documentation of the **colorspace** package at
https://colorspace.r-forge.r-project.org/articles/endrainbow.html [@R-colorspace].
```{r, echo=FALSE, warning=FALSE, message=FALSE}
library(tmap)
library(sf)
worldvector = read_sf("data/worldvector.gpkg")
```
<!-- https://github.com/mtennekes/tmap/blob/d3b8575fa19d704cff69cdac6746fedc5b8db758/R/tmap_options.R -->
By default, the **tmap** package attempts to identify the type of the used variable.
Based on the result, it selects one of the build-in palettes: categorical `"Set3"`, sequential `"YlOrBr"`, or diverging `"RdYlGn"` (Figure \@ref(fig:tmpals)).
<!-- info about tm_layout or reference to a section about it -->
```{r, eval=FALSE}
tm_shape(worldvector) +
tm_polygons("life_expectancy")
```
It also offers three main ways to specify color palettes using the `palette` argument: (1) a vector of colors, (2) a palette function, or (3) one of the build-in names (Figure \@ref(fig:tmpals)).
A vector of colors can be specified using color names or hexadecimal representations (Figure \@ref(fig:tmpals)).
Importantly, the length of the provided vector does not need to be equal to the number of colors in the map legend.
**tmap** automatically interpolates new colors in the case when a smaller number of colors is provided.
```{r, eval=FALSE}
tm_shape(worldvector) +
tm_polygons("life_expectancy", palette = c("yellow", "darkgreen"))
```
Another approach is to provide the output of a palette function (Figure \@ref(fig:tmpals)).
In the example below, we derived seven colors from `"ag_GrnYl"` palette.
This palette goes from green colors to yellow ones, however, we wanted to reverse the order of this palette.
Thus, we also used the `rev()` function here.
```{r, eval=FALSE}
tm_shape(worldvector) +
tm_polygons("life_expectancy", palette = rev(hcl.colors(7, "ag_GrnYl")))
```
The last approach is to use one of the names of color palettes build-in in **tmap** (Figure \@ref(fig:tmpals)).
In this example, we used the `"YlGn"` palette that goes from yellow to green.
```{r, eval=FALSE}
tm_shape(worldvector) +
tm_polygons("life_expectancy", palette = "YlGn")
```
You can find all of the named color palettes using an interactive app with `tmaptools::palette_explorer()`.
It is also possible to reverse the order of any named color palette by using the `-` prefix.
Therefore, `"-YlGn"` will return a palette going from green to yellow.
(ref:tmpals) Examples of four ways of specifying color palettes: (A) default sequential color palette, (B) palette created based on provided vector of colors, (C) palette created using the `hcl.colors()` function, and (D) one of the build-in palettes.
```{r tmpals, warning=FALSE, fig.cap="(ref:tmpals)", echo=FALSE}
tm_pal1 = tm_shape(worldvector) +
tm_polygons("life_expectancy") +
tm_layout(title = "A")
tm_pal2 = tm_shape(worldvector) +
tm_polygons("life_expectancy", palette = c("yellow", "darkgreen")) +
tm_layout(title = "B")
tm_pal3 = tm_shape(worldvector) +
tm_polygons("life_expectancy", palette = rev(hcl.colors(7, "ag_GrnYl"))) +
tm_layout(title = "C")
tm_pal4 = tm_shape(worldvector) +
tm_polygons("life_expectancy", palette = "YlGn") +
tm_layout(title = "D")
tmap_arrange(tm_pal1, tm_pal2, tm_pal3, tm_pal4,
ncol = 2)
```
<!-- state that the above example of setting colors works for most of palettes -->
<!-- midpoint argument -->
The default color palette for positive numerical variables is `"YlOrBr"` as seen in Figure \@ref(fig:tmmidpoint):A.
On the other hand, when the given variable has both negative and positive values, then **tmap** uses the `"RdYlGn"` color palette, with red colors below the midpoint value, yellow color around the midpoint value, and green colors above the midpoint value.
The use of diverging color palettes can be adjusted using the `midpoint` argument.
It has a value of 0 as the default, however, it is possible to change it to any other value.
For example, we want to create a map that shows countries with life expectancy below and above the median life expectancy of about 73 years.
To do that, we just need to set the `midpoint` argument to this value (Figure \@ref(fig:tmmidpoint):B).
<!-- , style = "cont" -->
```{r, eval=FALSE}
tm_shape(worldvector) +
tm_polygons(col = "life_expectancy", midpoint = 73)
```
```{r tmmidpoint, warning=FALSE, fig.cap="Examples of (A) a map with the default sequential color palette and (B) a map with the diverging color palette around the midpoint value of 73.", fig.asp=NA, fig.height=2, echo=FALSE}
# mean(x$lifeExp, na.rm = TRUE)
tm_mp1 = tm_shape(worldvector) +
tm_polygons(col = "life_expectancy") +
tm_layout(title = "A")
tm_mp2 = tm_shape(worldvector) +
tm_polygons(col = "life_expectancy", midpoint = 71) +
tm_layout(title = "B")
tmap_arrange(tm_mp1, tm_mp2, ncol = 2)
```
Now the countries with low life expectancy are presented with red colors, yellow areas represent countries with life expectancy around the median value (the `midpoint` in our case), and the countries with high life expectancy are represented by green colors.
The above examples all contain several polygons with missing values of a given variable.
Objects with missing values are, by default, represented by gray color and a related legend label *Missing*.
However, it is possible to change this color with the `colorNA` argument and its label with `textNA`.
**tmap** has a special way to set colors for categorical maps manually.
It works by providing a named vector to the `palette` argument.
In this vector, names of the categories from the categorical variable are the vector names, and specified colors are the vector values.
You can see it in the example below, where we plot the `"region_un"` categorical variable (Figure \@ref(fig:tmcatpals)).
Each category in this variable (e.g., `"Africa"`) has a new, connected to it color (e.g., `"#11467b"`).
<!--improve colors-->
<!-- also - improve example - maybe use less colors/categories -->
```{r tmcatpals, warning=FALSE, fig.cap="An example of a categorical map with manually selected colors.", fig.asp=NA, fig.height=2}
tm_shape(worldvector) +
tm_polygons("wb_region",
palette = c(
"Latin America & Caribbean" = "#11467b",
"Europe & Central Asia" = "#ffd14d",
"Middle East & North Africa" = "#86909a",
"Sub-Saharan Africa" = "#14909a",
"East Asia & Pacific" = "#7fbee9",
"South Asia" = "#df5454",
"North America" = "#7b1072")
)
```
```{r, echo=FALSE, eval=FALSE}
tm_shape(worldvector) +
tm_polygons(col = "life_expectancy")
tm_shape(worldvector) +
tm_polygons(col = "life_expectancy", alpha = 0.5)
tm_shape(worldvector) +
tm_polygons(col = "life_expectancy", colorNA = "purple", textNA = "I do not know!")
tm_shape(worldvector) +
tm_polygons(col = "life_expectancy", contrast = c(0.4, 1))
```
\index{color palettes!transparency}
Finally, visualized colors can be additionally modified.
It includes setting the `alpha` argument that represents the transparency of the used colors.
By default, the colors are not transparent at all as the value of `alpha` is 1.
However, we can decrease this value to 0 - total transparency.
The `alpha` argument is useful in two ways: one - it allows us to see-through some large objects (e.g., some points below the polygons or a hillshade map behind the colored raster of elevation), second - it makes colors more subtle.
<!-- alpha figure? -->
<!-- color to highlight?? -->
<!-- resources: -->
<!-- https://bookdown.org/hneth/ds4psy/D-2-apx-colors-essentials.html -->
<!-- https://developer.r-project.org/Blog/public/2019/11/21/a-new-palette-for-r/index.html -->
<!-- add some references about colors theory, color blindness, etc. -->
<!-- https://earthobservatory.nasa.gov/blogs/elegantfigures/2013/09/10/subtleties-of-color-part-6-of-6/ -->
### Color scale styles
<!-- intro about setting colors -->
<!-- info that generalized to points, lines, polygons, and rasters... -->
\index{Color scale styles}
`tm_polygons()` accepts three ways of specifying the fill color with the `col` argument^[To see and compare examples of every color scale style from **tmap** visit https://geocompr.github.io/post/2019/tmap-color-scales/.].
The first one is to fill all polygons with the same color.
This happens when we provide a single color value, either as a color name or its hexadecimal form (section \@ref(color-palettes)) (Figure \@ref(fig:colorscales1)).
```{r, eval=FALSE}
tm_shape(worldvector) +
tm_polygons(col = "lightblue")
```
```{r colorscales1, warning=FALSE, fig.cap="Example of a map with all polygons filled with the same color.", echo=FALSE, fig.asp=NA, fig.height=2}
tm_one = tm_shape(worldvector) +
tm_polygons(col = "lightblue")
tm_one
```
\index{Color scale styles}
\index{Categorical maps}
\index{Discrete maps}
\index{Continuous maps}
The second way of specifying the fill color is to provide a name of the column (variable) we want to visualize.
**tmap** behaves differently depending on the input variable type, but always automatically adds a map legend.
In general, a categorical map is created when the provided variable contains characters, factors, or is of the logical type.
However, when the provided variable is numerical, then it is possible to create either a discrete or a continuous map.
\index{Categorical maps}
An example of a categorical map can be seen in Figure \@ref(fig:colorscales2).
We created it by providing a character variable's name, `"wb_region"`, in the `col` argument^[The `tm_polygons(col = "region_un", style = "cat")` code is run automatically in this case.].
```{r, eval=FALSE}
tm_shape(worldvector) +
tm_polygons(col = "wb_region")
```
<!-- categorical -->
```{r colorscales2, warning=FALSE, fig.asp=NA, fig.height=2, message=FALSE, echo=FALSE, fig.cap="Example of a map in which polygons are colored based on the values of a categorical variable."}
tm_cat = tm_shape(worldvector) +
tm_polygons(col = "wb_region")
tm_cat
```
It is possible to change the names of legend labels with the `labels` argument.
However, to change the order of legend labels, we need to provide an ordered factor variable's name instead of a character one.<!--should we explain how to do it?-->
As mentioned in the section \@ref(color-palettes), we can also change the used color palette with the `palette` argument.
```{r, eval=FALSE, echo=FALSE}
tm_shape(worldvector) +
tm_polygons(col = "wb_region", labels = as.character(1:7))
```
<!-- optimal number of classes? 3-7 -->
\index{Discrete maps}
Discrete maps, on the other hand, represent continuous numerical variables using discrete class intervals.
In other words, values are divided into several groups based on their properties.
Several approaches can be used to convert continuous variables to discrete ones, and each of them could result in different groups of values.
**tmap** has 14 different methods to create discrete maps<!--list??--> that can be specified with the `style` argument.
Most of them (except `"log10_pretty"`) use the **classInt** package [@R-classInt] in the background, therefore some additional information can be found in the `?classIntervals` function's documentation.
By default, the `"pretty"` style is used (Figure \@ref(fig:discrete-methods):A).
This style creates breaks that are whole numbers and spaces them evenly ^[For more information visit the `?pretty()` function documentation].
```{r, eval=FALSE}
tm_shape(worldvector) +
tm_polygons(col = "gdp_per_cap")
```
It is also possible to indicate the desired number of classes using the `n` argument, when the `"pretty"` style is used.
While not every `n` is possible depending on the input values, **tmap** will try to create a number of classes as close to possible to the preferred one.
The next approach is to manually select the limits of each break with the `breaks` function (Figure \@ref(fig:discrete-methods):B).
This can be useful when we have some pre-defined breaks, or when we want to compare values between several maps.
It expects threshold values for each break, therefore, if we want to have three breaks, we need to provide four thresholds.
Additionally, we can add a label to each break with the `labels` argument.
```{r, eval=FALSE}
tm_shape(worldvector) +
tm_polygons(col = "gdp_per_cap",
breaks = c(0, 10000, 30000, 111000),
labels = c("low", "medium", "high"))
```
<!-- interval.closure -->
Another approach is to create breaks automatically using one of many existing classification methods.
Three basic methods are `"equal"`, `"sd"`, and `"quantile"` styles.
Let's consider a variable with 100 observations ranging from 0 to 10.
The `"equal"` style divides the range of values into *n* equal-sized intervals.
This style works well when the values change fairly continuously and do not contain any outliers.
In **tmap**, we can specify the number of classes with the `n` argument or the number of classes will be computed automatically <!--?nclass.Sturges-->.
For example, when we set `n` to 4, then our breaks will represent four classes ranging from 0 to 2.5, 2.5 to 5, 5 to 7.5, and 7.5 to 10.
The `"sd"` style represents how much values of a given variable varies from its mean, with each interval having a constant width of the standard deviation.
This style is used when it is vital to show how values relate to the mean.
The `"quantile"` style creates several classes with exactly the same number of objects (e.g., spatial features), but having intervals of various lengths.
This method has an advantage or not having any empty classes or classes with too few or too many values.
However, the resulting intervals from the `"quantile"` style can often be misleading, with very different values located in the same class.
To create classes that, on the one hand, contain similar values, and on the other hand, are different from the other classes, we can use some optimization method.
The most common optimization method used in cartography is the Jenks optimization method implemented at the `"jenks"` style (Figure \@ref(fig:discrete-methods):C).
<!-- how about adding ggplot2 histograms?? -->
<!-- should we add that these methods usually do not allow to compare between datasets? -->
```{r, eval=FALSE}
tm_shape(worldvector) +
tm_polygons(col = "gdp_per_cap",
style = "jenks")
```
The Fisher method (`style = "fisher"`) has a similar role, which creates groups with maximized homogeneity [@fisher_grouping_1958].
A different approach is used by the `dpih` style, which uses kernel density estimations to select the width of the intervals [@wand_databased_1997].
You can visit `?KernSmooth::dpih` for more details.
Another group of classification methods uses existing clustering methods.
It includes k-means clustering (`"kmeans"`), bagged clustering (`"bclust"`), and hierarchical clustering (`"hclust"`).
<!-- ... -->
Finally, there are a few methods created to work well for a variable with a heavy-tailed distribution, including `"headtails"` and `"log10_pretty"`.
The `"headtails"` style is an implementation of the head/tail breaks method aimed at heavily right-skewed data.
In it, values of the given variable are being divided around the mean into two parts, and the process continues iteratively for the values above the mean (the head) until the head part values are no longer heavy-tailed distributed [@jiang_head_2013].
The `"log10_pretty"` style uses a logarithmic base-10 transformation (Figure \@ref(fig:discrete-methods):D).
In this style, each class starts with a value ten times larger than the beginning of the previous class.
In other words, each following class shows us the next order of magnitude.
This style allows for a better distinction between low, medium, and high values.
However, maps with logarithmically transformed variables are usually less intuitive for the readers and require more attention from them.
```{r, eval=FALSE}
tm_shape(worldvector) +
tm_polygons(col = "gdp_per_cap",
style = "log10_pretty")
```
(ref:discrete-methods) Examples of four methods of creating discrete maps: (A) default method ('pretty'), (B) the 'fixed' method with manually set breaks, (C) the 'jenks' method, and (D) the 'log10_pretty' method.
<!-- discrete -->
```{r discrete-methods, warning=FALSE, fig.asp=NA, fig.height=3.46, message=FALSE, echo=FALSE, fig.cap="(ref:discrete-methods)"}
tm_pre = tm_shape(worldvector) +
tm_polygons(col = "gdp_per_cap") +
tm_layout(title = "A")
tm_fix = tm_shape(worldvector) +
tm_polygons(col = "gdp_per_cap",
breaks = c(0, 10000, 30000, 111000),
labels = c("low", "medium", "high")) +
tm_layout(title = "B")
tm_jen = tm_shape(worldvector) +
tm_polygons(col = "gdp_per_cap",
style = "jenks") +
tm_layout(title = "C")
tm_lop = tm_shape(worldvector) +
tm_polygons(col = "gdp_per_cap",
style = "log10_pretty") +
tm_layout(title = "D")
tmap_arrange(tm_pre, tm_fix, tm_jen, tm_lop, ncol = 2, asp = NA,
outer.margins = 0)
```
<!-- The numeric variable can be either regarded as a continuous variable or a count (integer) variable. See as.count. Only applicable if style is "pretty", "fixed", or "log10_pretty". -->
\index{Continuous maps}
Continuous maps also represent continuous numerical variables, but without any discrete class intervals (Figure \@ref(fig:cont-methods)).
Three continuous methods exist in **tmap**: `cont`, `order`, and `log10`.
Values change increasingly in all of them, but they differ in the relations between values and colors.
The `cont` style creates a smooth, linear gradient.
In other words, the change in values is proportionally related to the change in colors.
We can see that in Figure \@ref(fig:cont-methods):A, where the value change from 20,000 to 40,000 has a similar impact on the color scale as the value change from 40,000 to 60,000.
The `cont` style is similar to the `pretty` one, where the values also change linearly.
The main difference between these styles is that we can see differences between, for example, values of 45,000 and 55,000 in the former, while both values have exactly the same color in the later one.
The `cont` style works well in situations where there is a large number of objects in vectors or a large number of cells in rasters, and where the values change continuously (do not have many outliers).
```{r, eval=FALSE}
tm_shape(worldvector) +
tm_polygons(col = "gdp_per_cap",
style = "cont")
```
However, when the presented variable is skewed or have some outliers, we can use either `order` or `log10` style.
The `order` style also uses a smooth gradient with a large number of colors, but the values on the legend do not change linearly (Figure \@ref(fig:cont-methods):B).
<!--JN: Martijn please check the next sentence -->
It is fairly analogous to the `quantile` style, with the values on a color scale that divides a dataset into several equal-sized groups.
```{r, eval=FALSE}
tm_shape(worldvector) +
tm_polygons(col = "gdp_per_cap",
style = "order")
```
Finally, the `log10` style is the continuous equivalent of the `log10_pretty` style (Figure \@ref(fig:cont-methods):C).
```{r, eval=FALSE}
tm_shape(worldvector) +
tm_polygons(col = "gdp_per_cap",
style = "log10")
```
```{r cont-methods, warning=FALSE, fig.asp=NA, fig.height=3.46, message=FALSE, echo=FALSE, fig.cap="Examples of three methods of creating continuous maps: (A) the ‘cont’ method, (B) the ‘order’ method, and (C) the ‘log10’ method."}
tm_con = tm_shape(worldvector) +
tm_polygons(col = "gdp_per_cap",
style = "cont") +
tm_layout(title = "A")
tm_ord = tm_shape(worldvector) +
tm_polygons(col = "gdp_per_cap",
style = "order") +
tm_layout(title = "B")
tm_log = tm_shape(worldvector) +
tm_polygons(col = "gdp_per_cap",
style = "log10") +
tm_layout(title = "C")
tmap_arrange(tm_con, tm_ord, tm_log, ncol = 2, asp = NA,
outer.margins = 0)
```
The `tm_polygons()` also offer a third way of specifying the fill color.
When the `col` argument is set to `"MAP_COLORS"` then polygons will be colored in such a way that adjacent polygons do not get the same color (Figure \@ref(fig:colorscalesmc)).
```{r, eval=FALSE}
tm_shape(worldvector) +
tm_polygons(col = "MAP_COLORS")
```
In this case, it is also possible to change the default colors with the `palette` argument, but also to activate the internal algorithm to search for a minimal number of colors for visualization by setting `minimize = TRUE`.
```{r colorscalesmc, warning=FALSE, fig.cap="Example of a map with adjacent polygons having different colors.", echo=FALSE, fig.asp=NA, fig.height=2}
tm_uni = tm_shape(worldvector) +
tm_polygons(col = "MAP_COLORS")
tm_uni
```
All of the color scale styles mentioned above work not only for `tm_polygons()` - they can be also applied for `tm_symbols()` (and its derivatives - `tm_dots()`, `tm_bubbles()`, `tm_squares()`), `tm_lines()`, `tm_fill()`, and `tm_raster()`.
The `col` argument colors symbols' fillings in `tm_symbols()`, lines in `tm_lines()`, and cells in `tm_rasters()`.
<!-- important - mention how to change raster categorical colors (for e.g. data subsets!) -->
<!-- one color only: -->
<!-- - tm_borders -->
```{r, echo=FALSE, eval=FALSE}
data("metro", package = "tmap")
data("rivers", package = "tmap")
data("land", package = "tmap")
tm_shape(metro) +
tm_symbols(col = "pop2030", style = "cont")
tm_shape(rivers) +
tm_lines(col = "strokelwd", style = "cont")
tm_shape(land) +
tm_raster(col = "elevation", style = "cont")
```
<!-- title?? -->
## Sizes
```{r}
ei_points = read_sf("data/easter_island/ei_points.gpkg")
volcanos = subset(ei_points, type == "volcano")
```
Differences in sizes between objects are relatively easy to recognize on maps.
Sizes can be used for points, lines (line widths), or text to represent quantitative (numerical) variables, where small values are related to small objects and large values are presented by large objects.
Large sizes can be also used to attract viewers' attention.
By default, **t**maps present points, lines, or text objects of the same size.
For example, `tm_symbols()` returns a map where each object is a circle with a consistent size^[The default value of size is 1, which corresponds to the area of symbols that have the same height as one line of text.].
We can change the sizes of all objects using the `size` argument (Figure \@ref(fig:tmsizes):A).
```{r, eval=FALSE}
tm_shape(volcanos) +
tm_symbols(size = 0.5)
```
On the other hand, if we provide the name of the numerical variable in the `size` argument (e.g., `"elevation"`), then symbol sizes are scaled proportionally to the provided values.
Objects with small values will be represented by smaller circles, while larger values will be represented by larger circles (Figure \@ref(fig:tmsizes):B).
```{r, eval=FALSE}
tm_shape(volcanos) +
tm_symbols(size = "elevation")
```
<!-- numeric only -->
<!-- size.max -->
<!-- size.lim -->
<!-- sizes.legend -->
<!-- sizes.legend.labels -->
<!-- potential tmap improvement: use of size.legend instead of sizes.legend -->
We can adjust size legend breaks with `sizes.legend` and the corresponding labels with `sizes.legend.labels` (Figure \@ref(fig:tmsizes):C).
However, this only modifies the legend, not the related objects.
```{r, eval=FALSE}
tm_shape(volcanos) +
tm_symbols(size = "elevation",
title.size = "Elevation",
sizes.legend = c(100, 600),
sizes.legend.labels = c("low", "high"))
```
For example in the above code, we just show examples of how symbols with population of one million and 10 million looks like on the map.
```{r tmsizes, echo=FALSE, fig.cap="Examples of three approaches for changing sizes of symbols: (A) all symbols have a consistent size of 0.5, (B) sizes of symbols depends on the values of the elevation variable, (C) sizes of symbols have a manually created legend.", fig.asp=0.42, message=FALSE}
library(tmap)
tmsize1 = tm_shape(volcanos) +
tm_symbols(size = 0.5) +
tm_layout(title = "A")
tmsize2 = tm_shape(volcanos) +
tm_symbols(size = "elevation") +
tm_layout(title = "B")
tmsize3 = tm_shape(volcanos) +
tm_symbols(size = "elevation",
title.size = "Elevation",
sizes.legend = c(100, 600),
sizes.legend.labels = c("low", "high")) +
tm_layout(title = "C")
tmap_arrange(tmsize1, tmsize2, tmsize3, ncol = 3)
```
Widths of the lines can represent values of numerical variables for line data similar to sizes of the symbols for point data.
The `lwd` argument in `tm_lines()` creates thin lines for small values and thick lines for large values of the given variable (Figure \@ref(fig:tmlwd)).
```{r tmlwd, fig.asp=0.66, fig.cap="Example of a map where lines' widths represent values of the corresponding lines."}
ei_roads = read_sf("data/easter_island/ei_roads.gpkg")
tm_shape(ei_roads) +
tm_lines(lwd = "strokelwd")
```
In the above example, values of the `"strokelwd"` are divided into four groups and represented by four line widths.
Lines' thickness can be change using the `scale` argument, where the value of 1 is the default, and increasing this values increases lines' thickness.
Also, similarly to the last example of the `tm_symbols` above, it is possible to modify the lines width legend, by changing its title (`title.lwd`), categories (`lwd.legend`), and their names (`lwd.legend.labels`).
<!-- how about trying some transportation examples here (and expanding them)? -->
```{r, echo=FALSE, eval=FALSE}
tm_shape(rivers) +
tm_lines(lwd = "strokelwd")
tm_shape(rivers) +
tm_lines(lwd = "strokelwd",
scale = 2)
tm_shape(rivers) +
tm_lines(lwd = "strokelwd",
scale = 2, n = 6)
tm_shape(rivers) +
tm_lines(lwd = "strokelwd",
scale = 2,
title.lwd = "Line legend",
lwd.legend = c(1, 2, 3, 5, 10, 15),
lwd.legend.labels = LETTERS[1:6])
```
Text labels have a role to name features on a map or just to highlight some of them.
Usually, the size of text labels is consistent for the same spatial objects. <!--ref to the text label layer section-->
However, text labels can be also used to represent the values of some numerical variables.
Figure \@ref(fig:tmtextsize) shows an example, in which text labels show names of different volcanos areas, while their sizes are related to their elevations.
<!-- This allows us to not only locate different volcanos on the map but also differentiate between less populous areas (e.g., Seattle) and more populous ones (e.g., Tokyo). -->
```{r tmtextsize, fig.asp=0.25, fig.cap="Example of a map where text sizes represent elevations of the volcanos."}
tm_shape(volcanos) +
tm_text(text = "name", size = "elevation") +
tm_layout(legend.outside = TRUE)
```
<!-- sizes.legend -->
<!-- sizes.legend.labels -->
<!-- sizes.legend.text -->
<!-- again - mention other map types - cartograms, hexmaps, etc., which even impact of polygon sizes -->
## Shapes
<!-- ??and markers -->
<!-- potential tmap improvement: do not allow to use shape for numerical vars -->
Shapes allow representing different categories of point data.
They can be very generic, e.g., circle or square, just to be able to differentiate between categories, but often we use symbols that we associate with different types of features.
For example, we use the letter *P* for parking lots, *I* for information centers, an airplane symbol for airports, or a bus symbol for bus stops.
To use different shapes, we should use the `shape` argument in the `tm_symbols()` function.
It expects the name of the categorical variable.
```{r, eval=FALSE}
tm_shape(ei_points) +
tm_symbols(shape = "type",
title.shape = "Type:",
shapes.labels = c("Cave entrance", "Peak", "Volcano"))
```
By default, **tmap** uses symbols of filled circle, square, diamond, point-up triangle, and point-down triangle^[They are represented in R by numbers from 21 to 25.].
However, it is also possible to customize used symbols, their title, and labels.
Legend title related to shapes is modified with the `title.shape` argument, while their labels use the `shapes.lables` argument.
Shapes can be specified with the `shapes` argument, that allows using one of three options.
The first one is a numeric value that specifies the plotting character of the symbol^[However, this is not supported for the "view" mode.].
A complete list of available symbols and their corresponding numbers is in the `?pch` function's documentation.
<!--JN: or should we add a figure with them here??-->
```{r, eval=FALSE}
tm_shape(ei_points) +
tm_symbols(shape = "type",
shapes = c(0, 2, 5))
```
Second option is to use a *grob* object.
<!-- add intro what are grobs -->
<!-- add reference to some section explaining tmap_grob (chapter 10??) -->
```{r}
# library(grid)
# library(ggplotify)
library(ggplot2)
# p1 = as.grob(~barplot(1:10))
# p2 = as.grob(expression(plot(rnorm(10), yaxt = "n", xaxt = "n", ann = FALSE, bty = "n")))
# p3 = as.grob(function() plot(sin, yaxt = "n", xaxt = "n", ann = FALSE, bty = "n"))
p4 = ggplotGrob(ggplot(data.frame(x = 1:5, y = 1:5), aes(x, y)) + geom_point() + theme_void())
```
```{r, eval=FALSE}
tm_shape(ei_points) +
tm_symbols(shape = "type",
shapes = list(p4, p4, p4),
border.col = NULL)
```
<!-- explain what's a grob -->
<!-- try different codes (including the view mode) -->
<!-- add new example -->
<!-- A grob object, which can be a ggplot2 plot object created with ggplotGrob. To specify multiple shapes, a list of grob objects is required. See example of a proportional symbol map with ggplot2 plots. -->
The last possibility is to use an icon specification created with the `tmap_icons()` function, that uses any png images.
The `tmap_icons()` function accepts a vector of file paths or urls, and also allows setting the width and height of the icon.
In our example, we have three distinct groups, therefore we need to create new icons based on three images - `icon1.png`, `icon2.png`, and `icon3.png` in this case.
```{r}
my_icons = tmap_icons(c("images/icon1.png",
"images/icon2.png",
"images/icon3.png"))
```
Now, we can use the prepared icons in the `shapes` argument (Figure \@ref(fig:tmsymshape):D).
<!-- border.col = NULL -->
```{r, eval=FALSE}
tm_shape(ei_points) +
tm_symbols(shape = "type",
shapes = my_icons,
border.col = NULL)
```
```{r tmsymshape, fig.asp=1.32, echo=FALSE, fig.cap="Examples of two maps with different symbols: (A) default symbols, (B) user-defined symbols, (C) grob objects, and (D) icons."}
tmsymshape1 = tm_shape(ei_points) +
tm_symbols(shape = "type",
title.shape = "Type:",
shapes.labels = c("Cave entrance", "Peak", "Volcano")) +
tm_layout(title = "A")
tmsymshape2 = tm_shape(ei_points) +
tm_symbols(shape = "type",
shapes = c(0, 2, 5)) +
tm_layout(title = "B")
tmsymshape3 = tm_shape(ei_points) +
tm_symbols(shape = "type",
shapes = list(p4, p4, p4),
border.col = NULL) +
tm_layout(title = "C")
tmsymshape4 = tm_shape(ei_points) +
tm_symbols(shape = "type",
shapes = my_icons,
border.col = NULL) +
tm_layout(title = "D")
tmap_arrange(tmsymshape1, tmsymshape2,
tmsymshape3, tmsymshape4,
ncol = 1)
```
```{r tmlinlty, fig.asp=0.33, fig.cap="", echo=FALSE, eval=FALSE}
tm_shape(rivers) +
tm_lines(lty = 2)
```
## Mixing visual variables
The values of a given variable can be expressed by different categorical or sequential colors in polygons.
Lines can be also colored by one variable, but also widths of the lines can represent values of another quantitative variable.
When we use symbols, then we are able to use colors for one qualitative or quantitative variable, sizes for a quantitative variable, and shapes for another qualitative variable.
Therefore, it is possible to mix some visual variables for symbols and lines.
This section shows only some possible examples of mixing visual variables.
Figure \@ref(fig:mixsymb):A shows symbols, which sizes are scales based on the `sv` variable and they are colored using the values from `elevation`.
This can be set with the `size` and `col` arguments.
```{r, eval=FALSE}
tm_shape(ei_points) +
tm_symbols(size = "sv", col = "elevation")
```
We can also modify all of the visual variables using the additional arguments explained in the previous sections.
For example, we can set the color style (`style`), color palette (`palette`), or specify shapes (`shapes`) (Figure \@ref(fig:mixsymb):B).
```{r, eval=FALSE}
tm_shape(ei_points) +
tm_symbols(col = "elevation", style = "cont", palette = "Greens",
shape = "type", shapes = c(23, 24, 25))
```
When we use plot polygons, there is only one visual variable we can use - color.
Therefore, changing of map legend's title in functions like `tm_polygons()` or `tm_fill()` is done with the `title` argument.
However, what to do when we have two or more visual variables in, for example, `tm_symbols()`?
In these cases, we need to specify a corresponding suffix for each `title` argument.
The color title is set with `title.col`, size title with `title.size`, and shape title with `title.size` (Figure \@ref(fig:mixsymb):C).
```{r, eval=FALSE}
tm_shape(ei_points) +
tm_symbols(size = "elevation", title.size = "Elevation:",
shape = "type", title.shape = "Type:")
```
```{r mixsymb, echo = FALSE, fig.asp=1, fig.cap="Examples of maps using two visual variables at the same time: (A) size and color, (B) color and shape, (C) size and shape."}
tm_mvv1 = tm_shape(ei_points) +
tm_symbols(size = "sv", col = "elevation") +
tm_layout(title = "A")
tm_mvv2 = tm_shape(ei_points) +
tm_symbols(col = "elevation", style = "cont", palette = "Greens",
shape = "type", shapes = c(23, 24, 25)) +
tm_layout(title = "B")
tm_mvv3 = tm_shape(ei_points) +
tm_symbols(size = "elevation", title.size = "Elevation:",
shape = "type", title.shape = "Type:") +
tm_layout(title = "C")
tmap_arrange(tm_mvv1, tm_mvv2, tm_mvv3, ncol = 1)
```
For line data, we can present its qualitative and quantitative variables using colors and quantitative variables using sizes (line widths) (Figure \@ref(fig:mixline)).
```{r mixline, echo = FALSE, fig.asp=0.66, fig.cap="A map using two visual variables, color and size (line width), at the same time."}
tm_shape(ei_roads) +
tm_lines(col = "type", lwd = "strokelwd", palette = "Set1")
```