Get current wifi ssid in Python

As with a lot of the posts here. Just at quick note to make sure I can find the information again.

Print the current wlan ssid i python, running raspbian (or other linux OS’es):

import subprocess
ssid = subprocess.check_output("iwgetid -r", shell=True)
print(ssid)

 

Themes i R

Der er sikkert bedre ovesigter. Men det her er de jeg er faldet over, og godt vil kunne huske.

TV-themes: Farver, baggrund med videre. Tematiseret efter TV-shows. Brooklyn Nine-Nine, Game of Thrones, Rick & Morty, Simpsons og Parks & Rec. Blandt mange andre.

Wes Anderson: Knap så meget temaer, som farve paletter. Der matcher Wes Andersons film.

 

 

Errorbars on barcharts

Errorbars

An errorbar is a graphical indication of the standard deviation of a measurement. Or a confidence interval.

The mean of a measurement is something. We want to illustrate how confident we are of that value.

How do we plot that?

I am going to use three libraries:

library(ggplot2)
library(dplyr)
library(tibble)

ggplot2 for plotting, dplyr to make manipulating data easier. And tibble to support a single line later.

Then, we need some data.

data <- data.frame(
  name=letters[1:3],
  mean=sample(seq(4,15),3),
  sd=c(1,0.2,3)
)

I now have a dataframe with three variables, name, mean and standard deviation (sd).

How do we plot that?

Simple:

data %>% 
  ggplot() +
  geom_bar( aes(x=name, y=mean), stat="identity", fill="skyblue") +
  geom_errorbar(aes(x=name, ymin=mean-sd, ymax = mean+sd), width=0.1)

plot of chunk unnamed-chunk-3

So. What on earth was that?

The pipe-operator %>% takes whatever is on the left-hand-side, and inserts it as the first variable in whatever is on the right-hand-side.

What is on the right-hand-side is the ggplot function. That would normally have a datat=something as the first variable. Here that is the data we constructed earlier.

To that initial plot, which is completely empty, we add a geom_bar. That plots the bars on the plot. It takes an x-value, name, and a y-value, mean. And we tell the function, that rather than counting the number of observations of each x-value (the default behavior of geom_bar), it should use the y-values provided. We also want a nice lightblue color for the bars.

To that bar-chart, we now add errorbars. geom_errorbar needs to know the x- and y-values of the bars, in order to place them correctly. It also needs to know where to place the upper errorbar, and the lower errorbar. And we supply the information that ymin, the lower, should be the mean value minus the standard deviation. And the upper bar, ymax, the sum of the mean and sd. Finally we need to decide how broad those lines should be. We do that by writing “width=0.1”. We do not actually have to, but the default value results in an ugly plot.

And there you go, a barchart with errorbars!

Next step

That was all very nice. However! We do not usually have a nice dataframe with means and standard deviations calculated directly. More often, we have a dataframe like this:

mtcars %>% 
  remove_rownames() %>% 
  select(cyl, disp) %>% 
  head()
##   cyl disp
## 1   6  160
## 2   6  160
## 3   4  108
## 4   6  258
## 5   8  360
## 6   6  225

I’ll get back to what the code actually means later.

Here we have 32 observations (only 6 shown above), of the variables “cyl” and “disp”. I would now like to make a barplot of the mean value of disp for each of the three different values or groups in cyl (4,6 and 8). And add the errorbars.

You could scroll through all the data, sort them by cyl, manually count the number of observations in each group, add the disp, divide etc etc etc.

But there is a simpler way:

mtcars %>% 
  remove_rownames() %>% 
  select(cyl, disp) %>% 
  group_by(cyl) %>% 
  summarise(mean=mean(disp), sd=sd(disp))
## # A tibble: 3 x 3
##     cyl  mean    sd
##   <dbl> <dbl> <dbl>
## 1     4  105.  26.9
## 2     6  183.  41.6
## 3     8  353.  67.8

mtcars is a build-in dataset (cars in the us in 1974). I send that, using the pipe-operator, to the function remove_rownames, that does exactly that. We don’t need them, and they will just confuse us. That result is then send to the function select, that selects the two columns/variables cyl and disp, and discards the rest. Next, we group the data according to the value of cyl. There are three different values, 4, 6 and 8. And then we use the summarise function, to calculate the mean and the standard deviation of disp, for each of the three groups.

Now we should be ready to plot. We just send the result above to the plot function from before:

mtcars %>% 
  remove_rownames() %>% 
  select(cyl, disp) %>% 
  group_by(cyl) %>% 
  summarise(mean=mean(disp), sd=sd(disp)) %>% 
  ggplot() +
    geom_bar( aes(x=cyl, y=mean), stat="identity", fill="skyblue") +
    geom_errorbar(aes(x=cyl, ymin=mean-sd, ymax = mean+sd), width=0.1)

plot of chunk unnamed-chunk-6
All we need to remember is to change “name” in the original to “cyl”. All done!

But wait! There is more!!

Those errorbars can be shown in more than one way.

Let us start by saving our means and sds in a dataframe:

data <- mtcars %>% 
  remove_rownames() %>% 
  select(cyl, disp) %>% 
  group_by(cyl) %>% 
  summarise(mean=mean(disp), sd=sd(disp))

geom_crossbar results in this:

data %>% 
ggplot() +
  geom_bar( aes(x=cyl, y=mean), stat="identity", fill="skyblue") +
  geom_crossbar( aes(x=cyl, y=mean, ymin=mean-sd, ymax=mean+sd))

plot of chunk unnamed-chunk-8

I think it is ugly. But whatever floats your boat.

Then there is just a vertival bar, geom_linerange. I think it makes it a bit more difficult to compare the errorbars. On the other hand, it results in a plot that is a bit more clean:

data %>% ggplot() +
  geom_bar( aes(x=cyl, y=mean), stat="identity", fill="skyblue") +
  geom_linerange( aes(x=cyl, ymin=mean-sd, ymax=mean+sd))

plot of chunk unnamed-chunk-9

And here is geom_pointrange. The mean is shown as a point. This probably works best without the bars.

data %>% ggplot() +
  geom_bar( aes(x=cyl, y=mean), stat="identity", fill="skyblue", alpha=0.5) +
  geom_pointrange( aes(x=cyl, y=mean, ymin=mean-sd, ymax=mean+sd))

plot of chunk unnamed-chunk-10

Corresponding value to a max-value

One of our users need to find the max-value of a variable. He also needs to find the corresponding value in another variable.
As in – the maximum value in column A is in row 42. What is the value in column B, row 42.

And of course we need to do it for several groups.

Let us begin by making a dataset. Four groups in id,

library(tidyverse)
id <- 1:3
val <- c(10,20)
kor <- c("a", "b", "c")


example <- expand.grid(id,val) %>% 
  as_tibble() %>% 
  arrange(Var1) %>% 
  cbind(kor, stringsAsFactors=F) %>% 
  rename(group=Var1, value=Var2, corr = kor)

example
##   group value corr
## 1     1    10    a
## 2     1    20    b
## 3     2    10    c
## 4     2    20    a
## 5     3    10    b
## 6     3    20    c

We have six observations, divided into three groups. They all have a value, and a letter in “corr” that is the corresponding value we are interested in.

So. In group 1 we should find the maximum value 20, and the corresponding value “b”.
In group 2 the max value is stil 20, but the corresponding value we are looking for is “a”.
And in group 3 the max value is yet again 20, but the corresponding value is now “c”.

How to do that?

example %>%
  group_by(group) %>% 
  mutate(max=max(value)) %>% 
  mutate(max_corr=corr[(value==max)]) %>% 
  ungroup()
## # A tibble: 6 x 5
##   group value corr    max max_corr
##   <int> <dbl> <chr> <dbl> <chr>   
## 1     1   10. a       20. b       
## 2     1   20. b       20. b       
## 3     2   10. c       20. a       
## 4     2   20. a       20. a       
## 5     3   10. b       20. c       
## 6     3   20. c       20. c

The maximum value for all groups is 20. And the corresponding value to that in the groups is b, a and c respectively.

Isn’t there an easier solution using summarise function? Probably. But our user needs to do this for a lot of variables. And their names have nothing in common.

Openstreetmap data – for Florence

Not that advanced, but I wanted to play around a bit with plotting the raw data from Openstreetmap.

We’re going to Florence this fall. It’s been five years since we last visited the fair city, that has played such an important role in western history.

Openstreetmaps is, as the name implies, open.

I’m going to need some libraries

#library(OpenStreetMap)
library(osmar)
library(ggplot2)
library(broom)
library(geosphere)
library(dplyr)

osmar provides functions to interact with Openstreetmap. ggplot2 is used for the plots, broom for making some objects tidy and dplyr for manipulating data.

Getting the raw data, requires me to define a boundary box, encompassing the part of Florence I would like to work with. Looking at https://www.openstreetmap.org/export#map=13/43.7715/11.2717, I choose these coordinates:

top <- 43.7770
bottom <- 43.7642
left <- 11.2443
right <- 11.2661

After that, I can define the bounding box, tell the osmar functions at what URL we can find the relevant API (this is just the default). And then I can retrieve the data via get_osm(). I immediately save it to disc. This takes some time to download, and there is no reason to do that more than once.

box <- corner_bbox(left, bottom, right, top)
src <- osmsource_api(url = "https://api.openstreetmap.org/api/0.6/")
florence <- get_osm(box, source=src)
saveRDS(florence, "florence.rda")

Lets begin by making a quick plot:

plot(florence, xlim=c(left,right),ylim=c(bottom,top) )

plot of chunk unnamed-chunk-5

Note that what we get a plot of, is, among other things, of all lines that are partly in the box. If a line extends beyond the box, we get it as well.

Looking at the data:

summary(florence$ways)
## osmar$ways object
## 6707 ways, 9689 tags, 59052 refs 
## 
## ..$attrs data.frame: 
##     id, visible, timestamp, version, changeset, user, uid 
## ..$tags data.frame: 
##     id, k, v 
## ..$refs data.frame: 
##     id, ref 
##  
## Key-Value contingency table:
##         Key         Value Freq
## 1  building           yes 4157
## 2    oneway           yes  456
## 3   highway    pedestrian  335
## 4   highway   residential  317
## 5   bicycle           yes  316
## 6       psv           yes  122
## 7   highway  unclassified  108
## 8   highway       footway  101
## 9   barrier          wall   98
## 10  surface paving_stones   87

I would like to plot the roads and buildings. For some reason there are a lot of highways, of a kind I would probably not call highways.

Anyway, lets make a list of tags. tags() finds the elements that have a key in the tag_list, way finds the lines that are represented by these elements, and find, finds the ID of the objects in “florence” matching this.
find_down() finds all the elements related to these id’s. And finally we take the subset of the large florence data-set, which have id’s matching the id’s we have in from before.

tag_list <- c("highway", "bicycle", "oneway", "building")
dat <- find(florence, way(tags(k %in% tag_list)))
dat <- find_down(florence, way(dat))
dat <- subset(florence, ids = dat)

Now, in a couple of lines, I’m gonna tidy the data. That removes the information of the type of line. As I would like to be able to color highways differently from buildings, I need to keep the information.
Saving the key-part of the tags, and the id:

types <- data.frame(dat$ways$tags$k, dat$ways$tags$id)
names(types) <- c("type", "id")

This gives me all the key-parts of all the tags. And I’m only interested in a subset of them:

types <- types %>% 
  filter(type %in% tag_list)

types$id <- as.character(types$id)

Next as_sp() converts the osmar object to a spatial object (just taking the lines):

dat <- as_sp(dat, "lines")

tidy (from the library broom), converts it to a tidy tibble

dat <- tidy(dat)

That tibble is missing the types – those are added.

new_df <- left_join(dat, types, by="id")

And now we can plot:

new_df %>% 
  ggplot(aes(x=long, y=lat, group=group)) +
  geom_path(aes(color=type)) +
  scale_color_brewer() +
    xlim(left,right) +
  ylim(bottom,top) +
  theme_void() +
theme(legend.position="none")

plot of chunk unnamed-chunk-13

Nice.

Whats next? Someting like what is on this page: https://github.com/ropensci/osmplotr

Get the font color of a cell in Excel

People do weird and wonderful things in Excel.

Other people then have to pull out the data from those spreadsheets.

“Other people”  tend to spend a lot of time crying into their coffee.

At the moment, I am trying to pull out data of a spreadsheet, where “something” can have a value of 1, 2 or 3. That is of course marked by an “x” in a cell. I need to convert that x to a number.

That is rather simple. What is not so simple, is that there can be two x’es. One, in black, to denote the current state of affairs. And a second x, in red, to denote what a future, state is wanted to be.

So – I need a way to get the color of an x. VBA can do that:

Function GetColour(ByVal Target As Range) As Single
Application.Volatile
GetColour = Target.Font.Color
End Function

And if I need a logical test:

Function IsBlack(ByVal Target As Range) As Boolean
Application.Volatile
If Target.Font.Color = 0 Then
IsBlack = True
Else
IsBlack = False
End If
End Function

 

Crimes against graphs

Crime is a bad thing. No doubt about it. And one of the main topics in todays debate climate is – “those ‘orrible immigrants are very criminal. Look at these numbers, they prove it!”. Usually written with caps-lock engaged.

Well. Maybe they are, and maybe they do. But if you want to use statistics to prove it – pretty please, do not obfuscate the numbers.

This is an example. A blog post from one of the more notable danish newspapers. In the US it would be regarded as communist, in the rest of the world we would think of it as relatively conservative.

https://kulturkamp.blogs.berlingske.dk/2018/08/17/anmeldte-voldtaegter-og-voldsforbrydelser-er-paa-det-hoejeste-nogensinde/

The claim is, that the number of reported rapes and other violent crimes in Denmark, are the highest ever. That is because of the increasing numbers of immigrants in Denmark, especially muslims. Use Google translate if you want the details.

Again, that claim might be true. But the graphs in the post, that supposedly documents the claim, are misleading. To say the least.

First – the numbers come from the Danish Statistical Bureau. They have a disclaimer, telling us that changes to the danish penal code, means that a number of sexual offenses have been reclassified as violent crimes since 2013. If the number of violent crimes suddenly includes crimes that did not use to be classified as violent crimes, that number will increase. Not much of a surprise. Yes, the post asks why the numbers are still increasing after that reclassification. One should expect them to level off. And again the post may have a valid point. I don’t know. But what I do know, is that the graphs are misleading.

Heres why. The y-axis has been cut of. Lets recreate the graphs, and take a look.

There are two graphs. The first shows the number of reported cases of rape from 1995 until today.

The second shows the total number of reported cases of violent crimes in the same period. Both sets of data comes from http://www.statistikbanken.dk/.

We’re going to need some libraries:

library(ggplot2)
library(gridExtra)

Lets begin by pulling the data.

There might be better ways, but I’ve simply downloaded the data. Two files:

violence <- read.csv("tab1.csv", sep=";", skip=3, header=F)
rape <- read.csv("tab2.csv", sep=";", skip=3, header=F)
violence <- violence[1:(nrow(violence)-7),]
rape <- rape[1:(nrow(rape)-7),]

The last seven lines are the notes about changes in which cases are counted in this statistics. I think that is a pretty important point, but they are difficult to plot.

The graph for rape, as presented in the post, and with a more sensible y-axis:

post <- ggplot(rape, aes(x=V1, y=V2)) +
  geom_line(group=1) +
  scale_x_discrete(breaks = rape$V1[seq(1, length(rape$V1), by = 20)]) +
  theme_classic()

nice <- post + ylim(0,max(rape$V2))
grid.arrange(post, nice, ncol=2)

plot of chunk unnamed-chunk-4

And the one for violent crimes in general, again with the original on the left, and the better on the right:

post <- ggplot(violence, aes(x=V1, y=V2)) +
  geom_line(group=1) +
  scale_x_discrete(breaks = violence$V1[seq(1, length(violence$V1), by = 20)]) +
  theme_classic()

nice <- post + ylim(0,max(violence$V2))
grid.arrange(post, nice, ncol=2)

plot of chunk unnamed-chunk-5

So, still, some pretty scary increases. And the change in what is counted should give an increase. But that increase should level off, which it does not. Clearly something is not as it should be. But lets be honest, the graphs on the right are not quite as scary as the ones on the left.

Also – that change in what is counted as sexual assaults – it can explain the initial increase, but then it should level off. That is a fair point. However, there were other things that changed in the period. #metoo for example. I think it would be reasonable to expect that a lot of cases that used to be brushed of as not very important, are now finally being reported. The numbers might actually have leveled off without #metoo.

Anyway, my point is, that if you want to use graphs to support your claims, do NOT cut off the y-axis to make them look more convincing.

Migrants visualized

First of all, this is in no way a statement on the immigration crisis in Europe. I do have opinions. But it is more a reaction or reflection on three maps I saw on this page.

Danish televison channel TV2 is illustrating the number of refugees or perhaps rather immigrants received in EU-memberstates in the period 2015 to 2017. This is the map showing the number of immigrants to EU in 2015

Note Germany. Germany welcomed the absolutely highest number of immigrants. What piqued my interest though, is that this might be a good illustration of the numbers, it is not really the relevant comparisons. Yes, Germany welcomed more refugees than Denmark did. But Germany is a rather larger country than Denmark. For a given value of “fair”, it is only fair that Germany takes more refugees than smaller countries.

A more relevant comparison might be the number of refugees compared to population. Or area. Sweden saw (at that time) no problems with welcoming a huge number of migrants, because, as they said, there are a lot of un-populated space in Sweden, plenty of room for everyone! Or perhaps GDP is a better way. Richer countries should shoulder a larger part of the challenge than poorer countries.

I’m not concerned here with what is fair. What concerns me is that the graphic is misleading. Lets make an attempt at fixing that. Or at least present a slightly different perspective on the data.

I’ll try to illustrate the number of migrants as a proportion of population in the different countries. The data is “stolen” directly from the news-channel. They have it from UNHCR, Eurostat and the European Parlament.

The first step will be to get the data.

url <- "http://nyheder.tv2.dk/udland/2018-06-28-se-kortet-saa-mange-asylansoegere-har-de-forskellige-eu-lande-taget"
data <- readLines(url)

By inspection, I can see that the relevant data is in these three lines:

dat.2015 <- data[460]
dat.2016 <- data[409]
dat.2017 <- data[358]

There is a small problem. Strange danish characters are encoded. Lets fix that:

library(stringr)
data <- str_replace_all(data,"\\\\u00d8", "Ø")
data <- str_replace_all(data,"\\\\u00e6", "æ")
data <- str_replace_all(data,"\\\\u00f8", "ø")

And load it into the separate variables again:

dat.2015 <- data[460]
dat.2016 <- data[409]
dat.2017 <- data[358]

Next, I need to get the names of the countries, and the number of migrants received in each country.

The data I’m after looks like this:

\“Bulgarien\”:{\“valueheat\”:20365,\“valuecolored\”:\“none\”,\“description\”:\“\”},

And the regular expression picking that out of the data ought to be:

‘\“(\p{L}+)\”:{\“valueheat\”:(\d+|\“\”),’

For some reason that is not working. I probably should try to figure that out, but I’m on vacation, and would rather drink cold white wine that dig too deep into the weirdness that is regular expressions in R.

Instead I’m going to use this simpler pattern:

pattern <- '(\")(\\w+)(.*?)(\\d+|\\\"\\\")'

And then fix problems later.

Extracting the data:

dat.2015 <- unlist(str_extract_all(dat.2015, pattern))
dat.2016 <- unlist(str_extract_all(dat.2016, pattern))
dat.2017 <- unlist(str_extract_all(dat.2017, pattern))

Inspecting the data, reveals that the interesting parts are lines 23 to 111.

dat.2015 <- dat.2015[23:111]
dat.2016 <- dat.2016[23:111]
dat.2017 <- dat.2017[23:111]

An example of two lines:

dat.2015[11:12]
## [1] "\"Danmark\":{\"valueheat\":20935"              
## [2] "\"valuecolored\":\"none\",\"description\":\"\""

First, lets get rid of the second line. There are one of those for each country.

Secondly, I’ll remove the first to characters in the first line. \“ to be precise.

And thirdly, I’ll split that line on \”:{\“valueheat\”:

Using the nice pipes makes that easy:

library(purrr)
dat.2015 <- dat.2015 %>%
  discard(str_detect, "valuecolored") %>%
  substring(2) %>%
  strsplit(split="\":{\"valueheat\":",fixed=TRUE)

That should give me a list where each element is a vector with two elements:

dat.2015[[8]]
## [1] "Finland" "32345"

Neat! Lets do that with the other years:

dat.2016 <- dat.2016 %>%
  discard(str_detect, "valuecolored") %>%
  substring(2) %>%
  strsplit(split="\":{\"valueheat\":",fixed=TRUE)

dat.2017 <- dat.2017 %>%
  discard(str_detect, "valuecolored") %>%
  substring(2) %>%
  strsplit(split="\":{\"valueheat\":",fixed=TRUE)

Now, lets get these data into some dataframes. First I’m unlisting the data, then I pour it into a matrix to get the right shape. And then I’m converting the matrices to dataframes:

dat.2017 <- data.frame(matrix(unlist(dat.2017), ncol=2, byrow=T), stringsAsFactors = F)
dat.2016 <- data.frame(matrix(unlist(dat.2016), ncol=2, byrow=T), stringsAsFactors = F)
dat.2015 <- data.frame(matrix(unlist(dat.2015), ncol=2, byrow=T), stringsAsFactors = F)

There is a slight problem. Albania was the first country in the list. And the structure of the raw data was a bit different.

dat.2015[1,1]
## [1] "regions\":{\"Albanien"

Lets fix that:

dat.2015[1,1] <- "Albanien"
dat.2016[1,1] <- "Albanien"
dat.2017[1,1] <- "Albanien"

And let me just change the column names:

colnames(dat.2015) <- c("Land", "2015")
colnames(dat.2016) <- c("Land", "2016")
colnames(dat.2017) <- c("Land", "2017")

“Land” is danish for “country”.

I’m going to need just one dataframe. I get that by joining the three dataframes:

library(dplyr)
total <- left_join(dat.2015, dat.2016, by="Land")
total <- left_join(total, dat.2017, by="Land")

The numbers are saved as characters. Converting them to numeric:

total$`2015` <- as.numeric(total$`2015`)
## Warning: NAs introduced by coercion
total$`2016` <- as.numeric(total$`2016`)
## Warning: NAs introduced by coercion
total$`2017` <- as.numeric(total$`2017`)
## Warning: NAs introduced by coercion

That introduced some NAs. Countries where there are no data.

Inspecting the data, I can see that there are data for all three years for some countries. For other countries, there are no data at all. The function complete.cases() will return true for a row without NAs.

Using that to get rid of countries where we don’t have complete data:

total <- total[complete.cases(total),]

Next is getting some figures for the populations.

The relevant page on Wikipedia is:

url <- 'https://da.wikipedia.org/wiki/Verdens_landes_befolkningsst%C3%B8rrelser'

Getting that:

library(XML)
library(httr)
r <- GET(url)
doc <- readHTMLTable(
  doc=content(r, "text"))
tabellen <- doc$'NULL'

colnames(tabellen) <-  apply(tabellen[1,],2,as.character)

I’m only interested in the country name, and the population:

tabellen <- tabellen %>%
  select(`Land (eller territorium)`, Population)

Renaming the colums:

colnames(tabellen) <- c("Land", "Population")
tabellen$Population <- as.character(tabellen$Population)
tabellen$Population <- as.numeric(str_remove_all(tabellen$Population, fixed(".")))
## Warning: NAs introduced by coercion

And while I’m at it, the second line gets rid of the factors, and the third removes the thousand separators (“.”)

Now I can join the dataframe containing population figures, with the dataframe containing countries and number of migrants:

total <- left_join(total, tabellen, by="Land")
## Warning: Column `Land` joining character vector and factor, coercing into
## character vector

There are three smaller problems. Cyprys, France and Ireland. The problem is that the country name I get from Wikipedia contains a note. I might be able to get rid of that by code. I’m going to do it manually.

total[10,5] <- 4635400
total[7,5] <- 67286000
total[3,5] <- 847000

Now I have a nice dataframe with the name of the countries (in danish), the numbe of migrants received in 2015, 2016 and 2017, and the population in 2018.

Now it is time to look at some maps.

library(ggplot2)
library(rworldmap)

I am going to match the countries in my dataframe, with the countries I get from the map data. That requires that I have the english names for the countries in my dataframe.

This is the translation:

enland <- c("Belgium", "Bulgaria", "Cyprus", "Denmark", "Estonia", "Finland", "France", "Greece", "Netherlands", "Ireland",
          "Italy", "Croatia", "Latvia", "Lithuania", "Luxembourg", "Malta", "Poland", "Portugal", "Romania", "Slovakia",
          "Slovenia", "Spain", "United Kingdom", "Sweden", "Czech Rep.", "Germany", "Hungary", "Austria"  )

Getting that into the data frame:

total$enland <- enland

Next, lets get the map:

worldmap <- getMap()

That retrieves data for the entire world. I’m only interested in EU:

EU <- worldmap[which(worldmap$NAME %in% enland),]
EU <- map_data(EU)

The first line extracts the part of the world map that has names in the list of countries that I have data for.
map_data() converts that into a nice structure that is suitable for entering into ggplot.

Next step is calculating the number of migrants received in each country as a proportion of that countrys population:

total <- total %>%
  mutate(`2015` = `2015`/Population*100, `2016` = `2016`/Population*100, `2017`=`2017`/Population*100)

I’m mutating the columns 2015-2017 by dividing by population. And multiplying by 100 to get percentages.

The almost final step, is to join my migrant-proportions with the map data:

total <- left_join(total,EU, by=c("enland"="region") )

The map data does not call the countries for countries. Rather their names are saved in the variable “region”.

And now the final step. I’m going to need the data on tidy form. So I’m loading tidyr.

Then I pass the data frame to select(), where I pick out the variables I need. Long(itude), lat(itude), 2015, 2016 and 2017, and the name of the country.
That is passed to gather(), where I make a new row for each year, with the proportions in the new variabels year and prop.

All that is passed to ggplot, and a layer where the polygons describing the countries are plotted. They are with a colour matching the proportions. And grouped by “group”. This is important. Grouping by country name gives weird results. I’ll get back to that. color=“white” plots the lines in the polygons in white.

Finally, I facet the data on year.

library(tidyr)
total %>%
  select(long,lat,`2015`,`2016`,`2017`, group) %>%
  gather(year, prop,  `2015`:`2017`)%>%
  ggplot() +
    geom_polygon(aes(long, lat, fill = prop, group = group), color = "white") +
    theme_void() +
    facet_wrap(~year, ncol=2)

plot of chunk unnamed-chunk-32

Thats it!
And now the picture is slightly different. What is interesting is that Germany still takes a higher proportion of the migrants than other countries. But in 2015, they didn’t. That was the year when the german chancellor Angela Merkel said the famous words “Wir schaffen das”, We’ll manage. But also the year when Hungary and sweden welcomed migrants in numbers equalling 1.79% and 1.65% of their population respectively. You can compare that with the fact that Germany the same year received migrants equalling 0.58% of their population.

A cynic might claim that it is no surprise that Sweden and Hungary closed their borders late in 2015.

Any way, that is a different subject. I just think that these three maps are slightly more informative than what TV2 provided.

Also, I promised to get back to the group thingy.

Making the same plot, but grouping on country names:

total %>%
  select(long,lat,`2015`,`2016`,`2017`, group, enland) %>%
  gather(year, prop,  `2015`:`2017`)%>%
  ggplot() +
    geom_polygon(aes(long, lat, fill = prop, group = enland), color = "white") +
    theme_void()

plot of chunk unnamed-chunk-33

What happens is that the polygons describing Italy are grouped in a way that connects the parts describing sicily to the northern part of Italy. That looks weird. The same happens with Sardinia.

Finally. I have not been very consistent in my use of words. I have used “received” and “welcomed” interchangeably. Hungary and Denmark has not been very welcoming. But we are talking about real humans here, and welcoming simply sounds nicer than received. Complicating the situation was the fact that a lot of the arrivals were not actually what we would normally call refugees. At least not refugees from war. So I have also not been consistent in the use of “migrant” vs “refugee”. That is not really my point. The point is that we should always think about how these kinds of numbers are presented.

Udgivet i R

Briefly unavailable for scheduled maintenance.

Oh the horror.

A couple of days ago something happened while updating one of my sites. “Briefly unavailable for scheduled maintenance.” was all that the page told me.

How to fix that? Noting that the page in question is made with WordPress?

Log on to the server serving your page. Delete the file “.maintenance”. Log out. Refresh.

If that doesn’t help, you have bigger problems.

Why is it happening? When you update something on a wordpress site, the site is placed in maintenance mode. It’s a bad idea to have people entering comments into the database at the same time you’re doing maintenance on it.

The site is placed in maintenance mode – every request for the site is given the messange “Briefly unavailable for scheduled maintenance”. And that is controlled by the file “.maintenance”. Normally that file will be deleted when the update or whatever is over. If something goes wrong, it survives. Everything went fine, but the file is still there, and your site is unavailable.