Crime is a bad thing. No doubt about it. And one of the main topics in todays debate climate is – “those ‘orrible immigrants are very criminal. Look at these numbers, they prove it!”. Usually written with caps-lock engaged.
Well. Maybe they are, and maybe they do. But if you want to use statistics to prove it – pretty please, do not obfuscate the numbers.
This is an example. A blog post from one of the more notable danish newspapers. In the US it would be regarded as communist, in the rest of the world we would think of it as relatively conservative.
The claim is, that the number of reported rapes and other violent crimes in Denmark, are the highest ever. That is because of the increasing numbers of immigrants in Denmark, especially muslims. Use Google translate if you want the details.
Again, that claim might be true. But the graphs in the post, that supposedly documents the claim, are misleading. To say the least.
First – the numbers come from the Danish Statistical Bureau. They have a disclaimer, telling us that changes to the danish penal code, means that a number of sexual offenses have been reclassified as violent crimes since 2013. If the number of violent crimes suddenly includes crimes that did not use to be classified as violent crimes, that number will increase. Not much of a surprise. Yes, the post asks why the numbers are still increasing after that reclassification. One should expect them to level off. And again the post may have a valid point. I don’t know. But what I do know, is that the graphs are misleading.
Heres why. The y-axis has been cut of. Lets recreate the graphs, and take a look.
There are two graphs. The first shows the number of reported cases of rape from 1995 until today.
The second shows the total number of reported cases of violent crimes in the same period. Both sets of data comes from http://www.statistikbanken.dk/.
We’re going to need some libraries:
library(ggplot2)
library(gridExtra)
Lets begin by pulling the data.
There might be better ways, but I’ve simply downloaded the data. Two files:
violence <- read.csv("tab1.csv", sep=";", skip=3, header=F)
rape <- read.csv("tab2.csv", sep=";", skip=3, header=F)
violence <- violence[1:(nrow(violence)-7),]
rape <- rape[1:(nrow(rape)-7),]
The last seven lines are the notes about changes in which cases are counted in this statistics. I think that is a pretty important point, but they are difficult to plot.
The graph for rape, as presented in the post, and with a more sensible y-axis:
post <- ggplot(rape, aes(x=V1, y=V2)) +
geom_line(group=1) +
scale_x_discrete(breaks = rape$V1[seq(1, length(rape$V1), by = 20)]) +
theme_classic()
nice <- post + ylim(0,max(rape$V2))
grid.arrange(post, nice, ncol=2)
And the one for violent crimes in general, again with the original on the left, and the better on the right:
post <- ggplot(violence, aes(x=V1, y=V2)) +
geom_line(group=1) +
scale_x_discrete(breaks = violence$V1[seq(1, length(violence$V1), by = 20)]) +
theme_classic()
nice <- post + ylim(0,max(violence$V2))
grid.arrange(post, nice, ncol=2)
So, still, some pretty scary increases. And the change in what is counted should give an increase. But that increase should level off, which it does not. Clearly something is not as it should be. But lets be honest, the graphs on the right are not quite as scary as the ones on the left.
Also – that change in what is counted as sexual assaults – it can explain the initial increase, but then it should level off. That is a fair point. However, there were other things that changed in the period. #metoo for example. I think it would be reasonable to expect that a lot of cases that used to be brushed of as not very important, are now finally being reported. The numbers might actually have leveled off without #metoo.
Anyway, my point is, that if you want to use graphs to support your claims, do NOT cut off the y-axis to make them look more convincing.