Euler 48

This problem defines a sum.
11 + 22 + 33 + 44 + 55 + 66 + 77 + 88 + 99 + 1010

Lets call that S10. We are told that S10 is equal to 10405071317.

Our task is now to find the last ten digits fo S1000.

This is relatively simple – I have solved similar problems before. The challenge is that

999^999
## [1] Inf

has an awfull lot of digits. The number is so large that R tells me it is infinite.

We are only interested in the last ten digits however, and R handles ten digits without problems.

Instead of using the built-in power function, I’ll make my own.

selfPower <- function(x){
  res <- 1
  for(i in 1:x){
    res <- (res*x)%%10000000000000
  }
  res
}

The function takes x as input. Multiply the result (initialized to 1) with x, and do that x times. After each multiplication, it keeps the last 13 digits as the result. Whatever happens with the rest of the digits is uninteresting.

I’ll run that function on all numbers from 1 to 1000. And then I’ll have to add it all together.
Adding stuff up, I’ll run into the same problem. So instead of using the built-in add-function, I’m writing my own.

adding <- function(x,y){
  (x+y)%%100000000000
}

It takes two numbers. Add them, and return the last 11 digits.

Putting that together, and doing a bit of functional programming using purrr:

library(purrr)
answer <- 1:1000 %>%
  (function(x) sapply(x, selfPower)) %>%
  reduce(adding) %>%
  (function(x) x%%10000000000)

I’m passing the numbers 1:1000 to an anonymous function, that sapply’s selfPower on the x’es. Then I reduce the results using adding. And finally that result is passed to a second anonymous function, that returns the last ten digits.

Lessons learned:
1. Anonymous functions should be inside parantheses.
2. I run a version of the magrittr library, that has an interesting error in the error message when you learn lesson 1: “Error: Anonymous functions myst be parenthesized”

Project Euler 26

Reciprocal cycles.

Yeah.
1/6 = 0.1(6)

And
1/7 = 0.(142857)

Where (6) indicates that 6 is repeated.

Find the number d where 1/d has the longest recurring cycle.

Ok. So how to do this. One challenge is to calculate the decimal fraction to arbitrary precision. The recurring cycle might turn up after 7 digits. Or after 1000.
The other challenge is to figure out when the recurring cycle is actually recurring.

If I had to do this with paper and pen, I would take 1/7. The result is 0, and a remainder of 1. My first approximation of 1/7 would be 0. Then I would multiply the remainder by 10, and divide that result with 7:
10/7. The result is 1, with a remainder of 3. The approximation is now 0.1. The remainder is multiplied and divided: 30/7 = 4 and a remainder of 2. The approximation is now 1/7 = 0.14. Lets put that in a table:

Step Division made Result Remainder Approximation


 1      1/7               0            1             0
 2      10/7              1            3             0.1
 3      30/7              4            2             0.14
 4      20/7              2            6             0.142
 5      60/7              8            4             0.1428
 6      40/7              5            5             0.14285
 7      50/7              7            1             0.142857

And the I can stop. Because the next digit in the approximation is determined by the remainder of the previous division. And now I have a remainder that I have seen before. The first step where I have a remainder of 1 is step one. The next is step 7, and the cycle is 7-1 long.

I now have the structure of how to solve this.
Take a number d. Calculate 1 %% d. Save that result, lets call it X~1~, somewhere. Calculate X~2~ = X~1~*10 %% d, and save X~2~. Continue calculating X~n+1~ = 10 X~n~ %% d, until X~n+1~ has been seen before.
Do that for all positive integers d < 1000.

Lets write a function for that:

recurr <- function(d){
  res <- numeric()
  new_remainder <- 0.1
  while(length(res)==length(unique(res))){
    new_remainder <- (new_remainder*10) %% d
    res <- c(res, new_remainder)
  }
  which(res==res[length(res)])[2] - which(res==res[length(res)])[1]
}

The function takes d as input.
An empty vector, res, is defined, and a variable new_remainder is set to 0.1.
I’m using the res vector to store the consecutive list of remainders from the divisions. The moment that there are duplicate values in that vector, I have added a remainder to it that was already there. That indicates that a recurring cycle has been reached.
The next remainder that should be entered to the result vector, is found by multiplying the previous remainder by 10 and do the integer division by d. By setting the new_remainder variable to 0.1 from the beginning, I’m making sure that the first division is 1 %% d.
The new remainder is added to the result vector at the end. This is not an efficient method of growing that vector. I’ll think I’ll live with it.
The while-loop stops when a remainder has been added that was already in the result vector. That remainder will be in the last position of the vector. The length of the recurring cycle is the distance between the to instances of the last remainder added. That is length of the cycle.

What now remains is to run this function on all numbers d < 1000, and get the d, that has maximum recurr(d).

I pipe the vector 1:999 to map(), which simply applies a function on all elements of the vector.
The result is a list, so that is piped to unlist.
The order of that vector is unchanged. So the value in position 42, will be the length of the recurring cycle in 1/42. which.max() returns the position of the largest element in a vector, which is just what I need.

library(purrr)
answer <- 1:999 %>%
  map(recurr) %>%
  unlist %>%
  which.max()

Lessons learned

  1. Probably most of all that I should not shy away from a problem just because it seems a bit difficult. Not that this a difficult problem, it’s only a 5 percenter. But I passed it over more than once, because I could not get min brain around how to get those fractions.
  2. There might be a better way to see if something has occured before in a sequence of numbers. But this is working, and I think there is at least one other Euler problem that can be solved using similar logic.

Euler 100

Project Euler – problem 100

Back to the hopeless examples of probabilities from school.
In a bag there are 15 black balls and six white ones. Project Euler talks about discs, math-teachers has always used balls as examples, and they where always white and black. So I’ll stick with that.

It you draw two balls from the bag, there is a 50/50 chance of drawing 2 black balls:

(15/21)*(14/20)
## [1] 0.5

I’m told that the next set of balls in the bag with that property, is 85 black balls and 35 white ones:
(85/120)

(85/120)*(84/119)
## [1] 0.5

Find the mix of black and white balls, that gives a probability of 50/50 of drawing 2 black balls, given that there should be more than 10¹² = 1000000000000 balls in the rather large bag.

That should be straight-forward.
Lets call the number of black balls b and the number of white balls w. And lets define the total number of balls in the back as n=w+b
The probability of drawing two black balls is:

(b/n)((b-1)/(n-1)) = ½

n = w + b > 10¹²

Two equations with two unknowns.
The probability can be rearranged:

(b/n)((b-1)/(n-1)) = ½ <=>

b(b-1) / n(n-1) = ½ <=>

(b² – b) / (n² – n) = ½ <=>

b² – b = ½(n² – n) <=>

2b² – 2b = n² – n <=>

2b² – 2b – n² + n = 0

Hm. Maybe it is not that simple after all. First of all I don’t know if n is 100000000000 or 100000000001. That actually makes a pretty big difference:

1000000000001**2 - 1000000000000**2
## [1] 1.999978e+12

Second of all, I need to find integer solutions. An analytical solution might not give integer results. And I can’t have one third of a ball in the bag.

Googling “finding integer solutions to equations” give, as the first result, a link to the wikipedia article on “Diophantine equations”.
Which apparently are equations that should have integer solutions.

All right, a couple of the problems I’ve tackled earlier, and quite a lot of Project Euler problems I’ve given up on appears to be about solving these Diophantine equations.

So. Nice. The last link of the wikipedia page is to https://www.alpertron.com.ar/QUAD.HTM.
I should probably read up on the methods. But that will have to wait.

The point is, that this Diophantine equation can be solved by:

b~n+1~ = 3b~n~ + 2n~n~ -2

n~n+1~ = 4b~n~ + 3n~n~ -3

The idea is that we have a solution (b~n~, n~n~). And these two equations allows us to calculate the next solution, (b~n+1~, n~n+1~)

Lets try that, we was given that (15,21) was a solution. The next should be (85,120). Do we get that?

b <- 15
n <- 21
b_n <- 3*b + 2*n -2
n_n <- 4*b + 3*n -3
print(paste(b_n, n_n, sep=","))
## [1] "85,120"

Qap’la, it works. Nice. Now I just need to run through this until n~n+1~ gets above 10¹².

b <- 15
n <- 21
while(n<10**12){
  b_n <- 3*b + 2*n -2
  n_n <- 4*b + 3*n -3
  b <- b_n
  n <- n_n
}
answer <- b

Lessons learned:

  1. Solving Diophantine equations is at the heart of a lot of these problems. I’ve learned a new tools to handle them!
  2. If you want to subscript stuff in RMarkdown, you place a ~ on each side of what you want subscripted.

Other stuff to note: Maybe it is time someone wrote a new solver for Diophantine equations. The one I found is 19 years old. Something to do in Shiny perhaps?

Sudokus – Project Euler 96

This problem is basically about making a sudoku solver, that can be automated. If you don’t know what a sudoku is – google it and come back.

I have solved a LOT of sudokus. Manually. There are several techniques, but some of them are not that easy to code. And of course I’ll need a method that guarantees a solution.

Fortunately a brute force solution exists:

https://en.wikipedia.org/wiki/Sudoku_solving_algorithms

It is called backtracking. The process is to look at all the empty cells.

In the first one – insert the lowest allowed value, and progress to the next empty cell. Insert the lowest allowed value in it. Continue until you reach a cell with no allowed values. When you do that, go back to the previous cell, and insert the next-lowest allowed value. And then go back (forward?) to the next cell. If the cell you went back to also have no allowed values, you go one step futher back. Continue until you reach the last cell in the puzzle.

So. We’ll need a few things.

First of all we’ll need a matrix in which to keep the sudoku.

I’m going with a matrix rather than a dataframe after reading the R Inferno (https://www.burns-stat.com/pages/Tutor/R_inferno.pdf), that suggests that there is no reason to work with a dataframe if a matrix will do.

I’ll need a couple, or three (actually more like five) functions to figure out what values are allowd for a given cell in a given sudoku.

And I’ll need a way to keep track of which cells that are “frozen”, and which cells should be filled out.

Lets begin by making a matrix to keep the puzzle in. Project Euler provides us with a puzzle and the matching solution. And a file with 50 puzzles that needs to be solved.

Lets read in that file.

problems <- read.csv("p096_sudoku.txt",header=FALSE, stringsAsFactors = FALSE)

The structure is rather simple. 500 rows. First one row with an identifier, eg “Grid 01”. And then 9 rows with just the rows of digits, 0 for the empty cells.

We’ll need to pick out first the rows 1 to 10, then rows 11 to 20 etc.

Lets make a dataframe, and read it in:

start <- seq(1,500,by=10)
end <- seq(10,500,by=10)
prob_df <- data.frame(id=character(),problem=character(), stringsAsFactors = FALSE)
for(i in 1:50){
  prob_df[i,2] <- paste(problems[start[i]:end[i],][-1], collapse="")
}
prob_df[1,2]
## [1] "003020600900305001001806400008102900700000008006708200002609500800203009005010300"

Nice and simple, 50 rows with an 81 character long string of digits. This is the first puzzle in the file. It is also the puzzle where we are provided with a solution.

We now need to convert that to a matrix.

Or – we don’t need to. There are certainly techniques that would allow me to handle it as-is. But I think it is more intuitive to get it into a matrix.

So, lets do that.

mat <- matrix(as.numeric(unlist(strsplit(prob_df[1,2],""))),9,9,byrow=TRUE)  
mat
##       [,1] [,2] [,3] [,4] [,5] [,6] [,7] [,8] [,9]
##  [1,]    0    0    3    0    2    0    6    0    0
##  [2,]    9    0    0    3    0    5    0    0    1
##  [3,]    0    0    1    8    0    6    4    0    0
##  [4,]    0    0    8    1    0    2    9    0    0
##  [5,]    7    0    0    0    0    0    0    0    8
##  [6,]    0    0    6    7    0    8    2    0    0
##  [7,]    0    0    2    6    0    9    5    0    0
##  [8,]    8    0    0    2    0    3    0    0    9
##  [9,]    0    0    5    0    1    0    3    0    0

Neat. Split the string in individual characters. Unlist it, and cast it to numeric. Then pass it all to matrix(), noting that we want 9 rows and 9 columns. And that the matrix should be filled by row.

Depending on where you count from, the first empty cell is 1,2. Given the values that are already filled in, what values are allowed in that cell?

First, what values are allowed based on the row? From inspection it is obvious that only the values 1, 4, 5, 6, 7, 8 and 9 are allowed. Lets write a function for that:

rowall <- function(x,mat){
  setdiff(1:9, mat[x,])
}
rowall(1,mat)
## [1] 1 4 5 7 8 9

The function takes a row-number, in this example 1, and a matrix. mat[x,] returns the values in that row. And setdiff returns the values that are in 1:9 but not in the list of values already in the row.

More or less the exact same thing for the column:

# returnerer tilladte værdier i kolonne y i en matrix mat
colall <- function(y,mat){
  setdiff(1:9, mat[,y])
}
colall(2, mat)
## [1] 1 2 3 4 5 6 7 8 9

What about the frame, the 3×3 set of numbers?

I need a subset of the matrix. For our example, cell 1,2, I need the columns 1, 2 and 3. And the rows 1, 2 and 3. Or a vector 1:3. Thankfully there is symmetry, so column 1 gives the same vector as row 1. I need a function that returns 1:3 when I pass it 1, 2 or 3. 4:6 for the values 4, 5 or 6. and 7:9 for the values 7, 8 or 9.

My good colleague Henrik has tried to solve the same problem using Python. And he suggested that the way to do it is by integer division. In this way:

framelookup <- list(a=c(1,2,3),b=c(4,5,6),c=c(7,8,9))

getint <- function(x){
  x <- (x -1)%/%3+1
  unlist(framelookup[x])
}
  • Take the value x.
  • Substract 1 and divide the result by 3.
  • Then add 1. For x = 1,2 or 3, the result is 1, for x = 4, 5 or 6, the result is 2. And for 7 through 9, we get 3.
  • We can then use that to look up the vector we need in a list, and return the unlisted number.

At first I had done it with three if-statements. This solution is a bit more elegant, and Henrik thinks that if-statements should be outlawed. So in the interest of keeping the peace in our office, I’m going with that.

Now I can write a function that returns the values allowed in a given cell in a given matrix, given the values that are already filled out in the frame that cell is in:

fraall <- function(x,y,mat){
  res <- mat[getint(x),getint(y)]
  setdiff(1:9, res)
}

The function takes the coordinates of the cell, and a matrix. Based on that, getint() returns the vectors describing the frame, and saves that subframe in res. Then I use the setdiff() function in the same way I’ve done previously.

That gives me all the constraints. What values are allowed based on row, column and frame. The intersection between these three vectors gives me the allowed values for the cell. It is now simple to write a function that returns the values allowed for a given cell in a given matrix:

allowed <- function(x,y,mat){
  res <- intersect(rowall(x,mat), colall(y,mat))
  res <- intersect(res, fraall(x,y,mat))
  return(res)
}

Actually this is all I need for solving quite a lot of sudokus. Go through all the empty cells. Figure out what values are allowed. If only one value is allowed, plot it into the cell. Repeat until there are no more empty cells where only one value is allowed. A lot of sudokus can be solved with only that method.

The ones that cannot be solved in this way can be solved by the backtracking algorithm. But that is slow, so it might be a good idea to simplify the those puzzles by filling out what can be filled out with this method first.
So let’s write a function that does exactly that.

First I’ll need a list of the empty cells. The which() function can help me.

test <- which(mat ==0, arr.ind=T)
nrow(test)
## [1] 49

Which parts of the matrix is equal to 0? and arr.ind tells that what we want returned (when which is used on an array) is the array indeces. There are 56 0’es in the puzzle.

The matrix test has this structure:

head(test)
##      row col
## [1,]   1   1
## [2,]   3   1
## [3,]   4   1
## [4,]   6   1
## [5,]   7   1
## [6,]   9   1

The first pair of values is 2,1, the next is 4,1 etc. And if I take each one of those, looks at the allowed values for those positions in the matrix, and, if there is only one value allowed, place that value in that position, I get a step closer to the solution.

This does exactly that:

for(i in 1:nrow(test)){
  all <- allowed(test[i,1],test[i,2],mat)
  if(length(all)==1){
    mat[test[i,1],test[i,2]] <- all
  }
}
nrow(which(mat ==0, arr.ind=T))
## [1] 43

And, hey presto, 6 empty cells were filled. The logic is: Find the list of allowed values based on the values in the list of empty cells. If the length is 1, set the current cell to that value.

Now I need to get a new list of empty cells:

test <- which(mat ==0, arr.ind=T)

And I could do it again:

for(i in 1:nrow(test)){
  all <- allowed(test[i,1],test[i,2],mat)
  if(length(all)==1){
    mat[test[i,1],test[i,2]] <- all
  }
}
nrow(which(mat ==0, arr.ind=T))
## [1] 29

14 previously empty cells filled!

Lets make a function that does that again and again, until there is no more improvement.

We’ll start by writing a function that does it one time:

singlevalues <- function(mat){
  test <- which(mat ==0, arr.ind=T)
  for(i in 1:nrow(test)){
    all <- allowed(test[i,1],test[i,2],mat)
  if(length(all)==1){
    mat[test[i,1],test[i,2]] <- all
  }
  }
  return(mat)
}

Then I need to run that several times. When the number of empty cells after running it is the same as the number of empty cells before, stop. No more values can be filled in.

repsinglevalues <- function(mat){
  oldlen <- 1
  newlen <- 0
  while(oldlen > newlen){
  oldlen <- length(which(mat == 0))
  mat <- singlevalues(mat)
  newlen <- length(which(mat == 0))
  if(newlen==0) break()
  }
  return(mat)
}

Take a matrix. Set oldlen to 1, and newlen to 0. As long ans oldlen is larger than newlen, do this:
Set oldlen to the number of 0’es in the matrix. Run the singlevalues function from above. Find out how many empty cells, or zeroes there are now, and set newlen to that. If newlen is equal to 0, stop everything, otherwise repeat. And finally return the changed matrix.

Lets test it:

repsinglevalues(mat)
##       [,1] [,2] [,3] [,4] [,5] [,6] [,7] [,8] [,9]
##  [1,]    4    8    3    9    2    1    6    5    7
##  [2,]    9    6    7    3    4    5    8    2    1
##  [3,]    2    5    1    8    7    6    4    9    3
##  [4,]    5    4    8    1    3    2    9    7    6
##  [5,]    7    2    9    5    6    4    1    3    8
##  [6,]    1    3    6    7    9    8    2    4    5
##  [7,]    3    7    2    6    8    9    5    1    4
##  [8,]    8    1    4    2    5    3    7    6    9
##  [9,]    6    9    5    4    1    7    3    8    2

Yeah! The sudoku is solved!

But not every puzzle can be solved in that way. Now for the brutish way…

First of all, lets get a fresh copy of the original puzzle read in. And a fresh list of empyt cells.

All righty! Nu kan jeg løse en del sudokuer alene ved denne metode. Så er der resten…

mat <- matrix(as.numeric(unlist(strsplit(prob_df[1,2],""))),9,9,byrow=TRUE)  
test <- which(mat ==0, arr.ind=T)
nrow(test)
## [1] 49

Back to the 49 empty cells.

The algorithm is as follows.

  • Set i <- 1
  • Get the row and column indeces from test[i,1] and test[i,2] respectively.
  • Identify the list of allowed values for that cell.
  • Set the value of the cell to the lowest of the allowed values.
  • Increment i <- i + 1.
  • Get the row and column indeces again.
  • Get the list of allowed values for that cell. If there are no allowed values, decrement i <- i -1.
  • If there are allowed values, set the value of the cell to the lowest of the allowed values.
  • When we return to a cell that already has a value, we should not set it to the lowest allowed value. We should set it to the next lowest. That is the lowest of the allowed values, excluding the value that was already there.
  • If we get to a cell with no allowed values (excluding the one that might already be there), the value of the cell should be set to 0, and i should be decrementet.

This function does that.

solve <- function(mat){
  i <- 1
  test <- which(mat ==0, arr.ind=T)
  while(nrow(which(mat==0,arr.ind=T))>0){
    allowedvalues <- allowed(test[i,1],test[i,2],mat)
    currentvalue <- mat[test[i,1],test[i,2]]
    allowedvalues <- allowedvalues[allowedvalues>currentvalue]
    if(length(allowedvalues)==0){
      mat[test[i,1],test[i,2]] <- 0
      i <- i -1
    } else{
      mat[test[i,1],test[i,2]] <- min(allowedvalues)
      i <- i + 1
    }
  }
  mat
  }

The function takes a matrix. It sets the counter/pointer i to 1.

Then a test-matrix is generated, containing all the positions that are empty.

While the number of empty places in the matrix is larger than zero do this:

  • Find the allowed values.
  • Find the current value in the cell.
  • Remove all allowed values that are smaller than the current value. In that way, taking the minimum of the remaining allowed values, will always give us the next lowest value. And if the original value was 0, we will get the lowest.
  • If there are no allowed values, set the value of the cell to zero, and decrement the counter.
  • If there are allowed values, set the value of the cell to the lowest of the remaining allowed values, and increment the counter.
  • End by returning the now filled matrix.

Does it work?

solve(mat)
##       [,1] [,2] [,3] [,4] [,5] [,6] [,7] [,8] [,9]
##  [1,]    4    8    3    9    2    1    6    5    7
##  [2,]    9    6    7    3    4    5    8    2    1
##  [3,]    2    5    1    8    7    6    4    9    3
##  [4,]    5    4    8    1    3    2    9    7    6
##  [5,]    7    2    9    5    6    4    1    3    8
##  [6,]    1    3    6    7    9    8    2    4    5
##  [7,]    3    7    2    6    8    9    5    1    4
##  [8,]    8    1    4    2    5    3    7    6    9
##  [9,]    6    9    5    4    1    7    3    8    2

It does! Qap’la!

I now have two functions that solves sudokus. solve(), that bruteforces it and will always get a solution. And
repsinglevalues() that uses logic, gets a solution sometimes, but not always.

I also have a dataframe with 50 sudokus that I need to solve.

And what was the task again? Take the three first digits in the first row. Concatenate them to a single three digit number. Do that for all 50 puzzles. Add them all up. That should be the answer.

answer <- 0
for(i in 1:50){
  mat <- matrix(as.numeric(unlist(strsplit(prob_df[i,2],""))),9,9,byrow=TRUE)  
  mat <- repsinglevalues(mat)
  mat <- solve(mat)
  answer <<- answer + as.numeric(paste(mat[1,1:3],collapse=""))
}
  • Set a variable answer to zero.
  • Pick out the sudokus one by one.
  • Run the logic-solving function on it.
  • Then run the brute-force function on the result of that.
  • Pick out the three first digits in row 1, collapse them, cast as numeric, and add to the answer-variable.

Done!

Lessons learned

  1. Errors in the logic are a bit difficult to locate. I forgot to take into account that the list of allowed values in the brute-force solution does not include the value that is already in the cell. That made the function run for eternity. Or something pretty close. Had I not made that mistake, I would finished this problem a day earlier.
  2. Technically the lesson has not been learned yet. But below there is a note ot speed. Adding just one extra value to the puzzle reduces the time it takes to bruteforce the solution significantly.
  3. Set-functions, here specifically setdiff(), are neat!

Notes on speed

OK. I needed a function for returning a vector for getting the relevant frame in the matrix. As I mentioned, my colleague Henrik thought it should be done wit integer division rather than if-statements.

This was the way I originally did it:

ifgetint <- function(x){
if(x %in% c(1:3)){
  res <- c(1:3)
} else if(x %in% c(4:6)){
  res <- c(4:6)
  } else {  
      res <- c(7:9)
  }
  return(res)
}

But what way is faster? Lets find out:

library(microbenchmark)
mbm <- microbenchmark(ifgetint, getint, times=1000)
mbm
## Unit: nanoseconds
##      expr min lq   mean median uq  max neval cld
##  ifgetint  46 61 63.678     63 65 1236  1000   a
##    getint  49 63 67.615     65 66 2916  1000   a

Actually my original way of doing it was a little bit faster.

library(ggplot2)
autoplot(mbm)
plot of chunk unnamed-chunk-23

So. I found two methods for solving sudokus. It would be interesting to compare them. I can’t do that on just any sudoku. I need to use one that can be solved with both methods. Luckily the first problem in the set can do just that.

mat <- matrix(as.numeric(unlist(strsplit(prob_df[1,2],""))),9,9,byrow=TRUE)  
solve(mat)
##       [,1] [,2] [,3] [,4] [,5] [,6] [,7] [,8] [,9]
##  [1,]    4    8    3    9    2    1    6    5    7
##  [2,]    9    6    7    3    4    5    8    2    1
##  [3,]    2    5    1    8    7    6    4    9    3
##  [4,]    5    4    8    1    3    2    9    7    6
##  [5,]    7    2    9    5    6    4    1    3    8
##  [6,]    1    3    6    7    9    8    2    4    5
##  [7,]    3    7    2    6    8    9    5    1    4
##  [8,]    8    1    4    2    5    3    7    6    9
##  [9,]    6    9    5    4    1    7    3    8    2
repsinglevalues(mat)
##       [,1] [,2] [,3] [,4] [,5] [,6] [,7] [,8] [,9]
##  [1,]    4    8    3    9    2    1    6    5    7
##  [2,]    9    6    7    3    4    5    8    2    1
##  [3,]    2    5    1    8    7    6    4    9    3
##  [4,]    5    4    8    1    3    2    9    7    6
##  [5,]    7    2    9    5    6    4    1    3    8
##  [6,]    1    3    6    7    9    8    2    4    5
##  [7,]    3    7    2    6    8    9    5    1    4
##  [8,]    8    1    4    2    5    3    7    6    9
##  [9,]    6    9    5    4    1    7    3    8    2
mbm <- microbenchmark(solve(mat),repsinglevalues(mat), times=100)
mbm
## Unit: milliseconds
##                  expr       min        lq      mean   median        uq
##            solve(mat) 47.627037 53.516049 54.225652 54.26330 54.815257
##  repsinglevalues(mat)  6.746542  7.066549  7.702925  7.22182  7.509268
##       max neval cld
##  63.72794   100   b
##  11.50775   100  a

No surprise there. The logical method is much faster. Something like 8 times faster.

autoplot(mbm)
plot of chunk unnamed-chunk-25

How much of a difference does it make when part of the sudoku has been solved already? By inspection (ie, trial and error) I find sudoku number 11. The original puzzle has 53 empty cells. By filling out what can be filled out using logic, that is reduced to 52 empty cells.

How much faster is it to solve sudoku number 11, when an additional cell has been filled out using logic?
mat is the original sudoku. redmat is the sudoku partially solved by logic.

i <- 11
mat <- matrix(as.numeric(unlist(strsplit(prob_df[i,2],""))),9,9,byrow=TRUE)  
redmat <- repsinglevalues(mat)

mbm <- microbenchmark(solve(mat),solve(redmat), times=10)
mbm
## Unit: milliseconds
##           expr      min       lq     mean   median       uq      max neval
##     solve(mat) 754.9624 769.3149 797.2162 780.1627 788.8825 892.5340    10
##  solve(redmat) 534.7156 545.7596 550.4211 548.9691 552.2282 565.9535    10
##  cld
##    b
##   a

Wow! That really makes a difference!! Filling in just a single extra value using logic cuts about one third of the time it takes to solve the sudoku.

autoplot(mbm)
## Coordinate system already present. Adding new coordinate system, which will replace the existing one.

plot of chunk unnamed-chunk-27

melt() – tidy data

Just a short note to help me remember the melt()-function.

Lets create some messy data:

ID <- 1:10
T1 <- runif(10,10,20)
T2 <- runif(10,20,30)
T3 <- runif(10,30,40)
df <- data.frame(ID=ID, T1=T1, T2=T2, T3=T3)

This is heavily inspired by a practical problem a student came to us with. There is 10 different patients, at time = T1, there is a certain value measured on the patient. At time = T2 the same value is measured. And again at time = T3, where T1<T2<T3.

We would now like to plot the development of those values as a function of time (or just T1,T2 and T3).

How to do that?

Using reshape2, and making sure we have the tidyverse packages loaded:

library(tidyverse)
library(reshape2)
clean <- melt(df, id.vars=c("ID"), value.names=c("T1", "T2", "T3"))
head(clean)
##   ID variable    value
## 1  1       T1 18.43940
## 2  2       T1 15.12141
## 3  3       T1 17.46355
## 4  4       T1 18.79450
## 5  5       T1 11.20996
## 6  6       T1 17.82611

Nice, we now have tidy data.

Note that melt() also takes a variable “variable.name”, in case we have a different sort of mess.

Now it is easy to plot:

plot <- ggplot(clean, aes(x=variable, y=value, group=ID)) +
  geom_line()
plot

plot of chunk unnamed-chunk-4

Neat.

Udgivet i R

Euler 2

The second Project Euler problem.

Given the Fibonacci sequence, calculate all terms below 4 million. What is the sum of the even terms?

We are going to use the library “numbers” a lot!
It includes a function, that allows us to generate the Fibonacci-sequence. But how many terms should we get?
By inspection we find that the 34th term is:

library(numbers)
fibonacci(34)
## [1] 5702887

Therefore we only need the 33 first terms:

(f <- fibonacci(33, sequence = TRUE))
##  [1]       1       1       2       3       5       8      13      21
##  [9]      34      55      89     144     233     377     610     987
## [17]    1597    2584    4181    6765   10946   17711   28657   46368
## [25]   75025  121393  196418  317811  514229  832040 1346269 2178309
## [33] 3524578

We only have to sum the even terms:

sum(f[!f%%2])
## [1] 4613732

Lets split that up a bit. %% i modulus. It returns what is “left” in the division.
4%%2 returns 0, because there is no remainder – because 4 is an equal number.
When we do that on all the terms in the sequence, we get:

f%%2
##  [1] 1 1 0 1 1 0 1 1 0 1 1 0 1 1 0 1 1 0 1 1 0 1 1 0 1 1 0 1 1 0 1 1 0

0 for the equal terms, 1 for the unequal terms.
0 is equivalent to FALSE, and 1 to TRUE.
And ! negates the logical value:

!f%%2
##  [1] FALSE FALSE  TRUE FALSE FALSE  TRUE FALSE FALSE  TRUE FALSE FALSE
## [12]  TRUE FALSE FALSE  TRUE FALSE FALSE  TRUE FALSE FALSE  TRUE FALSE
## [23] FALSE  TRUE FALSE FALSE  TRUE FALSE FALSE  TRUE FALSE FALSE  TRUE

Compare that with the original sequence – now the value is FALSE for the unequal numbers, and true for the equal.
We can use that to pick out the equal terms in the sequence:

f[!f%%2]
##  [1]       2       8      34     144     610    2584   10946   46368
##  [9]  196418  832040 3524578

And then it is simple enough to just sum them:

sum(f[!f%%2])
## [1] Censored

Lessons learned:
1. You should always save your Euler solutions, otherwise you will have to reconstruct them when you decide to put them all on your website three years later.
2. Numbers is a really usefull library.

Merging PDF-files

You may need to merge several pdf-files into one.

There are several tools. Some of them you have to pay for.

If you are in a linux environment, try this:

gs -dBATCH -dNOPAUSE -q -sDEVICE=pdfwrite -sOutputFile=final.pdf filea.pdf fileb.pdf

continue with filec.pdf etc.

Some say it is slow.

Others say, yes it is slow, but it produces smaller final files, and are able to handle “difficult pdfs”.

I say it is nice that it can be done from the command line.

Euler 1

The very first Euler problem. And a relatively simple one.

Find all natural numbers below 1000, that are multiples of 3 or 5. What is the sum?

threes <- seq(3,999,by=3)
fives <- seq(5,999,by=5)
both <- unique(c(threes,fives))
sum(both)
## [1] Censored

seq(3, 999, by=3) gives us all numbers below 1000 that are a multiple of 3. Fives contains the numbers that are multiples of 5. Both is the union – unique removes duplicate elements. And sum finally gives the result.

Lessons learned:

  1. Make sure you always save your work. Otherwise you will have to reproduce it.

Euler 3

Problem 3 in Project Euler

What is the largest prime factor in 600851475143

Super simple. The library numbers have a function primeFactors(). It simply returns all prime factors.
Put that inside a max(), and you have the result.

library(numbers)
max(primeFactors(600851475143))
## [1] Censored

Using the library purrr, it can be written like this, and this is a way of coding that I need a bit more practice in, so lets do it that way as well.

library(purrr)
600851475143 %>%
  primeFactors() %>%
  max()
## [1] Censored

The %>% are a pipe function. It takes whats on the left, and passes it to whats on the right. In this case the number is piped to the function primeFactors(), and the result of that function is piped to the max() function. And you get the result.

Lesson learned – nope. I already knew that numbers is a pretty useful library.

Euler 49

There is a sequence 1487, 4817 and 8147 that has three properties:
1. All numbers are primes
2. They are all permutations of each other
3. Each term increases by a certain number (in this case 3330)

Project Euler claims that there are two such sequences with four-digit numbers. Our task is to find the other.
The answer is the three primes with the required properties, pasted together in ascending order.

As usual I’ll load the numbers library. And the first task must be to find all four-digit primes.

library(numbers)
cand <- Primes(1000,9999)

But we will not need these prime. We already know that answer:

cand <- cand[-which(cand %in% c(1487, 4817, 8147))]

If one of these four-digit primes is part of such a sequence, there must exist at least two other primes, which are permutations of it.
Lets begin by generating that list
The library gtools provides the function permutations()

library(gtools)

listing <- function(x){
  b <- unlist(strsplit(as.character(x),""))
  c <- permutations(4,4,v=b, set=FALSE)
  d <- apply(c, 1, paste, collapse="")
  d <- as.numeric(d)
  sort(unique(d[d %in% cand])) 
}

We take a four-digit number as input, converts it to characters via as.character(), and uses strsplit() to get a character vector with for numbers.
Then permutations. The first “4” indicates that the source vector (b) has 4 elements. The second “4” that we want results that contains 4 elements. That we take “b” as the source vector – i.e. the elements from which the permuations should be made. And set=FALSE is a lgical flag. Default is TRUE, and will remove duplicates. That is not what we want, so it is set to FALSE.
Then all the permutations are collapsed to four-digit strings, converted to numeric, and duplicates are removed with unique(), returning a list of all permuations of the digits in the input, that are in the list of candidate primes.

Thats nice. But we know that there should be three numbers in the sequence. This function simply tests if there are at least three numbers in the listing:

poss <- function(x){
  length(listing(x)) >= 3
}

That was the simple part. We can now take all the four-digit primes, excluding the three we already know about, make permutations of them, and see if there are at least three permutations among them, that are prime.

Now it gets a bit more complicated. Given a set of four-digit numbers, we need to locate a subset of three, A, B and C. The differences C-B and B-A should be equal.
There are probably more elegant ways to do that, but this is the way I found.
Given a number x, get all possible permutations of that:
listing(x)
Get all the combinations of three of these numbers:
comb(listing(x),3)
Now we have a matrix, with all the possible combinations in the columns.
Find the differences between the first and the second row, and the second and the third row, for all columns:
apply(apply(combn(listing(x), 3),2,diff)
If the two differences are equal, they fullfil the third property these numbers should have. Therefor we need to find the differende between the two differences. It that is 0, we have our result:
apply(apply(combn(listing(x), 3),2,diff),2,diff)==0])
That gives a logical vector, that is TRUE whenever the difference between the three numbers is the same. We get the numbers by:
combn(listing(x), 3)[,apply(apply(combn(listing(x), 3),2,diff),2,diff)==0]
I’ll define that as a function:

corrComb <- function(x){
  combn(listing(x), 3)[,apply(apply(combn(listing(x), 3),2,diff),2,diff)==0]  
}

It would be nice to have that as a logical test as well. I’m running out of ideas for reasonable function names:

test <- function(x){
  as.logical(sum(corrComb(x)))
}

Very simple – if there is a combination fulfilling the requirements, return TRUE.

Now, lets put it all together. I’ll use the library purrr, which allows me to use these nifty pipes.

Taking the candidate primes, we keep the ones that have three or more permutations that are prime, and in our candidate list. Of those, we keep the ones that have fulfill the requirement that the pairwise differences between three of them are equal.
Those are piped to corrComb, that returns the three numbers. They are piped to sort, so we get them in ascending order, and then I paste them together.

library(purrr)
answer <- cand %>%
  keep(poss) %>%
  keep(test) %>%
  corrComb%>%
  sort %>%
  paste(.,collapse="")

Lessons learned.

  1. Those pipes are pretty neat.
  2. The apply family is also pretty cool
  3. Read the documentation. I spent some time being confused about the results from permuations() until I discovered the set-flag