Project Euler 84

Monopoly – Project Euler problem 84

This is basically a question of building a Monopoly simulator.

We begin at square GO. Roll two six-sided dice, and advance.

If we land on the “Go to jail” (G2J) square, we go to the JAIL square.

If we land on a CC square, we draw a card from the “community chest” stack of cards. There are 16 cards there, only two changes the position of the player: “Advance to GO” and “Go to Jail”

If we land on a CH square, we draw a card from the Chance-stack. There are 16 cards here, only ten changes the position:

  • Advance to GO
  • Go to JAIL
  • Go to C1
  • Go to E3
  • Go to H2
  • Go to R1
  • Go to next R (railway company)
  • Go to next R
  • Go to next U (utility company)
  • Go back 3 squares.

There are three CH squares, and three CC.

A player being sent to jail, is assumed to buy himself out immediately. And if a player rolls three consecutive doubles, he does not advance the result of the third roll, but goes directly to jail.

The two stacks of cards are shuffled at the beginning.

Find the probability of landing on a given square, and return the three squares that are most likely to land on.

Numbering the squares from 00 to 39, the answer is the concatenated string representing these three squares. For 6 sided dice, the most likely squares to end on after a roll of the dice, are squares 10, 24 and 00. The answer is 102400.

I am going to need some dice:

dice <- function() sample(1:6, 2, replace=TRUE)

I’m gonna need a community chest function. And I am going to take a chance. That is, I am going to asume that the whole “shuffle the deck at the start, and place drawn card at bottom” is going to be equivalent to just picking a random card:

pool <- rep(0,14)
pool <- c(pool,1,11)
cc <- function(cs){
  res <- sample(pool,1)
  ifelse(res==0,cs,res)
}

It takes the current square, builds a vector containing 14 repetitions of 0, one instance of “1” corresponding to GO, and one of “11”, corresponding to jail. Then it returns one of the values.

And a chance function. Again, I am going to make the same assumption as with the community chest function:

ch_pool <- rep(0,6)
ch_pool <- c(ch_pool,1:10)
ch <- function(cs){
 chance <- sample(ch_pool,1) 
 if(chance==0) res <- cs
 if(chance==1) res <- 1
 if(chance==2) res <- 11
 if(chance==3) res <- 12
 if(chance==4) res <- 25
 if(chance==5) res <- 40
 if(chance==6) res <- 6
 if(chance==7&cs==8) res <- 16
 if(chance==7&cs==23) res <- 26
 if(chance==7&cs==37) res <- 6
 if(chance==8&cs==8) res <- 16
 if(chance==8&cs==23) res <- 26
 if(chance==8&cs==37) res <- 6
 if(chance==9&cs==8) res <- 13
 if(chance==9&cs==23) res <- 29
 if(chance==9&cs==37) res <- 13
 if(chance==10) res <- cs-3
 return(res)
}

This is basically the same as the community function. Just a bit longer.

I am also going to need a structure for the playing board:

hit <- numeric(40)

A third chance I’m taking. That the part about hitting three doubles in a row, and then going to jail, will also disappear if I just run this long enough:

curr_square <- 1

for(i in 1:10000000){
  curr_square <- ((curr_square + sum(dice())-1)%%40 + 1) 
  if(curr_square == 31) curr_square <- 11
  if(curr_square %in% c(2,18,34)) curr_square <- cc(curr_square)
  if(curr_square %in% c(8,23,37)) curr_square <- ch(curr_square)
  hit[curr_square] <- hit[curr_square] + 1
}

I begin at square 1. And then I am going to repeat the following 1.000.000 times:
Roll the dice, add the result to the current square. If the result is more than 40, do some simple math to continue from square 1.

If the square I’m landing on is square 31, set new square to 11 – that is the jail.

If it is 2, 18 or 34, run the cc – community chest function. That will return the square that we land on after drawing a card from that stack.

If it is 8, 23 or 37, we have landed on a chance square. Run the ch – chance function. That will return the square that we land on after drawing a card from that stack.

That should give me the frequencies with which the player have landed on each square.

Now I can rank the results:

freq <- rank(hit)

That gives me the rank. Number 40 is the square that was hit most times.

I need to translate that to the numbering of the squares that is used for the result.

squares <- c("00","01","02","03","04","05","06","07","08","09",10:39)
paste(squares[freq %in% c(40,39,38)], collapse="")
## [1] "001024"

pasting together the squares hit most, second most, and third-most times. That is fortunately the result given in the problem.

Now lets try with a new set of dice:

dice <- function() sample(1:4, 2, replace=TRUE)

And the same code:

hit <- numeric(40)
curr_square <- 1

for(i in 1:10000000){
  curr_square <- ((curr_square + sum(dice())-1)%%40 + 1) 
  if(curr_square == 31){ curr_square <- 11}
  if(curr_square %in% c(2,18,34)) curr_square <- cc(curr_square)
  if(curr_square %in% c(8,23,37)) curr_square <- ch(curr_square)
  hit[curr_square] <- hit[curr_square] + 1
}

freq <- rank(hit)

answer <- paste(squares[freq %in% c(40,39,38)], collapse="")

And we are rewarded with the nice green tickmark.

This was not as difficult as I thought. But maybe it would be wise to look at the forum for the problem, and study some of the non-brute force solutions.

Oh – another lesson learned. In the code above, I run through 10.000.000 throw with the dice. I actually wanted to throw the dice 1.000.000 times. Always count your zeroes!

Also – 100.000 thow is enough to get the right result.

Project Euler 179

Project Euler 179

14 has four divisors: 1, 2, 7 and 14.

14+1, ie. 15, also has four divisors: 1, 3, 5 and 15.

How many integers, n, where 1 < n < 107, has the same number of divisors as n + 1?

That is surprisinly simple. Start by figuring out how many divisors all the numbers have.

n <- 10000000
t <- integer(n)



for(s in 1:n){
  t[seq(s,n, by=s)] <- t[seq(s,n, by=s)] + 1
}

First I make a vector with the length 107 containing only zeroes.

Element 1 coresponds to n = 1, element 47 to n = 47.

Every single element in that vector has 1 as a divisor.

Every other element has 2 as a divisor.

Every third element has 3 as a divisor, and so on.

So for every element, in the sequence 2 to 107, in steps of 2, I add 1.

The same for every element in the sequence 3 to 107 in steps of 3.

And so on.

That takes a loooong time to do.

Now I have a vector t, where the value in t[n] is the number of divisors in n.

I now need to compare them.

If I have two identical vectors q and r. And add a 0 to the beginning of one of them, and a 0 at the end of the other. I will have two vectors, where element i+1 in the first is the same as element i in the second (or the other way around). So when I compare q[47] with r[47], I actually compare q[47] with q[48]. That is the comparison I need. When these to values are identicial, I have an iteger with the same number of divisors as the following integer.

Apoligies for the not so fluent prose. It’s pretty warm and I’m rather tired.

Anyway:

q <- c(0,t)
r <- c(t,0)

I now just need to compare them, and sum the result of that comparison:

answer <- sum(q==r)

Lesson learned

Just because a problem has a high number, doesn’t mean that it is particularly difficult.

dplyr::count()

dplyr::count()

A short note for myself.

Given a dataframe, eg mtcars:

head(mtcars)
##                    mpg cyl disp  hp drat    wt  qsec vs am gear carb
## Mazda RX4         21.0   6  160 110 3.90 2.620 16.46  0  1    4    4
## Mazda RX4 Wag     21.0   6  160 110 3.90 2.875 17.02  0  1    4    4
## Datsun 710        22.8   4  108  93 3.85 2.320 18.61  1  1    4    1
## Hornet 4 Drive    21.4   6  258 110 3.08 3.215 19.44  1  0    3    1
## Hornet Sportabout 18.7   8  360 175 3.15 3.440 17.02  0  0    3    2
## Valiant           18.1   6  225 105 2.76 3.460 20.22  1  0    3    1

These cars have 4, 6 or 8 cylinders. How many cars fall in each category?

count(mtcars, cyl)
## # A tibble: 3 x 2
##     cyl     n
##   <dbl> <int>
## 1    4.    11
## 2    6.     7
## 3    8.    14

They have 3, 4 or 5 gears.

How many cars have 8 cylinders, and 3 gears?

count(mtcars, cyl, gear)
## # A tibble: 8 x 3
##     cyl  gear     n
##   <dbl> <dbl> <int>
## 1    4.    3.     1
## 2    4.    4.     8
## 3    4.    5.     2
## 4    6.    3.     2
## 5    6.    4.     4
## 6    6.    5.     1
## 7    8.    3.    12
## 8    8.    5.     2

This is equivalent to first grouping the observations by cylinders and gears. And then counting using tally():

mtcars %>% 
  group_by(cyl,gear) %>%
  tally()
## # A tibble: 8 x 3
## # Groups:   cyl [?]
##     cyl  gear     n
##   <dbl> <dbl> <int>
## 1    4.    3.     1
## 2    4.    4.     8
## 3    4.    5.     2
## 4    6.    3.     2
## 5    6.    4.     4
## 6    6.    5.     1
## 7    8.    3.    12
## 8    8.    5.     2

If, for some obscure reason, we want to count the number of carburetors in each category, we can do that:

count(mtcars, cyl, gear, wt=carb)
## # A tibble: 8 x 3
##     cyl  gear     n
##   <dbl> <dbl> <dbl>
## 1    4.    3.    1.
## 2    4.    4.   12.
## 3    4.    5.    4.
## 4    6.    3.    2.
## 5    6.    4.   16.
## 6    6.    5.    6.
## 7    8.    3.   37.
## 8    8.    5.   12.

The cars with 8 cylinders, and 3 gears, have among them a total of 37 carburetors.

Project Euler – 58

Project Euler 58 – Spiral Primes

All right. Lets make a square, by jotting down all numbers from 1 in a spiral like this:

matrix(c(37, 36, 35, 34, 33, 32, 31,38, 17, 16, 15, 14, 13, 30,39, 18,  5,  4,  3, 12, 29,40, 19,  6,  1,  2, 11, 28,41, 20,  7,  8,  9, 10, 27,42, 21, 22, 23, 24, 25, 26,43, 44, 45, 46, 47, 48, 49),7,7,byrow=TRUE)
##      [,1] [,2] [,3] [,4] [,5] [,6] [,7]
## [1,]   37   36   35   34   33   32   31
## [2,]   38   17   16   15   14   13   30
## [3,]   39   18    5    4    3   12   29
## [4,]   40   19    6    1    2   11   28
## [5,]   41   20    7    8    9   10   27
## [6,]   42   21   22   23   24   25   26
## [7,]   43   44   45   46   47   48   49

We’re going anti-clockwise.

What Project Euler thinks is interesting, is that 8 out of the 13 numbers lying on the two diagonals, are prime.

That gives a primal ratio of 8/13, or about 62%

We can continue adding layers to the spiral, and make squares with increasing side length.

We are now tasked with finding the side length of a the square spiral for which the ratio of primes along the two diagonals falls below 10% (for the first time).

The task boils down to: Figure out what numbers are on the diagonals of a square with a given side length. What is the proportion of primes in that set of numbers.

The final number in the spiral of a nxn square is, nxn. That number is in the south-eastern corner of the square.

The number in the south-wester corner is n2 – (n-1). The number in the north-wester corner is n2 – 2(n-1). And the one in the north-eastern corner is n2 – 3(n-1).

The numbers on the diagonals are all corner-numbers in a square. If I want all the numbers on the diagonals in the 7×7 square above, I calculate the corner-numbers for n = 3, n = 5 and n = 7, and collect them all in a vector, along with the center piece, 1.

I’m going to define a numbers-vector containing the numbers on the diagonal, and add 1 to it.

numbers <- numeric()
numbers <- c(1, numbers)

Having that vector, I need to calculate the fraction of the elements in it, that are prime. The numbers library is handy for that:

library(numbers)

prime_fraction <- function(v){
  sum(isPrime(v))/length(v)
}

The function takes a vector as input, runs isPrime() on it. That function returns a logical vector, with TRUE where the element is prime. The sum of that vector is the number of primes in the input vector. Dividing that with the number of elements in the vector, length(), returns the fraction of primes in the input vector.

Now all that remains is adding diagonal numbers to the vector, check the fraction of primes, and continue until that fraction is below 0.1. I’m going to run the first iteration by hand. The fraction of primes in the initial numbers vector is 0, and that is of course less than 0.1, but not the answer I’m looking for.

i <- 3
numbers <- c(numbers, i*i, i*i-(i-1), i*i-2*(i-1), (i*i-3*(i-1)))
i <- i + 2
while(prime_fraction(numbers)>0.1){
  numbers <- c(numbers, i*i, i*i-(i-1), i*i-2*(i-1), i*i-3*(i-1))
  i <- i + 2
}
answer <- i - 2

OK. That gives me the correct answer. One problem: It takes forever. Well, not forever. But too long. The problem is the isPrime() function. A small test tells me that checking the primality of a vector containing 5385 elements, takes about 70 milliseconds. That is pretty quick. But when I do that several thousand times, it adds up.
Also I’m destined to spend eternity in the first circle of R-hell for growing the vector in perhaps the most inefficient way.

Is there a simpler way?

What I’m really looking for is two numbers. The number of primes in the diagonals. And the total number of numbers in the diagonals.

Each diagonal in an nxn square contains n numbers. The center is counted twice, so the total number is given by 2*n – 1.

What about the number of primes? If I instead of adding the four corners to the numbers vector, and testing that for primes, just test the four numbers, and add the number of primes among them to a counter. That should be quicker, as I’m only checking each number for primality once. What should make it even faster, is that I only have to check three numbers. The south-eastern number is always a square, so definitely not prime.

Lets try again. As before, I’m priming the pump, and beginning with the 5×5 square:

i <- 5
primes <- 3

while(primes/(2*i - 1) > 0.1){
  primes <- primes +sum(isPrime(c(i*i-(i-1), i*i-2*(i-1), i*i-3*(i-1))))
  i <- i + 2
}
answer <- i

Much, much faster!

Lessons learned

There usually is a faster way to do things. Also, and that was a surprise. Getting the length of a 5385-element long vector is not that slower than evaluating 2*2693 – 1.

test <- runif(5385)
leng <- function(){
  length(test)
}

i <- 2693
mult <- function(){
  2*i - 1
}

library(microbenchmark)
mbm <- microbenchmark(mult, leng, times = 10000)
## Warning in microbenchmark(mult, leng, times = 10000): Could not measure a
## positive execution time for 71 evaluations.
mbm
## Unit: nanoseconds
##  expr min lq     mean median uq    max neval
##  mult   0  0 193.7454      1  1 373167 10000
##  leng   0  0 197.7259      0  1 767366 10000

Depending on what else is happening on the computer, it might even be faster to take the length than to do simple arithmetic.

library(ggplot2)
autoplot(mbm)
## Warning: Transformation introduced infinite values in continuous y-axis
## Warning: Removed 9946 rows containing non-finite values (stat_ydensity).

plot of chunk unnamed-chunk-7

Not that it makes that much of a difference. But I find it interesting.

Project Euler 35

Project Euler 35 – Circular primes

197 is an example of a circular prime. It is itself a prime. And if we rotate the digits, 971 and 719 are also primes.

Given that 2, 3, 5 and 7 are by definition circular primes – not that we rotate that much, how many circular primes are there below one million?

Lets begin by generating a vector containing all primes below 1000000:

library(numbers)
prim <- Primes(1000000)
length(prim)
## [1] 78498

That is 78498 numbers. Is there any way to reduce that number?

There is. Any prime containg an even digit, will rotate to a number ending in an even digit. That number cannot be a prime.

The same applies to primes containing the digit 5.

Of course there are two special cases, 2 and 5.

Lets reduce the number of primes we need to consider. First make a function checking if there are any “illegal” digits in the number:

poss <- function(x){
  !any(unlist(strsplit(as.character(x),"")) %in% c(0,2,4,5,6,8))  
}

The function splits the number in individual digits, checks if they are in the list of forbidden digits, and return fals if any of the digits of the number is in the list of forbidden digits.

Now paring down the list of primes:

prim <- prim[sapply(prim,poss)]

That leaves us with 1111 possible primes.

Now we need to figure out how to rotate the numbers:

rotate <- function(x){
  res <- integer()
  while(length(res)<as.integer(log10(x)+1)){
    x <- x%/%10 + x%%10*(10**(as.integer(log10(x)+1)-1))
    res <- c(res,x)
  }
  all(res %in% prim)
}

Lets take a four digit number, 1234, as an example.

I begin by initializing a variable res to keep the rotated numbers.

as.integer(log10(x)+1) gives me the length of the number. Here, 4.

Rotating a four digit number will yield four numbers. As long as there are less than four numbers in the result-vector res, I need to rotate at least once more.

I want the first rotation to give the number 4123.

Performing an integer division by 10 gives me the first three digits of the number, here 123. Modulo 10 gives me the last, here 4.

I get the rotation by adding the first three digits of the number to the last number multiplied by 10 to the power of the number of digits in the original number minus 1.

103 is 1000, multiplying by 4 is 4000, adding that to 123 returns 4123. Repeat until there are four numbers in the result-vector.
Then check if all the elements of the result-vector are in the list of primes. If they all are, return TRUE, otherwise return FALSE.

Now it is simply a question of getting rid of all the remaining candidate primes that does not rotate to primes:

prim <- prim[sapply(prim,rotate)]

The lenght is the number of primes that are circular as defined above. I just need to add the two special cases 2 and 5, and I have the answer:

answer <- length(prim) + 2

Project Euler 46

Project Euler 46.

Christian Goldbach apparently proposed, that every odd composite number, can be written as the sum of a prime, and twice a square. As in:

9 = 7 + 2 * 12

I had to look up what a composite number is. It is basically a number that is not a prime.

The conjecture was false. And we should now find the smallest, odd, composite number, that can not be written as the sum of a prime and twice a square.

I was not able to find a shortcut, and apparently no one else has.

So I brute forced it.

I started with loading the trusted numbers, and purrr libraries:

library(numbers)
library(purrr)

Then I guessed, correctly as it turned out, that the number I’m looking for, is less than 100000.

That means, that the prime that can take part in the sum, must also be less than 100000. So – generating two vectors, one of primes below 100000, and candidate odd composite numbers below 100000:

prim <- Primes(100000)
test <- seq(3,100000,by=2)

1 is not included. I’m not sure if it is ruled out for not being a composite. But it is trival that it cannot be a sum of a prime and any thing else (positive). So I’m starting at 3.

Actually test does not contain only composite numbers. Getting rid of the primes is the next step:

test <- setdiff(test, prim)

setdiff() is rather useful. It takes two sets – vectors, and removes all the elements in the last from the first.

Next up is getting candidate squares.
As I’m squaring them, I don’t need to look at squares of numbers larger than 707. Since 708 squared and doubled is larger than 100000.

squares <- 1:707
squares <- 2*squares**2

Next I’m writing a function to test if any x can be expressed as the sum of twice a square and a prime.

x – squares will give me a vector with the result of substracting each of the elements in squares from x. I do not need to consider values in squares that are larger than x. Subtracting those from x would give a negative number – and that is not a prime.

To do that:

x – squares[squares<x]

Next, are the elements in that vector prime? The easy way is to test if they are in the list of primes. I use the %in% operator for that:

(x – squares[squares<x]) %in% prim

That will return a logical vector, with a value TRUE or FALSE for each element in the vector resulting from the subtraction, depending on whether it is in the list of primes. If any element is TRUE, we have a result.

The any() function does that. If any element in the vector is true, any() will return true.

Putting all of that into a function:

testing <- function(x){
  any((x - squares[squares<x]) %in% prim)
}

I now have a function that returns true, if a number x can be written as the sum of a prime and twice a square.

Finally I filter that through the discard() function – basically getting rid of all the numbers in my test-vector, that can be written as this sum. And finally passing those values to the min() function. And I have my answer:

answer <- test %>%
  discard(function(x) testing(x)) %>%
  min()

Project Euler 38

Multiplying 192 with 1, 2 and 3:

192 x 1 = 192

192 x 2 = 384

192 x 3 = 576

Concatenating the products, we get 192384576. That number is 1-9 pandigital (as in all digits 1 to 9 is represented in the number exactly one time).

The same can be achieved by multiplying 9 with 1, 2, 3, 4 and 5. That gives us the pandigital concatenation 918273645.

We should now find the largest 1-9 pandigital number, formed as a concatenation in the described way, by multiplying an integer with 1, 2,…n, where n > 1.

Testing with 918273645. Nope, that is not the solution. That would be too easy 🙂

OK. Lets call the integer we’re multiplying, m.

And as we’re using R, lets call the numbers we are multiplying it with, for n, as in the vector 1:n.

The concatenated product will have to have 9 as its first digit. The first digits comes from multiplying m with 1. Therefore, the first digit in m must be 9.

We need to end up with a total of 9 digits in the concatenated product. If m has two digits, multiplying with 1 will give us two digits in the result. Multiplying with 2 will give us 3 digits in the result (any two digit number larger than 50 will, multiplied by 2, give a three digit result. And we know that m should begin with 9).

Multiplying with 3 will give another 3 digits in the result, bringing us to at total of 8. Multiplying with 4 brings the total number of digits in the concatenated result to 11. There can be only 9.

Chosing a three digit m gives similar problems.

But a four digit m, with 9 as the first digit, gives us four digits multiplying by 1 and five when we multiply by 2. Giving us a total of 9 digits in the result.

What we know so far:

  • The integer we have to multiply has four digits, and begins with 9.
  • We will have to multiply it by 1 and 2. Or 1:2.

That should be possible to brute-force.

Multiply all four digit integers where the first digit is 9 by 1:2.

Collapse the results to a single string.

Keep all the mumbers that are pandigital.

Take the maximum of it.

I don’t need to check all the numbers. I know that the first four digits must be larger than 9182, since this is the first four digits in the example. The highest number is not 9999, it is 9876.

t <- 9183:9876

Using the purrr library:

library(purrr)
answer <- t %>%
  keep(function(x) length(unique(setdiff(unlist(strsplit(paste(x*1:2,collapse=""),split="")),0)))==9)

Ok, lets stop there. What am I doing?

I pass all the possible four digit integers to keep(). keep() applies a function to each of the integers, and keeps the ones where the function evaluates to true.

The function in question is an anonymous function of x:

length(unique(setdiff(unlist(strsplit(paste(x*1:2,collapse=“”),split=“”)),0)))==9

x, the integer from t, is multiplied wiht 1:2. That results in a vector where the first element is x multiplyed by 1, and the second x multiplied by 2.

That is then put into the paste() function, with the parameter collapse=“”. The result is that the vector from just before, is concatenated to just one string.

That string is splitted by strsplit into individual characters. The result is a list. Which is bloody annoying to work with.
So I put everything into unlist(), that returns a vector.

Setdiff(x,0) removes any zeroes from that vector.

Having removed the zeroes, unique() returns all the unique values in the vector. If there is 9, the number is 1 to 9 pandigital.

And if it is, the integer is kept by keep.

It doesn’t really give me the result. It returns the four digit integers beginning with 9, that, multiplied by 1:2 and concatenated gives a pandigital result.

But the largest of them is the integer I should multiply by 1:2. So lets expand the code a bit:

answer <- t %>%
  keep(function(x) length(unique(setdiff(unlist(strsplit(paste(x*1:2,collapse=""),split="")),0)))==9) %>%
  max() %>%
  (function(x) x*1:2) %>%
  (function(x) paste(x, collapse="")) %>%
  as.numeric()

max() gives me the largest of the four digit integers. The next function mulitplies it by 1:2. After that, the result is collapsed. And finally converted to numeric by as.numeric(). Not really necessary, but I’m doing it anyway.

Lessons learned

When using pipes to do stuff, anonymous functions are useful. But should be put into parantheses.

Project Euler 32

Project Euler 32

More pandigitality. Or whatever it’s called.

39 x 186 = 7254

And that is interesting, because 39 186 7254 is pandigital. As in all the digits 1 to 9 is represented exactly once.

The task is: Find all the products, where multiplicand (39), multiplier (186) and product (7254) is 1 through 9 pandigital.

First question I need to answer is:

How many digits can there be in the multiplicand and multiplier?

If there is 1 digit in the multiplicand, there has to be 4 digits in the multiplier, and 4 in the product.

If there is only 3 in the multiplier, we would have to have 5 digits in the product, which is impossible when we multiply a 1 digit number with a 3 digit number.

And if there was 5 digits in the multiplier, we would get 5 digits in the product, bringing the total number of digits in the identity to 11.

Therefore: If there is 1 digit in the multiplicand, there has to be 4 digits in the multiplier. We will get some products with 5 digits, but we’ll filter them out.

What if there are 2 digits in the multiplicand?

Then there has to be 3 digits in the multiplier, and 4 in the the product.

If there are 2 digits in the multiplier, the highest number of digits we can get in the product, is 4. And then we would only have 8 digits in total.

If on the other hand we have 2 digits in the multiplicand, and 4 digits in the multiplier, we would get 5 digits in the product, bringing the total number of digits to 11.

So. There must be 1 or 2 digits in the multiplicand. And 4 or 3 digits respectively in the multiplier.

And there must never be more than 4 digits in the product.

Lets begin by making all possible combinations of multiplicands and multipliers.

multiplicand <- 1:9
multiplier <- 1023:9876
res1 <- expand.grid(multiplicand,multiplier)

multiplicand <- 10:98
multiplier <- 102:987
res2 <- expand.grid(multiplicand,multiplier)

testcases <- rbind(res1,res2)

All possibilities for 1 digit in the multiplicand, combined with all possibilities for a 4-digit multiplier.

Combined (rbind) with all possibilities for 2-digit multiplicands, and all possibilites for 3-digit multipliers.

Next I’ll need a function that returns TRUE if its input it 1 throug 9 pandigital:

isPan <-  function(x){ (length(unique(setdiff(unlist(strsplit(x,split="")),0)))==9)&&(nchar(x)==9) }
isPan <- Vectorize(isPan)

I also need it to be vectorized. This is not at perfect way to do it. But I can’t figure out another way.

And now we’ll use the lovely pipes to pass all the test cases through a series of functions:

library(dplyr)

answer <- testcases %>%
  mutate(product= Var1*Var2) %>%
  filter(product <= 9999) %>%
  mutate(concat = paste(Var1, Var2, product, sep="")) %>%
  filter(isPan(concat)) %>%
  select(product) %>%
  unique() %>%
  sum()

First I make a new column with the mutate() function, called product, that is the multiplication of Var1 and Var2 (multiplicand and multiplier respectively)

Next I filter out all the product that has more than 4 digits.

Next I make another new column, where I paste together the multiplicand, the multiplier and the product.

I filter, using my isPan() function. If the concatenation is pandigital, I keep it. Otherwise I throw the row away.

Now I only have rows where the concatenation of multiplicand, multiplier and product is pandigital.

But I’m actually only interested in the product, so I select the column “product”

Some of the products are repeated, unique takes care of that.

And then I just need to sum them.

Lessons learned

Vectorized functions are essential for this methodology. But can be difficult to write.

Project Euler 44

Project Euler 44 – Pentagon numbers

A pentagonal number is generated by the formula:

P~n~ = n(3n-1)/2

I must now find the pair of pentagonal numbers P~j~ and P~k~ where the sum P~j~ + P~k~ is also pentagonal.

And where P~j~ – P~k~ is pentagonal.

And where D = |P~j~ – P~k~| is minimised.

The answer is D.

First I’ll make a function generating pentagonal numbers:

pent <- function(x){
  x*(3*x - 1)/2
}

Nice and vectorized.

I can check it by calculating the first ten pentagonal numbers:

t <- 1:10
pent(t)
##  [1]   1   5  12  22  35  51  70  92 117 145

And see that they are the same as the first ten pentagonal numbers listed in the problemt text.

Next, lets make a nice long list of pentagonal numbers:

t <- 1:10000
t <- pent(t)

Next I’ll make all the pair-wise combinations of these numbers:

s <- expand.grid(t,t)

And then I’ll filter those combinations:

library(dplyr)

answer<- s %>%
  filter((Var1 + Var2) %in% t) %>%
  filter(abs(Var1 - Var2) %in% t) %>%
  transmute(D = Var1 - Var2) %>%
  abs() %>%
  min()

First – only retain the combinations where the sum of the two pentagonal numbers is itself in the list of pentagonal numbers.

Then, only retain the combinations where the absolute difference is also in the list of pentagonal numbers.

Then calculate a new row D, as the difference.

Pass that to abs() to get the absolute difference.

And finally to min() to get the minimum.

The final two steps are actually unnecessary, since I only find one pair of pentagonal numbers that meets the requirement.

Lessons learned

Hm. Not really. Why is the first I find the solution? Noone else seems to know.

Stacked barcharts with ggplot2

Barcharts to percentages

Maybe not the most correct headline.

Anyway. Assume we have five cases, with two values for a given treatment, each labelled “AVO” and “INT”. How to get a barchart, where each cases is one bar, and the bars are coloured based on the percentages of the two values?

Lets make some sample data:

set.seed(47)
cases <- 1:5
cases <- c(cases, cases)
treat1 <- rep("AVO", 5)
treat2 <- rep("INT", 5)
treat <- c(treat1, treat2)
val <- runif(10)

df <- data.frame(cases=cases, treatment = treat, value = val, stringsAsFactors = FALSE)

How does that look?

df
##    cases treatment     value
## 1      1       AVO 0.9769620
## 2      2       AVO 0.3739160
## 3      3       AVO 0.7615020
## 4      4       AVO 0.8224916
## 5      5       AVO 0.5735444
## 6      1       INT 0.6914124
## 7      2       INT 0.3890619
## 8      3       INT 0.4689460
## 9      4       INT 0.5433097
## 10     5       INT 0.9248920

OK, nice and tidy data.

What I want is a bar for case 1, filled with percentages. One color with 0.976/(0.976+0.691) and another color for 0.691/(0.976+0.691).

library(ggplot2)
ggplot(df)+
  geom_col(aes(y=value, x=cases, fill=treatment), position = "fill")

plot of chunk unnamed-chunk-3

We’re using geom_col. That is a shorthand for geom_bar with stat=“identity”, and is the shortcut to expressing values in the barchart.

The y-value is the value from df, the x-value is the cases. And we’re colouring the bars based on the value in treatment. The position=“fill” stack the bars, and normalize the heights. If we wanted to just stack the elements, we would use position=“stack”:

ggplot(df)+
  geom_col(aes(y=value, x=cases, fill=treatment), position = "stack")

plot of chunk unnamed-chunk-4

This is a great, like really great, cheatsheet for ggplot:

https://www.rstudio.com/wp-content/uploads/2015/03/ggplot2-cheatsheet.pdf

Udgivet i R